KR19990068417A

KR19990068417A - language studying system which can change the tempo and key of voice data

Info

Publication number: KR19990068417A
Application number: KR1019990017753A
Authority: KR
Inventors: 박주성; 우종식; 김태훈
Original assignee: 오종식; 보이소반도체 주식회사
Priority date: 1999-05-18
Filing date: 1999-05-18
Publication date: 1999-09-06
Also published as: KR100368456B1

Abstract

본 발명은 어학학습장치의 부호화/복호화 방법 및 어학학습장치에 관한 것으로, MP3 방식으로 압축된 음성데이타를 받아서 음질의 손상없이 속도를 가변할 수 있으며, 음정변환 및 문자캡션 기능이 구현될 수 있도록 한 것이다. 이를 위해, 본 발명에 따른 어학학습장치의 부호화/복호화 방법은, 오디오 데이타를 MP3방식으로 압축하여 부호화하는 제 1단계; 압축된 데이타를 복호화하는 제 2단계; 복호화된 데이타의 음정 및 속도 가변과 증폭기 및 음성데이타 문장의 문자 표현을 위한 제 3단계를 포함한다. 그리고, 본 발명에 따른 어학학습장치는, 오디오 데이타를 MP3방식으로 압축하여 부호화하는 변형된 부호화기; 부호화기에 의해 압축된 데이타를 복호화하는 변형된 복호화기; 부호화기에 연결되게 설치되어 MP3방식으로 압축되어 부호화된 오디오 데이타를 반복학습모드를 위해 버퍼링하는 버퍼기; 복호화기로부터의 음성데이타를 속도 및 음정 가변을 처리하는 속도/음정가변기; 음성데이타 문장의 문자 표현과 상황설명을 위한 문자/영상처리기; 및 변형된 부호화기, 변형된 복호화기, 버퍼기, 증폭기, 문자/영상처리기 및 속도/음정가변기를 제어하는 제어기를 포함한다.The present invention relates to a method for encoding / decoding a language learning apparatus and a language learning apparatus. The present invention relates to a method for encoding / decoding a speech learning apparatus, and to varying speed without damaging sound quality by receiving compressed voice data in an MP3 manner. It is. To this end, the encoding / decoding method of the language learning apparatus according to the present invention comprises: a first step of compressing and encoding audio data in an MP3 method; A second step of decoding the compressed data; And a third step for varying the pitch and speed of the decoded data and the character representation of the amplifier and speech data sentences. The language learning apparatus according to the present invention comprises: a modified encoder for compressing and encoding audio data in an MP3 manner; A modified decoder for decoding the data compressed by the encoder; A buffer installed to be connected to the encoder and to buffer the encoded audio data encoded in the MP3 method for the repetitive learning mode; A speed / pitch variable for processing the speed and the pitch variable of the voice data from the decoder; A text / image processor for text representation and context description of voice data sentences; And a controller for controlling the modified encoder, the modified decoder, the buffer, the amplifier, the character / image processor, and the speed / pitch variable.

Description

Language studying system which can change the tempo and key of voice data}

본 발명은 어학학습시스템에 관한 것으로, 더욱 상세하게는 MP3 방식으로 압축된 음성데이타를 받아서 복원하고 음질의 손상없이 속도를 가변할 수 있으며, 어학학습 데이타의 속도가변, 음정변환 및 문자캡션 기능이 구현될 수 있는 어학학습장치의 부호화/복호화 방법 및 어학학습장치에 관한 것이다.The present invention relates to a language learning system, and more particularly, it can receive and restore compressed voice data in an MP3 manner, and can change the speed without damaging the sound quality. The speed variable, pitch conversion, and character caption function of language learning data are provided. The present invention relates to a method for encoding / decoding a language learning apparatus that can be implemented, and to a language learning apparatus.

일반적으로, 외국어를 배우는 과정에서 가장 어려운 과정이 말하는 내용을 정확하게 알아들을 수 없다는 것이다. 특히 말하는 속도가 빠른 경우에는 내용을 알아듣기가 매우 어렵다. 그러나 천천히 반복해서 이야기 해주면 문맥을 쉽게 파악할 수가 있다. 말을 내용을 쉽게 파악하게 하기 위하여 그림이나 비디오 테이프 등을 보조적으로 사용하기도 하지만, 말의 속도가 빠른 경우에는 이것들도 큰 청취력 향상에 큰 도움이 되지 못한다.In general, the hardest part of learning a foreign language is that you can't exactly understand what it says. It is very difficult to understand the content, especially if you speak fast. However, if you talk slowly and repeatedly, you can easily grasp the context. In order to make it easier to understand the words, pictures and video tapes may be used as auxiliary, but when the speed of the words is fast, these may not be very helpful for the improvement of the listening ability.

종래의 어학 학습은 카세트 테이프나 개인용 컴퓨터 상에서 CD 등을 이용하여, 정해진 속도로만 녹음된 학습용 대화를 들을 수 있도록 되어 있었다. 이러한 종래의 장치에 장착되는 구동용 모터의 속도를 변화시켜 말의 속도를 변화시키는 경우에는 속도 변환 영역이 아주 좁아서 학습효과가 약하다는 단점이 있다.Conventional language learning has been made to listen to the learning dialogue recorded only at a predetermined speed using a CD or the like on a cassette tape or a personal computer. In the case of changing the speed of the horse by changing the speed of the driving motor mounted on such a conventional device, the speed conversion area is very narrow, which has a disadvantage in that the learning effect is weak.

듣기 능력을 향상시키기 위하여 여러가지 매체로 저장된 음성 데이터를 반복적으로 청취한다. 종래의 음성 재생장치는 기계적인 모터를 구동시켜서 음성의 속도를 조절하게 되어있다. 음성 신호를 아무런 신호처리 과정 없이 모터의 속도를 변화시키면 음질 손상없이 어느 정도까지는 속도를 가변할 수 있지만 어학 학습기에서 요구되는 속도 정도를 변화시키면 원음이 손상되어 듣기가 어려운 상태로 된다.The audio data stored in various media is repeatedly listened to to improve the listening ability. Conventional speech reproducing apparatus is to adjust the speed of the voice by driving a mechanical motor. If the speed of the motor is changed without any signal processing, the speed can be changed to some extent without damaging the sound quality, but if the speed required by the language learner is changed, the original sound is damaged and it is difficult to hear.

어학 학습을 위한 음성 데이타를 저장하는 방식으로는 아날로그 형태와 디지탈 형태가 공존하고 있다. 종래의 방식에서 속도 가변이 어려운 이유는 신호를 아날로그적으로 처리하기 때문이다.Analog and digital forms coexist as a method of storing voice data for language learning. The reason why the variable speed is difficult in the conventional method is that the signal is processed analogously.

음성 데이타를 디지탈 형태로 저장하는 방식에서, 소리정보를 압축하기 위한 방식으로는 시간영역에서 신호를 처리하는 DPCM, ADPCM 방식이 있으며, 주파수 영역에서의 신호처리를 통하여 데이타를 압축하는 방식으로는 MPEG Audio 압축, AC3 등이 널리 사용되고 있다. 시간 영역에서 신호를 처리하는 방식은 계산량이 적은 대신 압축율이 그다지 높지 않다. 주파수 영역에서 압축하는 방식은 계산량은 많지만 압축률이 높다. 최근 들어 반도체 칩의 동작속도의 향상으로 주파수 영역에서 데이타를 압축하는 방식이 많이 사용되고 있다. 그 중에서 MPEG Audio 압축방식은 음악과 같은 고음질이 요구되는 분야에 널리 사용되고 있다.In the method of storing voice data in a digital form, there are DPCM and ADPCM methods for processing signals in the time domain, and the method for compressing data through the signal processing in the frequency domain is MPEG. Audio compression and AC3 are widely used. Signal processing in the time domain has a small amount of computation but not a high compression ratio. Compression in the frequency domain is more computational but has a higher compression rate. Recently, a method of compressing data in the frequency domain has been widely used due to an improvement in the operating speed of a semiconductor chip. Among them, MPEG audio compression method is widely used in the field where high sound quality such as music is required.

MPEG 압축에 관한 참조문헌은 a,b,c; 속도/음정가변을 위한 참고문헌은 d,e,f; 문자표현 방식에 관한 참고문헌은 g,h로 다음과 같다.References regarding MPEG compression include a, b, c; References for speed / pitch variability include d, e, f; References regarding the character expression method are g and h as follows.

a. ISO/IEC IS 11172(MPEG-1) Standarda. ISO / IEC IS 11172 (MPEG-1) Standard

b. Karlheinz Brandenburg & Gerhard Stool, "ISO-MPEG-1 Audio: A Generic Standard for Coding of High-Quality Digital Audio", J. Audio Eng. Soc., Vol.42 No.10, 1994 pp.780-792b. Karlheinz Brandenburg & Gerhard Stool, "ISO-MPEG-1 Audio: A Generic Standard for Coding of High-Quality Digital Audio", J. Audio Eng. Soc., Vol. 42 No. 10, 1994 pp. 780-792

c. Davis Yen Pan, "Digital Audio Compression", Digital Technical Journal Vol.5. No.2 Spring 1993, pp.1-14c. Davis Yen Pan, "Digital Audio Compression", Digital Technical Journal Vol. 5. No.2 Spring 1993, pp.1-14

d. E.Hardam, "High Quality Time Scale Modification of Speach Signals Using Fast Synchronized-Overlap-Add Algorithms", IEEE, 1990, pp.409-412d. E. Hardam, "High Quality Time Scale Modification of Speach Signals Using Fast Synchronized-Overlap-Add Algorithms", IEEE, 1990, pp. 409-412

e. J.L.Wayman and D.L.Wilson, "Some Improvements on the Synchronized-Overlap-Add Method of Time Scale Modification for use in Real-Time Speech Compression and Noise Filtering", IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-36, No.1, pp.139-140, January. 1988.e. J.L. Wayman and D.L.Wilson, "Some Improvements on the Synchronized-Overlap-Add Method of Time Scale Modification for use in Real-Time Speech Compression and Noise Filtering", IEEE Trans. Acoust., Speech, Signal Processing, Vol. ASSP-36, No. 1, pp. 139-140, January. 1988.

f. J.Makhoul and A.E-Jaroudi, " Time-Scale Modification in Medium to Low Rate Speech Coding," IEEE Int. Conf., Acoust, Speech, Signal Processing, 1986, pp.1705-1708f. J.Makhoul and A.E-Jaroudi, "Time-Scale Modification in Medium to Low Rate Speech Coding," IEEE Int. Conf., Acoust, Speech, Signal Processing, 1986, pp. 1705-1708

g. ISO-646 규격g. ISO-646 standard

h. ISO-2022 규격h. ISO-2022 standard

MPEG(Moving Picture Experts Group)의 MPEG1 Audio는 고품질·고능률, 고압축 스테레오 부호화를 위한 ISO/IEC의 표준 방식으로 정식 표준번호는 ISO/IEC IS 11172(MPEG-1)이다.[참고문헌a,b,c] 다른 표준안과 마찬가지로 MPEG1 Audio 방식도 표준안만 만들어져 있고 구현방법은 사용자의 재량에 맡겨 두고 있다.MPEG1 Audio from the Moving Picture Experts Group (MPEG) is an ISO / IEC standard for high-quality, high-efficiency, high-compression stereo coding. The official standard number is ISO / IEC IS 11172 (MPEG-1). , c] As with other standards, the MPEG1 Audio method is made up of standards only, and the implementation is left to the discretion of the user.

MPEG1 Audio 압축방식은 사람의 청각구조를 모델링한 음향심리 모델(Psycho-acoustic Model)을 사용하고 있다. 음향심리 모델에서 사람의 청각 구조는 시간영역의 신호를 주파수 영역으로 바꾸어서 인지하는데, 이때 각 주파수 대역에 따라서 민감도 또는 가청 한계가 다르다. 또한 특정 주파수 대역에서 큰 에너지를 갖는 신호가 있을 때 주변 대역의 약한 신호를 듣지 못하는 마스킹 현상(Masking Effect)이 발생한다. 마스킹 현상에 의해서 마스킹 되어 인지할 수 없을 만큼의 양자화 잡음을 발생시키는 양자화 레벨을 결정한 후, 그것을 이용한 비트 할당으로 데이타를 압축할 수 있다. MPEG1 Audio 압축방식은 Target bit-rate에 따라 layer1,2,3로 나누어서 코딩되고 있으며, 압축률이 1/4에서 1/12까지 변화한다. 그 중에서 압축률이 가장 높은 layer3 방식을 MP3(MPEG1 Layer3)라고 부른다. MP3[참고문헌a,b,c] 방식은 32 서브밴드외에 MDCT(Modified Discrete Cosine Transform)와 Huffman Coding을 이용하기 때문에 압축률이 1/12까지 높일 수 있고, 좋은 음질을 가지고 있다.The MPEG1 Audio compression method uses a psycho-acoustic model that models the human auditory structure. In the psychoacoustic model, the human auditory structure recognizes the time domain signal by transforming it into the frequency domain, where sensitivity or audible limits are different for each frequency band. In addition, when there is a signal with a large energy in a specific frequency band, a masking effect that does not hear the weak signal of the peripheral band occurs. After determining the quantization level that is masked by the masking phenomenon to generate an unrecognizable quantization noise, the data can be compressed by bit allocation using the same. The MPEG1 audio compression method is coded by dividing it into layers 1, 2, and 3 according to the target bit rate, and the compression rate varies from 1/4 to 1/12. Among them, the layer3 method having the highest compression ratio is called MP3 (MPEG1 Layer3). The MP3 [Refs. A, b, c] method uses the Modified Discrete Cosine Transform (MDCT) and Huffman Coding in addition to the 32 subbands, so that the compression ratio can be increased to 1/12 and has a good sound quality.

아래의 표 1은 MPEG1 Audio의 layer3의 성능을 도시한 도표이다.Table 1 below is a diagram showing the performance of layer3 of MPEG1 Audio.

주)※M-Mono, S-Stereo※Quality Factor : 5-very good, 4-good, 3-not bad, 2-bad, 1-very badNote) ※ M-Mono, S-Stereo ※ Quality Factor: 5-very good, 4-good, 3-not bad, 2-bad, 1-very bad

※실제 Delay는 이론적 Delay의 3배※ Actual Delay is 3 times of theoretical Delay

종래의 음성 신호의 음정 및 속도 가변 기술은 대개 SOLA(Synchronized OverLap and Add) 알고리즘이 사용되고 있다.Synchronized OverLap and Add (SOLA) algorithms are generally used in the pitch and rate varying techniques of conventional voice signals.

도 1은 본 발명에 따른 MP3 부호화/복호화방법의 SOLA 알고리즘의 실행예를 도시한 도면이다. 이 도면은 SOLA 알고리즘(algorithm)을 처리하기 위한 파형예를 표현하고 있다. 이 도면에 도시된 바와 같이, SOLA 알고리즘은, 음성데이타를 일정한 크기의 Window를 사용하여 일정한 간격으로 오버랩 시키면서 해당하는 블럭을 잘라내고, 속도변화가 요구하는 간격으로 블럭들을 재배치하여 더하면 원래와는 다른 속도 가변된 음성 신호를 합성를 얻을 수 있도록 하는 알고리즘이다. 그러나 단순히 간격만 변화 시킨 후 서로 다른 블럭간의 신호를 더하면 음질의 저하를 초래하여 원래와는 다른 소리의 신호가 된다. 이것을 방지하기 위해 블럭을 재 배열할 때, 요구되는 간격을 기준으로 일정구간안에서 미세 조정 간격을 주면서 두 신호의 유사성을 판단할 수 있는 Cross Correlation을 산정하고, 유사성이 가장 큰 값에 해당하는 미세 조정 간격 만큼을 이동시켜 두 개 블럭신호를 합성하면 원음과 같은 음질을 유지할 수 있다.1 is a view showing an example of the implementation of the SOLA algorithm of the MP3 encoding / decoding method according to the present invention. This figure represents a waveform example for processing an SOLA algorithm. As shown in this figure, the SOLA algorithm cuts out the corresponding blocks while overlapping the voice data at regular intervals using a window of a constant size, and rearranges the blocks at intervals required by the speed change, and then adds them different from the original. It is an algorithm that can synthesize the rate-varying speech signal. However, simply changing the interval and then adding the signals between different blocks will cause the sound quality to deteriorate, resulting in a different sound signal. In order to prevent this, when the blocks are rearranged, a cross-correlation for judging the similarity of two signals can be determined by giving a fine-tuning interval based on the required interval, and fine-tuning corresponding to the largest similarity value. By synthesizing two block signals by moving the interval, the same sound quality as the original sound can be maintained.

도면 1에서 a)는 원래의 신호를 표현한 것이고, b)c)d)e)는 원래의 신호를 일정한 크기의 window를 사용하여 오버랩 시키면서 잘라낸 파형을 나타낸다. h)i)j)는 c)d)e)의 신호를 b)c)d)와 각각 결합시키기 위해 도면의 실선사이의 미세조정된 간격으로 재배치하여 합성하면 k)와 같은 원신호와 유사한 음질을 가지는 속도가변된 신호를 얻을 수 있다.In FIG. 1, a) represents the original signal, and b) c) d) e) shows the waveform cut out while overlapping the original signal using a window having a constant size. h) i) j) combines the signals of c) d) e) with the fine-tuned intervals between the solid lines in the figure to combine with the signals b) c) d), respectively, A variable speed signal can be obtained.

음성속도 가변[참고문헌 d,e,f]에 이용되는 SOLA 알고리즘의 처리는 도면1에 표현된 방법으로 실행되지만, 복잡하고 많은 연산을 수행해야 하므로 실시간 처리에 어려움이 있다.Although the processing of the SOLA algorithm used for the variable voice speed [Ref. D, e, f] is executed by the method shown in Fig. 1, it is difficult to process in real time because it is complicated and must perform many operations.

상기한 바와 같은 방식을 통해 수행되는 디지탈 압축방식에서, 디지탈 형태로 저장하는 경우 압축율이 높은 방식을 사용하고 있지 않다. 그 이유로는 압축된 데이타를 실시간으로 복원하는데 어려움이 있기 때문이다.In the digital compression method performed through the above-described method, the digital compression method does not use a high compression rate method when storing in a digital form. The reason is that it is difficult to restore the compressed data in real time.

또한, 음성데이타를 아무런 압축없이 저장하면 방대한 메모리를 필요로하기 때문에 휴대용 학습기에서는 문제가 된다. 그래서, 휴대용 학습기의 메모리에는 적은 용량의 음성데이타가 저장되기 때문에, 다량의 어학학습 내용이 수록될 수 없다는 문제점이 있다. 그리고, 개인용 컴퓨터로 음성데이타를 다운로드 받아올 때도 압축기법을 이용하지 않으면 통신 시간으로 인한 문제점이 발생된다.In addition, storing voice data without any compression requires a large memory, which is a problem in a portable learner. Thus, since a small amount of voice data is stored in the memory of the portable learner, a large amount of language learning contents cannot be stored. In addition, even when the voice data is downloaded to the personal computer, if the compressor method is not used, a problem due to communication time occurs.

또한, 종래의 단순한 부호화/복호화 작업에 의해서는, 데이타가 손상되어 그 음질이 변하게 되어 어학학습의 목적을 달성할 수 없게 된다는 문제점도 있다.In addition, a conventional simple encoding / decoding operation has a problem that data is damaged and its sound quality is changed, thereby making it impossible to achieve the purpose of language learning.

그리고, 종래의 어학학습기는 어학학습자의 능력을 고려한 학습방법을 제공하지 못하고 있다. 즉, 고급자인 경우에는 정상속도보다 휠씬 빠른 속도로 플레이시켜 준다면 학습효과가 더욱 배가되고, 초급자 및 학습부진자인 경우에는, 어학데이타를 아주 천천히 플레이시켜야 하지만, 종래의 어학학습기는 이러한 기능을 제대로 제공하고 있지 못하다는 문제점 있다.In addition, the conventional language learner does not provide a learning method considering the ability of the language learner. In other words, if you play at a higher speed than normal speed for advanced users, the learning effect is more doubled, and for beginners and underachievers, the language data should be played very slowly, but the conventional language learner provides such a function properly. There is a problem that you are not doing.

따라서, 본 발명의 목적은, 상기한 문제점을 해결하기 위해 안출된 것으로서, 휴대장치나 PC와 같은 고정장치에서 MP3 방식으로 압축된 음성데이타를 받아서 음질의 손상없이 속도를 가변할 수 있으며, 어학학습 데이타의 속도가변, 음정변환 및 문자캡션 기능이 구현될 수 있는 어학학습장치 및 이의 부호화/복호화 방법을 제공하는 것이다.Accordingly, an object of the present invention is to solve the above problems, it is possible to vary the speed without damaging the sound quality by receiving the voice data compressed by the MP3 method in a fixed device such as a portable device or a PC, language learning It is to provide a language learning apparatus and a method of encoding / decoding the same, in which a speed change of data, a pitch conversion, and a character caption function can be implemented.

도 1 - 본 발명에 따른 MP3 부호화/복호화방법의 SOLA 알고리즘의 실행예를 도시한 도면.1 is a diagram showing an example of the implementation of the SOLA algorithm of the MP3 encoding / decoding method according to the present invention.

도 2 - 본 발명에 따른 어학학습장치의 개략적 구성도.2-schematic configuration diagram of a language learning apparatus according to the present invention.

도 3 - 도 2의 부호화기의 세부상세도.3-detail of the encoder of FIG.

도 4a - 도 2의 어학학습장치의 bit-stream format을 도시한 도면.4A to 2B illustrate the bit-stream format of the language learning apparatus of FIG.

도 4b - 도 2의 어학학습장치의 ancillary data 포맷을 도시한 도면.4B-an ancillary data format of the language learning apparatus of FIG.

도 5 - 도 2의 복호화기의 세부상세도.5-detail of the decoder of FIG.

도 6 - 본 발명에 따른 MP3 부호화/복호화방법의 순서도.6 is a flowchart of an MP3 encoding / decoding method according to the present invention.

도 7 - 샘플값을 구하기 위한 도면.7-A diagram for obtaining sample values.

도 8 - 음정변환 및 속도변환부의 순서도.8 is a flowchart illustrating a pitch conversion and a speed converter.

상기한 목적을 달성하기 위해 본 발명에 따른 어학학습장치의 부호화/복호화 방법은, 오디오 데이타를 MP3 방식으로 압축하여 부호화하는 제 1단계; 압축된 데이타를 복호화하는 제 2단계; 복호화된 데이타의 음정 및 속도를 가변시키는 제 3단계를 포함하는 것을 특징으로 한다.In order to achieve the above object, an encoding / decoding method of a language learning apparatus according to the present invention comprises: a first step of compressing and encoding audio data using an MP3 method; A second step of decoding the compressed data; And a third step of varying the pitch and speed of the decoded data.

제 1단계는 오디오 데이터를 받아들여서 MP3 형태로 부호화 한 후에 어학학습기에 필요한 부가정보를 추가한다. 부가정보에는 음성문장, 문자문장, 영상/문자, 음성속도, 외부제어, 암호화 등에 관한 정보를 포함하고 있다. 이 단계는 On-line 이나 Off-line으로 구현할 수 있다.In the first step, the audio data is received and encoded in MP3 format, and additional information necessary for the language learner is added. The additional information includes information on voice sentence, text sentence, image / character, voice speed, external control, encryption, and the like. This step can be implemented on-line or off-line.

제 2 단계는 MP3 형태로 부호화된 어학학습 데이터를 버퍼(Buffer)에 저장한 후 복호화를 한다. 복호화를 함에 있어서 기존의 MP3 부호화기를 그대로 사용하는 것이 아니고 어학학습기에 맞게 적절한 변형을 통하여 11.025/22.05 KHz 샘플링 주파수를 추가로 지원하게 하고, 음성과 음악이 혼재되어 있는 오디오 데이터를 음성을 강조하거나 음악을 강조하게 한다. 어학 학습용 오디오 데이터는 11.025/22.05KHz 샘플링 주파수와 16비트 데이터로써 충분한 음질을 보장할 수 있으므로 11.025/22.05KHz 샘플링 주파수를 지원하게 함으로써 44.1KHz 샘플링 주파수 데이터에 비하여 압축율을 증가 시킬 수 있다. 시스템 제어기는 학숩기 제어정보를 받아서 전반적인 제어신호를 발생한다. 학습기 제어정보는 속도, 음정, 증폭도, 문자/영상, 반복 등에 관한 것을 포함하고 있다. 또한 변형된 MP3 복호화기는 문자/영상 관련정보를 입력 데이터로부터 추출하여 제 3단계 문자/영상 처리기로 보낸다.In the second step, the language learning data encoded in the MP3 format is stored in a buffer and then decoded. In decoding, instead of using the existing MP3 encoder, it is possible to additionally support the 11.025 / 22.05 KHz sampling frequency through appropriate modification for the language learner, and to emphasize the voice or to emphasize the audio data with the mixed audio and music. To emphasize. The audio data for language learning can guarantee sufficient sound quality with 11.025 / 22.05KHz sampling frequency and 16-bit data, so that the compression rate can be increased compared to 44.1KHz sampling frequency data by supporting 11.025 / 22.05KHz sampling frequency. The system controller receives academic control information and generates an overall control signal. Learner control information includes speed, pitch, amplification, text / image, repetition, and the like. In addition, the modified MP3 decoder extracts text / image related information from input data and sends the same to the third stage text / image processor.

제 3단계는 복호화된 음성데이타나 일반 오디오 데이터를 받아서 음질의 왜곡없이 속도와 음정을 변화 시킨다. 본 발명에서는 속도/음정 변화를 위하여 SOLA 알고리즘을 이용하지만 다른 알고리즘을 사용할 수도 있다. 변형된 MP3 디코더에서 속도/음정 가변기로 데이터를 넘길 때 약간의 음질 저하를 감수한다면 Down-Sampling하는 방법도 사용할 수 있다. Down-Sampling 방식은 집적회로를 이용하여 본 발명을 구현하는 경우에 메모리 용량을 줄일 수 있는 장점이 있으며, 계산속도가 느린 고정장치를 위해서도 필요하다.The third step receives the decoded voice data or general audio data and changes the speed and pitch without distortion of the sound quality. In the present invention, the SOLA algorithm is used for the speed / pitch change, but other algorithms may be used. You can also use the down-sampling method if you have a slight degradation of the quality when passing the data from the modified MP3 decoder to the speed / pitch variable. The down-sampling method has an advantage of reducing memory capacity when implementing the present invention using an integrated circuit, and is required for a fixed device having a slow calculation speed.

본 발명에 따른 어학 학습장치는 오디오 데이터를 MP3 방식으로 압축하는 부호화기; 부호화된 데이터와 어학학습기 제어정보를 혼합하는 데이터 포맷기; 부호화된 데이터를 저장하는 버퍼; MP3 방식으로 복원하면서 추가로 어학학습기 구현에 필요한 기능을 수행하는 Modified MP3 Dec; 음성의 속도와 음정을 가변시키는 속도/음정 가변기; 학습기 제어정보를 받아서 학습기를 제어하는 시스템 제어기; 음성의 크기를 조절하는 증폭기; 문자와 영상을 처리하는 문자/영상 처리기를 포함하는 것을 특징으로 한다.The language learning apparatus according to the present invention comprises: an encoder for compressing audio data in an MP3 manner; A data formatter for mixing coded data and language learner control information; A buffer for storing encoded data; Modified MP3 Dec, which performs the functions necessary to implement the language learner while restoring to the MP3 method; A speed / pitch variable for varying the speed and pitch of speech; A system controller that receives the learner control information and controls the learner; An amplifier for adjusting the volume of voice; And a text / image processor for processing text and images.

이에 따라, MP3 방식으로 압축된 음성데이타를 받아서 음질의 손상없이 어느정도의 범위 내에서 속도가 가변될 수 있으며, 어학학습 데이타의 속도가변, 음정변환 및 문자캡션 기능이 구현될 수 있게 된다.Accordingly, the speed can be varied within a certain range without damaging the sound quality by receiving the voice data compressed by the MP3 method, and the speed change of the language learning data, the pitch conversion, and the character caption function can be implemented.

그리고, 어학학습장치의 가격이 비교적 가격이 저렴해지게 되고, 어학학습자의 학습능력을 고려하여 속도가변이 가능하고 문자캡션기능이 구비될 수 있게 된다.In addition, the price of the language learning apparatus is relatively low price, the speed can be changed in consideration of the learning ability of the language learners and the character caption function can be provided.

이하 첨부도면을 참조로 하여 본 발명에 따른 어학학습장치의 부호화/복호화방법을 더욱 상세하게 설명한다.Hereinafter, the encoding / decoding method of the language learning apparatus according to the present invention will be described in more detail with reference to the accompanying drawings.

도 2는 본 발명에 따른 어학학습장치의 개략적 구성도이다. 이 도면에 도시된 바와 같이, 어학학습장치는, MP3 부호화, 데이터 포맷기, 버퍼, 증폭기, 변형된 MP3 복호화기, 음정 및 속도가변기, 영상 및 문자처리기, 시스템 제어기 등으로 구성되어 있다.2 is a schematic configuration diagram of a language learning apparatus according to the present invention. As shown in the figure, the language learning apparatus is composed of an MP3 encoder, a data formatter, a buffer, an amplifier, a modified MP3 decoder, a pitch and speed variable, an image and a character processor, a system controller, and the like.

본 발명의 MP3 부호화기는, 종래의 MP3 부호화기와 데이터 포맷기로 구성된다. MP3 부호화기에서 부호화된 데이터에 각종 부가정보(음성데이타 문장, 음성데이타문장을 구성하는 문자정보, 영상/문자, 음성속도, 암호, 외부제어)를 추가하여 어학학습용 자료를 만든다. 이 과정은 On-line 이나 Off-line으로 구현할 수 있다. 특히 부호화 과정에서 종래의 MP3 부호화기에서 지원하지 않는 11.025/22.05KHz 샘플링 주파수를 지원하게 한다. 일반적인 MP3 부호화기에 제어데이타, 문자 및 배경영상 삽입블럭을 첨가한 것으로 어학학습 데이타를 off-line으로 처리할 수 있도록 되어 있다.The MP3 encoder of the present invention is composed of a conventional MP3 encoder and a data formatter. A variety of additional information (voice data sentence, text information constituting voice data sentence, video / character, voice speed, encryption, external control) is added to the encoded data in the MP3 encoder to make language learning materials. This process can be implemented on-line or off-line. In particular, the encoding process supports the 11.025 / 22.05KHz sampling frequency which is not supported by the conventional MP3 encoder. It adds control data, text and background image insertion block to general MP3 encoder, so that language learning data can be processed off-line.

본 발명의 어학학습기의 복호화기는, 변형된 MP3 복호화기, 음정 및 속도가변기, 영상 및 문자 처리기, 시스템 제어기, 반복학습을 지원하기 위한 BUFFER, 증폭기로 구성된다. 따라서, 복호화기는 문자 및 배경영상, 음성신호의 속도가변 및 음정변환 기능을 가지며 아래에 기술된 5가지의 모드(mode)로 동작한다. 즉, 5가지의 모드는, 어학학습용 음성데이타를 이용한 속도가변이나 음정가변 구동 모드, 음악데이타에 대한 MP3 디코더 기능모드, 음악신호의 고주파 성분을 감쇄시켜 음성속도가변이나 음정변환을 수행하는 모드, 음악과 음성이 혼합된 데이타의 저주파성분을 감쇄시켜 음악신호를 주로 청취하는 모드, 상기 네가지 모드 동작을 수행하면서, 배경영상이나 문자 처리를 수행하는 기능모드로 구성된다.The decoder of the language learner of the present invention is composed of a modified MP3 decoder, a pitch and speed variable, an image and text processor, a system controller, a BUFFER to support repetitive learning, and an amplifier. Therefore, the decoder has a function of variable speed and pitch conversion of text, background image, and voice signal, and operates in the five modes described below. That is, the five modes include a speed variable or pitch variable drive mode using language learning voice data, an MP3 decoder function mode for music data, a mode of performing a voice speed variable or pitch conversion by attenuating high frequency components of a music signal, It consists of a mode in which a low frequency component of data mixed with music and voice is attenuated, and a mode in which a music signal is mainly listened, and a function mode for performing a background image or text processing while performing the four mode operations.

상기의 5가지 동작모드를 차례로 살펴보면 다음과 같다.Looking at the five operating modes in sequence as follows.

1. 어학학습용 contents를 이용한 속도 가변 및 음정변환 장치를 구동시키는 모드1. Mode to operate variable speed and pitch conversion device using contents for language learning

어학학습기 부호화기에서는 종래의 MP3 알고리즘으로 부호화되는 어학용 데이타에 문자 및 배경영상, 제어데이타를 삽입하고, 어학학습기 복호화기에서는 어학용 데이타의 속도가변이나 음정변환과 문자 및 배경영상을 처리한다.The language learner encoder inserts characters, background images, and control data into language data encoded by a conventional MP3 algorithm, and the language learner decoder processes speed variations, pitch transformations, and character and background images of language data.

도면2의 1)2)에서는 부호화시 다양한 어학학습용 부가데이타를 첨가하며, 3)은 반복 청취 모드를 지원하기 위한 buffer를 나타낸다. 이 버퍼는 복원되기 전의 압축된 데이터를 저장하고 있다. 4)는 변형된 MP3복호화기로 어학학습용 음성데이타를 복호화하면서 다음과 같이 동작한다.In 1) and 2) of FIG. 2, additional data for various language learning are added at the time of encoding, and 3) shows a buffer for supporting a repeat listening mode. This buffer stores the compressed data before it is restored. 4) decodes the audio data for language learning using the modified MP3 decoder and operates as follows.

음성의 경우 고주파 성분이 거의 없으므로 어학 학습모드로 동작시킬 경우에는 저주파 신호만 이용하면 된다. 따라서 먼저 MP3 decoding 과정에서 처리하는 32개의 주파수 영역에서의 subband sample 중에서 하위 8개 또는 16개의 subband sample에 대해서만 값을 계산해주고 상위 24개 또는 16개 subband sample은 0이나 작은 값을 곱하여서 그 값을 감소시킨다. 이에 따라, 저주파 신호만 남게 되어 계산량은 감소한다. 상기의 저주파 성분에 대한 고려사항은 subband별 계산과정이나 subband의 합성 과정에서 저주파 영역의 8개나 16개의 subband sample만 이용하여 합성하면 된다. 따라서 종래의 MP3 디코더에서는 32KHz/ 44KHz/ 48KHz만 지원하게 되어 있지만 어학모드를 위하여 11.025KHZ/ 22.05KHz 샘플링된 데이터에 대해서도 복원할 수 있게 한다. 5)는 시스템 제어블럭으로 반복학습모드 키를 입력받아 메모리에 저장된 문장의 시작번지를 저장하여 buffer로 반복제어 명령을 보낸다. 도2의 6)은 음성신호의 음정 및 속도가변 처리를 수행한다. 도2의 7)은 음성신호 증폭을 위한 증폭기를 나타내며, 8)9)는 시청각 교육을 위한 부가데이타를 처리하는 블럭으로 영상 및 문자 처리를 수행한다.Since there are few high-frequency components in speech, only low-frequency signals need to be used when operating in language learning mode. Therefore, first of all, the value is calculated only for the lower 8 or 16 subband samples among the 32 subband samples processed in the MP3 decoding process, and the upper 24 or 16 subband samples are multiplied by 0 or a small value. Decrease. As a result, only the low frequency signal remains, and the calculation amount is reduced. The above considerations for low frequency components need to be synthesized using only 8 or 16 subband samples in the low frequency region during subband calculation or subband synthesis. Therefore, although the conventional MP3 decoder supports only 32KHz / 44KHz / 48KHz, it is possible to recover the 11.025KHZ / 22.05KHz sampled data for the language mode. 5) is a system control block that receives the repetition learning mode key and stores the start address of the sentence stored in the memory and sends the repetition control command to the buffer. 6, FIG. 2 performs pitch and speed varying processing of a voice signal. 2, 7) shows an amplifier for amplifying a voice signal, and 8) 9) a block for processing additional data for audiovisual education to perform image and text processing.

2. 음악데이타에 대한 MP3 디코더 기능모드2. MP3 decoder function mode for music data

휴대용 MP3 플레이어나 PC 상에서 MP3 디코더 프로그램과 동일한 기능을 수행하는 모드로 주로 음악데이타를 청취하기 위해 사용되며 일반적인 MP3 디코더 기능을 수행한다. 본 동작 모드에서 부호화기의 1)2)블럭은 첫 번째 동작모드와 동일하며, 4)는 종래의 MP3 복호화 과정을 수행하며, 시스템제어기 및 문자/영상처리기에 해당하는 5)8)9)의 동작은 첫번째 모드와 동일하다.This mode performs the same functions as a MP3 decoder program on a portable MP3 player or a PC. It is mainly used for listening to music data and performs a general MP3 decoder function. In this operation mode, the 1) 2) block of the encoder is the same as the first operation mode, and 4) performs the conventional MP3 decoding process, and 5) 8) 9) corresponding to the system controller and the character / image processor. Is the same as the first mode.

3. 음악신호의 고주파 성분을 감쇄시켜 저주파성분을 이용한 속도가변이나 음정변환 구동모드3. Speed variable or pitch conversion driving mode using low frequency component by attenuating high frequency component of music signal

배경음악이 있는 음성 데이타의 속도가변이나 음정변환을 위해 마련된 동작으로, 사용자의 요구에 의해 속도가변이나 음정변환에 관련된 외부키가 입력되면 음악데이타의 저주파 성분만을 이용하여 속도가변이나 음정변환을 실행한다. 본 동작의 모드는 도면2의 전체 시스템이 동작된다. 본 발명에서 음정 및 속도 가변을 위해 사용하는 SOLA 알고리즘은 음성데이타 영역인 저주파대역에서 가능하므로 고주파 성분을 제거한다. 본 발명에서 사용한 고주파의 제거 방법은 sub-band의 하위 8개만 decoding하고, 24개의 고주파 성분에 해당하는 sub-band는 decoding을 수행하지 않는다.It is an operation designed to change the speed or pitch of voice data with background music.If a foreign key related to speed change or pitch conversion is input at the request of the user, the speed variable or pitch conversion is executed using only the low frequency components of the music data. do. In this mode of operation, the entire system of Fig. 2 is operated. The SOLA algorithm used for variable pitch and velocity in the present invention is possible in the low frequency band, which is the voice data region, and thus removes high frequency components. In the method of removing high frequency used in the present invention, only the lower eight of the sub-bands are decoded, and the sub-bands corresponding to the 24 high frequency components are not decoded.

음성데이타의 에너지는 저주파 영역에 편중되어 있지만 음악신호의 에너지는 전체 주파수 대역에 고루 퍼져 있으므로 도 6의 4)에서 상위 24개 subband를 제거하면 전체 에너지가 감쇄되어 음량이 작아진다. 그러므로 처리된 저주파부분을 도 6의 7)로 증폭시키야만 원래 신호의 음량과 비슷하게 된다. 고주파 성분이 감쇄되고 저주파 성분 이용하여 속도/음정가변기에서는 속도 및 음정변환 기능을 수행하며, 감쇄된 전체 에너지를 증폭기를 사용하여 음량을 강화한다.The energy of the voice data is concentrated in the low frequency region, but the energy of the music signal is spread evenly over the entire frequency band. Therefore, when the upper 24 subbands are removed in FIG. 6, the total energy is attenuated and the volume is reduced. Therefore, the processed low frequency part must be amplified by 7) of FIG. 6 to be similar to the volume of the original signal. The high frequency component is attenuated, and the low frequency component is used to perform the speed and pitch conversion functions in the speed / pitch variable, and the volume is amplified by using the attenuated total energy.

4. 음악과 음성이 혼합된 데이타의 저주파성분을 감쇄시켜 음악신호를 주로 청취하는 모드4. Mode that listens to music signals mainly by attenuating low frequency components of data mixed with music and voice.

3번 모드와는 반대의 경우로 음악신호의 음성 주파수 대역을 감쇄시키고, 고주파 성분을 증폭 시킨후에 filter로 smoothing 처리하면 cristalize된 음악을 청취할 수 있다. 본 동작 모드는 3번모드와 유사하게 동작하지만 음악을 주로 청취하기 때문에 속도/음정가변기를 사용하지 않는다.In contrast to mode 3, the audio frequency band of the music signal is attenuated, the high frequency component is amplified, and smoothed with a filter to listen to cristalized music. This operation mode operates similarly to mode 3 but does not use the speed / pitch variable because it mainly listens to music.

5. 1,2,3,4 모드 동작을 수행하면서, 배경영상이나 음성데이타 문장의 문자표현 처리 모드5. Character expression processing mode of background image or voice data sentence while performing 1,2,3,4 mode operation

음성데이타 문장의 문자표현 및 배경영상 처리를 위한 동작모드는 시청각 교육효과를 높이기 위한 방법으로 음성데이타 문장 정보, 음성데이타 문장을 구성하는 문자 정보, 영상 부가 데이타를 어학학습기 부호화기의 도 6의 2)에서 삽입되며, 4)의 변형된 MP3 DEC에서 분리하고, 5)의 시스템 제어블럭과 8)9)의 영상 및 문자 처리기를 통하여 구현한다.The operation mode for text representation of speech data sentences and background image processing is a method for enhancing the audio-visual education effect of the audio data sentence information, the text information constituting the voice data sentences, and video additional data 2) of FIG. 6 of the encoder. It is inserted in, and separated from the modified MP3 DEC of 4), and implemented through the system control block of 5) and the image and text processor of 8) 9).

음성데이타 문장의 문자표현 처리는 부호화기에서 삽입된 부가데이타를 이용하여 문자의 시작 정보를 찾고, 연속적인 부가데이타를 이용하여 음성데이타 문장에 대한 문자의 갯수를 찾아낸다.The character expression processing of the voice data sentence uses the additional data inserted in the encoder to find the start information of the character, and finds the number of characters for the voice data sentence using the continuous additional data.

배경영상은 정지영상과 동영상으로 구분된다. 정지영상은 간단한 배경화면을 구성하고, 동영상은 저자의 강의모습과 상황 설명을 위해 사용된다. 영상 처리는 부호화시에 지정한 동/정지영상 지시자를 찾고 연속적으로 나타나는 어드레스 데이터를 이용하여 미리 저장된 영상 시작번지 Lookup-table를 참조한다. Lookup-table에서 참조된 데이타는 영상의 실제 시작번지를 가리켜서 영상데이타가 처리된다.Background image is divided into still image and moving image. Still images make up a simple background, and videos are used for the author's lecture and situation. The image processing finds the moving / still image indicator designated at the time of encoding and refers to the image start address Lookup-table stored in advance using address data that appears continuously. The data referenced in the lookup-table indicates the actual starting address of the image and the image data is processed.

어학 학습기용 부호화기 구현방안을 도 3을 참조로 설명하기로 한다.A method of implementing an encoder for a language learner will be described with reference to FIG. 3.

어학학습기 부호화기는 종래의 MP3 부호화기와 부가데이타 삽입을 위한 데이터 포맷기로 구성된다. MP3 부호화기의 각 동작은 참고a,b,c에 표현되어 있으며, 부가데이타 삽입은 각 프레임 마다 도 4a)의 ancillary data에 도 4b)의 반복 어학학습 모드를 위한 음성데이타 문장 정보와 음성데이타 문장을 구성하는 문자, 배경영상과 사용언어 선택을 위한 데이타와 외부 CPU의 명령어를 포함한다. 그리고 어학학습기를 위하여 다양한 샘플링 주파수도 지원한다. 도 3은 도 2의 부호화기 구성요소를 표현한 것으로 데이타의 삽입은 도면3의 MP3 부호화 과정의 bitstream-formating 과정 다음에 이루어진다. 부호화 과정은 On-line 및 Off-line으로 수행되며, 어학용 자료 및 음악 데이타를 부가데이타와 혼합하여 데이터 포맷기에서 부호화 한다. 본 발명의 부호화시에 삽입되는 정보를 도면4b에 나타내었다.The language learner encoder consists of a conventional MP3 encoder and a data formatr for inserting additional data. Each operation of the MP3 encoder is represented by reference a, b, and c. The additional data insertion is performed by adding the audio data sentence information and the voice data sentence for the repetitive language learning mode of FIG. 4 b to the ancillary data of FIG. It contains the characters, background image and data for selecting the language and instructions of the external CPU. It also supports various sampling frequencies for language learners. 3 is a representation of the encoder component of FIG. 2, and data insertion is performed after the bitstream-formating process of the MP3 encoding process of FIG. The encoding process is performed on-line and off-line, and the linguistic data and the music data are mixed with additional data and encoded in the data formatter. Information inserted in the encoding of the present invention is shown in Fig. 4B.

도 4a)는 mp3포맷의 bit-stream을 표현한 것이고, 도 4b)는 도 4a)의 ancillary data 부분에 어학학습용 문장, 문자 및 영상 지시자, 제어데이타를 표현하기 위한 포맷을 나타낸다.4A) shows a bit-stream of mp3 format, and FIG. 4B) shows a format for expressing language learning sentences, text and image indicators, and control data in the ancillary data portion of FIG. 4A).

음성데이타 문장 정보지시자는 어학 학습를 문장단위로 처리하기 위해 설정한 것으로, format 분석과정에서 ancillary data 부분의 음성데이타 문장의 정보지시자를 찾으면, 음성데이타 문장 단위의 시작과 끝, 문장을 구성하는 프레임 수를 알 수 있으므로 음성데이타 문장 단위로 반복 학습을 용이하게 할 수 있다.The voice data sentence information indicator is set to process language learning in sentence units. When the format analysis process finds the information indicator of the voice data sentence in the ancillary data part, the start and end of the voice data sentence unit and the number of frames constituting the sentence Since it can be seen that it can facilitate repeated learning in units of voice data sentences.

국적지시자나 국가코드는 음성데이타 문장을 구성하는 문자의 표기언어를 나타내며, 문자의 시작/끝 지시자를 format 분석과정에서 찾고 연속적으로 나타나는 문자 테이블 번지는 어학학습용 문자 데이타가 저장되어 있는 lookup-table 시작번지를 가리킨다. MP3는 1초 당 평균 32 frames으로 구성되므로 한 문자의 속도를 1/32 박자의 속도로 발음할 경우 1로 표현 할 수 있으며, 1/16 박자의 속도로 발음할 경우 2로 표현하는 방법을 이용하여 각 문자의 발음될 속도를 frame sync 수로 표현하였다. 배경영상은 정지영상과 동영상을 표현하고, 영상의 시작번지는 Lookup-table을 이용한 간접번지 지정 방식으로 각 영상의 시작번지를 가르킨다. 외부 cpu 명령어 지시자는 cpu의 특정 동작을 제어하기 위한 부분으로 cpu 명령어를 사용할 수 있도록 한다.The nationality indicator or country code indicates the language of the characters that make up the voice data sentence, and the start / end indicators of the characters are found during the format analysis process, and the character table address that appears continuously is the lookup-table start where the text data for language learning is stored. Point to a street address. MP3 consists of 32 frames per second on average, so the speed of a character can be expressed as 1 when pronounced at a rate of 1/32 beats, or 2 when pronounced at a rate of 1/16 beats. The speed at which each character is pronounced is expressed by the number of frame syncs. The background image expresses a still image and a moving image, and the starting address of the image indicates the starting address of each image by an indirect address designation method using a lookup-table. The external cpu command directives allow you to use cpu commands as part of control of specific cpu operations.

다음은, 어학학습기의 복호화기의 구현방안을 도 5를 참조로 설명하기로 한다.Next, an implementation plan of the decoder of the language learner will be described with reference to FIG. 5.

어학학습기 복호화기는 반복학습용 저장 버퍼, 변형된 MP3 복호화기, 속도/음정 가변기, 증폭기, 문자/영상처리기, 시스템제어기로 구성된다. 도5의 1)는 반복학습용 버퍼로 음성데이터를 문장 단위로 저장한다. 변형된 MP3 복호화기는 도5의 3)4)5)6)8)9)에 해당하는 종래의 MP3 복호화기와 2)의 부가데이타 분석기, 7)주파수 대역 선택기로 구성된다. 도 5의 10)에 해당하는 속도/음정가변기는 어학용 음성데이타를 음정 및 속도가변 처리하기 위한 것이고, 11)은 주파수 대역선택에 따른 음성신호의 에너지 보상을 위한 증폭기이다. 도 5의 12)15)16)은 문자/영상처리기, 13)14)는 시스템 제어기이다. 도 5의 3)4)5)6)8)9)에 해당하는 종래의 MP3 복호화기 동작은 참고a,b,c에 정리되어 있다.The learner decoder consists of a repeating learning storage buffer, a modified MP3 decoder, a speed / pitch variabler, an amplifier, a character / image processor, and a system controller. 5, 1) stores the voice data in sentence units as a repetition learning buffer. The modified MP3 decoder comprises a conventional MP3 decoder corresponding to 3) 4) 5) 6) 8) 9) of FIG. 5, an additional data analyzer of 2), and a 7) frequency band selector. The speed / pitch variable corresponding to 10) of FIG. 5 is for pitch and speed-variable processing of speech data for language, and 11) is an amplifier for energy compensation of a voice signal according to frequency band selection. 12) 15) 16) of FIG. 5 is a text / image processor, and 13) 14 is a system controller. The conventional MP3 decoder operation corresponding to 3) 4) 5) 6) 8) 9) of FIG. 5 is summarized in references a, b, and c.

도 6은 어학학습기 전체 복호화기 동작의 순서도이다. 순서도의 왼쪽은 변형된 MP3 복호화 과정과 음성속도/음정가변 알고리즘을 수행하는 루틴으로 동시에 실시간으로 처리되어야 하므로 어려움이 있다. 따라서 본 발명에서는 종래의 MP3와 음성속도/음정가변 알고리즘의 실시간 처리를 위한 처리 방법을 개선하고, 부가데이타를 처리하는 어학학습기를 구현한다.6 is a flowchart illustrating the operation of the entire language learner decoder. The left side of the flowchart is difficult because it needs to be processed in real time simultaneously with the modified MP3 decoding process and the voice speed / pitch variable algorithm. Therefore, the present invention improves the processing method for real-time processing of the conventional MP3 and voice speed / pitch variable algorithm, and implements a language learner for processing additional data.

도 6의 1)과정에서는 도 4의 1)에 해당하는 부호화된 데이타가 입력된다. 도 6의 2)과정에서는 부가데이타 분리기를 이용하여 메인 데이타와 ancillary data 부분의 삽입된 부가데이타를 분리한다. 메인 데이타는 상기의 1∼4까지의 동작모드를 수행하기 위해 도 6 순서도의 왼쪽 과정을 3)∼13)까지를 수행한다. 도 6의 2)과정에서 분리된 부가데이타에 지시자가 있다면 도 6의 14)∼21)를 수행하며, 지시자 다음의 연속적인 정보를 이용하여 음성데이타 문장, 음성데이타를 구성하는 문장의 문자표현, 배경영상 등의 부가 데이타를 처리한다. 분리된 부가데이타 부분의 음성데이타 문장의 시작 지시자가 있고 끝 지시자가 없다면 시작지시자를 기준으로 하여 문장이 아직 끝나지 않았음을 의미하고, 시작지시자가 없다면 종래의 MP3 복호화를 실행시켜 음악데이타를 청취가능하게 한다.In step 1) of FIG. 6, encoded data corresponding to 1) of FIG. 4 is input. In step 2) of FIG. 6, the additional data separator separates the inserted additional data of the main data and the ancillary data. The main data performs steps 3) to 13) on the left side of the flowchart of FIG. 6 to perform the above operation modes 1 to 4. If there is an indicator in the additional data separated in step 2) of FIG. 6, 14) to 21) of FIG. 6 are performed, and the voice data sentence, the character representation of the sentence constituting the voice data using continuous information following the indicator, Process additional data such as background image. If there is a start indicator and a stop indicator of the voice data sentence of the separated additional data part, it means that the sentence is not finished based on the start indicator. If there is no start indicator, the music data can be listened to by executing the conventional MP3 decoding. Let's do it.

이하에서는 변형된 MP3복호화기 구현방안을 상세하게 설명하기로 한다.Hereinafter, a modified MP3 decoder implementation will be described in detail.

도 6의 1)과정에서는 도 4의 a)형태로 부호화된 데이타가 입력되며, 각 frame은 sync와 side information과 main information으로 구성된다. 그런데 main information은 그 시작과 끝이 각 frame sync와 일치하지 않는다. 따라서 입력으로부터 sync와 side information을 제외하고 나머지 main information만 따로 저장해야만 main data를 연속으로 읽어 올 수 있다. 종래기술은 적어도 한 frame의 최대 main information 길이 이상의 buffering을 위한 메모리가 필요하다.In step 1) of FIG. 6, data encoded in the form a) of FIG. 4 is input, and each frame includes sync, side information, and main information. However, the main information does not match the start and end of each frame sync. Therefore, the main data can be read continuously only by saving the main information separately except sync and side information from the input. The prior art requires a memory for buffering more than the maximum main information length of at least one frame.

도 4a에서 한 frame의 main information은 두개의 granule로 구성되며, 본 발명에서는 한개의 granule을 buffering 한 후 복호화한 후에 다른 한개의 granule을 buffering하여 처리한다. 따라서 granule 별로 혹은 그 이하의 크기 단위로 필요한 경우에 buffering을 하므로 buffering을 위한 메모리의 용량을 반으로 줄일 수 있다.In FIG. 4A, the main information of one frame is composed of two granules. In the present invention, one granule is buffered and then decoded and then processed by another granule. Therefore, buffering is performed when necessary for each granule or smaller size unit, so the memory capacity for buffering can be reduced by half.

도 6의 4)과정에서는 huffman decoding을 수행한다. huffman decoding은 bit-stream의 side-information인 table-select[2:0]에 정해진 huffman code table을 이용하여 main data로부터 sample을 얻어낸다. main data로부터 한 비트씩 읽으면서 그 값이 huffman code에 해당하는 값이 나타날 때까지 계속해서 읽어나간다.In step 4) of FIG. 6, huffman decoding is performed. Huffman decoding obtains a sample from main data using a huffman code table specified in table-select [2: 0], which is a side-information of a bit-stream. Read bit by bit from the main data and continue reading until the value corresponds to the huffman code.

Huffman decoder는 huffman encoding된 data가 연속적으로 이어져 있기 때문에 한 비트씩 읽어가면서 huffman code table의 data와 비교를 하며 일치하는 값을 찾을 때까지 계속 data를 읽어 나간다. 종래의 기술은 bitstream에서 한 비트씩 읽어와서 그 값이 "0"인지 "1"인지를 비교해서 decoding을 수행한다. 본 발명의 도 6의 4)과정에서는 8bit(1 byte)의 data를 읽어서 한 비트씩 잘라 내지 않고, 읽어야 하는 비트에 도표2의 AND-mask를 씌운 후 그 값이 "0"인지 판별한다. 상기 동작은 비교 결과가 0인지 여부에 따라 미리 준비된 번지로 branch할 수 있는 명령어인 conditional branch를 쓰면 쉽고 빠르게 구현 되므로 huffman decoding 수행 시간을 줄여서 실시간 처리가 가능하게 해준다.The Huffman decoder reads data bit by bit, comparing the data in the huffman code table one by one, because the huffman-encoded data is continuously connected, and continues reading the data until a matching value is found. The conventional technology reads bit by bit from the bitstream and compares whether the value is "0" or "1" to perform decoding. In the process 4) of FIG. 6 of the present invention, 8 bits (1 byte) of data are read and not cut out by one bit, and the AND-mask of Table 2 is applied to the bits to be read, and then the value is determined to be "0". The above operation is implemented quickly and easily by using a conditional branch, which can be branched to a prearranged address depending on whether the comparison result is 0, thereby reducing the execution time of the huffman decoding, thereby enabling real-time processing.

표 2는 and-mask를 도시한 도표이다.Table 2 is a diagram showing and-mask.

MP3 복호화를 위해 도 6의 5) 과정에서 huffman code table을 이용하여 얻은 sample 값들(is)에 아래와 같은 일을 수행한다. 수학식 1,2,3에 표현된 변수는 참조a,b,c에 따른다.For MP3 decoding, the following is performed on the sample values (is) obtained by using the huffman code table in step 5) of FIG. Variables represented in Equations 1, 2, and 3 are based on references a, b, and c.

xr(i) = is(i)^4/3* gain * s calexr (i) = is (i) ^4/3 * gain * s cale

gain = 2^{0.25*(global_gain[gr]-64-8*sub블럭_gain[window][gr])} gain = 2 ^{0.25 * (global_gain [gr] -64-8 * subblock_gain [window] [gr])}

s cale = 2^{0.25*(1+ s calefac _ s cale[gr])*2*(- s calefac[cb][window][gr]-preflag[gr]*pretab[cb])} s cale = 2 ^{0.25 * (1+ s calefac _ s cale [gr]) * 2 * (-s calefac [cb] [window] [gr] -preflag [gr] * pretab [cb])}

is(i) : huffman decoding outputis (i): huffman decoding output

is로부터 xr을 구하기 위해서 수학식 1,2,3을 수학식 4,5,6,7,8로 변형하여, 실시간 처리를 위해 사용한다.In order to find xr from is, the equations 1,2 and 3 are transformed into equations 4,5,6,7,8 and used for real time processing.

gain = gain1*gain2gain = gain1 * gain2

gain1 = 2^{0.25*(global_gain[gr]-64} gain1 = 2 ^{0.25 * (global_gain [gr] -64}

gain2 = 2^{0.25*(-8*sub블럭_gain[window][gr])}= 2^{-0.5*(-4*sub블럭_gain[window][gr])} gain2 = 2 ^{0.25 * (-8 * subblock_gain [window] [gr])} = 2 ^{-0.5 * (-4 * subblock_gain [window] [gr])}

s cale = 2^{0.25*(1 + s calefac_ s cale[gr])*2*( - s calefac[cb][window][gr]-preflag[gr]*pretab[cb])} s cale = 2 ^{0.25 * (1 + s calefac_ s cale [gr]) * 2 * (-s calefac [cb] [window] [gr] -preflag [gr] * pretab [cb])}

s cale = 2^{-0.5*(1 + s calefac_ s cale[gr])*( s calefac[cb][window][gr]+preflag[gr]*pretab[cb])} s cale = 2 ^{-0.5 * (1 + s calefac_ s cale [gr]) * (s calefac [cb] [window] [gr] + preflag [gr] * pretab [cb])}

gain1을 구하기 위해서는 2^0.25*정수의 지수 계산이 필요하므로 Lookup-table 형태로 저장하여 수행시간을 단축한다. global_gain은 8bit로써 그 값이 0부터 255까지 변하므로 각각의 값에 해당하는 256 길이의 ROM table을 만들어서 저장하면 global_gain 값을 이용하여 바로 2^{0.25*(global_gain[gr]-64}을 얻을 수 있다. 본 발명에서는 -64대신에 -210을 global gain 미세 조정 간격으로 두었다. 한편, gain2와 scale을 구하기 위해서는 2^-0.5*정수의 계산 결과를 table 형태로 저장한다. 2^-0.5*정수의 범위는 0부터 64까지만 구별된다. 따라서 각각의 값에 해당하는 64 비트 폭의 lookup-table을 만들어서 바로 2^-0.5*정수를 얻을 수 있다.In order to obtain gain1, exponent calculation of 2 ^{0.25 * integer} is required. Therefore, the execution time is reduced by storing in Lookup-table form. The global_gain is 8 bits and its value varies from 0 to 255. Therefore, if you create and save a 256-byte ROM table corresponding to each value, you can get 2 ^{0.25 * (global_gain [gr] -64)} by using the global_gain value. According to the invention placed in place of -64 to -210 global gain fine adjustment interval. on the other hand, in order to obtain a scale gain2 and stores the calculation result of the second ^{-0.5 * constants} in table form in the range of 2 ^{-0.5 * integer} from 0 Only up to 64, so we can get a 2-bit ^{* integer} just by creating a lookup-table of 64-bit width for each value.

is(i)^4/3의 계산 결과를 table로 저장하여 is(i)의 값에 따라서 is(i)^4/3값을 얻는다. MP3 복호화기에서 표현된 is는 0부터 8191까지의 값으로 대부분의 is 값은 1024 이내이며 그 이상은 거의 발생하지 않으므로 1K word만 저장하여도 충분하다. is 값이 1024 이상인 경우는 is 값의 범위를 일정한 간격으로 구분하여 대표 값과 기울기를 저장하여 대표 값과 기울기를 이용하여 linear interpolation을 하여 나머지 다른 값을 근사화시킨다. 그러면 1K word 보다 조금 더 많은 Lookup-table만 필요하게 된다. 또한 Lookup-table의 용량을 줄이고자 할 경우에는 1024 대신 512나 256개만 저장하고 나머지는 상기의 방법으로 근사화시켜도 된다. 도 7 에서는 is가 1부터 8191까지 변할 경우의 is^4/3의 값을 나타내고 있다. 도 7에서 1500이상에서는 일정 구간내에서는 거의 직선을 유지하고 있음을 볼 수 있다. 따라서 직선 방정식을 이용하여 기울기와 대표값으로 나머지 값을 근사화시킬 수 있다.The result of calculating is (i) ^4/3 is stored in a table, and the value of is (i) ^4/3 is obtained according to the value of is (i). In the MP3 decoder, is is a value ranging from 0 to 8191. Most of the is values are within 1024, and more than that is rarely generated. If the is value is more than 1024, the range of the is value is divided at regular intervals to store the representative value and the slope, and linear interpolation using the representative value and the slope to approximate the other values. Then you just need a little more lookup-table than 1K word. In addition, if you want to reduce the capacity of Lookup-Table, you can store only 512 or 256 instead of 1024 and approximate the rest by the above method. In FIG. 7, the value of is ^4/3 when is changes from 1 to 8191 is shown. In FIG. 7, it can be seen that the straight line is maintained almost within a certain section. Therefore, the straight line equation can be used to approximate the rest of the values as the slope and the representative value.

어학학습기의 경우 음성 데이터는 높은 주파수 성분이 거의 없다. 따라서 11.025KHZ나 22.05KHZ로 샘플링하여도 충분하다. 부호화시에 이 경우를 지원하게 하기 위해서 심리음향모델을 수정하거나, 음성데이타를 44.1KHZ로 샘플링을 한 후에 9)의 주파수 대역선택기를 이용하여 상위(고주파)의 subband 값을 0으로 둔다. 11.025KHZ에 해당하는 경우는 하위(저주파) 8개의 subband 값만 취하고 나머지 상위 subband 값은 0으로 두면 되고 22.05KHZ에 해당하는 경우는 하위(저주파) 16개의 subband 값만 취하고 나머지 상위 subband 값은 0으로 두면 된다. 그러면 압축과정에서 상위 subband 값이 0이 되도록 값을 할당하면 압축율을 높일 수 있다. 이 경우 복호화시에 상위의 0인 subband에 대해서는 합성 과정을 수행하지 않고 하위의 subband에 대해서만 합성을 수행하므로 계산량면에서 상당한 이득을 얻을 수 있다. 이러한 과정을 통하여 보통의 경우 128Kbps의 전송률이 필요하지만 음성 데이터의 경우 64Kbps 정도의 전송률로도 고음질의 어학학습이 가능하다.In the case of a language learner, voice data has few high frequency components. Therefore, sampling at 11.025 KHZ or 22.05 KHZ is sufficient. To support this case in encoding, modify the psychoacoustic model, or sample the voice data at 44.1KHZ and set the upper (high frequency) subband value to zero using the frequency band selector of 9). In case of 11.025KHZ, only the lower (low frequency) 8 subband values should be taken and the remaining high subband values should be 0. In case of 22.05KHZ, only the lower (low frequency) 16 subband values should be taken and the remaining upper subband values should be 0. . Then, the compression rate can be increased by assigning a value such that the upper subband value becomes 0 during the compression process. In this case, since the synthesis is performed only on the lower subbands without performing the synthesis process on the upper zero subbands, significant gains can be obtained in terms of calculation amount. Through this process, 128Kbps transmission rate is usually required, but high-quality language learning is possible even at 64Kbps transmission rate for voice data.

도6의 13)은 어학학습기의 속도/음정가변을 위해 마련된 것으로 이의 구현방법은 도 8을 참조로 설명하기로 한다.13) of FIG. 6 is provided for changing the speed / pitch of the language learner, and an implementation thereof will be described with reference to FIG. 8.

도 8은 음성데이타의 속도가변 및 음정 변환을 수행하는 과정을 도시한 순서도이다. 도 8의 1)은 데이타 입력 과정으로 MP3 복호화기에 의해 복원된 PCM 데이타가 입력된다. MP3 복호화기는 11.025/22.05/32/44.1/48 KHz sampling rates 중 하나로 복원하여 출력한다. MP3 복호화와 음성속도 가변을 실시간으로 처리하기 위해서 많은 메모리가 필요하지만, 데이타 메모리 양을 줄이고 처리속도를 증가 시키기 위해서 처리해야할 data 양을 줄이는 방법을 고안한다. 종래의 MP3에서 지원하지 않는 11.025/22.5KHz sampling rates에 관하여 설명하면 다음과 같다. 사람의 목소리는 비교적 낮은 주파수를 가지므로 11.025/22.05KHz로 다운 샘플링해서 신호를 처리하여도 문제가 없다. Down-sampling은 4(11.025KHz) 또는 2(22.05KHz)개의 sample마다 1개를 취하는 방법이 사용된다. Down-sampling은 메모리 크기를 줄이고, 음성데이타인 경우 MP3 복호화기에서 출력된 데이타 1(11.025KHZ) 또는 2(22.05KHz)개가 본발명의 속도/음정변환기에 사용된 SOLA 알고리즘 처리에 필요한 1개의 데이타에 해당하므로 동기문제가 해결되는 잇점이 있다. 특히 이러한 방법으로 SOLA 알고리즘을 집적회로로 구현할 경우에는 메모리 용량을 대폭 줄이게 되어 상당한 효과를 가져온다. MP3 복호화기에서 1 granule(576 samples)씩 복원되어 출력되므로 down sampling하면 576/4=144(11.025KHz), 576/2=288(22.05KHz) 개의 samples만 속도/음정가변기에서 저장하여 처리한다.8 is a flowchart illustrating a process of performing variable speed and pitch conversion of voice data. 8) in FIG. 8, PCM data restored by the MP3 decoder is input as a data input process. The MP3 decoder restores and outputs one of the 11.025 / 22.05 / 32 / 44.1 / 48 KHz sampling rates. In order to process MP3 decoding and variable voice speed in real time, a lot of memory is required, but we devise a method to reduce the amount of data to be processed in order to reduce the amount of data memory and increase the processing speed. The 11.025 / 22.5 KHz sampling rates which are not supported by the conventional MP3 will be described as follows. Human voices have a relatively low frequency, so down-sampling to 11.025 / 22.05KHz is enough to process the signal. For down-sampling, one method is used for every 4 (11.025 KHz) or 2 (22.05 KHz) samples. Down-sampling reduces the memory size, and in case of voice data, one (11.025KHZ) or 2 (22.05KHz) data output from the MP3 decoder is used to process the SOLA algorithm used in the speed / pitch converter of the present invention. This is an advantage of solving the synchronization problem. In particular, when implementing the SOLA algorithm as an integrated circuit in this way, the memory capacity is greatly reduced, which brings a significant effect. Since 1 granule (576 samples) is restored and output from the MP3 decoder, when down sampling, only 576/4 = 144 (11.025KHz) and 576/2 = 288 (22.05KHz) samples are stored and processed in the speed / pitch variable.

도 8의 2)는 속도변환 및 음정변환에 관련하여 합성기준점, buffer 및 windows 크기 등의 근사값을 구하는 과정이다. 속도가변 조정 키가 외부로 부터 입력되면 음성속도 가변에 해당하는 근사화된 합성기준점(win_var)을 미리 저장한 Lookup-table을 이용하여 구한다. 일반적으로 SOLA 알고리즘에서 cross-corelation 계산에 필요한 buffer size는 window size의 두 배 정도이다. 그러나, 본 발명에서는 버퍼크기를 최소화하기 위해 data 처리 방식을 window size의 1/3만큼(win_fix)씩 처리하도록 구성하여 버퍼크기는 window size의 4/3만 있으면 되게 하였다.Referring to 2) of FIG. 8, an approximation value such as a synthesis reference point, a buffer, and a window size is obtained in relation to a velocity transformation and a pitch transformation. When the variable speed control key is input from the outside, it is obtained by using the Lookup-table which stores the approximate synthesized reference point (win_var) corresponding to the variable voice speed. In general, the SOLA algorithm requires about twice the buffer size for cross-corelation calculations. However, in the present invention, in order to minimize the buffer size, the data processing method is configured to process by 1/3 (win_fix) of the window size so that the buffer size only needs 4/3 of the window size.

속도가변에 따른 합성 기준점에 대한 미세 조정 간격은 cross-correlation으로 구해지지만 속도를 느리게 할 경우에는 미세 조정 간격을 합성기준점에서 왼쪽으로 편향되게 설정한다. 속도를 빠르게 할 경우에는 합성기준점에서 미세 조정 간격을 오른쪽으로 편향되게 설정한다. 미세조정간격을 좌우로 편향시키는 방법은 좌우의 모든 미세조정 간격에 대하여 유사도를 측정하지 않아도 되므로 계산량이 현격하게 감소된다.The fine tuning interval for the composite reference point due to the speed variation is obtained by cross-correlation, but when the speed is slow, the fine tuning interval is set to be biased to the left from the composite reference point. To speed up, set the fine tuning interval to the right at the composite reference point. The method of deflecting the fine tuning interval to the left and right does not need to measure the similarity for all the fine tuning intervals of the left and right, so the calculation amount is significantly reduced.

도 8의 3)은 종래의 SOLA와 같은 방법으로 window를 일정간격으로 이동시키면서 입력된 data를 일정한 크기로 잘려내는 과정이다.8 is a process of cutting the input data to a predetermined size while moving the window at a predetermined interval in the same manner as the conventional SOLA.

도 8의 4)과정은 합성기준점을 중심으로 최적화된 미세 조정 간격을 구하는 과정으로, 두 파형의 닮음 정도가 큰 최적화된 미세 조정 간격 지점에서 두 파형을 붙이게 된다면 비록 다소의 음질저하가 유발되지만 대체로 만족스러운 속도변환 결과를 얻을 수 있을 것이다. 미세 조정 간격을 조정하여 유사성의 측정하고자 하는 두개의 window 블럭이 overlap되는 구간에서 cross-correlation(Rxy)을 구한다. Rxy가 최대인 미세조정 간격을 찾아 그 위치에서 두신호를 합성하면 파형이 덜 일그러지고 tempo는 변화되면서 원형에 가까운 결과의 파형을 얻을 수 있다.Step 4) of FIG. 8 is a process of obtaining an optimized fine tuning interval based on the synthesis reference point. If two waveforms are attached at an optimized fine tuning interval point where the similarity of the two waveforms is large, although some sound quality deterioration is caused, generally, You will get satisfactory speed conversion results. By adjusting the fine tuning interval, cross-correlation (Rxy) is obtained at the interval where two window blocks to measure similarity overlap. Finding the fine tuning interval where Rxy is the maximum and synthesizing the two signals at that location results in a near-circular waveform with less distortion and tempo change.

SOLA 알고리즘의 최대 계산 시간은 Rxy를 구하는 과정에서 소모된다. 정해진 window size에서 overlap구간이 크다는말은 계산해야할 data 양이 많아 한 번의 Rxy를 구하는데 많은 시간이 소요됨을 의미한다. 따라서, 계산시간 단축을 위해서는 이 구간을 짧게 해야할 필요성이 있다. 본발명에서 사용한 Overlap 구간의 축소 방법은 파형의 특성-주기성-을 고려한 것이다. 즉, 두 window 블럭의 파형은 한 파형에서 생성되었으므로 overlap 부분 구간에서 큰 Rxy 값이면 대체로 같은 모양을 유지하므로 overlap 부분 구간의 나머지 부분에서도 높은 닮은도를 유지할 것이라 가정한다.The maximum computation time of the SOLA algorithm is spent in calculating Rxy. A large overlap section at a given window size means that it takes a lot of time to find one Rxy due to the large amount of data to be calculated. Therefore, it is necessary to shorten this section to shorten the calculation time. The reduction method of overlap section used in the present invention takes into account the characteristics of the waveform—periodicity. That is, since the waveforms of the two window blocks are generated from one waveform, it is assumed that a large Rxy value in the overlap part maintains the same shape, so that the similarity will be maintained in the rest of the overlap part.

Overlap된 부분 구간에서 Rxy를 구하더라도 Rxy는 덧셈, 곱셈, 나눗셈을 이루는 복합적인 연산을 수행하므로 여전히 계산량이 많다. 그래서 Rxy의 연산 횟수 줄이는 방법을 고안한다.Even if Rxy is found in the overlapped partial interval, Rxy still has a large amount of calculation because Rxy performs a complex operation of addition, multiplication, and division. So we devise a way to reduce the number of operations in Rxy.

Rxy = (ΣXY)²/ΣX²*ΣY² ^{Rxy = (ΣXY) 2 / ΣX} 2 * ΣY 2

일반적으로 음성 파형 data의 부호는 일정기간동안은 같은 부호를 유지하므로 일정한 비율로 몇 개의 sample 만을 뽑아 내어 수식을 계산하면 overlap 부분 구간 전체에 대해 계산하는 것과 같은 효과를 볼 수 있다.In general, since the sign of the voice waveform data is kept the same sign for a certain period of time, extracting only a few samples at a constant rate and calculating the formula can have the same effect as calculating the entire overlapped section.

나눗셈 연산의 실행속도를 높이기 위한 방법을 고안한다. 데이타가 16비트 샘플로 구성되면 샘플 X,Y를 각각 2¹⁶으로 정규화한 후 수학식 9의 분자에서 분모를 뺄셈을 하고, X과 Y가 가지는 데이타 값의 "leading non zero"-1에 해당하는 값 만큼 우측으로 shift 시켜 Rxy 값을 서로 비교하면 0에 가까운 결과값이 닮은도가 높아진다는 것을 알 수 있다. 정규화 과정과 "leading-nonzero"-1은 shift 연산으로 쉽게 구현되며, 나머지는 뺄셈연산으로 구현된다. 또한 Rxy를 구하는데 있어 분자항의 두 신호곱이 계속적으로 음수일 경우는 두 window 블럭간에 대응되는 데이타는 서로 다른 부호를 가지므로 미세 조정 간격을 변환하여 다음을 수행해도 무방하다.We devise a way to speed up the division operation. If the data consists of 16-bit samples, normalize the samples X and Y to 2 ¹⁶ , and then subtract the denominators from the numerator of Equation 9, and correspond to the "leading non zero" -1 of the data values of X and Y. By shifting the Rxy values to the right as much as the value, we can see that the result close to 0 increases the similarity. The normalization process and "leading-nonzero" -1 are easily implemented with the shift operation, and the rest are subtracted. In addition, when two signal products of a molecular term are continuously negative in obtaining Rxy, since the data corresponding to the two window blocks have different codes, the fine tuning interval may be changed and the following may be performed.

본발명에서 미세조정간격을 구하기 위한 시간 감소방법을 고안하였다. 미세조정간격의 각 위치마다 다른 결과의 Rxy 값이 나오겠지만, 바로 이웃한 경우의 미세 조정 간격 위치에서는 비교적 근사한 값의 Rxy 값을 가지는 경향이 있다. 이를 이용하여 검색하는 미세 조정 간격 point를 1-point 씩이 아닌 2-point 씩 이동하여 검색하도록 하는 방식을 고안한다. 또한 임계값을 적용하여 일정이상의 Rxy이 발생하면 최적의 미세 조정간격으로 선정하여 미세조정간격을 구하는 시간을 감소시킨다.In the present invention, a time reduction method was devised to find the fine tuning interval. The Rxy value of the different result will be generated at each position of the fine tuning interval, but tends to have a relatively approximate Rxy value at the fine tuning interval positions in the immediate neighboring case. Using this, we devise a way to search by retrieving fine-tuning interval points by 2-point instead of 1-point. In addition, if more than a certain amount of Rxy occurs by applying the threshold value, it is selected as the optimum fine tuning interval, which reduces the time for obtaining the fine tuning interval.

도 8의 5)과정에서는 도 9의 4)에서 찾은 미세 조정 간격을 바탕으로 연속된 두 블럭을 합성한다.In the process 5) of FIG. 8, two consecutive blocks are synthesized based on the fine adjustment interval found in FIG. 9 4).

도 8의 6)과정에서는 data출력을 위한 buffer 공간을 설정한다. 많은 양의 data가 저장되어 있다가 sound 출력으로 나간다면 속도/음정가변기에도 시간적인 부하를 줄여 전체 시스템이 좀 더 안정적으로 동작하게 된다. 실제 buffer size는 데이타를 출력하는 sampling rate에 따라서 달라지게 된다.In step 6) of FIG. 8, a buffer space for data output is set. If a large amount of data is stored and then goes to the sound output, the entire system operates more stably by reducing the time load on the speed / pitch variable. The actual buffer size depends on the sampling rate at which the data is output.

부가데이타 처리부를 설명하기로 한다.The additional data processing unit will be described.

부가데이타 처리부는 음성데이타 문장정보, 음성데이타 문장의 문자표현 정보 및 정지/동영상 정보, 시스템 제어 및 반복학습 기능을 지원한다. 본 어학학습기가 사용하는 부가데이타는 도 4의 b)에 표현되었다. 음성데이타 문장의 시작 지시자는 start of heading 문자인 hex(01)을 사용하고, 끝 지시자로는 "End of Transmission" 문자인 hex(04)를 사용한다. 음성데이타 문장의 시작 및 끝 지시자는 음성데이타의 한 문장 시작과 끝을 의미하는 지시자로 어학데이타의 문장의 시작번지를 시스템 제어부에서 해석하여, 음성데이타 문장 끝 지시자를 만날때 까지 frame sync 수를 count하여 반복학습모드 동작을 위해 데이타를 반복모드 버퍼에 저장한다.The additional data processor supports voice data sentence information, text representation information of voice data sentence, still / video information, system control and repetitive learning functions. Additional data used by the language learner is represented in b) of FIG. 4. The start indicator of the voice data sentence uses hex (01), which is the start of heading character, and the hex (04), which is the character of "End of Transmission", as the end indicator. The start and end indicators of a voice data sentence are indicators of the beginning and end of a sentence of voice data. Stores data in a repeat mode buffer for repeat mode operation.

음성데이타 문장을 구성하는 문자의 표현 방법은 현재 ISO[참고문헌 g,h]에 규정되어 있다. ISO에서는 문자를 표현하기 위해 국제적으로 사용되고 있는 ASCII 코드를 ISO-646, ISO-2022이라는 규격으로 정의하고 있다. 본 어학학습기의 문자표현은 IOS-646, ISO-2022를 사용한다. 본 어학학습기에서 한 문장을 구성하는 문자의 표현을 위해 사용되는 부가데이타로는 사용언어를 나타내는 국적지시자/국가코드와 음성데이타 문장을 구성하는 문자의 시작/끝 지시자와 하나의 문장을 구성하는 문자갯수가 있다. 국적을 구분하는 국적지시자는ASCII 코드의"acknowledge"에 해당하는 hex(06)를 사용하며 국가코드에는 8bit할당하여 국가를 구분한다. 국가 코드에 관계없이 음성데이타 문장을 구성하는 문자의 시작은 ASCII 코드의 "start of text"인 hexa(02)를 charater 헤더로 사용하고, 문자의 끝을 나타내기 위해서 ASCII 코드의 "end of text"인 hexa(03)을 전체 문자 열의 끝으로 인식한다. 문자의 시작 지시자에 연속되는 부가데이터는 문자만을 저장한 lookup-table번지를 의미한다. 해당 언어의 문자를 인식하면 문자가 저장된 lookup-table과 Font lookup-table을 이용하여, 부가데이타에 포함된 음성데이타 문장을 구성하는 문자 개수 만큼을 실제 화면에 디스플레이 한다.The method of expressing the characters constituting the voice data sentence is currently defined in ISO [Ref. G, h]. ISO defines internationally used ASCII codes to represent characters in the ISO-646 and ISO-2022 standards. IOS-646 and ISO-2022 are used for the text representation of the language learner. The additional data used to express the characters constituting a sentence in the language learner includes the national indicator / country code indicating the language used and the start / end indicator of the characters constituting the voice data sentence and the characters constituting one sentence. There is a number. Nationality indicator to distinguish nationality uses hex (06) which corresponds to “acknowledge” of ASCII code and assigns 8bit to country code to distinguish countries. Regardless of the country code, the beginning of the characters that make up the voice data sentence use hexa (02), the "start of text" of the ASCII code, as the charater header, and the "end of text" of the ASCII code to indicate the end of the character. Hexa (03) is recognized as the end of the entire string. Additional data following the start indicator of a letter means a lookup-table address that contains only the letter. When the character of the language is recognized, the number of characters constituting the voice data sentence included in the additional data is displayed on the actual screen by using the lookup-table and the font lookup-table in which the characters are stored.

속도 지시자는 문자의 발음속도 및 음악의 박자를 나타내기 위한 것으로 ASCII 코드의 "Bell"에 해당하는 hex(07)로 할당한다. 속도는 한개의 문자에 대해 1/32 박자(0.031초)를 frame sync 하나에 해당하는 1로 할당하며, 1/16박자(0.063초)인 경우 2로 할당된다. 시스템제어부에서 속도로 표현되는 frame sync 갯수를 count하여 한 문자씩 처리한다.The speed indicator is used to indicate the pronunciation speed of the character and the time signature of the music, and is assigned to hex (07) corresponding to the "Bell" of the ASCII code. The speed is assigned 1/32 beats (0.031 seconds) to 1 for one frame sync for one character, or 2 for 1/16 beats (0.063 seconds). The system controller counts the number of frame syncs expressed in speed and processes them one by one.

어학학습기의 영상은 동영상과 정지 영상으로 구분되며, 정지영상의 경우 여러가지 영상의 시작번지, 동영상일 경우 여러가지 동영상의 각각의 첫번째 영상데이타의 시작 번지를 먼저 Lookup-table에 저장하고 lookup-table에 저장된 번지를 이용하여 실제 영상데이타 저장매체를 액세스한다.The image of the language learner is divided into a video and a still image.In the case of a still image, the start address of each video is stored in the lookup-table and the start address of each first video data of the various videos in the lookup-table. The address is used to access the actual video data storage medium.

외부 cpu 명령어를 부호화된 데이타에 첨가하여 특정한 frame 순간에 branch, interrpt 등을 설정하거나 효과음 처리및 영상효과 처리를 위한 프로세서의 시작 순간이나 동작을 기술하기 위해 외부 cpu 명령어 지시자를 도면 12에 표현한다. 외부 cpu 명령어 지시자는 ASCII 코드의 "Substitute"에 해당하는 hex(1a)로 할당한다.An external cpu command indicator is illustrated in FIG. 12 to add branch cpu commands to encoded data to set branches, interrpts, and the like at a specific frame moment, or to describe the starting moment or operation of the processor for effect sound processing and image effect processing. The external cpu command directive is assigned to hex (1a), which is the "Substitute" in ASCII code.

이에 따라, 디지탈 신호처리 기법을 동원하여 광범위한 영역에서 속도 가변이 이루어지게 된다. 또한, 반도체 기술과 개인용 컴퓨터 기술의 발달로 고압축율을 가진 방식으로 압축된 데이타일지라도 고도의 디코딩 기법을 동원하면 실시간 처리가 가능하게 된다. 그리고, 종래에 없었던 MP3 데이타의 음정 및 속도 가변을 위한 실시간 처리 어학학습 시스템이 개발됨으로써, 부가적인 어학학습 시스템의 기능으로 문자 및 배경영상과 영상 effect를 추구하며 MP3 오디오 신호와 문자간의 동기 문제를 해결할 수 있게 된다.Accordingly, digital variable signal processing techniques are employed to achieve speed variation in a wide range of areas. In addition, due to the development of semiconductor and personal computer technologies, even highly compressed data can be processed in real time even with high decoding techniques. In addition, the development of a real-time processing language learning system for variable pitch and speed of MP3 data, which is not available in the past, pursues text, background and video effects as a function of an additional language learning system, and solves the problem of synchronization between an MP3 audio signal and a character. It can be solved.

상기한 어학학습기는 다양한 저장 미디어를 사용할 수 있으며, 시간 및 공간의 제약을 많이 받지 않는 새로운 개념의 어학학습시스템이다. 또한 PC를 이용하여 통신망을 통한 저가의 어학 학습시스템을 구성할 수 있는 새로운 어학학습기 개념으로 급격히 발전하는 통신망을 감안하면 온라인으로 어학 학습기의 사용은 급격히 중가할 것이다. 본 발명은 MP3을 위한 음성 및 음악 데이타 처리 뿐만 아니라, 다른 압축 알고리즘으로 구현된 음성 신호의 속도 및 음정 변화 뿐만 아니라 음악 데이타에 대한 처리 방법에 대해서도 본 발명의 다양한 처리 부분을 참고하여 사용할 수 있다.The language learner can use various storage media and is a new concept language learning system that is not restricted by time and space. In addition, the use of language learners online will rapidly increase in view of the rapidly evolving network as a new language learner concept that can construct a low cost language learning system using a PC. The present invention can be used with reference to various processing parts of the present invention not only for processing voice and music data for MP3, but also for processing of music data as well as speed and pitch change of voice signals implemented by other compression algorithms.

전술 및 도시한 예에서는, 본 발명에 따른 MP3데이타의 부호화/복호화방법이 어학학습시스템에서 사용되는 것으로 설명하였지만, 본 발명에 따른 MP3데이타의 부호화/복호화방법은 이에 한하지 아니하고, KOD(karaoke on demand) 및 AOD(audio on demand)등의 음향 및 음정의 속도가변이 필요한 분야에는 모두 적용할 수 있슴은 물론이다.In the above-described and illustrated examples, the method of encoding / decoding MP3 data according to the present invention has been described as being used in a language learning system. However, the method of encoding / decoding MP3 data according to the present invention is not limited thereto. Of course, it can be applied to the fields that need to change the speed of sound and pitch such as demand and audio on demand.

따라서, 본 발명에 의한 어학학습장치의 MP3데이타 부호화/복호화방법 및 이에 의한 어학학습장치에 의하면, 고음질, 고효율, 고기능 및 대용량을 갖고, MP3 방식으로 압축된 음성데이타를 받아서 음질의 손상없이 속도를 가변할 수 있으며, 어학학습 데이타의 속도가변, 음정변환 및 문자캡션 기능이 구현될 수 있는 효과를 얻을 수 있다.Therefore, according to the MP3 data encoding / decoding method of the language learning apparatus according to the present invention and the language learning apparatus thereby, the voice learning apparatus has a high sound quality, high efficiency, a high function, and a large capacity, and receives voice data compressed by the MP3 method without any damage to sound quality. It can be variable, and the effect of speed variation, pitch conversion, and character caption of language learning data can be realized.

또한, 어학학습장치의 가격이 비교적 가격해지게 되고, 어학학습자의 학습능력을 고려하여 속도가변이 가능하고 문자캡션기능이 가능하게 되는 효과를 얻을 수 있다.In addition, the price of the language learning apparatus becomes relatively high, and the speed can be changed in consideration of the learning ability of the language learner, and the character caption function can be obtained.

Claims

In the encoding / decoding method of a language learning apparatus having a modified MP3 encoder for compressing and encoding audio data and a modified MP3 decoder for decoding the encoded data,

A first step of the modified MP3 encoder compressing and encoding audio and the additional data by an MP3 method;

A second step of separating the additional data from the compressed data by the modified MP3 decoder and decoding the main data;

And a third step of varying the pitch and speed of the decoded main data and expressing a character of the additional data.

The method of claim 1,

Decoding using linear interpolation to reduce storage capacity when calculating is ^4/3 in the modified MP3 decoder;

Implementing encoding / decoding by transforming the lower / higher subbands with a frequency band selector to support 11.025 KHz and 22.05 KHz in sampling frequency for language learning voice data;

Amplifying the energy of music or voice selected by the frequency band selector; And

And storing the compressed data through the buffer device for repetitive learning in units of sentences.

The method according to claim 1 or 2,

Supporting 11.025 / 22.05KHz by downsampling of a speed / pitch variable that performs voice rate and pitch conversion and reducing memory;

Improving the speed of storing the synthesized reference point as a lookup-table;

Deflecting around the composite reference point of the fine tuning interval according to the speed change; And

A method of encoding / decoding a language learner further comprising using a normalization calculation, a subtraction operation, and non-leading zero for cross-correlation (Rxy).

A modified MP3 encoder for compressing and encoding audio data;

A modified MP3 decoder for decoding data encoded by the modified MP3 encoder;

An iterative learning buffer for storing compressed data;

An amplifier for amplifying the energy attenuated by the frequency band selector;

Subband selector to support 11.025 / 22.05KHz;

A speed / pitch variable for down-sampling PCM voice data from the modified MP3 decoder and processing speed and pitch variation of the voice signal; And

And a system controller for controlling the encoder, the decoder, the buffer, the frequency band selector, and the speed / pitch variable.

5. The apparatus of claim 4, wherein the modified MP3 encoder comprises a data formatter, the data formatter comprising: a speech data sentence start / end indicator; Frame number of speech data sentence construct; Nationality indicator; country code; Character start / end indicators for voice data sentences; Character table address; Number of characters for spoken data sentences; Pronunciation speed indicator; Number of frames for indicating the pronunciation speed of one character; Freeze / movie indicator; Stop / start video address; External cpu command indicators; A language learning device comprising inserting additional data such as a cpu command and encoding the same.