KR20030000400A

KR20030000400A - Method and apparatus for real- time modification of audio play speed

Info

Publication number: KR20030000400A
Application number: KR1020010036161A
Authority: KR
Inventors: 박주식
Original assignee: 주식회사 보이스텍
Priority date: 2001-06-25
Filing date: 2001-06-25
Publication date: 2003-01-06

Abstract

PURPOSE: A method and an apparatus for changing speed of voice regeneration in real time are provided to minimize deterioration of tone quality, damage of voice signals, and delay of regenerating time. CONSTITUTION: An apparatus for changing speed of voice regeneration in real time includes a user interface(7) receiving information required by a user, a digital voice signal storage medium(5) having stored digital voice signals, a voice signal processor(1) receiving the digital voice signals and converting into PCM data, a direct current component removing part(2) removing direct current component of the voice signals by using a weighted average to prevent tone quality from deteriorating according to regeneration speed control of the converted voice signals, a speed rate changing part(3), a real time regeneration control part(4) executing asking to control real time regeneration of the whole voice signals without damage of the tone quality.

Description

Method and apparatus for real-time conversion of voice playback speed {Method and apparatus for real-time modification of audio play speed}

본 발명은 디지털 음성신호의 재생속도를 가변시켜 실시간으로 재생하는 방법 및 장치에 관한 것으로서, 좀더 구체적으로는 시간축 변환(Time Scale Modification) 기술인 SOLA(Synchronized Overlap-and-Add) 알고리즘과 다중작업 (Multitasking) 기술을 이용하여 음성신호의 손실, 왜곡으로 인한 음색의 변화없이 실시간으로 음성신호를 재생할 수 있는 음성 재생속도 실시간 변환 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for reproducing a digital voice signal in real time, and more particularly, to a multi-tasking (SLA) algorithm, which is a time scale modification technique. The present invention relates to a voice playback speed real-time conversion method and apparatus capable of reproducing a voice signal in real time without changing the tone due to loss or distortion of the voice signal.

일반적으로, 시간 축 상에서의 음성신호의 수정방법 중에서 가장 단순하고 계산량이 적으면서도 음질이 우수하며 실시간 재생에 적합한 방법이 SOLA 알고리즘을 이용한 방법이다.In general, the method using the SOLA algorithm is the simplest, the least amount of calculation, the excellent sound quality, and the method suitable for real-time playback of the speech signal correction method on the time axis.

SOLA 알고리즘은 Roucos, S. & Wilgus A.에 의해 1985년 IEEE International Conference on Acoustics, Speech, and Signal Processing 2권(pp.493-496)에 "High Quality Time-Scale Modification for Speech"로 처음 소개되었다.The SOLA algorithm was first introduced by Roucos, S. & Wilgus A. in the 1985 IEEE International Conference on Acoustics, Speech, and Signal Processing, Volume 2 (pp.493-496) as "High Quality Time-Scale Modification for Speech." .

이러한, SOLA 알고리즘은 특정 구간에서 선행하는 음성신호 전체에 대해 새로운 하나의 음성신호 프레임의 시작점을 이동시키면서 가장 높은 상호상관 (Cross-Correlation) 계수를 가지는 위치를 찾고, 그 위치에서 아날로그 신호를 디지털신호로 바꿀 때 사용하는 샘플링 방법처럼 프레임들을 중첩시키고 평균을 내어 새로운 음성신호를 만들어 내는 방법이다. 즉, 이 방법은 근접구간 내에서 연속되는 프레임들간의 최적의 유사성(similarity)을 찾는 방법이다.The SOLA algorithm finds a position having the highest cross-correlation coefficient while moving the starting point of a new voice signal frame with respect to the entire preceding voice signal in a specific section, and converts the analog signal into a digital signal at that position. It is a method of overlapping and averaging frames to create a new voice signal like the sampling method used when switching to. In other words, this method finds the optimal similarity between successive frames in the proximity section.

여기서, 상호상관 계수란 두 음성신호에 대하여 상호상관 함수(Cross- Correlation Function)를 적용하여 계산되는 값으로서, '-1'과 '1' 사이의 값을 가지게 되는데 상관계수가 '1'이면 두 신호간에 상관관계가 아주 높다는 의미이다. 참고로, 상관함수는 두 음성신호 간의 상관관계를 수학적으로 표현하기 위해 정의되는 함수인데, 동일한 신호에 대해서는 자기상관함수(Autocorrelation function), 서로 다른 두 신호에 대해서는 상호상관함수(Cross Correlation function)라 한다. 이러한, 상관함수는 시간영역에서는 신호의 상관관계를 나타내는 함수가 되고, 주파수 영역에서는 그 신호가 포함하고 있는 파워(power) 또는 에너지의 분포를 나타내는 함수가 된다. 이러한 관계를 Wiener-Khintchine 법칙이라 하는데, 두 신호 x(t)와 y(t)의 상호상관함수(R_xy(t))는 시간 t에서의 x(t)와 시간 t+에서의 y(t+)의 곱을 충분히 긴 시간에 걸쳐 평균한 값이 된다.Here, the cross-correlation coefficient is a value calculated by applying a cross-correlation function to two voice signals, and has a value between '-1' and '1'. This means that the correlation between the signals is very high. For reference, the correlation function is a function defined to mathematically express the correlation between two voice signals, an autocorrelation function for the same signal and a cross correlation function for two different signals. do. Such a correlation function is a function indicating a correlation of signals in the time domain, and a function indicating a distribution of power or energy included in the signal in the frequency domain. This relationship is called Wiener-Khintchine's law. The cross-correlation function (R_xy (t)) of two signals x (t) and y (t) is defined as x (t) at time t and y (t +) at time t +. The product is averaged over a sufficiently long time.

이와같이, SOLA 알고리즘을 이용한 방법을 통해 조합된 프레임들은 음성신호의 시간에 대해 종속적인 특성을 갖는 피치(Pitch), 크기(Magnitude), 위상(Phase)이 보호될 수 있으며, 또한, 피치 추출(Pitch Extraction), 주파수 영역에서의 계산, 위상 추출(Phase Unwrapping) 등을 필요로 하지 않기 때문에 단순하고 효율적인 방법이라고 볼 수 있다.As such, the frames combined through the method using the SOLA algorithm may protect the pitch, magnitude, and phase, which have a time-dependent characteristic of the voice signal, and also extract the pitch. It is a simple and efficient method because it does not require extraction, calculation in the frequency domain, and phase unwrapping.

일반적으로 테이프 플레이어나 MP3 플레이어와 같은 재생장치에 있어서, 그 재생속도를 가변시키는 경우에는 저장매체에 저장된 디지털 음성신호의 음색이 변경되어 재생되는 것이다. 즉, 속도를 빠르게 변경시키면 듣기 거북한 고음의 음성이 재생되고, 느리게 변경할 경우에는 늘어지는 소리로 재생된다. 이런 현상의 원인은 음성신호의 속도가 변경될 때, 저장된 음성신호의 주파수와 피치(pitch) 성분이 변경되기 때문이다.In general, in a reproducing apparatus such as a tape player or an MP3 player, when the reproduction speed is changed, the tone of the digital voice signal stored in the storage medium is changed and reproduced. In other words, if you change the speed faster, you can hear high-pitched voices. The reason for this phenomenon is that when the speed of the voice signal is changed, the frequency and pitch components of the stored voice signal are changed.

이와같은 현상이 발생하는 것을 방지하기 위한 종래의 방법으로는, 첫째, 평균크기의 차함수(AMDF 방식)를 이용하여 입력된 디지털 음성신호의 주기를 찾고, 찾은 주기를 통하여 유성음과 무성음을 분리하여 분리된 유성음의 일부를 복사하거나 제거하여 음성신호의 길이를 변조하고, 앞에서 분리된 무성음과의 합성을 수행하는 방법이 있다. 둘째, 시간축에서 디지털 신호의 일부분을 복사하여 추가하거나 제거하는 방법 등이 있다.As a conventional method for preventing such a phenomenon from occurring, first, a period of an input digital voice signal is found using an average size difference function (AMDF method), and the voiced sound and the unvoiced sound are separated through the found period. There is a method of modulating the length of the voice signal by copying or removing a part of the separated voiced sound, and performing synthesis with the separated voiced sound. Second, there is a method of copying and adding or removing a part of the digital signal in the time axis.

그러나, 상기와 같은 종래의 방법들을 이용하여 구현할 경우 여러가지 다양한 문제점이 발생할 수 있다.However, various implementations may occur when implemented using the conventional methods as described above.

평균크기의 차함수(AMDF 방식)를 이용하여 입력된 디지털 음성신호의 주기를 찾을 때 정확한 주기를 찾지 못하게 되면 더 심한 왜곡이 발생하며, 또한 시간축에서 디지털 신호의 일부분을 복사하여 추가하거나 제거하는 방법의 경우에는 빠르게 재생속도를 변화시킬 때 디지털 음성신호의 손실이 과다하게 발생하며 심한 경우에는 단어를 건너뛰고 재생되는 현상이 발생될 수 있다. 또한, 종래의 방법에 있어서, 실시간으로 재생속도를 변경할 경우에는 재생하고자 하는 전체 음성신호들을 일정한 크기로 블록화하여 처리하여야 하는데, 이 경우 블록단위로 처리되는 음성신호의 연결부분에서 음성신호의 과다한 손실 또는 왜곡으로 인하여 잔향이나 틱틱거리는 클릭(click)이 발생할 수 있으며, 일정 시간의 시간지연이 발생될 수 있다.If you do not find the correct period when you find the period of the input digital voice signal by using the difference function of the average size (AMDF method), further distortion occurs, and how to copy or add a part of the digital signal on the time base In the case of rapidly changing the playback speed, the loss of the digital voice signal is excessive, and in severe cases, the word may be skipped and played back. In addition, in the conventional method, when the playback speed is changed in real time, all audio signals to be reproduced must be processed by blocking them to a certain size. In this case, excessive loss of the voice signal at the connection part of the voice signal processed in units of blocks is performed. Alternatively, the reverberation or the tick distance may be clicked due to distortion, and a time delay may occur for a predetermined time.

본 발명의 목적은 종래의 문제점을 해결하기 위한 것으로서, 시간축 변환기술인 SOLA 알고리즘과 다중작업 기술을 이용하여 디지털화되어 저장된 음성신호를재생할 경우에 음성신호의 재생속도를 실시간으로 변경가능하도록 하고, 이에 따라 발생할 수 있는 음색의 변질 및 손실의 발생을 방지하며, 사람이 빨리 또는 천천히 이야기하는 것처럼 들리도록 음성신호의 재생속도를 실시간으로 조절할 수 있는 음성 재생속도 실시간 변환 방법 및 장치를 제공하는데 있다.SUMMARY OF THE INVENTION An object of the present invention is to solve a conventional problem, and when reproducing a digitized and stored voice signal using SOLA algorithm, which is a time base conversion technique, and a multitasking technique, the reproduction speed of the voice signal can be changed in real time. It is to provide a voice playback speed real-time conversion method and apparatus that can prevent the occurrence of alteration and loss of the sound to occur, and can adjust the playback speed of the voice signal in real time to sound like a person talking quickly or slowly.

도1은 본 발명에 따른 시스템 구성도.1 is a system configuration according to the present invention.

도2는 실시간 재생 제어부의 제어를 통해 이루어지는 시스템상의 작업처리 구성도.2 is a configuration of a job processing on a system made through the control of a real time playback control unit.

도3은 음성신호 처리부에서의 작업 흐름도.3 is a flowchart of operation in the voice signal processing unit.

도4는 직류성분 제거부에서의 작업 흐름도.4 is a flowchart of operation in the direct current component removing unit;

도5는 재생속도 변환부에서의 작업 흐름도.5 is a flowchart of work in the reproduction speed converting section.

도6은 음성의 손실 및 왜곡을 방지하기 위한 방법의 구성도.6 is a block diagram of a method for preventing the loss and distortion of speech.

<도면부호의 설명><Description of Drawing>

음성신호 처리부(1), 직류성분 제거부(2), 재생속도 변환부(3), 실시간 재생 제어부(4), 디지털 음성신호 저장매체(5), 디지털 음성신호 재생장치(6), 사용자 인터페이스(7), 음성신호 샘플의 저장(20), 음성신호의 연결부분(30), 초기 재생시작 명령(A), 재생중 재생속도의 변경명령(A')Audio signal processing section 1, DC component removal section 2, playback speed converting section 3, real time playback control section 4, digital audio signal storage media 5, digital audio signal playback apparatus 6, user interface (7), storage of the audio signal sample 20, connection portion 30 of the audio signal, an initial playback start command (A), and a command to change the playback speed during playback (A ')

본 발명은 디지털 음성신호의 재생속도를 0.5배속∼2배속 범위내에서 가변시켜 실시간으로 재생하는 방법 및 장치에 관한 것으로, 본 발명에서의 음성신호라 함은 사람이 발성하는 음성뿐만 아니라, 악기 등으로부터 발생하는 모든 음향신호를 말한다.The present invention relates to a method and apparatus for reproducing a digital voice signal in real time by varying the playback speed within a range of 0.5 times to 2 times the speed. The voice signal according to the present invention is not only a voice produced by a person, but also a musical instrument. Refers to all sound signals generated from.

본 발명의 전체적인 구성을 도1을 참조하여 설명한다.The overall configuration of the present invention will be described with reference to FIG.

본 발명에 따른 디지털 음성신호의 재생속도를 가변시켜 실시간으로 재생하기 위한 시스템은, 사용자가 필요로 하는 정보(사용자가 재생을 원하는 파일의 선택, 변경할 재생속도, 재생시작, 재생멈춤, 멈춤 등)를 입력받는 사용자 인터페이스(7), 디지털 음성신호를 저장하고 있는 디지털 음성신호 저장매체(5), 상기 디지털 음성신호 저장매체(5)로부터 디지털 음성신호를 입력받아 특정한 PCM(Pulse Code Modulation) 데이터형식으로 변환하는 음성신호 처리부(1), 상기 음성신호의 재생속도 조절에 따른 음질저하 방지를 위해 입력된 디지털 음성신호의 가중평균값을 이용하여 음성신호의 직류성분을 제거하는 직류성분 제거부(2), 상기 음성신호의 재생속도 변환을 위해 시간축 변환기술인 SOLA(Synchronized Overlap-and-Add) 알고리즘을 이용하여 변환하는 재생속도 변환부(3), 상기의 재생하고자 하는 입력된 음성신호 전체를 음질의 손상없이 실시간으로 재생하도록 제어하기 위해 다중작업(Multitasking)을 수행하는 실시간 재생 제어부(4), 상기 실시간 재생 제어부(4)의 제어를 통해 재생속도 변환부(3)를 통한 음성신호를 출력하기 위한 디지털 음성신호 재생장치(6)로 구성된다.The system for reproducing in real time by varying the playback speed of the digital audio signal according to the present invention, the information required by the user (selection of the file you want to play, the playback speed to be changed, playback start, playback stop, stop, etc.) Receives a digital voice signal from the user interface (7), a digital voice signal storage medium (5) for storing digital voice signals, and the digital voice signal storage medium (5) and receive a specific pulse code modulation (PCM) data format. Voice signal processing unit (1) for converting the signal, DC component removal unit (2) for removing the DC component of the voice signal by using the weighted average value of the input digital voice signal in order to prevent the sound quality degradation caused by the reproduction speed of the voice signal To change the playback speed of the voice signal, a playback speed change is performed by using a time-base conversion technology, SOLA (Synchronized Overlap-and-Add) algorithm. 3, the real-time playback control unit 4 and the real-time playback control unit 4 performing multitasking to control the entire inputted audio signal to be reproduced in real time without damage to sound quality. And a digital audio signal reproducing apparatus 6 for outputting the audio signal through the reproduction speed converting section 3 through the control.

여기서, 레코드나 라디오의 음성은 음의 강약을 그대로 전류의 강약으로 전달, 재생하는 아날로그 방식을 취하고 있다. 이에 대해, 음의 강약을 '1'이나 '0'의 2진신호의 조합, 즉, 디지털신호로 변환해서 보내는 것을 PCM(Pulse Code Modulation ; 펄스부호변조)방식이라 한다. 그리고, 본 발명에서 음성신호의 재생속도 변환을 위해 사용된 시간축 변환기술인 SOLA 알고리즘에 대해서는 종래기술에서 설명한 바와 같다.Here, the audio of a record or a radio takes an analog method of transmitting and reproducing a sound intensity as a current intensity. On the other hand, a combination of a binary signal of '1' or '0', that is, a negative signal is converted into a digital signal and transmitted, called a PCM (Pulse Code Modulation) method. In the present invention, the SOLA algorithm, which is a time-base conversion technique used for converting a reproduction speed of a voice signal, has been described in the related art.

이와같이 구성된, 본 발명에 따른 음성 재생속도 실시간 변환 장치의 작용을 실시간 재생 제어부(4)의 제어동작의 관점에서 전체적인 업무처리에 대해 도2를 참조하여 설명한다.The operation of the voice playback speed real-time conversion device according to the present invention configured as described above will be described with reference to FIG. 2 for the overall business processing in view of the control operation of the real-time playback control section 4.

본 발명의 시스템은, 먼저, 사용자 인터페이스(7)에서 사용자가 필요로 하는 정보(사용자가 재생을 원하는 파일의 선택, 변경할 재생속도, 재생시작, 재생멈춤, 멈춤 등)를 입력받는다. 입력받은 정보는 시스템을 전체적으로 제어하는 실시간 재생 제어부(4)의 제어에 의해 동작되는데, 본 발명의 시스템은 재생속도의 변경에 있어서, 음성신호의 초기 재생시작의 명령(A)이 들어오면 음성신호 처리부(1)에서는 디지털 음성신호 저장매체(5)로부터 일정한 크기의 블록단위로 음성신호를 읽어들인 후, 특정한 PCM 데이터형식으로 음성신호를 변환한다. 변환된 음성신호 데이터는 직류성분 제거부(2) 및 재생속도 변환부(3)로 전송된 후, 직류성분 제거부(2)에서는 음성신호의 직류성분을 제거하고, 재생속도 변환부(3)에서는 시간축 변환기술인 SOLA 알고리즘을 이용하여 음성신호를 변환한다. 이와같이, 직류성분 제거부 (2)와 재생속도 변환부(3)에서는 음성신호 처리부(1)로부터 음성신호가 전달되는 동안, 직류성분의 제거 및 속도변환 처리를 수행한다. 변환된 음성신호는 사운드 카드 등과 같은 디지털 음성신호 재생장치(6)를 통해 출력된다.The system of the present invention first receives information required by the user (selection of a file that the user wants to play, playback speed to change, playback start, playback stop, stop, etc.) in the user interface 7. The inputted information is operated by the control of the real-time playback control section 4, which controls the system as a whole. The system of the present invention, when changing the playback speed, receives a command (A) for the initial start of playback of the audio signal. The processor 1 reads the voice signal from the digital voice signal storage medium 5 in units of blocks of a predetermined size, and then converts the voice signal into a specific PCM data format. The converted voice signal data is transmitted to the DC component removal unit 2 and the reproduction speed conversion unit 3, and then the DC component removal unit 2 removes the DC component of the audio signal, and reproduces the speed conversion unit 3. In this example, the voice signal is converted using the SOLA algorithm, which is a time base conversion technique. In this way, the DC component removing unit 2 and the reproduction speed converting unit 3 perform the DC component removal and speed conversion processing while the audio signal is transmitted from the audio signal processing unit 1. The converted voice signal is output through the digital voice signal reproducing apparatus 6 such as a sound card.

그러나, 음성신호 처리부(1)에서의 작업 중 실시간 재생 제어부(4)로부터 '잠시멈춤'이나 '멈춤' 명령이 들어오면, 작업을 일시 중지시켜 음성신호의 재생을 일시중지 시키거나 작업을 종료시킨다.However, if a 'pause' or 'pause' command is received from the real-time playback control section 4 during the operation in the audio signal processing section 1, the operation is paused to pause the playback of the audio signal or terminate the operation. .

또한, 음성신호의 재생 중 사용자 인터페이스(7)로 재생속도(Speed Rate)의 변경 명령(A')이 들어오면, 실시간으로 속도변경이 이루어지기 위해서 음성신호 처리부(1)를 거치지 않고, 실시간 재생 제어부(4)의 제어에 의해 직류성분 제거부(2)와 재생속도 변환부(3)를 통해 작업이 이루어진 후 디지털 음성신호 재생장치(6)를 통해 음성신호가 출력된다.In addition, when a command (A ') of changing the speed rate is input to the user interface 7 during the reproduction of the voice signal, the real-time reproduction is performed without passing through the voice signal processing unit 1 to change the speed in real time. After the operation is performed by the DC component removing unit 2 and the reproduction speed converting unit 3 under the control of the control unit 4, the audio signal is output through the digital audio signal reproducing apparatus 6.

상기와 같이 구성되는 본 발명의 시스템에 있어서, 음성 재생속도의 실시간 변환 방법에 대해서는 시스템의 각 구성요소별로 작업 흐름도를 참조하여 설명한다.In the system of the present invention configured as described above, the real-time conversion method of the voice reproduction speed will be described with reference to the work flow chart for each component of the system.

(1) 음성신호 처리부(1) voice signal processing unit

도3은 음성신호 처리부(1)에서의 작업처리를 위한 흐름도이다.3 is a flowchart for job processing in the audio signal processing unit 1.

음성신호 처리부(1)에서는 도3에 나타난 바와 같이,In the audio signal processor 1, as shown in FIG.

실시간 재생 제어부(4)에서 지정한 디지털 음성신호 저장매체(5)에서, 특정 파일이나 메모리상의 특정 번지에 저장된 음성신호의 재생을 시작하라는 명령에 따라 디지털 음성신호 저장매체(5)에 저장된 디지털 음성신호의 데이터형식(PCM 데이터형식으로의 변경여부)을 조사하는 단계[100];In the digital audio signal storage medium 5 designated by the real-time playback control section 4, the digital audio signal stored in the digital audio signal storage medium 5 in accordance with a command to start playback of the audio signal stored in a specific file or a specific address on the memory. Examining the data format of the data format (whether or not it is changed to the PCM data format) [100];

조사후, PCM 데이터형식으로의 변경이 가능한 경우에 일정 크기의 블록단위로 데이터를 읽어오는 단계[101];Reading the data in units of blocks of a predetermined size when the change to the PCM data format is possible after the inspection [101];

그러나, 조사 후에 PCM 데이터형식으로의 변경이 불가능한 경우에는 실시간 재생 제어부(4)로 통보하는 단계[101']가 수행된다.However, if the change to the PCM data format is impossible after the irradiation, the step [101 '] of notifying the real time reproduction control section 4 is performed.

읽어온 데이터를 PCM 데이터 형식으로 변환하는 단계[102]가 수행되는데, 본 발명에서는 PCM 데이터 형식으로의 변환시, Sampling Rate는 11.025kHz, 16bit ∼ 22.05kHz, 16bit의 PCM 데이터 형식으로 변환한다.Step [102] of converting the read data into the PCM data format is performed. In the present invention, the sampling rate is converted into the PCM data format of 11.025 kHz, 16 bits to 22.05 kHz, and 16 bits.

변환한 데이터는 실시간 재생에 용이한 일정 크기의 블록으로 나눈 후, 전체 음성신호의 재생 또는 재생중지 또는 잠시 멈춤 등의 명령을 받을 때까지 블록단위의 음성신호를 직류성분 제거부(2)로 전송하는 단계[103]로 이루어진다. 여기서, 재생속도(speed rate)는 재생속도 변환부(3)로 전송된다(도시하지 않음).The converted data is divided into blocks of a certain size, which is easy for real-time playback, and then the audio signal in block unit is transmitted to the DC component removing unit 2 until a command such as playing or stopping playback or pausing of the entire audio signal is received. It consists of a step [103]. Here, the speed rate is transmitted to the playback speed conversion section 3 (not shown).

여기서, 전체 음성신호를 블록으로 나누어 처리하는 이유는 실시간 재생 제어부(4)의 다중작업(Multitasking)을 하는데 있어서, 하나의 작업단위가 장시간 시스템의 자원을 점유하게 되면 병목현상으로 인해 시간지연이 발생하는 것을 방지하기 위함이다.Here, the reason for processing the whole voice signal into blocks is to perform multitasking of the real-time playback control unit 4. When one work unit occupies resources of the system for a long time, a time delay occurs due to a bottleneck. To prevent it.

(2) 직류성분 제거부(2) DC component removal unit

도4는 직류성분 제거부(2)에서의 작업처리를 위한 흐름도이다.4 is a flowchart for processing a job in the direct current component removing unit 2.

직류성분 제거부(2)는 실시간 재생 제어부(4)의 제어에 의해 음성신호 처리부(1)로부터 전송된 블록단위의 음성신호의 직류성분을 제거하기 위한 것으로서,The DC component removing unit 2 is for removing DC components of the audio signal in a block unit transmitted from the audio signal processing unit 1 under the control of the real time reproduction control unit 4.

음성신호 처리부(1)로부터 전송된 블록단위의 음성신호에서, 최소값(Min) 및 최대값(Max)을 최소값=-32767, 최대값=32767의 고정값으로 할당하는 단계[200];Assigning a minimum value (Min) and a maximum value (Max) to a fixed value of minimum value = -32767 and maximum value = 32767 in a block unit voice signal transmitted from the voice signal processing unit 1 [200];

입력된 음성신호 샘플들을 검색하여 최소값(Min)보다 작은 샘플값의 최소값 (R_Min)과 최대값(Max)보다 큰 샘플값의 최대값(R_Max)을 찾는 단계[201];Searching the input voice signal samples to find a minimum value R_Min of a sample value smaller than a minimum value Min and a maximum value R_Max of a sample value larger than a maximum value [201];

입력된 음성신호 블록의 각 샘플들의 평균값을 계산하고, 계산된 평균값에 가중값(=0.5)을 가산하여 가중평균값을 계산하는 단계[202]가 수행된다.A step [202] of calculating the average value of each sample of the input voice signal block and adding the weight value (= 0.5) to the calculated average value is performed.

가중평균값의 결정에 있어서, 샘플값의 최소값(Real Min)에서 가중평균값을 차감한 값이 최초 할당한 최소값(Min)보다 작을 경우에는 가중평균값에 최초 할당한 최대값(Max)을 가산하여 가중평균값을 구하고, 샘플값의 최대값(Real Max)에서 가중평균값을 차감한 값이 최초 할당한 최대값(Max)보다 클 경우에는 가중평균값에 최초 할당한 최소값(Min)을 가산하여 가중평균값을 구한다[203].In determining the weighted average value, if the value obtained by subtracting the weighted average value from the minimum value (Real Min) of the sample value is smaller than the minimum value (Min) assigned to the initial value, the weighted average value is added to the weighted average value. If the value obtained by subtracting the weighted average value from the maximum value (Real Max) of the sample value is larger than the first assigned maximum value (Max), the weighted average value is obtained by adding the minimum value (Min) initially assigned to the weighted average value [ 203].

이를 수식으로 나타내면 다음과 같다.This is expressed as a formula as follows.

IF (R_Min - 가중평균값) < MinIF (R_Min-weighted average) <Min

THEN 가중평균값 = 가중평균값 + MaxTHEN weighted average = weighted average + Max

IF (R_Max - 가중평균값) > MaxIF (R_Max-Weighted Average)> Max

THEN 가중평균값 = 가중평균값 + MinTHEN weighted average = weighted average + Min

가중평균값이 결정된 후에는, 각 샘플에서 가중평균값을 차감하여 직류성분을 제거하는 단계[204]가 수행되고,After the weighted average value is determined, a step of removing the DC component by subtracting the weighted average value from each sample is performed [204].

직류성분이 제거된 블록단위의 음성신호를 재생속도 변환부(3)로 전송하는 단계[205]로 이루어진다.And transmitting the audio signal in the block unit from which the DC component has been removed to the reproduction speed converting section 3 [205].

이러한 가중평균값을 이용하여 음성신호의 직류성분을 제거하는 방법은, 음성신호를 시간영역에서 수정하기 전에 직류성분을 제거함으로써 재생 음질의 향상을 가져온다.The method of removing the DC component of the audio signal by using the weighted average value improves the reproduction sound quality by removing the DC component before the audio signal is corrected in the time domain.

(3) 재생속도 변환부(3) Play speed converter

도5는 재생속도 변환부(3)에서의 작업처리를 위한 흐름도이다.5 is a flowchart for job processing in the reproduction speed converting section 3.

먼저, 재생속도 변환부(3)에서 사용되는 수학식 및 이에 대한 설명은 다음과 같다.First, the equation used in the playback speed converter 3 and a description thereof are as follows.

상기에서, 수학식 1은 윈도우 크기(Window Size)와 변환을 원하는 재생 속도 비율(Speed Rate)을 이용하여 최대 상호상관계수 값을 검색할 때, 적용하는 이동 시간간격(Step Time)을 계산하는 수식이다. 수학식 2는 입력된 음성신호의 주파수를 참조하여 한 개 샘플의 시간 간격(Duration)을 계산하기 위한 수식이며, 수학식 3은 속도변환을 처리할 입력 프레임들을 패치할 때 적용하기 위한 입력 프레임 스텝크기를 계산하기 위한 수식이다. 그리고, 수학식 4는 해닝 윈도우 함수의 적용을 위해 윈도우 하나의 크기를 계산하는 수식이며, 수학식 5는 SOLA 알고리즘 중첩 및가산을 처리하기 위한 크기, 최소, 최대 범위를 계산하기 위한 수식이다. 수학식 6은 입력된 음성신호 블록의 전체 재생시간을 계산하기 위한 수식이며, 수학식 7은 해닝 윈도우 함수를 나타내는 수식이다.In Equation 1, Equation 1 is used to calculate a step time applied when a maximum correlation value is searched using a window size and a speed rate to be converted. to be. Equation 2 is a formula for calculating a time duration of one sample by referring to the frequency of the input voice signal, and Equation 3 is an input frame step to apply when patching input frames to be processed for speed conversion. The formula to calculate the size. Equation 4 is an equation for calculating the size of one window for applying the Hanning window function, and Equation 5 is an equation for calculating the size, minimum, and maximum range for processing the SOLA algorithm overlapping and addition. Equation 6 is an equation for calculating the total reproduction time of the input voice signal block, and Equation 7 is an equation representing a Hanning window function.

실시간 재생 제어부(4)에 의해 제어되는 재생속도 변환부(3)에서는 직류성분이 제거된 음성신호 블록을 SOLA 알고리즘을 이용하여 재생속도 변환처리를 하기 위한 것으로서,The playback speed conversion section 3 controlled by the real time playback control section 4 performs the playback speed conversion process on the audio signal block from which the DC component has been removed using the SOLA algorithm.

음성신호 샘플의 윈도우 크기 등 각종 시간요소 파라미터(Time Scale Parameter)의 정의 및 계산 단계[300]가 포함되는데,Defining and calculating various time scale parameters such as a window size of a voice signal sample, [300],

본 발명에서 입력된 음성신호의 윈도우 크기(Window Time)는 0.032(32ms)로 고정하고, 동기화(Synchronization)를 위한 검색 시간요소인 동기화 시간(+/- Sync. Time)은 0.01(10ms)로 고정한다.In the present invention, the window time of the input voice signal is fixed at 0.032 (32 ms), and the synchronization time (+/- Sync. Time), which is a search time element for synchronization, is fixed at 0.01 (10 ms). do.

각종 파라미터의 정의 및 계산 후, 상기 직류성분 제거부(2)에서 직류성분이 제거된 블록단위의 음성신호의 전체 재생시간(End Time)을 계산하는 단계[301]가 수행된다. 여기서, End Time은 수학식 6을 이용하여 구할 수 있다.After defining and calculating various parameters, a step [301] of calculating the total end time of the audio signal in the block unit in which the DC component is removed is performed by the DC component removing unit 2. Here, the end time can be obtained using Equation 6.

음성신호의 전체 재생시간 계산 후, 현재 처리된 음성신호 재생시간(Current Processing Time)과 입력된 음성신호 블록의 전체 재생시간(End Time)과의 비교 단계[302]가 수행되는데,After calculating the total reproduction time of the audio signal, a comparison step [302] is performed between the currently processed audio signal reproduction time (Current Processing Time) and the total reproduction time (End Time) of the input audio signal block.

여기서, 현재 처리된 음성신호 재생시간이 입력된 음성신호 블록의 전체 재생시간에 이를 때까지 단계[303]부터의 단계들이 반복되며, 만일, 현재 처리된 음성신호 재생시간이 입력된 음성신호 블록의 전체 재생시간에 이를 때[303']에는 바로 디지털 음성신호 재생장치(6)를 통해 음성신호가 출력된다.Here, the steps from step [303] are repeated until the currently processed voice signal reproduction time reaches the entire reproduction time of the input voice signal block. When the total reproduction time is reached [303 '], the audio signal is immediately output through the digital audio signal reproducing apparatus 6.

상호상관(Cross-Correlation) 함수와 그 계수를 이용하여 음성신호 프레임들 중에서의 가장 적절한 위치를 검색하는 단계[303]는 수학식 1을 이용한다.Using the cross-correlation function and its coefficients to search for the most appropriate position among the speech signal frames [303], Equation 1 is used.

검색 후, 해닝 윈도우 함수(Hanning Window Function)를 적용하게 되는데 [304],After searching, we apply Haning Window Function [304],

여기서, 해닝 윈도우 함수란 등간격 자료를 0.25, 0.5, 0.25의 가중치를 이용하여 평활화시키는 기법으로서, 윈도우의 급격한 차단효과를 줄이기 위해 적용하는 함수이다. 해닝 윈도우 함수의 적용에 있어서는 수학식 4와 7을 이용한다.Here, the hanning window function is a technique for smoothing equal interval data using weights of 0.25, 0.5, and 0.25, and is a function applied to reduce a sudden blocking effect of a window. Equations 4 and 7 are used to apply the Hanning window function.

해닝 윈도우 함수의 적용 후, 윈도우 크기와 검색된 중첩 및 가산(Overlap and Add) 위치를 이용하여 음성신호를 중첩, 가산하는 단계[305]가 수행되고,After the application of the Hanning window function, a step of overlapping and adding the speech signal using the window size and the searched overlap and add positions [305] is performed.

해닝 윈도우 함수 및 중첩, 가산 처리된 두 음성신호의 합성단계[306]가 수행된다.A hanning window function and a synthesis step [306] of two superimposed and added speech signals are performed.

여기서, 해닝 윈도우 함수 및 중첩, 가산 처리된 두 음성신호의 합성은 아날로그 신호를 디지털 신호호 바꿀 때 사용하는 샘플링 방법을 이용하여 합성한다.Here, the synthesis of the Hanning window function and the superimposed and added speech signals are synthesized using a sampling method used to convert an analog signal into a digital signal call.

합성 처리된 음성신호 프레임들은 디지털 음성신호 재생장치(6)를 통해 출력되기 위해 메모리 버퍼로의 기록 및 현재 처리된 음성신호 재생시간(Current Processing Time)을 갱신하며[307],The synthesized speech signal frames are updated into the memory buffer and the current processed speech signal reproduction time (Current Processing Time) for output through the digital speech signal reproduction apparatus 6 [307],

갱신 후에는 다음 프레임을 처리하기 위해 단계[302]로 돌아가[308], 현재 처리된 음성신호 재생시간과 입력된 음성신호 블록의 전체 재생시간과의 비교작업을 다시 수행한다.After the update, the process returns to step 302 to process the next frame [308], and the comparison operation between the currently processed audio signal reproduction time and the entire reproduction time of the input audio signal block is performed again.

끝으로, 음성의 손실 및 왜곡을 방지하기 위한 방법을 도6을 참조하여 설명한다.Finally, a method for preventing the loss and distortion of speech will be described with reference to FIG.

만약, 음성신호 블록의 끝부분 프레임을 처리할 경우에는 도6에 나타난 바와 같이, 실시간 처리를 수행할 경우에 연결부분(30)의 음성신호 손실이나 왜곡이 발생할 수 있는데, 본 발명에서는 이를 방지하기 위해 윈도우 크기의 1/2 만큼의 음성신호 샘플을 저장(20)한 후, 도면에서 보는 바와 같이 다음 블록의 처리시 다음 블록의 앞부분에 연결하여 처리하도록 하였다.If the end frame of the voice signal block is processed, as shown in FIG. 6, when the real time processing is performed, a loss or distortion of the voice signal of the connection part 30 may occur. For example, after storing the voice signal samples as much as 1/2 of the window size, as shown in the figure, when the next block is processed, it is connected to the front part of the next block.

이상과 같이 본 발명의 음성 재생속도 실시간 변환 방법 및 장치에 따르면, 음성 재생장치에 입력된 디지털 음성신호를 실시간으로 속도를 변경하여 재생시킬 수 있으므로 재생되는 음성신호의 음색의 열화 및 실시간 재생을 위한 블록화 처리에 따라 발생할 수 있는 음성신호의 손실과 재생시간 지연을 극소화하여 음성신호의 재생속도를 실시간으로 가변하여 재생시킬 수 있도록 하였다. 따라서, 향상된 재생 음성신호를 이용하여 MPEG 방식으로 압축된 음성신호를 재생하는 MP3전용 플레어어나 외국어 학습과 같이 실시간으로 빠르게 또는 느리게 재생하여 청취해야 하는 경우에 다량의 음성신호 데이터를 빠르게 또는 느리게 재생하여 청취할 수 있는 효과를 얻을 수 있다.As described above, according to the method and apparatus for real-time conversion of the voice reproducing speed, the digital voice signal input to the voice reproducing apparatus can be reproduced by changing the speed in real time. By minimizing the loss of the audio signal and the playback time delay that can occur due to the blocking process, the playback speed of the audio signal can be changed in real time. Therefore, when a user needs to play back or slow down in real time, such as an MP3 flare or a foreign language learning, which uses an enhanced playback audio signal to play an MPEG-compressed audio signal, a large amount of audio signal data can be played back quickly or slowly. You can get audible effects.

Claims

An apparatus for reproducing in real time by varying the reproduction speed of a digital audio signal,

A user interface for receiving information required by the user,

Digital voice signal storage medium for storing the digital voice signal,

A voice signal processor for receiving a digital voice signal from the digital voice signal storage medium and converting the digital voice signal into a PCM data format;

DC component removal unit for removing the DC component of the voice signal using a weighted average value to prevent sound quality degradation due to the control of the playback speed of the voice signal converted by the voice signal processor,

Playback speed conversion unit for converting the playback speed of the speech signal from which the DC component is removed using the SOLA algorithm, which is a time base conversion technology, through the DC component removal unit;

A real time playback control unit for performing a multi task to control the entire voice signal to be reproduced in real time without damage to sound quality;

And a digital voice signal reproducing apparatus for outputting a voice signal through a reproducing rate converting unit under the control of the real time reproducing control unit.

The apparatus of claim 1, wherein the entire speech signal is processed in block units (1024 * N; N is an integer greater than 0).

The apparatus of claim 1, wherein the reproduction speed of the speech signal reproduced through the digital speech signal reproducing apparatus is variably reproduced within a range of 0.5 times to 2 times speed.

The method of claim 1, wherein the command of the initial playback start input to the user interface is performed from the voice signal processor under the control of the real-time playback controller, and the command of changing the playback speed input during the playback of the voice signal is controlled by the control of the real-time playback controller. The voice playback speed real-time conversion device, characterized in that the operation is started in the removal unit and the playback speed conversion unit.

The apparatus of claim 1, wherein frames of voice signals in block units are connected to each other in order to prevent loss and distortion of the voice signals.

In the method for reproducing in real time by varying the playback speed of the digital audio signal,

1) receiving the information required by the user through the user interface,

2) reading the voice signal from the digital voice signal storage medium designated by the real-time playback control unit and converting the voice signal into a PCM data format;

3) removing the DC component by using a weighted average value in the DC component removing unit to prevent the sound quality degradation due to the conversion of the reproduction speed of the voice signal transmitted from the voice signal processor;

4) converting the playback speed by using the SOLA algorithm in the playback speed converter to convert the playback speed of the voice signal from which the DC component is removed by the DC component remover;

5) The voice playback speed real-time conversion method, characterized in that it comprises the step of outputting the audio signal through the digital voice signal playback apparatus under the control of the real-time playback control unit.

In claim 6, step 2) is

Checking whether the input digital voice signal is changed into the PCM data format [100];

Reading data in block units of a predetermined size when the change to the PCM data format is possible [101];

Converting the read data into the PCM data format [102];

And dividing the converted data into blocks of a predetermined size, and then transmitting the converted data to a DC component removing unit [103].

In claim 6, step 3)

Allocating a fixed value of a minimum value of −32767 and a maximum value of 32767 in a transmitted voice signal in a block unit [200];

Retrieving the minimum value R_Min of the sample value smaller than the minimum value Min and the maximum value R_Max of the sample value larger than the maximum value Max among the voice signal samples [201];

Calculating a weighted average value by adding a weighted value to the calculated average value after calculating an average value of each sample of the audio signal [202];

If the value obtained by subtracting the weighted average value from the minimum value (Real Min) of the sample value is smaller than the minimum value (Min) assigned first, the weighted average value is obtained by adding the maximum value (Max) initially assigned to the weighted average value. If the value obtained by subtracting the weighted average value from the value (Real Max) is larger than the initially assigned maximum value (Max), adding a minimum value (Min) initially assigned to the weighted average value to obtain a weighted average value [203];

Removing the direct current component by subtracting the weighted average value from each sample of the audio signal [204];

And transmitting the audio signal in the block unit from which the DC component has been removed to the reproduction speed conversion unit.

In claim 6, step 4)

Defining and calculating various time element parameters such as a window size of a voice signal sample [300];

Calculating a total reproduction time of the audio signal [301];

Comparing the currently processed voice signal reproduction time with the total reproduction time of the input voice signal block [302];

Retrieving a most suitable position among voice signal frames using a cross-correlation function and its coefficients [303];

Applying a hanning window function [304];

Superimposing and adding the speech signals using the window size and the searched overlap and add positions [305];

Synthesizing two speech signals having a hanning window function and an overlapping and addition process [306];

And [307] recording the synthesized speech signal frames into a memory buffer and updating the currently processed speech signal reproduction time.

10. The method according to any one of claims 6 to 9, wherein the entire speech signal is processed in block units (1024 * N; N is an integer greater than 0).

The method according to any one of claims 6 to 9, wherein the reproduction speed of the audio signal reproduced through the digital audio signal reproducing apparatus is variably reproduced within the range of 0.5 times to 2 times the speed.

The method of claim 6, wherein the command of the initial playback start input to the user interface is performed from the voice signal processor under the control of the real-time playback control, the command of changing the playback speed input during the playback of the voice signal is controlled by the control of the real-time playback control unit The operation of the voice playback speed, characterized in that the work is started in the removal unit and the playback speed conversion unit.

7. The method of claim 6, wherein frames of voice signals in block units are connected to each other in order to prevent loss and distortion of the voice signals.

8. The method of claim 6 or 7, wherein the sampling rate is 11.025 kHz, 16 bits to 22.05 kHz, 16 bits when the voice signal processor converts the data into a PCM data format.

The method according to claim 6 or 8, characterized in that the weight in calculating the weighted average value in the DC component remover is 0.5, the voice playback speed real-time conversion method.

The method of claim 6 or 9, wherein the playback speed converter is configured to fix the window size to 0.032 (32ms) and the synchronization time to 0.01 (10ms).

10. The method of claim 6 or 9, wherein the synthesis of the two speech signals superimposed and added by the Hanning window function in the reproduction speed converting unit uses a sampling method such as converting an analog signal into a digital signal. .