KR20040013729A

KR20040013729A - Audio signal time-scale modification method using variable length synthesis and reduced cross-correlation computations

Info

Publication number: KR20040013729A
Application number: KR1020020046792A
Authority: KR
Inventors: 최원용
Original assignee: 주식회사 코스모탄
Priority date: 2002-08-08
Filing date: 2002-08-08
Publication date: 2004-02-14
Also published as: KR100547444B1

Abstract

PURPOSE: A method of modifying time scale of an audio signal using variable length synthesis and correlation computation reduction technique is provided to reduce the quantity of calculations for finding the maximum value of cross-correlation. CONSTITUTION: An analysis window composed of the first number of audio samples of an input stream is decided(S16). Similarity between the first audio samples of the analysis window and the second audio samples of an output signal is calculated using the third and fourth audio sample blocks composed of audio samples down-selected from the first and second audio samples at a predetermined ratio. The similarity is calculated whenever the analysis window is shifted within a predetermined search range. A shift value of the analysis window when the maximum value among the calculated similarity values is provided is obtained(S18).

Description

Audio signal time-scale modification method using variable length synthesis and reduced cross-correlation computations}

본 발명은 오디오 신호의 시간스케일수정(time-scale modification: TSM) 기술에 관한 것으로서, 보다 상세하게는 시간영역에서 처리되어 입력 오디오신호의 샘플링레이트(sampling rate)가 높은 경우에도 실시간 처리가 가능하고, 원래의 오디오신호의 피치 정보(pitch information)의 왜곡을 최소화하면서 시간스케일수정을 할 수 있는 방법에 관한 것이다.The present invention relates to a time-scale modification (TSM) technique of an audio signal. More particularly, the present invention relates to a time-scale modification (TSM) technique of the audio signal. The present invention relates to a method of performing time-scale correction while minimizing distortion of pitch information of an original audio signal.

오디오신호의 재생속도를 정상속도보다 빠르게 혹은 느리게 하기 위해서는오디오신호의 시간스케일을 수정할 필요가 있다. 오디오 신호의 시간스케일수정 방법은 크게 주파수영역 처리방법과 시간영역 처리방법 등으로 구분된다. 주파수영역 처리방법은 FFT(Fast Fourier Transformation)을 이용하므로 이를 구현하기가 어렵고 연산량이 너무 많기 때문에, 일반적으로 실시간 처리가 요구되는 응용분야에는 적합하지 않는 방법으로 평가된다. 이에 비해, 음질과 연산량의 관점에서 볼 때, 시간영역 처리방법은 상대적으로 적은 양의 연산으로 우수한 음질을 제공해줄 수 있는 방법으로 평가되어 특히, 시간스케일수정이 실시간으로 처리될 필요가 있는 응용분야에 효과적으로 적용될 수 있는 방법으로 평가되고 있다.In order to make the playback speed of the audio signal faster or slower than the normal speed, it is necessary to modify the time scale of the audio signal. The time scale correction method of an audio signal is classified into a frequency domain processing method and a time domain processing method. Since the frequency domain processing method uses fast fourier transformation (FFT), it is difficult to implement it and the calculation amount is too large. Therefore, the frequency domain processing method is generally evaluated as an unsuitable method for applications requiring real time processing. On the other hand, from the viewpoint of sound quality and calculation amount, the time domain processing method is evaluated as a method that can provide excellent sound quality with a relatively small amount of operations, and in particular, an application field where time scale correction needs to be processed in real time. It is evaluated in a way that can be applied effectively.

시간영역 처리방법의 기본개념은 중첩가산법(overlap-add: OLA) 이라는 이름으로 소개되었다. 이 방법은 입력 오디오 신호를 부분적으로 오버랩 된 다수의 프레임으로 구분한 다음, 인접 프레임들간의 거리(즉, 시간)를 원하는 시간스케일에 따라 수정하여 출력 오디오 신호로 부가한다. 이 때, 부가하고자 하는 프레임에서 출력 오디오 신호와 오버랩 되는 부분은 가중함수를 적용하여 합성하고, 나머지 부분을 그대로 더하는 방식으로 처리된다. 이러한 OLA가 소개된 이래, TSM 처리된 오디오신호의 음질이 입력 오디오신호의 음질과 보다 유사하도록 하기 위한 방향과 연산량을 줄이는 방향에 초점이 맞춰진 여러 가지 개량된 방법이 소개되었다. 그 중 대표적인 것으로, 동기중첩가산법(synchronized overlap and add : SOLA)과 파형유사도 기반 중첩가산법(waveform similarity based overlap and add: WSOLA)을 들 수 있다.The basic concept of time domain processing was introduced under the name overlap-add (OLA). This method divides the input audio signal into a number of partially overlapped frames, and then modifies the distance (i.e., time) between adjacent frames according to a desired time scale and adds them to the output audio signal. In this case, the overlapping part of the output audio signal in the frame to be added is synthesized by applying a weighting function, and the remaining part is added as it is. Since the introduction of this OLA, various improved methods have been introduced, focusing on the direction of reducing the amount of computation and the direction to make the sound quality of the TSM processed audio signal more similar to that of the input audio signal. Representative examples include synchronized overlap and add (SOLA) and waveform similarity based overlap and add (WSOLA).

SOLA는 분석과 합성이라는 두 단계로 입력신호의 시간스케일을 수정하는데,분석 단계는 OLA와 비슷하게 입력신호를 부분적으로 중첩되는 다수의 창으로 구분하되, 각 창은 고정 길이 N를 가지며 고정된 분석길이 Sa 만큼 서로 이격 한다. 합성 단계에서는, 분석단계에서 확보된 창들을 합성길이 Ss 간격으로 재배치하는데, 이 때 각 창은 이전 창들의 합성인 출력신호와 부분적으로 중첩된다. 각 창을 출력신호에 중첩 시, 분석길이와 합성길이의 차이로 말미암아 유발되는 신호의 불연속을 감소시키기 위해 각 창은 출력신호와 유사도, 즉 상호상관도(cross-correlation)가 가장 높은 지점에서 정렬되도록 하며, 이러한 정렬에 의한 중첩을 동기화 된 중첩이라 한다. 중첩구간의 합성과 비 중첩구간의 단순 부가는 OLA와 동일하게 처리한다. 이 SOLA는 원래의 피치 정보를 최대한 유지하는 방식으로 시간스케일을 수정하므로 시간스케일 수정된 출력신호의 음질이 OLA에 비해 크게 개선시켜준다. 하지만, SOLA는 최대 유사도 지점을 검색하는 동안에 정렬 지점 Km이 바뀌고 그럴 때마다 중첩구간이 변하므로 새로운 상호상관도를 다시 계산해야 하기 때문에, 상호상관도 계산을 매우 복잡하게 하고 그에 따른 연산량도 매우 많다는 단점이 있어, 실시간 처리가 요구되는 응용분야에 적용하기에는 적합하지 못하다. 뿐만 아니라 입력신호와 이를 시간스케일 수정한 출력신호 간의 길이 비가 원하는 시간스케일 값과 정확하게 일치시킬 수 있는 방법을 제공하지 못한다.SOLA modifies the timescale of the input signal in two phases: analysis and synthesis. The analysis phase divides the input signal into multiple overlapping windows, similar to OLA, with each window having a fixed length N and a fixed analysis length. As far as Sa will space each other. In the synthesizing step, the windows secured in the analysis step are rearranged at the synthesizing length Ss interval, where each window partially overlaps the output signal which is the synthesis of the previous windows. When each window is superimposed on the output signal, each window is aligned at the point of similarity, or cross-correlation, with the output signal to reduce the discontinuity of the signal caused by the difference between the analysis and synthesis lengths. This sorting nesting is called synchronized nesting. Synthesis of overlapping sections and simple addition of non-overlapping sections are treated the same as OLA. This SOLA modifies the timescale in such a way as to retain the original pitch information as much as possible, which greatly improves the sound quality of the timescale-modified output signal compared to the OLA. However, because SOLA changes the alignment point Km while retrieving the maximum similarity point, and the overlap interval changes each time, the new cross-correlation needs to be recalculated, which makes the cross-correlation calculation very complicated and the amount of computation required. There are disadvantages, which make them unsuitable for applications requiring real-time processing. In addition, the length ratio between the input signal and the output signal whose time scale is corrected does not provide a method for accurately matching the desired time scale value.

WSOLA는 파형 세그먼트 경계에서 신호의 연속성을 충분히 보장해주기 위해 최대 상관도 지점을 인접 프레임의 파형 유사성을 기준으로 찾아내며, 특히 관련된 샘플 인덱스의 모든 이웃 샘플들에서 시간스케일 수정된 합성신호는 원래의 신호와 최대의 국부유사도(local similarity)를 유지해주는 방법이다. WSOL는 SOLA에 비해주파수 왜곡을 작게 일으키고 연산량도 적긴 하지만, 파형의 유사도를 알아내기 위해 순시 푸리에변환(short time Fourier transform: STFT)을 채용하여야 하므로 연산량 감축에는 한계가 있어, 시간스케일수정이 실시간으로 처리되길 요구하는 응용분야에 적용하기가 쉽지 않다는 문제가 있다.WSOLA finds the maximum correlation point based on the waveform similarity of adjacent frames to ensure sufficient signal continuity at the waveform segment boundaries, especially for all neighboring samples of the associated sample index, where the time-scale modified composite signal And local maximum similarity. Although WSOL produces less frequency distortion and less computation than SOLA, it requires a short time Fourier transform (STFT) to determine the similarity of waveforms. The problem is that it is not easy to apply to applications that require processing.

오디오신호의 시간스케일수정에 있어서 고려하여야 할 중요한 문제 중의 하나는 연산량 감축이다. 연산량이 많으면 시간스케일수정을 실시간으로 처리할 수 없게 되어, 응용범위가 크게 제한을 받는다. 거의 대부분의 시간영역에서 처리되는 TSM 방법들, 예컨대 SOLA, WSOLA 및 이들의 변형된 방법들은 입력신호의 특정 창(또는 프레임)을 출력신호에 중첩 시켜 합할 때, 오디오신호의 스펙트럼 특성(spectral characteristics)이나 피치 주기(pitch periods)를 최대한 원래의 신호(이하에서는 '입력신호'라 함)와 일치시키기 위해, 그 창과 시간스케일 수정된 신호(이하에서는 '출력신호'라 함) 간의 상호상관도(cross-correlation)의 최대값을 제공하는 지점을 찾아서 그 지점에서 합성하는 방식을 이용한다. 그런데 상호상관도의 최대값 지점을 찾아내는 일은 많은 계산을 유발한다. 기존의 방법에 따라 시간스케일수정을 하는 경우, 프로세서(또는 CPU)에 가해지는 부하의 대략 95% 이상이 바로 이 상호상관도의 최대값 지점을 찾아내는 과정에서 발생된다.One of the important issues to consider in the time scale correction of audio signals is the reduction of computation. If the amount of computation is large, the time scale correction cannot be processed in real time, and the application range is greatly limited. TSM methods processed in most time domains, such as SOLA, WSOLA, and their modified methods, combine spectral characteristics of an audio signal when a specific window (or frame) of the input signal is superimposed and summed on the output signal. Or cross-correlation between the window and the time-scale-corrected signal (hereafter referred to as the 'output signal') to match the pitch periods as closely as possible to the original signal (hereinafter referred to as the 'input signal'). Find the point that gives the maximum value of -correlation) and synthesize at that point. However, finding the maximum point of cross-correlation leads to many calculations. In the case of time-scale correction according to the conventional method, approximately 95% or more of the load on the processor (or CPU) is generated in the process of finding the maximum value point of the cross-correlation.

뿐만 아니라, 상호상관도 관련 연산량은 시간스케일수정을 해야 할 입력신호의 샘플링레이트가 높아짐에 따라 기하급수적으로 증가하는 특성을 보인다. 상호상관도의 최대값 산출을 위해서는 이중루프 연산을 수행해야 하고 각 루프의 연산량은 입력신호의 샘플링레이트에 비례하기 때문이다. 즉, 이중루프의 하나는 입력신호의 분석창의 특정구간(중첩구간)과 출력신호의 특정구간(중첩구간) 각각에 속하는 모든 샘플들을 서로 곱하는 연산을 하는 제1 루프이고, 다른 하나는 상기 제1 루프의 곱셈 연산을 상기 분석창을 검색범위에 대하여 한 샘플씩 쉬프팅 하면서 반복적으로 수행하는 제2 루프이다. 각 루프의 연산량은 입력신호의 샘플링레이트에 비례한다. 이들 두 루프를 이중루프방식으로 실행하므로, 연산량은 입력신호의 샘플링레이트에 지수적으로 비례하여 증가한다. 따라서 SOLA나 다른 방법에 비해 연산량이 작다고 알려진 WSOLA 조차도 많은 연산을 필요로 하여, 성능이 좋은 CPU를 채용하는 퍼스널 컴퓨터(PC)에는 적용될 수 있어도 상대적으로 낮은 성능을 갖는 엠베디드 프로세서(embedded processor)에는 적용하기 힘들다.In addition, the cross-correlation-related computations increase exponentially as the sampling rate of the input signal to be timescale corrected increases. This is because the double loop operation must be performed to calculate the maximum value of the cross-correlation, and the calculation amount of each loop is proportional to the sampling rate of the input signal. That is, one of the double loops is a first loop for multiplying all samples belonging to each of a specific section (overlapping section) and an output signal of a specific section (overlapping section) of the input signal, and the other is the first loop. The second loop performs a multiplication operation of the loop repeatedly while shifting the analysis window one sample over the search range. The amount of computation in each loop is proportional to the sampling rate of the input signal. Since these two loops are executed in a double loop scheme, the amount of computation increases exponentially in proportion to the sampling rate of the input signal. Therefore, even WSOLA, which is known to be smaller than SOLA or other methods, requires a lot of operations, and it can be applied to a relatively low performance embedded processor that can be applied to a personal computer (PC) employing a high performance CPU. Difficult to apply

8KHz로 샘플 된 오디오신호의 20ms 패킷(세그먼트)에 대하여, 기존의 TSM은 24,000번의 곱셈과 24,000번의 덧셈의 연산을 필요로 한다. 773MHz 인텔 펜티엄 III 칩 상에서, 8KHz 샘플링레이트 오디오신호의 20ms 패킷에 대한 TSM 연산을 할 때 거의 0.35ms가 소요된다. 44.1KHz 샘플링레이트의 오디오신호의 경우, 773MHz 인텔 펜티엄 III 칩으로 20ms 패킷에 대한 TSM 연산을 하는데 대략 10.64ms를 필요로 한다. 그러므로 CPU 성능은 389 MHz 이상이 되어야 하고, 이는 389 MHz의 CPU 처리능력 전체가 TSM에 할당되어야 함을 의미한다. 96KHz의 샘플링레이트를 갖는 DVD용 오디오신호는, 733 MHz 인텔 펜티엄 III 프로세서에서 조차도 실시간으로 TSM 연산을 하는 것은 불가능하다. 왜냐하면 위와 같은 신호를 TSM 처리를 함에 있어서, 20ms패킷(세그먼트) 당 대략 50.4ms의 시간이 걸리기 때문이다. 44.1KHz의 샘플링레이트를 갖는 오디오신호를 일반적인 SOLA나 WSOLA로 처리하기 위해서는,내장형 프로세서의 최소 389MHz 전체 처리능력이 TSM에 할당되어야 한다. 16KHz 샘플링레이트의 오디오신호의 경우도, TSM 처리를 위해서는 적어도 51MHz 정도의 처리능력이 할당되어야 한다. 상용화된 엠베디드 프로세서의 최고 성능은 아직까지 200MHz 정도의 수준을 넘지 못하고 있다. 엠베디드 프로세서를 채용한 어떤 시스템에 있어서, 그 엠베디드 프로세서에 부과되는 부하는 TSM 처리를 위한 것 이외에도 여러 가지 다른 서비스의 처리를 위한 것도 많다. 따라서 이러한 점을 고려할 때, 8KHz 정도면 모를까 샘플링레이트가 이보다 더 높은 오디오신호에 대해서는, 기존의 TSM 방법에 따른 프로그램을 내장한 엠베디드 프로세서를 이용하여 TSM을 실시간으로 처리하는 것은 현실적으로 불가능에 가깝다고 할 수 있다.For a 20ms packet (segment) of an audio signal sampled at 8KHz, the existing TSM requires 24,000 multiplications and 24,000 additions. On a 773MHz Intel Pentium III chip, it takes nearly 0.35ms to perform a TSM operation on a 20ms packet of 8KHz sample rate audio. For an audio signal with a 44.1KHz sampling rate, it takes approximately 10.64ms to perform a TSM operation on a 20ms packet with a 773MHz Intel Pentium III chip. Therefore, CPU performance should be above 389 MHz, which means that the entire 389 MHz CPU processing capacity should be allocated to TSM. Audio signals for DVDs with a sampling rate of 96KHz are not possible to perform TSM operations in real time, even on 733 MHz Intel Pentium III processors. This is because it takes about 50.4 ms per 20 ms packet (segment) to process the above signal with TSM. In order to process an audio signal with a sampling rate of 44.1KHz with a typical SOLA or WSOLA, a minimum 389MHz total processing capacity of the embedded processor must be allocated to the TSM. Even in the case of an audio signal of 16 KHz sampling rate, at least 51 MHz processing capacity should be allocated for TSM processing. The peak performance of commercially available embedded processors has not yet exceeded 200MHz. In some systems employing embedded processors, the load imposed on the embedded processor is not only for TSM processing but also for processing various other services. Therefore, considering this point, it may be practically impossible to process a TSM in real time using an embedded processor with a built-in program according to the existing TSM method for an audio signal having a sampling rate higher than 8 KHz. Can be.

특히, 최근의 경향은 음질의 고급화 요구를 반영하여 오디오신호의 샘플링레이트는 점점 높아지는 추세를 보인다. 퍼스널 컴퓨터에서 사용하는 WAV 포맷은 최근에는 44.1Khz의 샘플링주파수를 주로 사용하며 MPEG 모노 타입 또한 같은 샘플링주파수로 만들어진다. 나아가, DVD에서는 96Khz 또는 이의 두 배인 192Khz의 샘플링레이트가 사용되기도 한다. 이 같은 고 샘플링레이트의 오디오신호에 대한 TSM을 실시간으로 처리하기 위해서는 프로세서가 할당할 수 있는 처리능력의 범위 내로 연산량을 감축시킬 필요가 있다. 기존의 알려진 TSM 방법들은 이에 대한 해결책을 제시하지 못하고 있다.In particular, the recent trend is that the sampling rate of the audio signal is gradually increasing to reflect the demand for higher quality sound. The WAV format used in personal computers has recently used a sampling frequency of 44.1Khz, and the MPEG mono type is also made with the same sampling frequency. Furthermore, DVDs use a sampling rate of 96 kHz, or twice that of 192 kHz. In order to process TSM for such a high sampling rate audio signal in real time, it is necessary to reduce the amount of computation within the range of processing power that the processor can allocate. Existing known TSM methods do not provide a solution.

연산량을 현저하게 감축시킬 수 있다면, 그로부터 얻어지는 프로세서의 잉여능력의 일부를 음질개선을 위한 처리로 전용할 수 있어 기존의 TSM 방법보다 더 우수한 음질을 얻을 수도 있을 것이다. 사람의 음성은 주파수 대역폭이 넓지 않기 때문에, 상호상관도 최대값을 기준으로 합성하는 종래의 TSM 방법으로도 원음의 피치 정보의 왜곡은 크게 발생시키지 않는다. 하지만, 음악의 경우는 상대적으로 주파수 대역폭이 넓어 종래의 TSM 방법으로 얻어진 출력신호는 피치정보의 왜곡과 노이즈 혼입의 정도가 상대적으로 더 크다. 따라서 음악신호의 경우 음질개선을 위한 부가적인 처리가 더 필요하다.If the amount of computation can be significantly reduced, a part of the surplus capacity of the processor obtained therefrom can be diverted to the processing for improving the sound quality, so that sound quality better than the existing TSM method may be obtained. Since the human voice does not have a wide frequency bandwidth, even in the conventional TSM method of synthesizing on the basis of the maximum correlation, the distortion of the pitch information of the original sound does not occur significantly. However, in the case of music, since the frequency bandwidth is relatively wide, the output signal obtained by the conventional TSM method has a greater degree of distortion and noise mixing of pitch information. Therefore, in the case of music signals, additional processing for improving sound quality is required.

이상의 사항들을 고려하여 본 발명은, 오디오신호를 시간영역에서 시간스케일수정을 위한 처리를 함에 있어서 상호상관도의 최대값을 찾기 위한 연산량을 현저하게 감소시켜, 고 샘플링레이트의 오디오신호에 대한 TSM을 실시간으로 처리할 수 있도록 하는 방법을 제공하는 것을 제1의 목적으로 한다.In view of the above, the present invention significantly reduces the amount of computation for finding the maximum value of cross-correlation in processing a time scale correction of an audio signal in a time domain, thereby reducing TSM for an audio signal having a high sampling rate. It is a first object to provide a method for processing in real time.

본 발명은 또한, 입력신호에서 정해진 분석창을 출력신호에 중첩합산 할 때 분석창의 중첩위치와 분석창과 출력신호 간의 중첩구간을 상호상관도 이외에 상관계수(coefficient of correlation)라는 평가지표를 더 고려하여 정함으로써 출력신호의 피치정보를 입력신호의 그것에 보다 더 가깝게 유지되도록 하는 방법을 제공하는 것을 제2의 목적으로 한다.In addition, when the analysis window defined in the input signal is overlaid on the output signal, the present invention further considers an evaluation index called a coefficient of correlation in addition to the correlation between the overlapping position of the analysis window and the overlapping interval between the analysis window and the output signal. It is a second object to provide a method by which the pitch information of the output signal is kept closer to that of the input signal.

도 1A는 본 발명의 TSM 방법(Reduced computations and variable synthesis based TSM: RCVS-TSM)에 따른 입력신호의 분석창을 쉬프팅하면서 그 분석창과 출력신호 간의 최대 상호상관도 지점을 결정하는 방법을 설명하기 위한 도면이다.1A illustrates a method of determining a maximum cross-correlation point between an analysis window and an output signal while shifting an analysis window of an input signal according to the TSM method (Reduced computations and variable synthesis based TSM: RCVS-TSM) of the present invention. Drawing.

도 1B는 본 발명의 RCVS-TSM 방법에 따른 입력신호의 분석창과 출력신호 간의 중첩구간의 결정과 합성 방법을 설명하기 위한 도면이다.FIG. 1B is a view for explaining a method of determining and combining overlapping sections between an analysis window and an output signal of an input signal according to the RCVS-TSM method of the present invention.

도 2는 본 발명의 RCVS-TSM 방법에 따른, 입력신호의 분석창과 출력신호 간의 중첩구간에 대한 상호상관도 계산방법을 설명하기 위한 도면이다.2 is a view for explaining a method of calculating the cross-correlation for the overlap section between the analysis window and the output signal of the input signal according to the RCVS-TSM method of the present invention.

도 3A는 원하는 시간스케일이 2일 때(α= 2) 입력신호에서 분석창을 정하는 방법을 보여준다.3A shows a method of determining an analysis window from an input signal when a desired time scale is 2 (α = 2).

도 3B는 도 3A에서 정해진 분석창들을 정해진 최대 상호상관도와 중첩구간을 적용하여 중첩합산 하여 출력신호를 합성하는 방법을 보여준다.FIG. 3B illustrates a method of synthesizing an output signal by superimposing the analysis windows defined in FIG. 3A by applying a predetermined maximum cross-correlation and overlapping intervals.

도 4A는 원하는 시간스케일이 0.5일 때( α= 0.5) 입력신호에서 분석창을 정하는 방법을 보여준다.4A shows a method of determining an analysis window from an input signal when a desired time scale is 0.5 (α = 0.5).

도 4B는 도 4A에서 정해진 분석창들을 정해진 최대 상호상관도와 중첩구간을 적용하여 중첩합산 하여 출력신호를 합성하는 방법을 보여준다.4B illustrates a method of synthesizing an output signal by overlapping and adding the analysis windows defined in FIG. 4A by applying a predetermined maximum cross-correlation and overlapping intervals.

도 5는 본 발명에 따른 RCVS-TSM 방법의 전체적인 실행절차를 도시한 흐름도이다.5 is a flowchart illustrating the overall execution of the RCVS-TSM method according to the present invention.

도 6은 도 5에 도시된 흐름도의 S18 단계(상호상관도의 최대값과 그 때의 분석창의 쉬프트값을 정하는 단계)의 상세 절차를 도시한 흐름도이다.FIG. 6 is a flowchart showing the detailed procedure of step S18 (setting the maximum value of the cross-correlation diagram and the shift value of the analysis window at that time) of the flowchart shown in FIG. 5.

도 7은 도 5에 도시된 흐름도의 S20 단계(분석창과 출력신호 간의 상관계수에 의거하여 중첩구간을 정하는 단계)의 상세 절차를 도시한 흐름도이다.FIG. 7 is a flowchart illustrating a detailed procedure of step S20 (setting an overlap section based on a correlation coefficient between an analysis window and an output signal) of the flowchart illustrated in FIG. 5.

도 8은 본 발명의 방법을 실행하는 데 필요한 자원들을 갖춘 장치의 블록도이다.8 is a block diagram of an apparatus with resources needed to implement the method of the present invention.

상기 제1의 목적을 달성하기 위한 본 발명에 따르면, 오디오 샘플들의 입력스트림으로 구성되는 입력신호를 원하는 시간스케일로 수정된 출력신호로 형성하는오디오신호의 시간스케일수정 방법에 있어서, 상기 입력스트림에서 제1 소정개수의 오디오 샘플로 구성되는 분석창(analysis window)을 정하는 단계; 상기 분석창의 Nov 개의 제1 오디오 샘플들과 상기 출력신호의 Nov 개의 제2 오디오 샘플들 사이의 유사도를 상기 제1 및 제2 오디오 샘플들로부터 각각 소정비율로 감소선택 된(down-selected) 오디오 샘플들로 구성되는 제3 및 제4 오디오 샘플블럭을 이용하여 계산하되, 상기 유사도 계산은 상기 분석창을 소정의 검색범위 내에서 쉬프트할 때마다 반복하는 단계; 및 계산된 유사도 값들 중에서 최대값이 제공될 때의 상기 분석창의 쉬프트값 Km을 구하는 단계를 포함하는 것을 특징으로 하는 오디오신호의 시간스케일수정 방법이 제공된다.According to the present invention for achieving the first object, in the time scale correction method of an audio signal to form an input signal consisting of an input stream of audio samples as an output signal modified to a desired time scale, Determining an analysis window consisting of a first predetermined number of audio samples; The similarity between the Nov first audio samples of the analysis window and the Nov second audio samples of the output signal is down-selected from each of the first and second audio samples by a predetermined ratio. Calculating using the third and fourth audio sample blocks, wherein the similarity calculation is repeated each time the analysis window is shifted within a predetermined search range; And calculating a shift value Km of the analysis window when a maximum value is provided among the calculated similarity values.

상기 방법은, 상기 쉬프트값 Km과, 상기 분석창과 상기 출력신호 간의 상관계수(coefficient of correlation)가 소정의 기준값 이상이거나 또는 최대값을 제공할 때의 최적 중첩구간 Nm에 의거하여 N+Nm-Nov 개의 (단, 상기 N은 상기 제1 소정개수에서 상기 분석창과 상기 출력신호 간의 유사도 검색범위 Kmax를 뺀 값임) 오디오 샘플을 부가프레임(add frame)으로 정하는 단계를 더 포함하는 것이 바람직하다.The method includes N + Nm-Nov based on an optimal overlapping section Nm when the shift value Km and the coefficient of correlation between the analysis window and the output signal are equal to or greater than a predetermined reference value or provide a maximum value. Preferably, the method further comprises the step of determining an additional frame (where N is a value obtained by subtracting the similarity search range Kmax between the analysis window and the output signal from the first predetermined number).

나아가, 상기 방법은 상기 부가프레임의 맨 앞에서부터 Nm개의 오디오 샘플과 상기 출력신호의 맨 끝에서 Nm개의 오디오 샘플을 가중함수(weighting function)의 적용 하에 중첩합산 하여 중첩-합산 블록(overlap-add block)을 형성하는 단계; 및 상기 중첩-합산 블록을 상기 출력신호의 맨 끝에서 상기 Nm개의 오디오 샘플 대신에 치환해 넣고, 상기 부가프레임의 나머지 오디오 샘플은 상기 중첩-합산블록 끝에 그대로 부가하는 단계를 더 포함하는 것이 바람직하다.Furthermore, the method overlaps and adds Nm audio samples from the front of the additional frame and Nm audio samples at the end of the output signal by applying a weighting function to overlap-add block. Forming); And substituting the overlap-summing block instead of the Nm audio samples at the end of the output signal, and adding the remaining audio samples of the additional frame to the end of the overlap-summing block as they are. .

한편, 상기 제2의 목적을 달성하기 위한 본 발명에 따르면, 오디오 샘플들의 입력스트림으로 구성되는 입력신호를 원하는 시간스케일로 수정된 출력신호로 형성하는 오디오신호의 시간스케일수정 방법에 있어서, 상기 입력스트림에서 N+Kmax개의(단, 상기 N과 상기 Kmax는 상수임) 오디오 샘플로 구성되는 분석창을 정하는 단계; 상기 분석창을 소정의 검색범위 내에서 쉬프트 시키면서, 상기 분석창의 Nov개의 오디오 샘플들과 상기 출력신호의 맨 끝에서부터 Nov개의 오디오 샘플들 사이의 유사도의 최대값과 상기 Nov 값을 여러 가지 값으로 변경시키면서 상관계수(coefficient of correlation)를 계산하는 단계; 상기 분석창의 맨 앞에서 Km+Nov-Nm번째의 오디오 샘플에서부터 N+Nm-Nov개의 오디오 샘플을 부가프레임(add frame)으로 정하고, 여기서 상기 Km은 상기 유사도의 최대값이 제공될 때의 상기 분석창의 쉬프트값이며, 상기 Nm은 상기 분석창과 상기 출력신호 간의 상관계수(coefficient of correlation)가 소정의 기준값 이상이거나 또는 최대값을 제공할 때의 최적 중첩구간이며, 상기 N은 상기 제1 소정개수에서 상기 분석창과 상기 출력신호 간의 유사도 검색범위 Kmax를 뺀 값으로 정의되는 단계; 상기 부가프레임의 맨 앞에서부터 상기 최적 중첩구간 Nm개의 오디오 샘플들과 상기 출력신호의 맨 끝에서 상기 최적 중첩구간 Nm개의 오디오 샘플들을 가중함수(weighting function)의 적용 하에 중첩합산 하여 중첩-합산 블록(overlap-add block)을 형성하는 단계; 및 상기 중첩-합산 블록을 상기 출력신호의 맨 끝에서 상기 최적 중첩구간 Nm개의 오디오 샘플들 대신에 치환해 넣고, 상기 부가프레임의 나머지 오디오샘플들은 상기 중첩-합산 블록의 끝에 단순히 부가하는 단계를 구비하는 것을 특징으로 하는 오디오신호의 시간스케일수정 방법이 제공된다.Meanwhile, according to the present invention for achieving the second object, in the time scale correction method of an audio signal, the input signal consisting of an input stream of audio samples is formed into an output signal modified to a desired time scale. Determining an analysis window consisting of N + Kmax audio samples, wherein N and Kmax are constant; By shifting the analysis window within a predetermined search range, the maximum value of the similarity between the Nov audio samples and the Nov audio samples from the end of the output signal and the Nov value are changed to various values. Calculating a coefficient of correlation; In the front of the analysis window, N + Nm-Nov audio samples are defined as add frames from the Km + Nov-Nm audio samples, where Km is the analysis window when the maximum value of the similarity is provided. Is a shift value, and Nm is an optimal overlap period when a coefficient of correlation between the analysis window and the output signal is equal to or greater than a predetermined reference value or provides a maximum value, and N is the first predetermined number. Defining a value obtained by subtracting the similarity search range Kmax between the analysis window and the output signal; The overlapping-summing block is formed by overlapping the optimal overlapping section Nm audio samples from the front of the additional frame and the optimal overlapping section Nm audio samples at the end of the output signal by applying a weighting function. forming an overlap-add block); And replacing the overlap-summing block at the end of the output signal instead of the Nm audio samples at the optimal overlap period, and simply adding the remaining audio samples of the additional frame to the end of the overlap-summing block. A time scale correction method of an audio signal is provided.

상기 제1의 목적 도는 제2의 목적을 달성하기 위한 본 발명의 방법에 있어서, 상기 제3 및 제4 오디오 샘플블럭을 구성하는 오디오 샘플들은 샘플인덱스가 M₁(단, M₁는 2보다 큰 자연수) 만큼의 차이를 가진다. 또한, 상기 제1 소정개수는 N+Kmax(단, N과 Kmax는 상수임)이고, 상기 검색범위는 Kmax개의 오디오 샘플구간이며, 상기 분석창의 쉬프트는 M₂개 (단, M₂는 2이상의 자연수)의 오디오 샘플을 규칙적으로 건너뛰는 방식으로 이루어진다. 그리고, 상기 유사도는 상호상관도의 계산에 의해 정해지는 것이 바람직하다.In the method of the present invention for achieving the first object or the second object, the audio samples constituting the third and fourth audio sample blocks have a sample index of M ₁ (wherein M ₁ is larger than 2). Natural number). The first predetermined number is N + Kmax (where N and Kmax are constants), the search range is Kmax audio sample intervals, and the shift of the analysis window is M ₂ (where M ₂ is 2 or more). Audio samples of natural numbers). In addition, the similarity is preferably determined by the calculation of cross-correlation.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 예를 상세하게 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

입력신호는 TSM 처리 대상인 원래의 오디오신호를 나타내며, 출력신호는 TSM 처리에 의해 얻어지는 오디오신호를 나타낸다. 입력신호는 아날로그 오디오신호를 샘플링 하여 양자화 한 샘플신호의 스트림으로 표현된 신호이다.The input signal represents the original audio signal subject to TSM processing, and the output signal represents the audio signal obtained by TSM processing. The input signal is a signal represented by a stream of sample signals obtained by sampling and quantizing analog audio signals.

아래에서 설명되는 여러 가지 처리는 RCVS-TSM 알고리즘을 프로그램으로 구현하여 그 엔진프로그램을 프로세서에 의해 차례로 실행하는 방식으로 이루어진다. 따라서 본 발명을 실행하기 위한 장치는 도 8에 도시된 것처럼 기본적으로는, 엔진프로그램을 저장하기 위한 ROM 같은 불휘발성 메모리(84), 그 엔진프로그램을 읽어와서 각 명령어를 차례로 실행하는 것에 의해 입력신호의 RCVS-TSM 처리를 수행하는 프로세서(80), 그리고 RCVS-TSM 처리 전의 입력신호를 임시로 저장하는 입력버퍼 메모리(82a)와 RCVS-TSM 처리 후의 출력신호를 임시로 저장하는 출력버퍼 메모리(82b) 등과 같이 프로세서(80)의 데이터처리 공간을 제공하는 메모리(82) 자원이 필요하다. 그 밖에도 본 발명은 사용자가 지정하는 시간스케일 α값을 받아들여 그 값에 따라서 RCVS-TSM 처리를 할 수 있도록 하기 위해, 사용자가 원하는 시간스케일 α, 즉 오디오신호의 배속값을 설정할 수 있고 설정된 배속값을 읽어 들여 그 시점이후부터 이루어지는 프로세서(80)에 의한 RCVS-TSM 처리에 반영되도록 하는 수단들, 즉 입력 키패드 또는 리모콘 등과 같은 사용자 입력수단(86)도 필요하다. 나아가 RCVS-TSM 처리 대상인 입력신호를 RCVS-TSM 처리를 위해 입력버퍼(82a)로 제공하는 입력신호 제공부(88)와, RCVS-TSM 처리 결과 얻어진 출력신호를 출력버퍼(82b)로부터 넘겨받아 오디오재생을 위한 처리를 하는 오디오재생부(90) 등도 구비될 필요가 있다.The various processes described below are implemented by implementing the RCVS-TSM algorithm into a program and executing the engine program in sequence by the processor. Therefore, the apparatus for implementing the present invention basically shows a nonvolatile memory 84, such as a ROM for storing an engine program, an input signal by reading the engine program and executing each instruction in turn. Processor 80 for performing RCVS-TSM processing, and an input buffer memory 82a for temporarily storing input signals before RCVS-TSM processing, and an output buffer memory 82b for temporarily storing output signals after RCVS-TSM processing Memory 82 resources, which provide data processing space of the processor 80, are required. In addition, the present invention can set the desired time scale α, i.e., the speed value of the audio signal, in order to accept the time scale α value designated by the user and perform RCVS-TSM processing according to the value. There is also a need for means for reading the value and reflecting it to the RCVS-TSM processing by the processor 80 from that point on, i.e., user input means 86 such as an input keypad or remote control. Furthermore, the input signal providing unit 88 which provides the input signal targeted for RCVS-TSM processing to the input buffer 82a for RCVS-TSM processing and the output signal obtained as a result of the RCVS-TSM processing are transferred from the output buffer 82b. It is also necessary to provide an audio reproducing unit 90 or the like for processing for reproduction.

이들 자원들은 퍼스널 컴퓨터의 경우처럼 각각이 독립적인 칩으로 존재할 수도 있고, 하나 또는 수개의 칩에 통합되어 있을 수도 있다. 그러므로 특별히 명시하지 않는 한, 위와 같은 자원들이 공동으로 기능하여 아래에 설명하는 RCVS-TSM 처리를 하는 것으로 이해될 필요가 있다. 프로세서(80)는 예컨대 디지털신호처리기(DSP), 마이콤 또는 중앙연산처리유닛(CPU) 등으로 구현할 수 있고, 또는 특정 용도로 만들어진 오디오칩, 오디오/비디오칩, MPEG 칩, DVD 칩 등이 될 수도 있다.These resources may exist as separate chips, as in the case of personal computers, or may be integrated into one or several chips. Therefore, unless otherwise specified, these resources need to be understood to function together to perform the RCVS-TSM processing described below. The processor 80 may be implemented with, for example, a digital signal processor (DSP), a microcomputer or a central processing unit (CPU), or the like, or may be an audio chip, an audio / video chip, an MPEG chip, a DVD chip, or the like made for a specific purpose. have.

본 발명의 RCVS-TSM 방법은 크게 분석과 합성이라는 처리로 이루어진다. 도3A 또는 4A를 참조하면, 입력신호를 N+Kmax 개의 샘플들로 구성되는 연속적인 분석창 Wm (단, m = 1, 2, 3, )들로 구분하고, 각 분석창 Wm을 RCVS-TSM을 위한 분석의 단위로 삼는다. 각 분석창 Wm의 시작점은 입력신호에서 mSa 번째의 샘플이 된다. 그러므로 Sa는 연속적인 분석창들의 시작점 간격(이하 '분석간격'이라 함)을 의미한다. 여기서, m은 프레임 인덱스를 나타내며, N은 한 기준 프레임(F₀)의 샘플 수를 나타낸다. 분석창 Wm은 분석창과 출력신호 간의 최대 상호상관도를 제공해주는 지점을 찾기 위해 출력신호를 따라서 쉬프트 주기마다 일정한 샘플간격 M₂만큼씩 거꾸로 쉬프트한다. 이러한 분석창 Wm의 쉬프트는 Kmax개의 샘플범위 내에서 이루어진다. Kmax는 분석창을 쉬프트하는 샘플개수의 최대값 즉, 상호상관도의 최대값을 제공하는 위치를 분석하는 검색범위를 나타낸다. Km은 분석창 Wm이 상호상관도의 최대값이 제공되는 지점까지 쉬프트한 거리 즉, 쉬프트 샘플 수를 나타내며, 그 크기는 항상 Kmax 를 초과할 수 없다.The RCVS-TSM method of the present invention largely consists of a process called analysis and synthesis. Referring to FIG. 3A or 4A, the input signal is divided into successive analysis windows Wm (where m = 1, 2, 3,) composed of N + Kmax samples, and each analysis window Wm is RCVS-TSM. As a unit of analysis for The starting point of each analysis window Wm is the mSa th sample of the input signal. Therefore, Sa means the starting point interval of consecutive analysis windows (hereinafter referred to as 'analysis interval'). Here, m represents a frame index and N represents a sample number of one reference frame F ₀ . The analysis window Wm shifts backwards by a constant sample interval M _{2 for} each shift period along the output signal to find a point that provides the maximum cross-correlation between the analysis window and the output signal. The shift of the analysis window Wm is within the Kmax sample range. Kmax represents a search range for analyzing a position providing a maximum value of the number of samples shifting the analysis window, that is, a maximum value of cross-correlation. Km represents the distance that the analysis window Wm shifts to the point at which the maximum value of the cross-correlation is provided, i.e., the number of shift samples, whose magnitude cannot always exceed Kmax.

도 1A와 1B를 참조하면, 분석창 Wm을 출력신호에 합성할 때, 합성간격 Ss가 기준이 된다. 합성간격 Ss는 분석간격 Sa과는 Ss = αSa 의 관계를 갖는다. 합성간격 Ss는 고정값이다. 따라서 원하는 시간스케일 α의 값이 주어지면 분석간격 Sa는 위 관계식에 의해 그 값이 결정된다. 합성간격 Ss는 분석창 Wm을 출력신호와의 관계에서 상호상관도의 최대값을 찾기 위한 쉬프트의 출발점이 된다. 즉, 분석창 Wm의 맨 앞을 출력신호의 mSs 번째 샘플지점에 위치시켜 상호상관도의 최대값 산출을 위한 연산을 시작한다. 합성은 각 분석창 Wm을 분석단계에서 찾아진 최대 상호상관도 위치에서 출력신호의 일부와 중첩 시키면서 합한다. 이 때, 중첩구간도 그 값을 여러 가지 크기로 변경시키면서 분석창 Wm과 출력신호 간의 상관계수를 계산하여 소정의 조건을 만족하는 경우에 대응되는 중첩구간을 실제 적용할 중첩구간 Nm으로 확정한다. 프로그램으로의 구현을 용이하게 하기 위해 중첩구간은 최대 Nov 구간에서 시작하여 일정한 비율 내지 간격으로 줄여가는 방식으로 변경하는 것이 바람직하다. 중첩구간 Nm이 확정되면 분석창 Wm에서 부가프레임(20)을 확정한다. 그리고 출력신호의 맨 끝에서부터 Nm개의 샘플(45)과 부가프레임(20)의 맨 앞의 Nm개의 샘플(35)을 가중함수를 적용하면서 합하여 새로운 합성블록(40)을 만들고 이를 출력신호의 기존 샘플(45)을 대신하여 출력신호에 치환해넣은 다음 그 뒤에 부가프레임(20)의 나머지 샘플들을 단순히 부가하는 방식으로 합성한다.1A and 1B, when synthesizing the analysis window Wm to the output signal, the synthesis interval Ss becomes a reference. The synthetic interval Ss has a relationship of Ss = αSa with the analysis interval Sa. The composite interval Ss is a fixed value. Therefore, given the desired value of time scale α, the analysis interval Sa is determined by the above relation. The synthesis interval Ss is the starting point of the shift for finding the maximum value of the correlation between the analysis window Wm and the output signal. That is, the calculation for calculating the maximum value of cross-correlation is started by placing the front of the analysis window Wm at the mSs th sample point of the output signal. The synthesis sums each analysis window Wm overlapping a part of the output signal at the position of the maximum cross-correlation found in the analysis step. At this time, while the overlap section is changed to various sizes, the correlation coefficient between the analysis window Wm and the output signal is calculated and the overlap section corresponding to the case where the predetermined condition is satisfied is determined as the overlap section Nm to be actually applied. In order to facilitate the implementation of the program, it is preferable to change the overlapping section in such a manner that it starts at the maximum Nov section and decreases by a constant ratio or interval. When the overlap section Nm is determined, the additional frame 20 is determined in the analysis window Wm. Then, Nm samples 45 from the end of the output signal and the Nm samples 35 at the beginning of the additional frame 20 are summed by applying a weighting function to form a new composite block 40, and the existing sample of the output signal. Substituting the output signal in place of (45), and then synthesized by simply adding the remaining samples of the additional frame (20).

이러한 기본 개념을 바탕으로 한 본 발명의 RCVS-TSM 알고리즘의 전체적인 실행절차가 도 5의 흐름도에 개략적으로 도시되어 있다. RCVS-TSM 방법의 실행은 먼저, 입력신호의 헤더에 기록된 정보를 참조하여 재생에 필요한 정보를 알아내고, RCVS-TSM 처리를 위해 필요한 여러 가지 초기화를 수행하는 것으로 시작한다 (S10 단계). 입력신호의 헤더에는 샘플링레이트와 샘플링사이즈 등과 같은 그 오디오 입력신호에 관한 기본정보가 기록되어 있으므로, 샘플링레이트 정보를 최대 상호상관도 지점을 찾는 데 발생하는 연산량을 감소시키는 데 활용할 수도 있다. 이에 관한 자세한 사항은 후술한다. 초기화 단계에서는 다음과 같은 여러 가지 변수들의 초기화를 수행한다. 우선, N, Nov, Ss, Kmax 등에 대하여 적절한 크기의 값을 부여한다. 특히 Kmax는 그 값이 크면 TSM된 신호의 음질은 좋아지지만 그 좋아지는 정도는 그 값이 커질수록 포화되는 반면 연산량이 점점 늘어난다. 따라서 음질의 개선정도가 완만해지는 지점 부근에서 Kmax의 값을 적절하게 선택하는 것이 바람직하다. 원하는 시간스케일 α의 값이 주어지면 관계식 Sa = Ss/α을 이용하여 Sa의 값을 결정하는 것도 초기화 단계에서 처리해야 할 사안이다.The overall execution procedure of the RCVS-TSM algorithm of the present invention based on this basic concept is schematically illustrated in the flowchart of FIG. Execution of the RCVS-TSM method begins by first identifying information necessary for reproduction by referring to the information recorded in the header of the input signal, and performing various initializations necessary for RCVS-TSM processing (step S10). Since the basic information about the audio input signal such as the sampling rate and the sampling size is recorded in the header of the input signal, the sampling rate information can also be used to reduce the amount of computation occurring in finding the maximum cross-correlation point. Details thereof will be described later. In the initialization phase, the following variables are initialized. First, values of appropriate magnitudes are given to N, Nov, Ss, Kmax and the like. In particular, the larger the value of Kmax, the better the sound quality of the TSM-signaled signal. However, the higher the value, the more saturated the signal is. Therefore, it is desirable to appropriately select the value of Kmax near the point where the improvement of sound quality is gentle. Given the value of the desired timescale α, determining the value of Sa using the relation Sa = Ss / α is also an issue that needs to be addressed in the initialization phase.

초기화를 한 다음, 맨 먼저 하는 일은 입력신호의 최초 N개의 샘플을 출력신호로서 그대로 복사하는 것이다. 복사된 N개의 샘플은 출력신호의 첫번째 프레임 F₀을 구성한다(S12 단계). 프레임 하나에 대한 처리가 이루어졌으므로 프레임 인덱스 m의 값을 1로 설정한다 (S14 단계).After initialization, the first thing to do is to copy the first N samples of the input signal as they are. The copied N samples constitute the first frame F ₀ of the output signal (step S12). Since the processing for one frame is performed, the value of the frame index m is set to 1 (step S14).

프레임 인덱스 m의 값을 1씩 증가시키면서 입력신호의 끝을 만날 때까지 S16~S28의 단계들로 구성된 루프를 반복적으로 실행하면서 입력신호에 대한 TSM 처리를 한다. 이 루프를 한 번 돌 때마다 출력신호는 한 프레임씩 늘어난다. 이러한 측면에서 이 루프를 프레임 루프라 부를 수 있다. 프레임 루프의 앞의 세 단계 S16, S18, S20은 위에서 언급한 '분석' 단계에 관련되고, 뒤의 세 단계 S22, S24, S26은 '합성'단계에 관련된다. 이를 구체적으로 설명하기로 한다.TSM processing is performed on the input signal while repeatedly executing the loop composed of steps S16 to S28 until the end of the input signal is increased while increasing the value of the frame index m by one. Each time through this loop, the output signal is increased by one frame. In this respect, this loop can be called a frame loop. The first three steps S16, S18 and S20 of the frame loop are related to the 'analysis' step mentioned above, and the following three steps S22, S24 and S26 are related to the 'synthesis' step. This will be described in detail.

먼저, '분석'단계의 첫 절차로서, 앞서 언급한 바와 같이, 입력신호에서 분석창 Wm을 구성하는 샘플들을 결정한다(S16 단계). 입력신호는 오디오신호의 샘플스트림으로 구성되는데 다수의 연속적인 분석창으로 구분한다. m번째 분석창 Wm은 mSa 번째 샘플부터 N+Kmax 개의 샘플로 구성되며 이를 하나의 분석단위로 취급한다. 분석창 Wm은 주어진 시간스케일 α의 값이 1보다 클 때 (정상재생속도보다 더느리게 재생하고자 하는 경우로서, 도 3A가 이에 해당)나 또는 1보다 작을 때 (정상재생속도보다 더 빠르게 재생하고자 하는 경우로서, 도 4A가 이에 해당)나 언제나 동일한 수 N+Kmax 개의 샘플로 구성한다.First, as the first procedure of the 'analysis' step, as described above, the samples constituting the analysis window Wm from the input signal is determined (step S16). The input signal consists of a sample stream of the audio signal, which is divided into a number of consecutive analysis windows. The mth analysis window Wm is composed of N + Kmax samples starting with the mSath sample and treated as one analysis unit. The analysis window Wm is to be played back when the value of the given time scale α is larger than 1 (slower than the normal playback speed, which is equivalent to FIG. 3A) or when it is smaller than 1 (faster than the normal playback speed). 4A corresponds to this) or always consists of the same number N + Kmax samples.

'분석'단계의 두 번째 절차는 분석창 Wm을 Kmax개의 샘플 범위 내에서 출력신호를 따라서 쉬프팅하면서 분석창 Wm과 출력신호의 Nov구간에 대하여 상호상관도의 최대값과 그 최대값이 제공될 때의 분석창 Wm의 쉬프트 값 Km을 각각 산출하는 것이다 (S18 단계). 본 발명의 RCVS-TSM 알고리즘은 이 단계에서 연산량을 대폭적으로 줄일 수 있는 방식을 도입한다.The second procedure of the 'Analysis' step shifts the analysis window Wm along the output signal within the Kmax sample range while providing the maximum value of the cross-correlation and the maximum value for the Nov section of the analysis window Wm and the output signal. It is to calculate the shift value Km of the analysis window Wm, respectively (step S18). The RCVS-TSM algorithm of the present invention introduces a scheme that can greatly reduce the amount of computation at this stage.

연산량 감축을 위한 첫번째 방안은 상호상관도의 계산에 참여하는 샘플 수를 줄이는 것이다. 연산량 감축을 위한 두 번째 방안은 분석창 Wm의 쉬프트 간격 M₂을 늘이는 것이다. 이들 두 방안은 모두 적용할 수도 있고 어느 하나만을 적용할 수도 있는데, 둘 다 적용할 때가 연산량 감축효과가 가장 크게 나타나는 것은 당연하다.The first way to reduce the amount of computation is to reduce the number of samples involved in the calculation of cross-correlation. The second way to reduce the amount of computation is to increase the shift interval M ₂ in the analysis window Wm. Both of these methods can be applied or only one of them can be applied, and it is natural to see the greatest reduction in throughput when both are applied.

상호상관도 계산은 Nov개의 샘플이 포함될 수 있는 넓이를 갖는 중첩구간(17)에 걸쳐지는 분석창 Wm의 샘플과 출력신호의 샘플에 대해서 이루어진다. 출력신호는 고정되어 있고 분석창 Wm만을 쉬프트시키므로, 상호상관도 계산에 참여하는 출력신호는 항상 맨 끝의 Nov개의 샘플구간(15)으로 일정한 반면, 상호상관도 계산에 참여하는 분석창 Wm은 한 번씩 쉬프트 할 때마다 Nov구간이 M₂개의 샘플만큼 우측으로 이동한다. 즉, 맨 처음에는, 분석창 Wm의 맨 앞의 Nov개의 샘플구간(10)과 출력신호의 맨 끝의 Nov개의 샘플구간(15)에 대하여 수행된다(도 1A 참조). 그런 다음, 분석창 Wm은 M₂개 샘플만큼 좌측으로 쉬프트 되고, 이 때 중첩구간(17)에 걸쳐지는 분석창 Wm의 Nov 개의 샘플구간과 출력신호의 Nov개의 샘플구간 사이에 대하여 다시 상호상관도 계산이 수행된다. 이런 과정을 분석창 Wm의 쉬프트값 M₂이 검색구간 Kmax를 벗어나기 직전까지 반복적으로 수행한다.The cross-correlation calculation is made for the sample of the analysis window Wm and the sample of the output signal that spans the overlap section 17 having an area that can contain Nov samples. Since the output signal is fixed and shifts only the analysis window Wm, the output signal participating in the cross-correlation calculation is always constant at the last Nov sample intervals (15), while the analysis window Wm participating in the cross-correlation calculation is one. Each shift shifts the Nov section to the right by M ₂ samples. That is, at the beginning, it is performed for the Nov sample sections 10 at the front of the analysis window Wm and the Nov sample sections 15 at the end of the output signal (see FIG. 1A). Then, the analysis window Wm is shifted to the left by M ₂ samples, and again cross-correlation between the Nov sample sections of the analysis window Wm and the Nov sample sections of the output signal spanning the overlap section 17. The calculation is performed. This process is repeatedly performed until the shift value M ₂ of the analysis window Wm leaves the search section Kmax.

본 발명은 상호상관도를 계산함에 있어서, 중첩구간(17)에 포괄되는 분석창 Wm과 출력신호 각각의 전체 샘플들이 아닌 그들 중 일부만을 선택하여 그 선택된 샘플들에 대해서만 상호상관도를 계산하는 방식을 취한다. 도 2에서, 중첩구간(17)이 10개의 샘플로 이루어지는 경우라 할 때 상호상관도에 참여하는 샘플들은 3개씩 건너뛴 샘플들 즉, 분석창 Wm(50)에서 선택된 3개 샘플 (x_m0, x_m4, x_m8)과 출력신호(55)에서 선택된 3개의 샘플들 (y_m0, y_m4, y_m8) 사이에 대해서 상호상관도를 계산한다. 어느 정도의 비율로 연산량을 줄일 것인지는 본 발명의 RCVS-TSM 엔진이 적용되는 프로세서의 성능과 입력신호의 샘플링레이트 등을 고려하여 적합하게 결정하면 될 것이다. 이와 같은 발췌계산 방식은 상관도 계산에 참여한 샘플의 개수가 줄어들었으므로 중첩구간에 속하는 모든 샘플들에 대하여 상호상관도를 계산하던 종래의 TSM 방식에 비해서는 상호상관도의 최대값이 제공되는 지점의 정확도가 낮을 수는 있지만, 그 정도는 무시할 수 있다.The present invention calculates the cross-correlation only for the selected samples by selecting only some of them rather than all samples of the analysis window Wm and the output signal included in the overlap section 17 in calculating the cross-correlation. Take In FIG. 2, when the overlap section 17 is composed of ten samples, the samples participating in the cross-correlation are three skipped samples, that is, three samples selected from the analysis window Wm 50 (x _m0 , x _m4 , x _m8 ) and the cross-correlation for the three samples (y _m0 , y _m4 , y _m8 ) selected in the output signal 55 are calculated. The ratio of the amount of calculation to be reduced may be appropriately determined in consideration of the performance of the processor to which the RCVS-TSM engine of the present invention is applied and the sampling rate of the input signal. Since the excerpt calculation method reduces the number of samples involved in the correlation calculation, the point where the maximum value of the correlation is provided in comparison with the conventional TSM method that calculates the correlation for all samples in the overlapping interval. May be low, but that can be neglected.

2이상의 일정한 샘플간격 M₁마다 하나씩 골라낸 샘플들을 이용하여 상호상관도를 계산하므로 입력신호 실제 파형 패턴을 거의 그대로 유지할 수 있을 뿐만아니라, 발생 가능한 최대상관도 지점의 오차범위도 M₁/2개의 샘플을 넘지 않고, 사람의 청각능력에서 볼 때 이 정도의 오차에서 비롯되는 노이즈는 전혀 인지할 수 없는 정도이므로 무시될 수 있다.By calculating the cross-correlation using samples selected _one by one for every two or more constant sample intervals M ₁ , not only the actual waveform pattern of the input signal can be maintained almost as it is, but also the error range of the maximum possible correlation point is M _1/2 . Without exceeding the sample, the noise resulting from this degree of error in human hearing capacity is completely unrecognizable and can be ignored.

이와 더불어, 분석창 Wm의 쉬프트 값 M₂도 2보다 큰 자연수 중에서 어느 하나를 적용하면 마찬가지로 연산량 감축효과를 얻을 수 있다. 이격간격 M₁과 쉬프트 값 M₂는 같은 반드시 동일한 값으로 정해질 필요는 없다.In addition, if one of the natural numbers larger than the shift value M ₂ of the analysis window Wm is also applied, a computational amount reduction effect can be similarly obtained. The separation interval M ₁ and the shift value M ₂ are not necessarily the same.

이격간격 M₁과 쉬프트 값 M₂는 RCVS-TSM 엔진프로그램 작성 시에 그 값을 임의로 지정해줄 수도 있지만, 다음과 같은 방식으로 정할 수도 있다. 한 가지 방식은 입력신호의 실제 샘플링레이트를 소정크기의 기준 샘플링레이트로 나눈 값에 가장 가까운 두 정수값 중 어느 하나의 값을 산출하고, 이 산출된 값을 이격간격 M₁및/또는 쉬프트 값 M₂으로 사용하는 것이다. 기준 샘플링레이트는 오디오신호의 정보를 제대로 전달할 수 있는 조건을 만족하는 것을 전제로, 그 값이 가급적 낮게 설정되는 것이 바람직하다. 높은 값으로 설정되면 프로세서의 부하를 줄이고자 하는 취지에 반하기 때문이다. 상용적으로 제공되는 프로세서 성능을 고려할 때, 기준 샘플링레이트의 값은 예컨대 8Khz로 설정하여 사용하면 별 문제가 없다고 판단된다. 하지만, 기준 샘플링레이트의 크기는 추후 프로세서의 성능이 업그레이드됨에 따라 상향 조정될 수도 있을 것이다. 이 방식에 의하면, 입력신호가 어떤 샘플링레이트를 갖는 지와는 무관하게 프로세서는 항상 자신이 감당할 수 있는 최적의연산량을 부담하도록 조절될 수 있다.The separation interval M ₁ and the shift value M ₂ may be arbitrarily designated when the RCVS-TSM engine program is written, but may be determined in the following manner. One method calculates the value of one of the two integer values closest to the actual sampling rate of the input signal divided by the reference sampling rate of the predetermined size, and the calculated value is separated from the interval M ₁ and / or the shift value M. ₂ is used. It is preferable that the reference sampling rate is set as low as possible on the premise of satisfying a condition capable of properly transmitting the information of the audio signal. If set to a high value, it is contrary to the intention to reduce the load on the processor. In consideration of commercially available processor performance, it is judged that there is no problem when the reference sampling rate is set to, for example, 8Khz. However, the size of the reference sampling rate may be increased later as the performance of the processor is upgraded. In this way, regardless of what sampling rate the input signal has, the processor can always be adjusted to bear the optimal amount of computation it can afford.

다른 방식은, 오디오신호의 현존하는 여러 종류의 샘플링레이트마다 이격간격 M₁및/또는 쉬프트 값 M₂의 대응값을 미리 정해두고, 입력신호의 헤더정보로부터 파악된 샘플링레이트에 매핑되는 대응값을 이격간격 M₁및/또는 쉬프트 값 M₂으로 적용하는 방식이다.Alternatively, the corresponding value of the separation interval M ₁ and / or the shift value M ₂ is determined in advance for each of the existing types of sampling rates of the audio signal, and the corresponding value mapped to the sampling rate obtained from the header information of the input signal is determined. It is a method of applying the separation interval M ₁ and / or shift value M ₂ .

위와 같은 연산량 감축 기법은 상호상관도를 기준으로 분석창의 최적 중첩지점을 탐색하는 TSM 방법 예컨대 SOLA나 WSOLA, 그리고 이들과 기본적인 개념은 공통적이면서 음질개선이나 연산량 감축을 위한 변형 내지 개선된 TSM 방법들에 대해서도 널리 적용될 수 있다.The above algorithm reduces the TSM method for searching the optimal overlapping point of the analysis window based on cross-correlation, such as SOLA or WSOLA, and the basic concept is common to the modified or improved TSM methods for improving sound quality or reducing the throughput. It can also be widely applied.

도 6의 흐름도는 S18 단계의 절차를 보다 구체화한 것으로서, 상호상관도의 계산에 참여하는 샘플 수를 줄임과 동시에 분석창 Wm의 쉬프트 간격을 늘이는 위 두 방안을 모두 적용한 경우를 가정하여 실제 프로그램으로 구현하는 경우의 흐름도이다.6 is a detailed embodiment of the step S18, assuming that the case of applying both of the above methods to reduce the number of samples participating in the cross-correlation calculation and increase the shift interval of the analysis window Wm to the actual program This is a flowchart of the implementation.

분석창 Wm을 출력신호와 가장 높은 연속성을 가지면서 중첩합산할 수 있는 상호상관도의 최대값이 제공되는 지점 Km을 산출을 위해서는, 분석창 Wm의 쉬프트 양을 나타내는 변수 K와 상호상관도의 최대값 변수 C(m, Km)를 0 으로 초기화한다(S40 단계). 또한, 상호상관도 변수 Corr과 상호상관도의 크기를 표준화하기 위한 분모를 나타내는 변수 Denom와 중첩구간(17) 내에서의 샘플인덱스를 나타내는 변수 j를 각각 0으로 설정한다 (S42, S44 단계).To calculate the point Km where the maximum value of the cross-correlation that can be superimposed and summed up the analysis window Wm with the highest continuity is obtained, the maximum value of the cross-correlation K and the variable K representing the amount of shift of the analysis window Wm is provided. The value variable C (m, Km) is initialized to 0 (step S40). Further, the variable Denom representing the denominator for standardizing the size of the cross-correlation variable Corr and the variable j representing the sample index in the overlap section 17 are set to 0, respectively (steps S42 and S44).

그런 다음, 샘플인덱스 변수 j를 매주기마다 이격간격 M1씩 (단, M1은 2보다 큰 자연수임) 증가시켜 가면서(S48 단계) 그 값이 Nov-1을 초과할 때까지(S50 단계), 아래 두 식을 이용하여 상호상관도 Corr와 표준화 변수 Denom을 계산한다 (S46 단계).Then, increase the sample index variable j by the interval M1 every cycle (where M1 is a natural number greater than 2) (step S48) until the value exceeds Nov-1 (step S50). Calculate the correlation coefficient Corr and the standardized variable Denom using the two equations (step S46).

Corr = Corr + x(mSa+j) ㆍ y(mSs+K+j)(1)Corr = Corr + x (mSa + j)-y (mSs + K + j) (1)

Denom = Denom + x(mSa+j) ㆍ x(mSa+j)(2)Denom = Denom + x (mSa + j)-x (mSa + j) (2)

중첩구간(17) 전체에 대한 위 두 식의 연산이 완성되면, Corr의 값을 Denom의 값으로 나누어주어 표준화된 상호상관도 c(m, K)를 구한다. 이렇게 구한 c(m, K)은 그 직전까지 산출된 상호상관도 값들 중에서 최대값인 c(m, Km)과 비교되어 둘 중 더 큰 값을 그 때까지의 최대 상호상관도 c(m, Km)로 결정한다. 이러한 과정(S42~S52 단계)을 변수 K의 크기를 2보다 큰 자연수인 쉬프트간격 M₂만큼씩 증가시키면서 그 값이 Kmax-1을 넘지 않을 동안에 변경된 K의 값에 대하여 반복적으로 수행한다. 변수 K의 크기가 Kmax-1보다 크게 될 때, 즉 분석창 Wm이 검색범위 Kmax를 전부 쉬프트 했을 때, 그 때의 최대 상호상관도 c(m, Km)를 제공해주는 Km값이 바로 S18 단계에서 구하고자 하는 결과이며, 분석창 Wm의 출력신호에 대한 '합성' 시에 적용되어야 쉬프트값인 것이다.When the above two equations are completed for the entire overlap section (17), the value of Corr is divided by the value of Denom to obtain the standardized cross-correlation degree c (m, K). C (m, K) is compared with c (m, Km), which is the largest value among the cross-correlation values calculated up to the previous time, and the larger of the two is the maximum cross-correlation c (m, Km) Decide on) This process (steps S42 to S52) is repeatedly performed on the changed value of K while increasing the size of the variable K by the shift interval M _{2, which} is a natural number greater than _2, while the value does not exceed Kmax-1. When the size of the variable K becomes larger than Kmax-1, that is, when the analysis window Wm shifts the search range Kmax altogether, the Km value that provides the maximum cross-correlation c (m, Km) at that time is immediately performed in step S18. This is the result to be obtained, and it is a shift value that should be applied when 'synthesizing' the output signal of the analysis window Wm.

도 6의 흐름도를 살펴보면 이중루프가 실행됨을 알 수 있다. 위와 같은 연산량 감축기법을 적용하지 않으면 이 이중루프를 실행하기 위해 엄청난 양의 연산을 수행해야 함을 쉽게 짐작할 수 있을 것이다.Looking at the flow chart of Figure 6 it can be seen that the double loop is executed. If we do not apply the above computational reduction techniques, we can easily assume that a huge amount of computation must be performed to execute this double loop.

한편, 본 발명의 RCVS-TSM 방법은, 위와 같이 분석창 Wm에 관한 상호상관도의 최대값이 제공될 때의 분석창 Wm의 쉬프트값 Km을 구한 다음에는, 입력신호의 피치정보를 왜곡시키지 않고 최상의 음질을 얻을 수 있는 최적 중첩구간 Nm을 결정하는 절차를 수행한다 (S20 단계).On the other hand, the RCVS-TSM method of the present invention does not distort the pitch information of the input signal after obtaining the shift value Km of the analysis window Wm when the maximum value of the cross-correlation with respect to the analysis window Wm is provided as above. A procedure of determining an optimal overlap section Nm for obtaining the best sound quality is performed (step S20).

분석창 Wm의 최적 쉬프트값 Km을 구할 때에는 분석창 Wm 과 출력신호 간의 중첩구간(17)은 일정한 길이 Nov로 적용하였다. 하지만 중첩구간이 Nov인 경우가 분석창 Wm이 출력신호와 항상 가장 좋은 정렬을 이룬다고 말할 수 없다. 최적 쉬프트값 Km 은 '특정한 크기 Nov'의 중첩구간에 대해서 상대적으로 최적인 값일 뿐, 중첩구간이 달라진 경우에도 '절대적으로' 최적인 값이라고 할 수는 없다. 여러 가지 음원 종류에 대한 실험을 통해서는 이러한 점이 확인될 수 있다.When calculating the optimum shift value Km of the analysis window Wm, the overlap section 17 between the analysis window Wm and the output signal was applied with a constant length Nov. However, if the overlap section is Nov, it cannot be said that the analysis window Wm always has the best alignment with the output signal. The optimum shift value Km is only a relatively optimal value for the overlapping section of 'specific size Nov', and is not an 'absolutely' optimal value even when the overlapping section is changed. This can be confirmed by experimenting with various sound source types.

록 음악(Rock Music), 2배 느리게 (상관계수 기준값: 70%)Rock Music, 2x slower (correlation base value: 70%) Nov(i)Nov (i) Nov(1)Nov (1) Nov(2)Nov (2) Nov(3)Nov (3) Nov(4)Nov (4) Nov(5)Nov (5) 5msec5 msec 4msec4 msec 3msec3 msec 2msec2 msec 1msec1 msec 상관계수 (Rxy-m)Correlation Coefficient (Rxy-m) Rxy_1Rxy_1 100.00100.00 100.00100.00 100.00100.00 100.00100.00 100.00100.00 Rxy_2Rxy_2 37.1737.17 39.8439.84 54.2454.24 84.2584.25 66.1066.10 Rxy_3Rxy_3 92.8092.80 92.8092.80 92.8092.80 92.8092.80 92.8092.80 Rxy_4Rxy_4 100.00100.00 100.00100.00 100.00100.00 100.00100.00 100.00100.00 Rxy_5Rxy_5 -89.94-89.94 -84.63-84.63 -65.88-65.88 -44.29-44.29 7.617.61 Rxy_6Rxy_6 100.00100.00 100.00100.00 100.00100.00 100.00100.00 100.00100.00 Rxy_7Rxy_7 -58.39-58.39 -33.17-33.17 22.6022.60 -15.21-15.21 -25.79-25.79 Rxy_8Rxy_8 100.00100.00 100.00100.00 100.00100.00 100.00100.00 100.00100.00 Rxy_9Rxy_9 52.5052.50 32.3632.36 32.5332.53 24.6424.64 4.624.62 Rxy_10Rxy_10 71.8171.81 71.8171.81 71.8171.81 71.8171.81 71.8171.81 Rxy_11Rxy_11 100.00100.00 100.00100.00 100.00100.00 100.00100.00 100.00100.00 Rxy_12Rxy_12 25.2825.28 18.9318.93 26.8926.89 32.3832.38 11.1511.15 Rxy_13Rxy_13 39.1339.13 41.4841.48 -6.70-6.70 -8.43-8.43 -24.28-24.28 Rxy_14Rxy_14 100.00100.00 100.00100.00 100.00100.00 100.00100.00 100.00100.00 Rxy_15Rxy_15 73.2173.21 73.2173.21 73.2173.21 73.2173.21 73.2173.21 Rxy_16Rxy_16 84.8484.84 84.8484.84 84.8484.84 84.8484.84 84.8484.84 Rxy_17Rxy_17 90.8590.85 90.8590.85 90.8590.85 90.8590.85 90.8590.85 Rxy_18Rxy_18 100.00100.00 100.00100.00 100.00100.00 100.00100.00 100.00100.00 Rxy_19Rxy_19 76.7976.79 76.7976.79 76.7976.79 76.7976.79 76.7976.79 Rxy_20Rxy_20 90.1890.18 90.1890.18 90.1890.18 90.1890.18 90.1890.18 Rxy_21Rxy_21 73.4173.41 73.4173.41 73.4173.41 73.4173.41 73.4173.41 Rxy_22Rxy_22 88.5988.59 88.5988.59 88.5988.59 88.5988.59 88.5988.59 Rxy_23Rxy_23 58.8058.80 51.2251.22 31.3331.33 55.5755.57 57.9657.96 합 계Sum 1,507.011,507.01 1,508.501,508.50 1,537.481,537.48 1,571.401,571.40 1,539.851,539.85 평 균Average 65.5265.52 65.5965.59 66.8566.85 68.3268.32 66.9566.95

표 1은 주파수대역폭이 비교적 넓은 특징을 갖는 록음악(rock music)의 임의의 구간(23개의 분석창)에 대하여 중첩구간을 여러 가지 값 즉, 5msec, 4msec, 3msec, 2msec, 1msec 등으로 변경시키면서 각 중첩구간별로 분석창 Wm의 샘플 x와 출력신호의 샘플 y간의 상관계수(coefficient of correlation) Rxy를 계산해본 결과를 정리한 것이다.Table 1 changes the overlapping interval to various values, i.e., 5 msec, 4 msec, 3 msec, 2 msec, 1 msec, etc., for an arbitrary section (23 analysis windows) of rock music having a relatively wide frequency bandwidth. The results of calculating the coefficient of correlation Rxy between the sample x of the analysis window Wm and the sample y of the output signal for each overlapping section are summarized.

상관계수 Rxy는 아래 식을 이용하여 구하였다.The correlation coefficient Rxy was obtained using the following equation.

Rxy = [(Σxy)/(nσ_xσ_y)] ㆍ100%(3)Rxy = [(Σxy) / (nσ _x σ _y )] 100% (3)

여기서, n은 상관계수 계산에 참여한 각 변수 x, y의 샘플 수를 나타내며, σ_x와 σ_y는 각각 변수 x, y의 분산 값이다.Here, n denotes the number of samples of each variable x and y participating in the correlation coefficient calculation, and σ _x and σ _y are the variance values of the variables x and y, respectively.

상관계수는 100[%]부터 +100[%]까지의 범위로 변할 수 있다. 일반적으로 두 변수 x와 y 간의 상관계수 값이 음의 값을 가지면 어떤 하나의 변수(x)의 값이 증가(또는 감소)할 때 다른 하나의 변수(y)의 값은 감소(또는 증가)하는 이른바 부정적인 관련성(negative relationship)을 가진다. 또한, 두 변수 x와 y 간의 상관계수 값이 양인 경우에는 두 변수의 변화방향이 동일한 이른바 긍정적인 관련성(positive relationship)을 가지긴 하지만, 그 값이 0%~30% 범위에 속하는 경우는 두 변수 x와 y의 상관도는 약하다고(weak) 보고, 30%~70%인 경우에는 상관도가 중간정도(moderate)로 볼 수 있으며, 70%~100% 범위에 놓이게 되면 상관도가 높다고(high) 평가된다. 따라서 분석창과 출력신호 간의 상관계수 Rxy가 70% 이상을 제공해주지 못하는 중첩구간의 값을 적용하여 분석창을 출력신호에 중첩합산하면 피치정보의 왜곡이 나타나는 등 음질이 저하된다.The correlation coefficient may vary from 100 [%] to +100 [%]. In general, if the value of the correlation coefficient between two variables x and y is negative, the value of one variable (y) decreases (or increases) when the value of one variable (x) increases (or decreases). It has a so-called negative relationship. Also, if the value of the correlation coefficient between the two variables x and y is positive, the two variables have the same so-called positive relationship, but if the value is in the range of 0% to 30%, the two variables The correlation between x and y is considered weak, and if it is 30% to 70%, the correlation can be regarded as moderate. If it is in the range of 70% to 100%, the correlation is high. Is evaluated. Therefore, when the analysis window is overlaid and summed on the output signal by applying the value of the overlapping interval where the correlation coefficient Rxy between the analysis window and the output signal cannot provide more than 70%, the sound quality is degraded, such as distortion of pitch information.

표 1에서도 확인할 수 있듯이, 예컨대 중첩구간의 값이 5msec인 경우에 상관계수가 항상 최대로 나타나는 것은 아님을 알 수 있다. 예컨대 2번째 분석창의 경우, 중첩구간 값이 2msec 일 때가 최대의 상관계수값 84.25를 제공한다. 따라서 2번째 분석창은 중첩구간을 2msec 하여 출력신호와 중첩합산 될 때 가장 좋은 정렬 내지 신호연속성을 가질 수 있어 피치정보의 왜곡을 최소화할 수 있다.As can be seen from Table 1, it can be seen, for example, that the correlation coefficient is not always maximized when the value of the overlapping section is 5 msec. For example, in the case of the second analysis window, the maximum correlation coefficient value of 84.25 is provided when the overlapping interval value is 2 msec. Therefore, the second analysis window can have the best alignment or signal continuity when overlapping with the output signal by overlapping the interval by 2msec, thereby minimizing distortion of the pitch information.

도 7은 이러한 개념에 기초하여 각 분석창마다 최적 중첩구간 Nm의 값을 결정하는 절차(S20 단계)의 상세 절차를 도시한다. 중첩구간의 크기를 5msec, 4msec, 3msec, 2msec, 1msec 이렇게 5종류로 하여, 그 값을 차례대로 변경시켜가면서 상관계수를 계산해나가는 경우를 가정하자. 후보 중첩구간 Nov(i)의 인덱스 값 i를 0부터 4까지 1씩 증가시키면서(S60, S66, S68 단계) 위 식 (3)을 이용하여 각 중첩구간에 대한 상관계수 Rxy_m(i)를 계산하고(S62 단계), 그 계산된 상관계수 Rxy_m(i)의 크기가 기준값 Ref 이상인지를 체크한다(S64 단계).7 shows the detailed procedure of the procedure (step S20) for determining the value of the optimum overlapping section Nm for each analysis window based on this concept. Suppose that the size of the overlapping section is 5msec, 4msec, 3msec, 2msec, 1msec and calculates the correlation coefficient while changing the values in order. While increasing the index value i of the candidate overlapping section Nov (i) by 1 from 0 to 4 (steps S60, S66, S68), the correlation coefficient Rxy_m (i) for each overlapping section was calculated using Equation (3). (Step S62), it is checked whether the magnitude of the calculated correlation coefficient Rxy_m (i) is greater than or equal to the reference value Ref (step S64).

기준값은 70%가 바람직한 값이긴 하나 반드시 이에 한정될 것은 아니고, 음질 개선을 더 고려한다면 더 높은 값(예컨대 80%)으로 정할 수 있고, 반대로 연산량 증가의 억제쪽을 더 고려한다면 더 낮은 값(예컨대 60%)으로 정할 수도 있다. 위 표 1은 상관계수의 기준값 (Ref)을 70%로 적용한 것인데, 이 경우를 가정하자. 계산된 상관계수 Rxy_m(i)가 70% 이상의 값을 가지면 그 때의 중첩구간 Nov(i)의 값을 실제로 분석창 Wm을 출력신호에 중첩합산 할 때 적용할 최적 중첩구간 Nm으로 채택한다(S72 단계). 계산된 상관계수 Rxy(i)가 70%를 초과하지 못하면 i를 1 증가시킨 후(S66 단계), 다음 번 중첩구간 Nov(i)에 대하여 다시 한번 상관계수 Rxy_m(i)를 계산하여 그 값이 70%를 넘는지를 체크한다.Although the reference value is preferably 70%, but not necessarily limited thereto, the reference value may be set to a higher value (e.g., 80%) in consideration of further improvement in sound quality, and a lower value (e.g., more in consideration of suppression of increased throughput). 60%). Table 1 above applies 70% of the reference value (Ref) of the correlation coefficient. If the calculated correlation coefficient Rxy_m (i) has a value of 70% or more, the value of the overlapping section Nov (i) at that time is adopted as the optimal overlapping section Nm to be applied when the analysis window Wm is superimposed on the output signal (S72). step). If the calculated correlation coefficient Rxy (i) does not exceed 70%, i is increased by one (step S66), and then the correlation coefficient Rxy_m (i) is calculated again for the next overlapping section Nov (i). Check if it exceeds 70%.

표 1에서 중첩구간이 5msec 인 경우에 2, 5, 7, 9, 10, 12, 13, 23번째의 분석창에 대한 상관계수 Rxy_2(0), Rxy_5(0), Rxy_7(0), 이 기준값 70%에 미달한다. 이들 각 분석창에 대해서는, 중첩구간을 4msec로 변경시켜 상관계수 Rxy_m(1) (단, m은 2, 5, 7, 9, 10, 12, 13, 23)를 다시 계산한다. 계산된 상관계수의 값이 70%보다 큰 경우에는 그 분석창에 대한 최적 중첩구간 Nm은 4msec로 확정하면 되고, 여전히 70% 미만인 경우에는 그 분석창에 대해서는 다시 중첩구간 Nov(2)을 3msec로 변경하여 상관계수 Rxy_m(2)를 재계산 한다. 이러한 방식으로 상관계수를 계산해나가는데, 중첩구간이 1msec인 경우까지 상관계수의 계산이 진행되어도 상관계수의값이 70%를 넘지 못하면 그 때까지 계산된 상관계수들 Rxy_m(i) (단, i는 0, 1, ,4) 중에서 최대값을 제공하는 중첩구간을 실제 적용할 중첩구간 Nm으로 확정한다. 표 1에서 예컨대 5번째 분석창에 대한 최적 중첩구간 N₄은 1msec 로 결정될 것이다.In Table 1, the correlation coefficients Rxy_2 (0), Rxy_5 (0), Rxy_7 (0) for the 2nd, 5th, 7th, 9th, 10th, 12th, 13th and 23rd analysis windows when the overlapping section is 5 msec. It is less than 70%. For each of these analysis windows, the overlap section is changed to 4 msec, and the correlation coefficient Rxy_m (1) (where m is 2, 5, 7, 9, 10, 12, 13, 23) is calculated again. If the value of the calculated correlation coefficient is greater than 70%, the optimal overlap section Nm for the analysis window can be set to 4 msec. If the calculated correlation coefficient is still less than 70%, the overlap section Nov (2) is again set to 3 msec for the analysis window. Change and recalculate the correlation coefficient Rxy_m (2). In this way, the correlation coefficient is calculated. Even if the correlation coefficient is calculated until the overlapping interval is 1 msec, if the value of the correlation coefficient does not exceed 70%, the calculated correlation coefficients Rxy_m (i) (i is The overlapping section providing the maximum value among 0, 1, and 4) is determined as the overlapping section Nm to be actually applied. In Table 1, for example, the optimal overlap section N ₄ for the fifth analysis window will be determined to be 1 msec.

이와 같이 본 발명의 RCVS-TSM 방법은 최적의 중첩구간 Nm을 가변적으로 결정하는데, 최적의 중첩구간 Nm을 찾기 위한 연산량 증가가 발생되긴 하나, 상관계수 계산에도 위에서 설명한 연산량 감축방법을 적용하면 큰 문제가 되지 못한다. 최적 중첩구간 Nm 개념은 주파수 대역이 좁은 음성의 경우에는 이 개념을 적용하지 않은 경우에 비해 음질개선의 효과가 그리 크지 않게 나타난다. 하지만, 주파수 대역이 넓은 음악에 최적 중첩구간 Nm 개념을 적용할 경우에 적용하지 않은 경우에 비해 두드러진 음질개선 효과를 얻을 수 있다.As described above, the RCVS-TSM method of the present invention variably determines the optimal overlapping section Nm, but an increase in the amount of computation for finding the optimal overlapping section Nm occurs, but it is a big problem to apply the above described calculation method to the correlation coefficient calculation. Cannot be. The concept of optimal overlapping interval Nm shows that the effect of sound quality improvement is not so great in the case of speech with narrow frequency band compared to the case where this concept is not applied. However, when the concept of the optimal overlap section Nm is applied to music with a wide frequency band, a noticeable sound quality improvement effect can be obtained as compared with the case where the concept of Nm is not applied.

도 5의 흐름도로 돌아와서, 최적 중첩구간 Nm의 값이 결정되면, S18 단계에서 확정된 최적 쉬프팅값 Km을 활용하여 해당 분석창 Wm에서 출력신호로 중첩합산할 부가프레임 Fm(20)을 정한다(S22 단계). 분석창 Wm에서 (Km+Nov-Nm)~(N+Nm-Nov) 구간의 샘플들이 부가프레임 Fm(20)이 된다.Returning to the flowchart of FIG. 5, when the value of the optimum overlapping section Nm is determined, the additional frame Fm 20 to be superimposed and summed as the output signal in the corresponding analysis window Wm is determined using the optimal shifting value Km determined in step S18 (S22). step). Samples in the (Km + Nov-Nm) to (N + Nm-Nov) intervals in the analysis window Wm become the additional frame Fm (20).

부가프레임 Fm(20)이 결정되면 이를 출력신호와 중첩합산 시킨다(S24 단계). 부가프레임 Fm(20)의 맨앞에서 Nm개의 샘플들과 출력신호의 맨 끝에서 Nm개의 샘플들은 중첩시켜 합성한다. 이 때, 분석창과 출력신호 양측의 Nm개의 샘플들에 대하여 각각 가중함수(weighting function) 적용하면서 합성하여 중첩합산블럭(40)을 형성한다(S24 단계). 가중함수를 적용하여 합성하는 이유는 출력신호의 끝부분에서분석창의 시작부분으로 자연스럽게 연결해주어 중첩구간에서의 신호의 불연속성을 최소화하기 위한 것이다. 가중함수의 대표적인 예로서 다음과 같은 선형 램프함수 g(j)가 될 수 있지만, 지수함수나 그 밖의 다른 적절한 함수를 선택할 수도 있다.When the additional frame Fm 20 is determined, the additional frame Fm 20 is overlaid with the output signal (step S24). Nm samples at the beginning of the additional frame Fm 20 and Nm samples at the end of the output signal are superimposed and synthesized. At this time, a weighted function is applied to each of the Nm samples on both sides of the analysis window and the output signal, and synthesized to form an overlap-added block 40 (step S24). The reason for applying the weighting function is to minimize the discontinuity of the signal in the overlapping section by naturally connecting from the end of the output signal to the beginning of the analysis window. As a representative example of the weighting function, it may be a linear ramp function g (j) as follows, but an exponential function or other appropriate function may be selected.

g(j) = 0, j < 0;(4-1)g (j) = 0, j <0; (4-1)

g(j) = j/Nm, 0 ≤ j ≤ Nm;(4-2)g (j) = j / Nm, 0 ≦ j ≦ Nm; (4-2)

g(j) = 1,j > Nm(4-3)g (j) = 1, j> Nm (4-3)

출력신호의 맨 끝에서 Nm개의 샘플(45)은 위에서 새로이 합성한 중첩합산블럭(40)으로 치환한다. 그리고, 부가프레임 Fm(20)의 앞의 Nm개의 샘플(35)을 제외한 나머지 샘플들은 중첩합산블럭(40)의 끝에 그대로 부가한다(S26). 분석창 Wm의 나머지 샘플들 즉, 앞부분의 Km+Nov-Nm개의 샘플들(30)과 뒷부분의 Kmax-Km개의 샘플들(25)은 버린다. 도 3B는 원하는 시간스케일 α의 값이 2일 때, 즉 입력신호의 길이를 2배로 늘리는 경우, 도 3A 처럼 구분한 분석창 Wm들을 활용하여 위에서 설명한 방식으로 중첩합성한 경우를 보여준다. 매 프레임주기마다 중첩구간(60a, 60b, 60c,)의 길이와 출력신호로 부가되는 프레임들 Fm (단, m = 0, 1, 2, 3, )의 길이는 가변적이다. 하지만, 매 프레임주기마다 얻어지는 출력신호의 전체길이는 지정된 시간스케일 α값에 정확하게 비례하여 늘어난다.At the end of the output signal, the Nm samples 45 are replaced by the overlapped summation block 40 newly synthesized above. The remaining samples except for the Nm samples 35 in front of the additional frame Fm 20 are added to the end of the overlapped sum block 40 as they are (S26). The remaining samples of the analysis window Wm, that is, the Km + Nov-Nm samples 30 at the front and the Kmax-Km samples 25 at the rear, are discarded. FIG. 3B shows the case where the desired time scale α is 2, that is, when the length of the input signal is doubled, when the length of the input signal is doubled. The lengths of the overlapping sections 60a, 60b, 60c, and frames Fm (where m = 0, 1, 2, 3,) added to the output signal are variable every frame period. However, the total length of the output signal obtained every frame period increases in proportion to the specified time scale α value exactly.

위와 같은 특성은 입력신호의 길이를 줄이는 경우에도 동일하다. 즉, 시간스케일 α의 값이 0.5인 경우를 예로 하여, 도 4A는 앞서 설명한 분석창 구분방법에따라 연속적인 다수의 분석창 Wm으로 구분한 것을 보여주며, 도 4B는 이렇게 구분된 분석창들을 위에서 설명한 방식의 분석과 합성처리를 거쳐서 정해진 최적 쉬프트 Km과 최적 중첩구간 Nm을 적용하여 부가프레임 Fm을 합성한 것을 보여준다. 입력신호의 길이를 줄이는 경우에도 부가되는 프레임들 Fm (단, m = 0, 1, 2, 3, )의 길이는 가변적이지만, 매 프레임주기마다 얻어지는 출력신호의 전체길이는 지정된 시간스케일 α값에 정확하게 비례하여 줄어든다.The above characteristics are the same even when the length of the input signal is reduced. That is, for example, when the value of the time scale α is 0.5, FIG. 4A shows that a plurality of consecutive analysis windows Wm are divided according to the analysis window classification method described above, and FIG. 4B shows the divided analysis windows from above. The additional frame Fm is synthesized by applying the optimal shift Km and the overlapping interval Nm determined through the analysis and synthesis process. Even if the length of the input signal is reduced, the length of the added frames Fm (where m = 0, 1, 2, 3,) is variable, but the total length of the output signal obtained every frame period is determined by the specified time scale α value. Decreases in exact proportions.

따라서 각 프레임주기마다 얻어지는 출력신호의 길이가 지정된 시간스케일 α값에 정확하게 비례하여 증감되는 특성은, 본 발명의 RCVS-TSM 방법은 멀티미디어 재생에 적용될 때 비디오신호와의 동기도 어떤 시간스케일에서나 완벽하게 일치시킬 수 있음을 보장해준다. 예컨대 DVD 재생시에, 재생속도를 느리게 혹은 빠르게 변경시킬 경우 비디오의 재생속도 변경율과 정확하게 같은 비율로 오디오의 재생속도를 변경시킬 수 있으므로, 고속재생이든 저속재생이든 오디오와 비디오는 항상 동기되어 재생될 수 있다.Therefore, the characteristic that the length of the output signal obtained in each frame period is increased or decreased precisely in proportion to the designated time scale α value, the RCVS-TSM method of the present invention is perfectly synchronized with the video signal at any time scale when applied to multimedia playback It ensures that they can match. For example, during DVD playback, if you change the playback speed slower or faster, you can change the playback speed of the audio at exactly the same rate as the video playback rate. Can be.

이상과 같이 하나의 분석창에 대하여 '분석'과 '합성'을 거치면, 출력신호에는 TSM 처리된 한 개의 프레임이 추가된다. 이 때, 다음 번 분석창에 대한 '분석'과 '합성'을 통해 또 하나의 TSM 처리된 프레임을 출력신호에 부가할 수 있도록 하기 위해 프레임 인덱스 m의 값을 1만큼 증가시킨다 (S28 단계). 그리고 더 처리해야 할 입력신호가 존재하는지를 확인하기 위해, 입력신호의 끝을 만났는지를 체크한다(S30 단계). 입력신호의 끝을 만나지 않았으면 다시 다음 번 분석창에 대한 '분석'과 '합성'절차를 진행한다. 이와 같은 프레임 루프를 입력신호의 끝까지 반복적으로 실행함으로써 원하는 시간스케일로 TSM된 출력신호를 얻을 수 있다.As described above, when one analysis window is subjected to 'analysis' and 'synthesis', one TSM-processed frame is added to the output signal. At this time, the value of the frame index m is increased by 1 in order to be able to add another TSM-processed frame to the output signal through 'analysis' and 'synthesis' for the next analysis window (step S28). In order to check whether there is an input signal to be processed further, it is checked whether the end of the input signal has been met (step S30). If the end of the input signal is not met, proceed to the analysis and synthesis procedure for the next analysis window again. By repeatedly executing such a frame loop to the end of the input signal, a TSM output signal can be obtained at a desired time scale.

입력신호 제공부(88)로부터 입력버퍼(82a)로 전달된 입력신호는 프로세서(80)에 의해 위에서 설명한 RCVS-TSM 방법에 따라 처리되어 출력신호로 만들어진다. 그리고, 그 출력신호는 소정단위 씩 출력버퍼(82b)에 차례로 기입된 다음, 일정한 출력스케쥴에 따라 오디오재생부(90)로 전달되어 실시간으로 재생된다.The input signal transmitted from the input signal providing unit 88 to the input buffer 82a is processed by the processor 80 according to the RCVS-TSM method described above to be an output signal. The output signal is sequentially written to the output buffer 82b by a predetermined unit, and then transferred to the audio reproducing unit 90 according to a predetermined output schedule and reproduced in real time.

본 발명의 RCVS-TSM 방법은 기존의 TSM 방법들에 비해 연산량을 대폭적으로 줄일 수 있으면서도 원래의 오디오신호 수준의 음질을 거의 그대로 유지할 수 있는 TSM 방식이다. 일반적인 SOLA 또는 WSOLA 알고리즘을 이용하여 192KHz의 샘플링레이트의 오디오신호를 TSM 처리를 할 경우 프로세서의 처리성능은 대략 7,366MHz 수준이 요구된다. 따라서 이러한 수준을 만족할만한 CPU 특히 엠베디드 프로세서가 출현하기까지는 수많은 시간이 흘러야 하고, 그 때까지는 위와 같은 고 샘플링레이트의 오디오신호를 실시간으로 TSM 처리하는 것은 불가능하다. 하지만, 본 발명의 RCVS-TSM 방법을 이용하면, 192KHz 샘플링레이트의 오디오신호를 실시간으로 TSM 처리를 하는 데 대략 28 Mips (대략 28KHz)의 처리능력이 요구될 뿐이므로, 현재 상용화되고 있는 CPU 특히 엠베디드 프로세서 위에서도 위와 같은 고 샘플링레이트의 오디오신호를 실시간으로 TSM 처리를 할 수 있다.The RCVS-TSM method of the present invention is a TSM method capable of substantially reducing the amount of computation compared to the existing TSM methods while maintaining almost the same sound quality of the original audio signal level. When performing TSM processing of 192KHz sampling rate using general SOLA or WSOLA algorithm, the processor's processing performance is about 7,366MHz. Therefore, a lot of time must pass before the appearance of a CPU, especially an embedded processor, that satisfies this level, and until then, it is impossible to process TSM in real time with such a high sampling rate. However, using the RCVS-TSM method of the present invention, since only 28 Mips (approximately 28 KHz) of processing capacity is required to perform TSM processing of an audio signal having a 192KHz sampling rate in real time, a CPU, especially an embe Even on a deed processor, the TSM can process the high sampling rate audio signal in real time.

뿐만 아니라, 본 발명은 음질개선에 있어서도 기존의 TSM 방식에 비해 더 좋은 결과를 제공해준다. 입력신호의 분석창과 출력신호 간의 중첩 길이는 일정한 값으로 고정하는 경우보다는 본 발명의 방법처럼 최적길이로 가변시키는 쪽이 더 높은 상관계수를 보장해주기 때문에 신호의 불연속 정도를 최소화할 수 있고, 그 결과 시간스케일수정으로 발생하는 피치 정보의 왜곡을 최소화할 수 있다.In addition, the present invention provides better results than the conventional TSM method in terms of sound quality improvement. Since the overlap length between the analysis window of the input signal and the output signal is fixed to the optimum length like the method of the present invention rather than fixed to a constant value, the degree of discontinuity of the signal can be minimized, as a result. Distortion of the pitch information caused by time scale correction can be minimized.

본 발명의 RCVS-TSM 방법은 프로그램으로 작성되어 퍼스널 컴퓨터용 멀티미디어 재생기의 일부 기능으로 포함시킬 수도 있고, 예컨대 DVD 플레이어, 디지털 VTR, MP3 플레이어, 셋톱박스 등과 같은 디지털 오디오신호의 재생기능을 필요로 하는 재생장치의 전용 칩에 내장되어 그들 장치의 오디오 재생 기능을 강화시키는 데 적용될 수도 있다.The RCVS-TSM method of the present invention may be written as a program and included as a part of a multimedia player for a personal computer, and may require a function of reproducing digital audio signals such as a DVD player, a digital VTR, an MP3 player, a set-top box, and the like. It may be embedded in a dedicated chip of a playback device and applied to enhance the audio playback function of the device.

이상에서는 본 발명의 실시예에 따라 본 발명이 설명되었지만, 본 발명의 사상을 일탈하지 않는 범위 내에서 다양한 변형이 가능함은 본 발명이 속하는 기술 분야의 당업자라면 명확히 인지할 수 있을 것이다.Although the present invention has been described above according to an embodiment of the present invention, it will be apparent to those skilled in the art that various modifications may be made without departing from the spirit of the present invention.

Claims

In the time scale correction method of an audio signal for forming an input signal consisting of an input stream of audio samples as an output signal modified to a desired time scale,

Determining an analysis window consisting of a first predetermined number of audio samples in the input stream;

The similarity between the Nov first audio samples of the analysis window and the Nov second audio samples of the output signal is down-selected from each of the first and second audio samples by a predetermined ratio. Calculating using the third and fourth audio sample blocks, wherein the similarity calculation is repeated each time the analysis window is shifted within a predetermined search range; And

And obtaining a shift value Km of the analysis window when a maximum value is provided among the calculated similarity values.

The N + Nm method according to claim 1, wherein the shift value Km and the coefficient of correlation between the analysis window and the output signal are equal to or greater than a predetermined reference value or based on an optimum overlapping period Nm when a maximum value is provided. -Nov (wherein N is a value obtained by subtracting the similarity search range Kmax between the analysis window and the output signal from the first predetermined number), and determining an additional frame as an add frame. How to correct time scale of audio signal.

3. An overlap-add block according to claim 2, wherein Nm audio samples from the front of the additional frame and Nm audio samples at the end of the output signal are overlapped and summed under the application of a weighting function. forming a block); And

And substituting the overlap-summing block instead of the Nm audio samples at the end of the output signal, and adding the remaining audio samples of the additional frame as they are at the end of the overlap-summing block. Time scale correction method of audio signal.

The time period of the audio signal according to claim 1, wherein the audio samples constituting the third and fourth audio sample blocks have a sample index difference of M ₁ (wherein M ₁ is a natural number greater than 2). Scale correction method.

The method of claim 1, wherein the first predetermined number is N + Kmax (where N and Kmax are constants), the search range is Kmax audio sample intervals, and the shift of the analysis window is M ₂ (where M is ₂ is a method of regularly adjusting audio samples of two or more natural numbers).

The method of claim 1, wherein the audio samples constituting the third and fourth audio sample blocks have a sample index difference of M ₁ (wherein M ₁ is a natural number greater than 2), and the first predetermined number is N. + Kmax (where, N and Kmax is constant), and wherein the search range is Kmax audio sample period, shifted the analysis window crossing the audio samples of the M ₂ items (where, M ₂ is a natural number of 2 or more) regularly A time scale correction method of an audio signal, characterized in that the jumping method.

The method according to any one of claims 4 to 6, wherein the sample index interval M ₁ of the audio samples constituting the third and fourth audio sample blocks M ₁ and / or the shift interval M ₂ of the analysis window is the input signal. The method of modifying the time scale of an audio signal according to claim 1, wherein the actual sampling rate is determined by one of two integer values closest to the value obtained by dividing the actual sampling rate by a reference sampling rate of a predetermined size.

The method according to any one of claims 4 to 6, wherein the sample index interval M ₁ of the audio samples constituting the third and fourth audio sample blocks and / or the shift interval M ₂ of the analysis window are equal to each other. And setting a corresponding value in advance for each type of sampling rate and applying the corresponding value mapped to the sampling rate determined from the header information of the input signal as the value of M ₁ and / or M ₂ . A time scale correction method of an audio signal, characterized by the above-mentioned.

2. The method of claim 1, further comprising the step of accepting, by the input means, a value α specified by the user as the desired time scale, wherein the length ratio of the output signal to the input signal is equal to the value of α. A time scale correction method of an audio signal.

8. The method of claim 7, wherein the first audio sample of the m-th analysis window is the mSa-th audio sample at the front of the input stream, and the Nov value is reduced at a predetermined rate with N-Ss as a maximum value. Ss is a fixed value, and Sa is determined by Ss = αSa.

The method of claim 1, wherein the similarity is determined by calculating cross-correlation.

Determining an analysis window consisting of N + Kmax audio samples in the input stream, wherein N and Kmax are constants;

While shifting the analysis window within a predetermined search range, the maximum value of the similarity between the Nov audio samples and the Nov audio samples from the end of the output signal and the Nov value are changed to various values. Calculating a coefficient of correlation;

In the front of the analysis window, N + Nm-Nov audio samples are defined as add frames from the Km + Nov-Nm audio samples, where Km is the analysis window when the maximum value of the similarity is provided. Is a shift value, and Nm is an optimal overlap period when a coefficient of correlation between the analysis window and the output signal is equal to or greater than a predetermined reference value or provides a maximum value, and N is the first predetermined number. Defining a value obtained by subtracting the similarity search range Kmax between the analysis window and the output signal;

The overlapping-summing block is formed by overlapping the optimal overlapping section Nm audio samples from the front of the additional frame and the optimal overlapping section Nm audio samples at the end of the output signal by applying a weighting function. forming an overlap-add block); And

Substituting the overlap-summing block for Nm audio samples at the end of the output signal in place of the optimal overlapping section, and simply adding the remaining audio samples of the additional frame to the end of the overlap-summing block. A time scale correction method of an audio signal, characterized in that.

13. The method of claim 12, further comprising the step of accepting a value α specified by the user through the input means as the desired time scale, wherein the length ratio of the output signal to the input signal is equal to the value of α. A time scale correction method of an audio signal.

13. The method of claim 12, wherein the first audio sample of the m-th analysis window is the mSa-th audio sample at the front of the input stream, and the Nov value is reduced at a predetermined rate with N-Ss as a maximum value. Ss is a fixed value, and Sa is determined by Ss = αSa.

13. The method of claim 12, wherein the reference value with respect to the correlation coefficient has a value of 0.7 or more.

The audio sample of claim 12, wherein the audio sample participating in the calculation of the similarity and the correlation coefficient is selected from signals belonging to the Nov section of each of the analysis window and the output signal, and adjacent audio samples have a sample index of M _1. , M ₁ is a natural number greater than 2).

13. The method of claim 12, shifted the analysis window is M ₂ gae every shift is accomplished by skipping the audio samples (where, M ₂ is a natural number equal to or greater than 2) on a regular basis, the sum of the shifting audio samples is Kmax audio A time scale correction method of an audio signal, characterized by not exceeding a search range composed of samples.

The audio sample of claim 12, wherein the audio sample participating in the calculation of the similarity and the correlation coefficient is selected from signals belonging to the Nov section of each of the analysis window and the output signal, and adjacent audio samples have a sample index of M _1. , M ₁ has a difference of as much as a natural number greater than 2), shifts the analysis window is made in such a way beats regularly over to the audio samples of the M ₂ items (where, M ₂ is two or more natural number) for every shift, shifting And a sum of the number of audio samples thus obtained does not exceed a search range consisting of Kmax audio samples.

13. The method of claim 12, wherein the variables M ₁ and / or M ₂ are determined as one of two integer values closest to a value obtained by dividing an actual sampling rate of the input signal by a reference sampling rate of a predetermined size. A time scale correction method of an audio signal to be performed.

13. The time of an audio signal according to claim 12, wherein the similarity between the Nov audio samples of the analysis window and the Nov audio samples of the output signal is determined using a cross correlation or the correlation coefficient. Scale correction method.