KR20030015579A

KR20030015579A - time-scale modification method of audio signals of which playback time is substantially acculately proportional to a designated playback-time-varying ratio and apparatus for the same

Info

Publication number: KR20030015579A
Application number: KR1020010049367A
Authority: KR
Inventors: 최원용; 박성제; 임동연
Original assignee: 주식회사 코스모탄
Priority date: 2001-08-16
Filing date: 2001-08-16
Publication date: 2003-02-25

Abstract

PURPOSE: A method and a device for correcting a time scale of an audio signal to obtain a producing time precisely proportional to an assigned transmission rate are provided to obtain precise synchronization of audio signals with video signals by the correction of the time scale to a desired and ideal reproducing time regardless of the size of the audio signal. CONSTITUTION: A method for correcting a time scale of an audio signal to obtain a producing time precisely proportional to an assigned transmission rate includes the steps of calculating synchronization lags whenever correcting a time scale by frame and calculating an accumulation value of the synchronization lags(S30), and correcting the time scale of audio signals by employing a new transmission rate for reducing the accumulated value when the accumulated value exceeds a predetermined allowance range(S34-S36).

Description

{Time-scale modification method of audio signals of which playback time is substantially acculately proportional to a designated playback-time-varying ratio and apparatus for the same}

본 발명은 신호의 시간스케일 수정기술의 개량에 관한 것으로서, 보다 상세하게는, 원래의 오디오신호의 피치 정보를 거의 그대로 유지하면서도 사용자가 지정한 변속율에 거의 비례하는 재생시간을 갖도록 오디오신호의 시간스케일을 수정하는 방법과 이를 위한 장치에 관한 것이다.The present invention relates to an improvement of a technique for correcting a time scale of a signal. More particularly, the time scale of an audio signal has a reproduction time that is substantially proportional to a transmission rate specified by a user while maintaining the pitch information of the original audio signal. It relates to a method and a device for this.

오디오신호를 정상속도보다 고속 내지 저속으로 재생하기 위한 오디오신호 가공방법들은 신호가공이 주파수영역에서 수행되느냐 또는 시간영역에서 수행되느냐에 따라 크게 구분된다. 시간영역에서 신호를 가공하는 방법으로서, 오디오신호의 시간스케일 수정(time scale modification: TSM) 원리에 기초하는 알고리즘이 널리 이용되어 오고 있다. 시간스케일수정 알고리즘들은, 1985년 Roucus와 Wilgus에 의해 소개된 오버랩가산법(overlap and addition: OLA)을 모태로 하여, 원음의 피치 정보를 그대로 유지하면서 원하는 속도로 재생될 수 있도록 오디오신호의 시간스케일을 수정하는 방향으로 개선되어 왔다. OLA 알고리즘을 모태로 하여, 인접 프레임간의 상관도 즉, 파형 유사도가 가장 높은 지점을 찾아 그 지점을 기준으로 인접 프레임이 오버랩 되도록 하는 동기식 오버랩가산법(Synchronized OLA: SOLA) 또는 파형유사도에 기반한 OLA법 즉, WSOLA법(overlap and add technique based onwaveform similarity)법 등과 같은 개량된 알고리즘들이 소개된 바 있다. 그리고, 이들의 변형 알고리즘으로서 GLS-TSM법(global and local search time scale modification), 엣지검출법(edge detection method)에 의한 SOLA 알고리즘(EDSOLA), Moulines 와 Charpentier에 의해 제안된 PSOLA(pitch-SOLA) 등도 있다. SOLA 특히 WSOLA 알고리즘은 원래의 오디오신호의 피치 정보의 왜곡을 많이 유발하는 OLA 알고리즘의 약점을 효과적으로 극복할 수 있다고 평가되어 현재 가장 주목을 많이 받는 알고리즘으로 평가되고 있다.Audio signal processing methods for reproducing an audio signal at a higher speed or a lower speed than a normal speed are largely classified according to whether signal processing is performed in a frequency domain or a time domain. As a method of processing a signal in the time domain, an algorithm based on the principle of time scale modification (TSM) of an audio signal has been widely used. The time scale correction algorithms, based on the overlap and addition (OLA) method introduced by Roucus and Wilgus in 1985, provide the time scale of an audio signal so that it can be reproduced at a desired speed while maintaining the original pitch information. It has been improved in the direction of correcting. Based on the OLA algorithm, Synchronized OLA (SOLA) or OLA method based on waveform similarity that finds the point of highest correlation between adjacent frames, that is, waveform similarity, and overlaps adjacent frames based on the point. That is, improved algorithms such as the overlap and add technique based onwaveform similarity (WSOLA) method have been introduced. In addition, GLS-TSM (global and local search time scale modification), SOLA algorithm (EDSOLA) by edge detection method, PSOLA (pitch-SOLA) proposed by Moulines and Charpentier, etc. have. SOLA, in particular, WSOLA algorithm is evaluated as being able to effectively overcome the weaknesses of the OLA algorithm that causes a lot of distortion of the pitch information of the original audio signal, it is currently evaluated as the most attention algorithm.

OLA 알고리즘의 기본 아이디어는 원래의 오디오 신호 x(·)를 다수의 오버랩된 프레임으로 자르고, 지정된 변속율 (즉, 시간스케일) α_o에 따라서 인접 프레임간의 거리를 수정하고(α_o> 1 인 경우에는 신장, α_o< 1 인 경우에는 압축), 인접 프레임에 대하여 가중치를 주어 더하여 시간스케일 수정된 신호 y(·)를 새롭게 합성해내는 것이다. 도 1은 OLA 알고리즘을 적용하여 오디오신호의 시간스케일을 수정할 때 오디오신호가 어떻게 프레임들로 잘려졌다가 다시 합성되는지를 보여준다. 도 1의 (a)에 도시된 원래의 오디오신호 x(·)는 디지털 오디오신호이다. 이 원래의 오디오신호는 변속율 α_o가 α_o> 1 인 경우에는 도 1의 (c)와 같이 전체적인 길이(즉, 재생시간)가 신장된 신호 y(·)로 합성되고, α_o< 1 인 경우에는 도 1의 (d)에 도시된 것처럼 전체적인 길이가 압축된 신호 y(·)로 합성된다.The basic idea of the OLA algorithm is to cut the original audio signal x (·) into a number of overlapping frames, modify the distance between adjacent frames according to the specified transmission rate (i.e. time scale) α _o (α _o > 1). Is a height, compression when α _o <1), and weights are added to adjacent frames to newly synthesize a time-scale-corrected signal y (·). Figure 1 shows how an audio signal is cut into frames and then synthesized again when the time scale of the audio signal is modified by applying the OLA algorithm. The original audio signal x (·) shown in Fig. 1A is a digital audio signal. When the transmission rate α _o is α _o > 1, the original audio signal is synthesized as a signal y (·) of which the overall length (that is, reproduction time) is extended as shown in FIG. 1C, and α _o <1 In the case of, the overall length is synthesized into the compressed signal y (·) as shown in FIG.

구체적으로, 원래의 오디오신호 x(·)를 제1 메모리(비도시)에 저장해둔 상태에서 (도 1의 (a)), 한 프레임씩 읽어들이고 이를 제2 메모리(비도시)에 차례대로 써나가는 데, 이 과정에서 시간스케일을 수정한다. 제1 메모리에서 프레임단위로 읽어들일 때, 도 1의 (b)에 도시된 것처럼, 연속되는 다수의 프레임 (F₀, F₁, F₂, F₃, ...)으로 구분하고, 인접 프레임(F_m-1, F_m)끼리는 N-SA개의 샘플만큼 중복되도록 읽어들인다. 여기서, N은 한 프레임을 구성하는 샘플수를 나타내고, SA는 N보다 작은 정수 값으로서, 원래의 오디오신호의 인접 프레임들의 시작점의 샘플간격을 나타낸다. 이와 같은 방식으로 프레임을 구분하여 제1 메모리로부터 읽어들인 각 프레임들은, 시간스케일을 지정된 변속율 α_o에 따라서 수정하기 위해, 도 1의 (c) 또는 (d)에 도시된 것처럼, 인접 프레임의 시작점의 샘플간격이 SS개가 되도록 즉, 인접프레임간에 N-SS개의 샘플이 중복되도록, 제2 메모리에 기록한다. 그리고 이와 같은 방식으로 인접 프레임간의 간격을 재배치할 때, 도 1의 (e)에 도시된 것처럼, 인접프레임의 중복구간(10a, 10b)의 샘플은 가중치를 적용하여 합하고, 이전프레임(F_m-1)과 중복되지 않는 현재프레임(F_m)의 구간(10c)의 샘플은 그대로 복사한다. 여기서 SS는 SS = α_oSA 의 관계를 가지며, 변속율의 지정 가능한 최대값, 한 프레임을 구성하는 샘플수 N, 인접 프레임간에 중복되어야 할 최소한의 샘플수 등을 고려하여 적절하게 결정되는 값이며, 상기 시간스케일 수정된 오디오신호 y(·)의 인접 프레임들의 시작점의 샘플간격을 나타낸다.Specifically, in the state in which the original audio signal x (·) is stored in the first memory (not shown) (Fig. 1 (a)), the frames are read one by one and written out in sequence to the second memory (not shown). In this process, the time scale is modified. When reading in units of frames from the first memory, as shown in FIG. 1B, the frames are divided into a plurality of consecutive frames F ₀ , F ₁ , F ₂ , F ₃ , ..., and adjacent frames. (F _m-1 , F _m ) are read in duplicate by N-SA samples. Here, N represents the number of samples constituting one frame, and SA is an integer value smaller than N, and represents the sample interval of the starting point of adjacent frames of the original audio signal. In this manner, each of the frames read out from the first memory by dividing the frames is divided into adjacent frames, as shown in (c) or (d) of FIG. 1, in order to modify the time scale according to the specified shift ratio α _o . The second memory is written so that the sample interval of the starting point is SS, that is, N-SS samples are overlapped between adjacent frames. And when repositioning the interval between adjacent frames in this way, as shown in (e) of Figure 1, the samples of the overlapping section (10a, 10b) of the adjacent frame is added by applying a weight, and the previous frame (F _m- Samples of the section 10c of the current frame F _m not overlapping with ₁ ) are copied as they are. In this case, SS has a relationship of SS = α _o SA and is appropriately determined in consideration of the maximum value of the transmission rate, the number of samples constituting one frame, and the minimum number of samples to overlap between adjacent frames. And a sample interval of a starting point of adjacent frames of the time-scale modified audio signal y (·).

이와 같은 OLA 알고리즘은 지정된 변속율에 정확하게 비례하여 시간스케일을 수정하므로, 시간스케일 수정된 오디오신호의 재생속도 내지 재생시간도 상기 지정된 변속율에 정확하게 비례하여 변경된다. OLA 알고리즘의 이와 같은 특성은, 오디오신호와 비디오신호가 통합된 멀티미디어 데이터를 변속 재생함에 있어서, OLA 알고리즘을 적용하여 오디오신호를 변속하면 비디오신호와 오디오신호의 동기는 정확하게 보장될 수 있다는 점에 주목할 필요가 있다. 하지만, OLA 알고리즘은, 오디오신호의 시간스케일을 수정함에 있어서 인접 프레임간의 파형 유사도와 같은 상관도를 전혀 고려하지 않기 때문에, 원래의 오디오신호가 갖는 피치 정보를 왜곡시키는 단점을 갖는다. 오디오신호의 스펙트럼 특성(spectral characteristics)이나 피치 주기(pitch periods)는 인접 프레임을 단순히 아무 지점에서(at any points) 오버랩 시켜 합한다고 하여 원상태로 유지되는 것이 보장될 수는 없다. 나아가 클릭(clicks), 노이즈 버스트(burst of noise), 반향(reverberation) 등이 시간스케일 수정된 신호에 나타날 수 있다. 그 결과 시간스케일 수정된 신호의 재생음의 품질이 원래의 오디오신호에 비해 크게 나빠진다.Since the OLA algorithm corrects the time scale in proportion to the specified speed change rate, the reproduction speed or reproduction time of the time scale-modified audio signal is also changed in proportion to the specified speed change rate. This characteristic of the OLA algorithm is that, in shifting and reproducing multimedia data in which an audio signal and a video signal are integrated, if the audio signal is shifted by applying the OLA algorithm, the synchronization of the video signal and the audio signal can be guaranteed accurately. There is a need. However, since the OLA algorithm does not consider any correlation such as waveform similarity between adjacent frames when modifying the time scale of the audio signal, the OLA algorithm has a disadvantage of distorting the pitch information of the original audio signal. The spectral characteristics or pitch periods of an audio signal cannot be guaranteed to remain intact by simply overlapping adjacent frames at any point. Further clicks, burst of noise, reverberation, etc. may appear in the time scale corrected signal. As a result, the quality of the reproduction sound of the time-scale corrected signal is significantly worse than that of the original audio signal.

원래의 오디오신호의 피치정보를 최대한 유지하면서 시간스케일을 수정할 수 있는 방법이 필요하게 되었고, 이러한 방법의 대표적인 예로서 SOLA 또는 WSOLA 알고리즘이 제안되었다. SOLA 내지 WSOLA 알고리즘은 기본적으로는 OLA 알고리즘에 따르되, 원래의 오디오신호의 피치정보의 왜곡을 최소화하기 위해 원래의 오디오신호를 연속되는 다수의 프레임으로 구분할 때 인접된 두 프레임간의 파형 유사성을 조사하여 가장 유사성이 좋은 위치 즉, 최대 상관도(best correlation)가 얻어지는 지점에서 인접되는 두 프레임을 오버랩 시킴으로써 원래의 오디오신호의 피치정보를 음질저하를 최소화하는 방법이다. 즉, SOLA 또는 WSOLA 알고리즘은 원래의 신호를 다수의 프레임으로 구분하여 상기 제1 메모리로부터 읽어올 때, 각 프레임을 항상 SA의 정수배인 mSA번째의 샘플부터 읽어오는 것이 아니라, 인접프레임간에 최대 상관도가 얻어지는 지점만큼 변위(이를 동기지연(synchronization lag) K_m이라 함)를 적용한 지점, 즉 mSA+K_m번째의 샘플부터 N-K_m개의 샘플을 읽어온다. 이렇게 읽어들인 각 프레임을 상기 제2 메모리에 기록할 때의 인접 프레임의 시작점의 간격은 위 OLA 방식과 같이 항상 SS개의 샘플 간격을 유지한다. 그 결과 제2 메모리에 저장되는 시간스케일 수정된 신호는 인접 프레임간에 최대 상관도를 갖도록 연쇄되므로 원래의 오디오신호의 피치정보를 거의 동일하게 유지한다.There is a need for a method capable of modifying the time scale while maintaining the pitch information of the original audio signal as much as possible. As a representative example of such a method, an SOLA or WSOLA algorithm has been proposed. The SOLA to WSOLA algorithm basically follows the OLA algorithm, but when the original audio signal is divided into a plurality of consecutive frames in order to minimize the distortion of the pitch information of the original audio signal, the SOLA to WSOLA algorithm is most closely examined. It is a method of minimizing the sound quality degradation of the pitch information of the original audio signal by overlapping two adjacent frames at a location where good similarity, that is, a best correlation is obtained. That is, when the SOLA or WSOLA algorithm divides the original signal into a plurality of frames and reads from the first memory, each frame is not always read from the mSA-th sample, which is an integer multiple of SA, but the maximum correlation between adjacent frames. NK _m samples are read from the point where the displacement (which is called the synchronization lag K _m ) is applied, i.e., the mSA + K _m th sample. When each frame thus read is written into the second memory, the interval of the start point of the adjacent frame always maintains SS sample intervals as in the above OLA method. As a result, the time-scale modified signal stored in the second memory is concatenated to have the maximum correlation between adjacent frames, so that the pitch information of the original audio signal is kept almost the same.

그런데, 이 방법에 의하면, OLA 알고리즘과는 달리, 제2 메모리에 기록되는 각 프레임의 길이가 동기지연 K_m의 크기에 따라 달라지므로 시간스케일 수정된 오디오신호의 전체적인 샘플길이(즉, 재생시간)는 원래의 오디오신호와 지정된 변속율을 곱한 값과 정확하게 일치하지 않는다는 점에 주목할 필요가 있다. 즉, 실제로는 변속율이, OLA 알고리즘의 변속율 SS/SA과 다른 값, 즉 [SS+(K_m의 평균값)]/SA 로 적용된 결과가 된다. 또한, 각 프레임에 대해서는 실제 변속율이 (SS+K_m)/SA로 적용된 것으로 볼 수 있다. 만약 오디오신호의 어떤 구간에서 연속되는 각 프레임 (F_m-1, F_m, F_m+1, ...)에 대한 동기지연 K_m의 부호가 동일하면, 그 구간에서는 동기지연의 절대값의 누적치는 상쇄되지 않고 계속 증가하게 되고, 그 결과 그 구간에 해당되는 시간스케일 수정된 신호의 재생길이(즉, 재생속도)는 상기 지정된 변속율에비례하는 재생길이(즉, 재생속도)와 큰 오차를 갖게 된다. 이와 같은 재생속도의 불규칙화 현상은, 오디오 신호만을 변속 재생하는 경우에는 큰 문제가 아닐 수도 있지만, 비디오신호와 오디오신호가 통합되어 있는 멀티미디어 신호를 변속 재생하기 위해 SOLA 또는 WSOLA 알고리즘을 적용하여 오디오신호의 시간스케일을 수정하는 경우에 비디오신호와 오디오신호의 동기불일치라는 문제를 야기한다. 왜냐하면, 일반적인 비디오신호의 변속방법은 비디오 프레임 수를 지정된 변속율에 정확하게 비례하여 수정하므로, 변속 처리된 비디오신호는 지정된 변속율에 정확하게 비례하는 크기의 일정한 속도로 재생되기 때문이다. 동기불일치의 정도가 심하면 비디오화면상의 화자의 입과 그 화자의 재생음이 일치되지 않는 치명적인 문제를 유발한다.According to this method, however, unlike the OLA algorithm, the length of each frame recorded in the second memory varies depending on the size of the synchronization delay K _m , so that the overall sample length of the time-scale-corrected audio signal (that is, the playback time) It is to be noted that is not exactly the same as the product of the original audio signal and the specified transmission rate. In other words, the speed ratio is actually the result of applying a value different from the speed ratio SS / SA of the OLA algorithm, that is, [SS + (average value of K _m )] / SA. In addition, it can be seen that the actual transmission rate is applied to (SS + K _m ) / SA for each frame. If the sign of the synchronization delay K _m for each successive frame (F _m-1 , F _m , F _{m + 1} , ...) in a section of the audio signal is the same, then the absolute value of the The cumulative value does not cancel and continues to increase, so that the reproduction length (i.e., reproduction speed) of the time-scale-corrected signal corresponding to the interval is largely different from the reproduction length (i.e., reproduction speed) in proportion to the specified speed ratio. Will have Such irregularities in the playback speed may not be a big problem when shifting and playing only audio signals. However, the SOLA or WSOLA algorithm is applied to shift and reproduce multimedia signals in which video and audio signals are integrated. In the case of correcting the time scale, the video signal and the audio signal are out of sync. This is because the conventional video signal shifting method corrects the number of video frames in proportion to the designated speed ratio, so that the shifted video signal is reproduced at a constant speed with a size that is exactly proportional to the designated speed ratio. If the degree of inconsistency is high, it causes a fatal problem in which the speaker's mouth on the video screen does not match the playback sound of the speaker.

이상의 사항들을 고려하여 본 발명은 원래의 오디오 신호를 시간스케일 수정하여 얻은 신호가 피치 정보를 거의 원상태로 보존하면서도, 지정된 변속율에 거의 정확하게 비례되는 재생시간을 갖도록 하는 오디오신호의 시간스케일 수정방법을 제공하는 것을 제1의 목적으로 한다.In view of the above, the present invention provides a method of correcting the time scale of an audio signal such that a signal obtained by time-scaling the original audio signal has a reproduction time that is almost exactly proportional to a specified transmission rate while preserving the pitch information almost intact. It is a primary object to provide.

본 발명은 또한 이러한 오디오신호의 시간스케일 수정방법을 실행 가능하게 하는 장치를 제공하는 것을 제2의 목적으로 한다.It is also a second object of the present invention to provide an apparatus which makes it possible to implement such a method of correcting the time scale of an audio signal.

도 1은 OLA 알고리즘을 적용하여 오디오신호의 시간스케일을 수정할 때 오디오신호가 어떻게 프레임들로 잘려졌다가 다시 합성되는지를 보여준다.Figure 1 shows how an audio signal is cut into frames and then synthesized again when the time scale of the audio signal is modified by applying the OLA algorithm.

도 2는 지정된 변속율 α_o가 1보다 훨씬 클 때, 즉 α_o> SS/(SS-L)일 때, 본 발명의 시간스케일 방법에 따라 오디오신호의 시간스케일을 수정하는 개념을 설명하기 위한 도면이다.FIG. 2 illustrates the concept of modifying the time scale of an audio signal according to the time scale method of the present invention when the specified shift ratio α _o is much larger than 1, that is, α _o > SS / (SS-L). Drawing.

도 3은 지정된 변속율 α_o가 1보다 약간 클 때, 즉 α_o< SS/(SS-L)일 때, 본 발명의 시간스케일 방법에 따라 오디오신호의 시간스케일을 수정하는 개념을 설명하기 위한 도면이다.FIG. 3 illustrates the concept of modifying the time scale of an audio signal according to the time scale method of the present invention when the specified shift ratio α _o is slightly larger than 1, that is, α _o <SS / (SS-L). Drawing.

도 4는 본 발명에 따른 오디오신호의 시간스케일 수정방법을 실행하는 장치의 구성을 예시적으로 도시한 블록도이다.4 is a block diagram illustrating a configuration of an apparatus for executing a time scale correction method of an audio signal according to the present invention.

도 5a와 5b는 본 발명에 따른 오디오신호의 시간스케일 수정방법을 실행하는 절차를 도시한 흐름도이다.5A and 5B are flowcharts illustrating a procedure for executing a method for correcting a time scale of an audio signal according to the present invention.

도 6은 본 발명에 따른 오디오신호의 시간스케일 수정방법을 실행하는 과정에서 시간스케일 수정된 오디오신호의 재생시간을 지정된 변속율에 이상적으로 비례하는 재생시간으로 강제 수렴시키기 위한 변속율 수정 개념을 도시한다.FIG. 6 illustrates a speed change correction concept for forcibly converging a reproduction time of a time scale modified audio signal to a reproduction time ideally proportional to a specified transmission rate in a process of executing a time scale correction method of an audio signal according to the present invention. do.

도 7은 재생길이 20초의 원래의 오디오신호의 파형도이다.7 is a waveform diagram of an original audio signal having a reproduction length of 20 seconds.

도 8의 (a)는 지정된 변속율 α_o가 1.2일 때, 앞서 설명한 스캔범위 수정 개념만을 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정했을 때 얻어진 신호의 파형도이다.FIG. 8A is a waveform diagram of a signal obtained when the audio signal of FIG. 7 is time-scale corrected using the WSOLA algorithm to which only the scan range correction concept described above is applied when the specified shift ratio α _o is 1.2.

도 8의 (b)는 지정된 변속율 α_o가 1.2일 때, 스캔범위 수정 및 변속율 보정 개념을 함께 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정했을 때 얻어진 신호의 파형도이다.FIG. 8B is a waveform diagram of a signal obtained when the audio signal of FIG. 7 is time-scale corrected using the WSOLA algorithm to which a scan range correction and a shift rate correction concept are applied together when the designated shift ratio α _o is 1.2. .

도 9의 (a)는 지정된 변속율 α_o가 2.0인 경우 스캔범위 수정 개념만을 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정하여 얻어진 신호의 파형도이다.FIG. 9A is a waveform diagram of a signal obtained by time-scale correcting an audio signal of FIG. 7 using a WSOLA algorithm to which only a scan range correction concept is applied when a designated shift ratio α _o is 2.0.

도 9의 (b)는 지정된 변속율 α_o가 2.0인 경우 스캔범위 수정 및 변속율 보정 개념을 함께 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정하여 얻어진 신호의 파형도이다.FIG. 9B is a waveform diagram of a signal obtained by time-scaling the audio signal of FIG. 7 using the WSOLA algorithm that applies a scan range correction and a shift rate correction concept when the designated shift ratio α _o is 2.0.

** 도면의 주요부분에 대한 부호의 설명 **** Explanation of symbols for main parts of drawings **

100: 신호처리기100: signal processor

110: 변속엔진110: variable speed engine

120: 제1 메모리120: first memory

130: 제2 메모리130: second memory

상기 제1 목적을 달성하기 위한 본 발명의 일 측면에 따르면,According to an aspect of the present invention for achieving the first object,

오디오신호를 연속되는 다수의 프레임으로 구분하되 인접 프레임끼리는 제1샘플수만큼 중복되도록 구분하고, 구분된 상기 인접 프레임간에 제2 샘플수가 중복되도록 상기 인접 프레임의 간격을 압축하거나 신장시켜 상기 오디오신호의 시간스케일을 수정하며, 상기 제2 샘플수는 상기 제1 샘플수와 사용자가 지정한 최초변속율을 곱한 값에 상기 인접 프레임간의 최대상관도 지점에서 구해지는 동기지연을 더 반영하여 정해지는 것을 특징으로 하는 오디오신호의 시간스케일 수정방법에 있어서,The audio signal is divided into a plurality of consecutive frames, and adjacent frames are separated by a first number of samples, and the interval of the adjacent frames is compressed or extended so that a second number of samples is overlapped between the separated adjacent frames. And modifying a time scale, wherein the second sample number is determined by further reflecting a synchronization delay obtained at a point of maximum correlation between adjacent frames by a value obtained by multiplying the first sample rate by a user-specified initial transmission rate. In the time scale correction method of an audio signal,

특히 한 프레임씩 시간스케일을 수정할 때마다 동기지연을 산출하고 이전까지 누적된 동기지연과 합하여 상기 동기지연의 누적치를 산출하고, 상기 누적치가 소정의 허용오차범위를 초과하면 상기 누적치를 감소시킬 수 있는 새로운 변속율을 적용하여 상기 오디오신호의 시간스케일을 수정하는 것을 특징으로 하는 오디오신호의 시간스케일 수정방법이 제공된다.In particular, each time the time scale is modified by one frame, the synchronization delay is calculated, and the cumulative value of the synchronization delay is calculated by adding up the previously accumulated synchronization delay, and when the cumulative value exceeds a predetermined tolerance range, the cumulative value can be reduced. Provided is a time scale correction method of an audio signal, wherein the time scale of the audio signal is modified by applying a new speed ratio.

상기 허용오차범위는, 오디오신호와 비디오신호가 통합된 멀티미디어 신호를 재생할 때, 재생화면과 재생음의 동기불일치 정도가 부자연스럽게 느껴지지 않을 정도의 범위 안에서 정해지는 것이 바람직하다.The tolerance range is preferably set within a range in which the degree of synchronization mismatch between the playback picture and the playback sound is unnaturally felt when playing a multimedia signal in which an audio signal and a video signal are integrated.

또한, 상기 새로운 변속율을 적용한 시간스케일 수정에 의해 상기 누적치가 감소되어 상기 허용오차범위 이내에서 정의된 리셋범위 안으로 들어오면, 상기 새로운 변속율을 상기 최초 변속율로 다시 회복시켜 적용하는 것이 바람직하다.In addition, when the cumulative value is reduced by the time scale correction to which the new speed change rate is applied and falls within the reset range defined within the tolerance range, the new speed change rate is preferably restored to the initial speed change rate and applied. .

상기 수정방법에서, 상기 최대 상관도 지점은 소정의 스캔범위 내에서 두 개의 인접 프레임간의 파형유사도가 가장 높은 지점에서 결정된다.In the correction method, the maximum correlation point is determined at the point with the highest waveform similarity between two adjacent frames within a predetermined scan range.

나아가, 상기 스캔범위는 상기 최초변속율의 상한값 등을 고려하여 두 개의인접프레임간의 최대 파형유사도를 찾을 수 있는 충분한 범위로 정하되, 상기 오디오신호의 현재 프레임의 시작지점(mSA)이 상기 스캔범위 외부에 위치되도록 상기 스캔범위를 정하는 것이 바람직하다.Further, the scan range is determined to be a range sufficient to find the maximum waveform similarity between two adjacent frames in consideration of the upper limit value of the initial transmission rate, etc., wherein the start point (mSA) of the current frame of the audio signal is outside the scan range. It is desirable to define the scan range to be located at.

상기 제1의 목적을 달성하기 위한 본 발명의 다른 측면에 따르면,According to another aspect of the present invention for achieving the first object,

사용자가 설정한 변속율로 오디오신호의 시간스케일을 수정하기 위해 연속되는 프레임들을 중첩 및 부가 원리(overlap and add principle)를 이용함과 동시에, 상기 오디오신호의 원래의 피치정보를 유지하기 위해 상관도가 최대인 지점에서 두 개의 인접 프레임을 연쇄시키는(concatenate) 방식의 소정 알고리즘을 이용하여 상기 오디오신호의 시간스케일을 수정하는 방법에 있어서,In order to maintain the original pitch information of the audio signal while using overlap and add principle, successive frames are used to correct the time scale of the audio signal at a user-specified transmission rate. In a method of modifying the time scale of the audio signal using a predetermined algorithm of a method of concatenating two adjacent frames at the maximum point,

상기 소정의 알고리즘을 적용하여 한 프레임씩 시간스케일을 수정할 때마다, 소정의 스캔범위 내에서 시간스케일 수정된 신호의 마지막 프레임과의 상기 오디오신호의 현재프레임간의 최대 상관도(best cross-correlation)를 나타내는 지점과 상기 스캔범위의 중심 지점간의 샘플간격인 동기지연(synchronization lag)을 찾아내는 단계;Whenever the time scale is modified by one frame by applying the predetermined algorithm, the best cross-correlation between the current frame of the audio signal and the last frame of the time scale-corrected signal is determined within a predetermined scan range. Finding a synchronization lag that is a sample interval between a representative point and a center point of the scan range;

찾아낸 동기지연을 이전까지의 동기지연 누적치와 합하여 새로운 동기지연의 누적치를 산출하는 단계; 및Calculating a cumulative value of the new synchronization delay by adding the found synchronization delay to the previous accumulated value of the synchronization delays; And

산출된 동기지연의 누적치를 소정의 허용오차범위와 비교하여, 상기 누적치가 상기 허용오차범위를 벗어나면 이후 프레임부터는 적용 변속율을 상기 동기지연의 누적치를 감소시켜 상기 허용오차범위 이내로 수렴시킬 수 있는 새로운 변속율로 보정하여 시간스케일을 수정하는 것을 특징으로 하는 오디오신호의 시간스케일수정방법이 제공된다.Comparing the calculated cumulative delay value with a predetermined tolerance range, if the cumulative value is out of the tolerance range, the applied transmission rate can be reduced from the next frame and converged within the tolerance range from the subsequent frame. There is provided a time scale correction method of an audio signal, wherein the time scale is corrected by correcting with a new speed ratio.

상기 수정방법에 있어서, 바람직하게는 상기 소정의 알고리즘은, 상기 스캔범위 내에서 두 개의 인접 프레임을 슬라이딩하면서 파형유사도가 가장 높은 지점을 상기 최대상관도로 정의하는 SOLA , WSOLA 또는 이와 등가적인 알고리즘이다.In the above correction method, preferably, the predetermined algorithm is SOLA, WSOLA, or an equivalent algorithm that defines a point having the highest waveform similarity while sliding two adjacent frames within the scan range.

상기 수정방법에 있어서, 상기 시간스케일 수정된 신호의 인접프레임의 시작점 샘플간격 SS에서 상기 오디오신호의 인접프레임의 시작점 샘플간격 SA의 차이 SS-SA를 산출하고, 상기 스캔범위는 |SS-SA| > L인 경우에는 상기 스캔범위는 상기 현재프레임의 시작점부터 상기 SA개 건너뛴 샘플 mSA을 중심으로 좌우로 L개의 샘플을 포함하는 범위 즉, mSA-L ~ mSA+L로 정의되고, 그 밖의 경우에는 mSA-(SS-SA) ~ mSA+2L-(SS-SA)로 정의되는 것이 바람직하다. 여기서 상기 변수 L과 m은 각각 상기 스캔범위의 절반과 프레임 인덱스를 나타낸다.In the above correction method, the difference SS-SA of the starting point sample interval SA of the adjacent frame of the audio signal is calculated from the starting point sample interval SS of the adjacent frame of the time scale corrected signal, and the scan range is | SS-SA | > L, the scan range is defined as a range including L samples left and right about the SA m skipped sample mSA from the start of the current frame, that is, mSA-L to mSA + L, otherwise It is preferable to define mSA- (SS-SA) to mSA + 2L- (SS-SA). The variables L and m represent half of the scan range and frame index, respectively.

상기 수정방법은, 상기 새로운 변속율을 적용하여 상기 오디오신호의 시간스케일을 수정하는 동안에도 상기 동기지연의 누적치를 주기적으로 산출하여 그 산출된 누적치가 상기 허용오차범위 안으로 수렴하는지 여부에 따라 이후 주기부터 적용할 변속율을 재조정하는 단계를 더 구비하는 것이 바람직하다.In the correction method, the cumulative value of the synchronization delay is periodically calculated even while the time scale of the audio signal is corrected by applying the new shift rate, and the subsequent period depends on whether the calculated cumulative value converges within the tolerance range. It is preferable to further include the step of adjusting the speed change to be applied from.

나아가 상기 수정방법은, 상기 새로운 변속율을 적용한 시간스케일 수정에 의해 상기 누적치가 감소되어 상기 허용오차범위 이내에서 더 좁은 범위로 정의되는 리셋범위 안으로 들어오면, 상기 새로운 변속율을 상기 사용자가 설정한 변속율로 다시 회복시켜 적용하는 단계를 더 구비하는 것이 바람직하다.Further, the correction method, when the cumulative value is reduced by the time scale correction to which the new speed change rate is applied and falls within a reset range defined as a narrower range within the tolerance range, the user sets the new speed change rate. It is preferable to further include the step of applying again to the speed change rate.

더 나아가, 상기 수정방법은 보정모드의 기본값은 오프로 설정하고, 상기 동기지연의 누적치의 크기가 상기 허용오차범위를 초과하는 경우에는 상기 보정모드는 온으로 변경하며, 상기 보정모드가 온 인 경우 상기 동기지연의 누적치가 이전 주기까지의 동기지연의 누적치와 비교하여 증가한 경우에는 다음주기에서 상기 동기지연의 누적치가 상기 허용오차범위를 초과하는지 체크되도록 하고, 감소하는 경우에는 상기 보정모드를 온으로 유지시켜 현재주기에 적용된 변속율이 그대로 적용되도록 하는 단계를 더 구비하는 것이 바람직하다.Further, in the correction method, the default value of the correction mode is set to off, the correction mode is changed to on when the magnitude of the accumulated value of the synchronization delay exceeds the tolerance range, and the correction mode is on. If the cumulative value of the synchronization delay increases compared to the cumulative value of the synchronization delay up to the previous period, check that the cumulative value of the synchronization delay exceeds the tolerance range in the next period, and if it decreases, turn on the correction mode. It is preferable to further include the step of maintaining so that the transmission rate applied in the current cycle is applied as it is.

상기 제1의 목적을 달성하기 위한 본 발명의 또 다른 측면에 따르면,According to another aspect of the present invention for achieving the first object,

원래의 오디오신호를 제1 메모리에 저장해두고, 상기 원래의 오디오신호를 상기 제1 메모리로부터 한 프레임씩 읽어들여 인접프레임의 시간스케일이 사용자가 설정한 최초변속율에 따라서 수정되도록 재배치하여 제2 메모리에 저장하는 오디오신호의 시간스케일 수정 방법에 있어서,The original audio signal is stored in the first memory, the original audio signal is read out one frame from the first memory, and the second audio is rearranged so that the time scale of adjacent frames is modified according to the initial transmission rate set by the user. In the time scale correction method of an audio signal stored in

상기 제1 메모리의 상기 원래의 오디오신호로부터 한 프레임씩 차례로 읽어들이되, 현재프레임은 이전프레임과는 소정개의 샘플이 오버랩 되도록 mSA-L 번째 샘플부터 N+L개의 샘플을 읽어내는 단계로서, 상기 변수 N은 단위 프레임을 구성하는 샘플수이며, 상기 변수 SA는 SS/α로 정의되며, 상기 변수 SS는 상기 시간스케일 수정된 오디오신호의 연속 프레임들의 시작점들 사이의 샘플수로 정의되며, 상기 변수 α는 각 프레임주기별 적용 변속율로서 상기 최초변속율 또는 수정된 변속율을 나타내며, 상기 변수 m은 프레임 인덱스를 나타내며, 상기 L은 소정의 스캔범위의 절반에 해당하는 샘플수를 나타내는 단계;Read one by one frame from the original audio signal of the first memory, the current frame is to read N + L samples from the mSA-L th sample so that a predetermined sample overlaps with the previous frame, The variable N is the number of samples constituting the unit frame, the variable SA is defined as SS / α, the variable SS is defined as the number of samples between the start points of the successive frames of the time-scale modified audio signal, α represents an applied transmission rate for each frame period, the initial transmission rate or a modified transmission rate, the variable m represents a frame index, and L represents a number of samples corresponding to half of a predetermined scan range;

상기 스캔범위 내에서 상기 현재프레임을 슬라이딩하면서 상기 시간스케일 수정된 오디오신호의 마지막 프레임과 상기 읽어들인 현재프레임간의 파형유사도를 계산하는 단계;Calculating a waveform similarity between the last frame of the time-scale-modified audio signal and the read current frame while sliding the current frame within the scan range;

매 프레임 주기마다, 동기지연 K_m을 산출하는 단계로서, 상기 동기지연 K_m은 계산된 파형유사도들 중에서 최대값을 갖는 지점과 해당주기의 시간스케일 수정된 신호의 시작지점(mSS) 간의 샘플수의 차이로 정의되는 단계;Each frame period, the method comprising calculating a synchronization delay K _m, the synchronization delay K _m is the number of samples between the time scale of having the maximum value among the calculated waveform similarity point and the cycle correction signal starting point (mSS) Defined by the difference of;

상기 읽어들인 현재프레임에서 산출된 동기지연 K_m을 적용한 N-K_m개의 샘플을 상기 제2 메모리의 상기 마지막 프레임에 mSS번째 샘플지점부터 부가하되 상기 현재프레임과 상기 마지막 프레임 양측의 중복되는 샘플은 가중치함수를 적용하여 합산하는 단계;NK _m samples to which the synchronization delay K _m calculated in the read current frame is applied are added to the last frame of the second memory from the mSS-th sample point, and the duplicated samples of both the current frame and the last frame are weighted. Summing by applying;

새로운 동기지연 K_m이 산출될 때마다 이전프레임주기까지의 동기지연의 누적치와 합하여 현재프레임주기까지의 동기지연 누적치를 새로이 산출하는 단계; 및Each time a new synchronization delay K _m is calculated, adding a cumulative value of the synchronization delays up to the previous frame period to newly calculate the synchronization delays up to the current frame period; And

산출된 새로운 동기지연 누적치가 소정의 허용오차범위를 벗어나는지를 체크하고, 벗어나는 것으로 확인되면 다음프레임주기부터 상기 적용 변속율의 값을 조정하여 상기 동기지연 누적치가 상기 허용오차범위 안으로 들어오도록 강제하는 단계를 구비하는 것을 특징으로 하는 오디오신호의 시간스케일 수정방법이 제공된다.Checking whether the calculated new synchronization delay cumulative value is out of a predetermined tolerance range, and if it is confirmed that the calculated new synchronization delay cumulative value is out of a predetermined tolerance range, forcing the synchronization delay cumulative value to fall within the tolerance range by adjusting a value of the applied transmission rate from the next frame period. Provided is a time scale correction method of an audio signal, comprising: a.

상기 수정방법에 있어서 상기 동기지연 누적치가 음의 방향으로 증가하여 상기 허용오차범위를 벗어나면 상기 적용 변속율 α을 상기 최초 변속율 α_o보다 작은 값으로 수정하고, 상기 누적된 동기지연오차가 양의 방향으로 증가하여 상기 허용오차범위를 벗어나면 상기 적용 변속율 α을 상기 최초 변속율 α_o보다 큰 값으로 수정하는 것이 바람직하다.In the correction method, if the cumulative synchronization delay value increases in the negative direction and falls outside the tolerance range, the applied shift ratio α is corrected to a value smaller than the initial transmission ratio α _o , and the accumulated synchronization delay error is positive. It is preferable to correct the applied transmission rate α to a value larger than the initial transmission rate α _o when it is increased in the direction of and falls outside the tolerance range.

또한, 상기 수정방법에 있어서, 상기 동기지연 누적치가 상기 허용오차범위 이내로 강제하기 위해, 상기 허용오차범위의 일부를 리셋범위로 정의하고, 상기 동기지연 누적치가 상기 허용오차범위를 벗어난 다음, 수정된 변속율의 적용하여 시간스케일을 수정한 결과 상기 동기지연 누적치가 감소하여 상기 리셋범위(reset range)안에 들어오면 상기 적용 변속율을 상기 최초 변속율 α_o로 회복하는 것이 바람직하다.In the correction method, a part of the tolerance range is defined as a reset range so that the synchronization delay accumulated value is within the tolerance range, and the synchronization delay accumulated value is out of the tolerance range and then modified. As a result of correcting the time scale by applying the speed change rate, when the synchronization delay cumulative value decreases and falls within the reset range, it is preferable to restore the applied speed change rate to the initial speed change rate α _o .

나아가, 상기 수정방법에 있어서, 상기 스캔범위는, |α_o|>SS/(SS-L) 인 경우에는 mSS 지점을 중심으로 좌우로 L개의 샘플을 포함하는 범위 즉, mSS-L ~ mSS+L로 정의되고, |α_o|≤SS/(SS-L) 인 경우에는 mSS-(SS-SA) ~ mSS+2L-(SS-SA)로 정의되는 것이 바람직하다.Further, in the above correction method, the scan range is a range including L samples from left to right around the mSS point when | α _o |> SS / (SS-L), that is, mSS-L to mSS +. When defined as L and | α _o | ≤SS / (SS-L), it is preferred to define mSS- (SS-SA) to mSS + 2L- (SS-SA).

한편, 상기 제2의 목적을 달성하기 위한 본 발명의 일 측면에 따르면,On the other hand, according to an aspect of the present invention for achieving the second object,

사용자가 설정한 변속율에 따라서 오디오신호를 시간스케일 수정하되, 소정의 스캔범위 내에서 두 개의 인접 프레임을 슬라이딩하면서 최상의 상관도를 찾아 시간스케일 수정 시에 반영하는 원리에 기초하여 상기 오디오신호의 시간스케일을 수정하는 장치에 있어서,The audio signal is time-scaled according to a user-specified transmission rate, but the time of the audio signal is based on the principle of finding and reflecting the best correlation while sliding two adjacent frames within a predetermined scan range. In the device for modifying the scale,

원래의 오디오 신호와 상기 원래의 오디오 신호를 사용자가 설정한 변속율에 의거하여 시간스케일 수정된 오디오 신호를 저장하기 위한 메모리수단; 및Memory means for storing the original audio signal and the audio signal corrected in time scale based on a transmission rate set by a user; And

오디오신호를 연속되는 다수의 프레임으로 구분하되 인접 프레임끼리는 제1 샘플수만큼 중복되도록 구분하고 구분된 상기 인접 프레임간에 제2 샘플수가 중복되도록 상기 인접 프레임의 간격을 압축하거나 신장시켜 상기 오디오신호의 시간스케일을 수정하되, 특히 한 프레임씩 시간스케일을 수정할 때마다 소정의 스캔범위 내에서 시간스케일 수정된 신호의 마지막 프레임과의 상기 오디오신호의 현재프레임간의 최대 상관도(best cross-correlation)를 나타내는 지점과 상기 스캔범위의 중심 지점간의 샘플간격인 동기지연을 산출하고 이전프레임주기까지 누적된 동기지연과 합하여 동기지연의 누적치를 산출하고, 산출된 동기지연의 누적치를 소정의 허용오차범위와 비교하여, 상기 누적치가 상기 허용오차범위를 벗어나면 이후 프레임부터는 적용 변속율을 상기 동기지연의 누적치를 감소시켜 상기 허용오차범위 이내로 수렴시킬 수 있는 새로운 변속율로 보정하여 시간스케일을 수정하는 신호처리수단을 구비하는 것을 특징으로 하는 오디오신호의 시간스케일 수정장치가 제공된다.The audio signal is divided into a plurality of consecutive frames, and adjacent frames are separated by a first number of samples, and the interval of the adjacent frames is compressed or extended so that a second number of samples is overlapped between the divided adjacent frames. A point indicating the best cross-correlation between the current frame of the audio signal and the last frame of the time-scale-corrected signal within a predetermined scan range, particularly when the time-scale is corrected one frame at a time. And calculating the synchronization delay which is the sample interval between the center point of the scan range and adding the synchronization delay accumulated up to the previous frame period, calculating the accumulation value of the synchronization delay, and comparing the calculated accumulation delay with the predetermined tolerance range. If the cumulative value is out of the tolerance range, the shift is applied from the next frame. And a signal processing means for correcting the time scale by correcting the rate to a new speed change rate that can reduce the cumulative value of the synchronization delay to converge within the tolerance range. .

상기 장치에 있어서, 상기 신호처리수단은, 상기 시간스케일 수정된 신호의 인접프레임의 시작점 샘플간격 SS에서 상기 오디오신호의 인접프레임의 시작점 샘플간격 SA의 차이 SS-SA를 산출하고, 상기 스캔범위는 |SS-SA| > L인 경우에는 상기 스캔범위를 상기 현재프레임의 시작점부터 상기 SA개 건너뛴 샘플 mSA을 중심으로 좌우로 L개의 샘플을 포함하는 범위 즉, mSA-L ~ mSA+L로 정의되고, 그 밖의 경우에는 mSA-(SS-SA) ~ mSA+2L-(SS-SA)로 정의한다. 여기서 상기 정수(integer) L과 m은 각각 상기 스캔범위의 절반과 프레임 인덱스를 나타낸다.In the apparatus, the signal processing means calculates the difference SS-SA of the starting point sample interval SA of the adjacent frame of the audio signal from the starting point sample interval SS of the adjacent frame of the time scale corrected signal, and the scan range is | SS-SA | > L, the scan range is defined as a range including L samples left and right around the sample mSA skipped SA from the start of the current frame, that is, mSA-L to mSA + L, otherwise MSA- (SS-SA) to mSA + 2L- (SS-SA). Wherein the integers L and m represent half the scan range and the frame index, respectively.

또한, 상기 장치에 있어서, 상기 신호처리수단은 상기 새로운 변속율을 적용하여 상기 오디오신호의 시간스케일을 수정하는 동안에도 상기 동기지연의 누적치를 주기적으로 산출하여 그 산출된 누적치가 상기 허용오차범위 안으로 수렴하는지 여부에 따라 이후 주기부터 적용할 변속율을 재조정하는 것이 바람직하다.Further, in the apparatus, the signal processing means periodically calculates a cumulative value of the synchronization delay while the time shift of the audio signal is corrected by applying the new shift rate, and the calculated cumulative value falls within the tolerance range. Depending on whether it is converging or not, it is advisable to readjust the transmission rate to apply from a later cycle.

나아가, 상기 장치에 있어서, 상기 신호처리수단은 상기 새로운 변속율을 적용한 시간스케일 수정에 의해 상기 누적치가 감소되어 상기 허용오차범위 이내에서 더 좁은 범위로 정의되는 리셋범위 안으로 들어오면, 상기 새로운 변속율을 상기 사용자가 설정한 변속율로 다시 회복시켜 적용하는 것이 바람직하다.Further, in the above apparatus, the signal processing means is adapted to reduce the cumulative value by a time scale correction to which the new speed change rate is applied, and to enter the reset range defined as a narrower range within the tolerance range. It is desirable to recover and apply to the transmission rate set by the user.

이와 같은 본 발명에 따르면, 오디오신호의 시간스케일을 사용자가 지정한 최초 변속율에 의거하여 수정하고, 이러한 수정과정에서 발생하는 재생시간의 누적오차를 매 주기마다 체크하여 그 크기가 미리 설정해둔 허용오차범위를 초과하면 다음 주기부터 적용되는 변속율의 크기를 수정하여 상기 재생시간의 누적오차가 곧바로 상기 허용오차범위 이내로 수렴되도록 강제해준다. 이에 의해 시간스케일 수정된 신호의 재생시간(즉, 재생속도)은 어느 시점을 막론하고 항상 사용자가 지정한 최초 변속율에 비례하는 이상적인 재생시간(즉, 재생속도)과 거의 오차를 갖지 않게 된다. 그러므로 본 발명을 비디오신호와 오디오신호가 통합된 멀티미디어 신호의 변속재생에 적용하면, 재생음과 재생화면의 동기는 정확하게 유지될 수 있다.According to the present invention as described above, the time scale of the audio signal is corrected based on the initial transmission rate designated by the user, and the cumulative error of the playback time occurring in such a correction process is checked every cycle, and the tolerance is set in advance. When the range is exceeded, the magnitude of the shift ratio applied from the next period is corrected to force the cumulative error of the reproduction time to immediately converge within the tolerance range. As a result, the playback time (i.e., playback speed) of the time-scale-corrected signal has almost no error with the ideal playback time (i.e., playback speed), which is always proportional to the initial transmission rate specified by the user at any point in time. Therefore, when the present invention is applied to the variable speed reproduction of the multimedia signal in which the video signal and the audio signal are integrated, the synchronization of the reproduction sound and the reproduction screen can be maintained accurately.

이하에서는 첨부한 도면을 참조하여 본 발명에 따른 오디오신호의 가공방법을 보다 구체적으로 설명하기로 한다.Hereinafter, a method of processing an audio signal according to the present invention will be described in more detail with reference to the accompanying drawings.

도 4는 본 발명에 따른 오디오신호의 시간스케일 수정방법을 적용하기 위한장치의 구성을 예시적으로 도시한 블록도이다. 본 발명의 방법은 디지털 오디오신호를 저장할 수 있는 메모리(120, 130)와 이 메모리에 저장된 디지털 오디오신호를 읽어들여 시간스케일을 수정하기 위한 신호처리를 할 수 있도록 프로그램 된 신호처리기(100)(또는 그러한 신호처리 기능을 갖는 프로그램을 예컨대 하드디스크와 같은 데이터저장부(140)로부터 로딩 하여 그 프로그램을 실행할 수 있는 신호처리기(100))를 구비하는 장치에서 실행될 수 있다. 신호처리기(100)는 예컨대 디지털신호처리기(DSP), 마이콤 또는 중앙연산처리유닛(CPU) 등이 될 수 있다.4 is a block diagram exemplarily illustrating a configuration of an apparatus for applying a time scale correction method of an audio signal according to the present invention. The method of the present invention includes a memory processor 120 and 130 capable of storing digital audio signals, and a signal processor 100 (or programmed to perform signal processing for modifying a time scale by reading digital audio signals stored in the memory). A program having such a signal processing function may be executed in an apparatus having a signal processor 100 capable of loading the program from the data storage 140 such as a hard disk and executing the program. The signal processor 100 may be, for example, a digital signal processor (DSP), a microcomputer, or a central processing unit (CPU).

원래의 오디오신호 x(·)는 시간에 대한 음의 세기를 정수 스트림으로 표현한 디지털신호로서, 그 최대값에 의해 정규화 되어 비트 스트림의 형태로 만들어지는 것이 바람직하다. 이 오디오신호는 시간스케일 수정에 앞서 제1 메모리(120)에 저장된다. 변속엔진(110)은 본 발명에 따라 시간스케일 수정을 할 수 있는 프로그램으로서 신호처리기(100)에 의해 실행된다. 신호처리기(100)는 제1 메모리(120)에 저장된 오디오신호를 한 프레임씩 순차적으로 읽어와서 지정된 변속율 α_o에 따라 시간스케일수정을 위한 처리를 한 다음, 처리된 각 프레임을 제2 메모리(130)에 순차적으로 기록한다.The original audio signal x (·) is a digital signal expressing the intensity of sound with respect to time as an integer stream, which is preferably normalized by its maximum value and made into a bit stream. This audio signal is stored in the first memory 120 prior to the time scale correction. The speed change engine 110 is executed by the signal processor 100 as a program capable of time scale correction according to the present invention. The signal processor 100 sequentially reads the audio signals stored in the first memory 120 one frame at a time, performs a process for time-scale correction according to the specified shift rate α _o , and stores each processed frame in a second memory ( 130) sequentially.

도 5는 변속엔진(110)에 의해 실행되는 본 발명의 시간스케일 수정방법의 개념을 흐름도로 도시하고 있다. 이 흐름도를 참조하여 본 발명의 시간스케일 수정방법을 보다 구체적으로 설명하기로 한다.5 is a flowchart illustrating the concept of a time scale correction method of the present invention executed by the speed change engine 110. The time scale correction method of the present invention will be described in more detail with reference to this flowchart.

먼저, 한 프레임을 구성하는 샘플 수 N과 상기 변수 SS 등을 앞서 설명한 개념에 기초하여 적절한 상수값들, A, B로 설정한다. 그리고 최대상관도를 탐색하기 위한 스캔범위도 정의한다. 최초의 스캔범위는 하한값 K_min과 상한값 K_max을 각각 mSS-L ~ mSS+L로 설정한다. 여기서 변수 m은 프레임 인덱스이고, 변수 L은 스캔범위의 절반을 구성하는 샘플수를 의미하며, 지정 가능한 변속율의 최대값, 한 프레임을 구성하는 샘플수 N, 인접 프레임간에 중복되어야 할 최소한의 샘플수 등을 고려하여 적절하게 결정되는 값이다. 또한 사용자가 입력수단(비도시)을 통해 지정한 변속율의 값 α_o가 변속율 변수 α에 할당되고, 각 프레임별 동기지연 K_m의 누적치를 나타내는 변수 Err은 0으로 초기화되며, 변속율을 보정할 것인지를 나타내는 보정모드(rev_mode)는 오프로 설정된다. 나아가 상기 변수 SA의 값을 관계식 SA=SS/α를 이용하여 미리 계산해둔다 (S10 단계). 시간스케일수정 과정에서 새로운 변속율 α이 입력되면, 변속엔진(110)에는 인터럽트가 걸리어 새로운 변속율을 기준으로 위와 같은 초기화를 다시 한다.First, the number N of samples constituting a frame, the variable SS, and the like are set to appropriate constant values A and B based on the concept described above. It also defines the scan range for searching the maximum correlation. The initial scan range sets the lower limit K _min and the upper limit K _max to mSS-L to mSS + L, respectively. Where variable m is the frame index, variable L is the number of samples that make up half of the scan range, the maximum value of the variable rate, the number of samples that make up one frame, and the minimum number of samples that need to overlap between adjacent frames The value is appropriately determined in consideration of the number and the like. In addition, the value α _o of the speed change designated by the user through the input means (not shown) is assigned to the speed change variable α, and the variable Err indicating the cumulative value of the synchronization delay K _m for each frame is initialized to 0, and the speed change is corrected. The correction mode (rev_mode) indicating whether or not is set to off. Further, the value of the variable SA is calculated in advance using the relation SA = SS / α (step S10). When a new speed change α is input in the time scale correction process, the transmission engine 110 is interrupted and the above initialization is repeated based on the new speed change rate.

다음으로, 지정된 변속율의 크기에 따라 스캔범위의 수정여부를 판단하고, 수정이 필요한 경우에는 초기에 설정된 스캔범위를 수정한다 (S12, S14 단계).Next, it is determined whether or not the scan range is corrected according to the size of the designated shift rate, and when the correction is necessary, the initially set scan range is corrected (steps S12 and S14).

이를 위해 먼저 상기 변수 SS의 값과 상기 변수 SA의 값의 차이(lag=SS-SA)를 최초 설정된 스캔 범위의 절반의 크기 L을 비교하고, 이 비교 결과에 따라 최종적인 스캔범위(final scan range)를 확정한다. 스캔 범위의 절반 L의 값이 상기 차이의 절대값 |SS-SA| 보다 클 경우에는 스캔범위의 수정이 불필요하므로 최초 설정된 스캔범위를 그대로 유지한다. 하지만 상기 절대값 |SS-SA|의 크기가 상기 L보다작을 경우에는, 지정된 변속율에 비례하는 시간스케일 수정을 하기 위해 스캔범위를 다음과 같이 수정한다.To this end, first, the difference between the value of the variable SS and the value of the variable SA (lag = SS-SA) is compared with the size L of half of the initially set scan range, and the final scan range is determined according to the comparison result. ). The value of half L of the scan range is the absolute value of the difference | SS-SA | If it is larger, it is not necessary to correct the scan range, so the initial set scan range is maintained. However, when the magnitude of the absolute value | SS-SA | is smaller than L, the scan range is modified as follows in order to correct the time scale in proportion to the designated speed ratio.

K_min= mSS - lag = mSS - (SS-SA)K _min = mSS-lag = mSS-(SS-SA)

K_max= mSS + 2L - lag = mSS +2L - (SS-SA)K _max = mSS + 2L-lag = mSS + 2L-(SS-SA)

스캔범위를 수정해야 하는 이유는 원래의 스캔범위에서는 지정된 변속율에 상응하는 시간스케일 수정 효과를 얻을 수 없기 때문이다. 스캔범위의 제한이 없다면 최대상관도를 갖는 지점은 최대상관도를 찾기 위해 수행하는 프레임 슬라이딩을 하기 전의 지점 즉, 원래의 신호에서의 오버랩 지점(도 3의 (32) 지점, 이를 본래적인 최대상관도 지점이라 칭하기로 함)이 될 것이다. 그런데, |SS-SA| ≤ L인 경우, 즉 지정된 변속율 α_o≤ SS/(SS-L)인 경우에는, 상기 본래적인 최대상관도 지점(32)이 원래의 스캔범위(34) 안에 위치하므로(도 3의 (b) 참조), 실제의 최대상관도 지점 역시 상기 본래적인 최대상관도 지점(32)과 일치되는 지점에서 찾아질 수밖에 없다. 이 지점(32)에서의 동기지연 K_m은 0이 되고, 실제 적용된 변속율은 1이 되어 원하는 시간스케일 수정 효과를 얻을 수 없게 된다. 따라서 지정된 변속율에 상응하는 시간스케일 수정이 이루어지기 위해서는 상기 본래적인 최대상관도 지점(32)이 스캔범위 외부에 위치되도록 기 설정된 스캔범위(34)를 수정할 필요가 있다. 스캔범위를 수정하는 방법의 한 예로서, 도 3의 (c)에 도시된 것처럼 원래의스캔범위(34) 전체를 우측으로 적어도 L-lag 만큼(단, lag=SS-SA) 이동시킨다. 이렇게 하여 얻어진 새로운 스캔범위(36)는 mSS-lag ~ mSS+2L-lag 가 되고, 상기 본래적인 최대상관도 지점(32)은 항상 새로운 스캔범위(36) 외부에 존재하게 되어 실제 적용된 변속율이 1로 수렴하는 경우는 발생하지 않게 된다.The reason why the scan range needs to be corrected is that in the original scan range, the time scale correction effect corresponding to the designated shift rate cannot be obtained. If there is no limit of the scan range, the point with the maximum correlation is the point before the frame sliding to perform the maximum correlation, that is, the overlap point in the original signal ((32) in FIG. 3, which is the original maximum correlation). Will also be referred to as a point). By the way, | SS-SA | ≤ L, i.e., if the designated transmission rate α _o ≤ SS / (SS-L), the original maximum correlation point 32 is located in the original scan range 34 (Fig. 3 (b)). ), The actual maximum correlation point may also be found at a point coinciding with the original maximum correlation point 32. At this point 32, the synchronization delay K _m becomes 0, and the actually applied transmission rate becomes 1, so that the desired time scale correction effect cannot be obtained. Therefore, in order to perform the time scale correction corresponding to the designated shift rate, it is necessary to correct the preset scan range 34 so that the original maximum correlation point 32 is located outside the scan range. As an example of a method of modifying the scan range, the entire original scan range 34 is moved by at least L-lag (but lag = SS-SA) to the right as shown in FIG. The new scan range 36 thus obtained is mSS-lag to mSS + 2L-lag, and the original maximum correlation point 32 is always outside the new scan range 36 so that the actual applied speed ratio is Converging to 1 does not occur.

이와 같은 선행 조치를 수행한 후부터, 신호처리기(100)는 원래의 오디오신호 x(·)가 저장된 제 1 메모리(120)로부터 N개 샘플의 최초 프레임 F₀을 읽어와서 제 2 메모리(130)에 그대로 복사하는 것으로부터 시간스케일 수정을 위한 신호처리를 시작한다 (S16 단계). 그리고 하나의 프레임을 처리하였으므로 프레임 인덱스 변수 m을 1로 설정한다 (S18 단계).After performing such a preceding action, the signal processor 100 reads the first frame F ₀ of N samples from the first memory 120 in which the original audio signal x (·) is stored, and reads the first frame F ₀ into the second memory 130. Signal processing for time scale correction is started from the copy as it is (step S16). Since one frame is processed, the frame index variable m is set to 1 (step S18).

그 다음, 루프가 한 번씩 반복하면서 한 프레임씩을 처리한다. 원래의 오디오신호 x(·)의 현재 프레임 F_m을 가져와서 이 프레임과 시간스케일 수정된 신호 y(·)와의 가장 높은 상관도에 의거하여 동기지연 K_m을 계산한다. 동기지연 K_m의 계산은 상기 현재 프레임 F_m을 y(m*SS) 주변을 슬라이딩하여 상기 현재 프레임 F_m과 시간스케일 수정된 신호가 가장 높은 상관도를 갖는 지점을 찾는 것에 의해 이루어진다. 최대 상관도를 갖는 지점에서는 양 프레임의 파형유사도가 가장 높다. 그런 다음, 상기 현재 프레임 F_m을 그 지점에 위치시켜 시간스케일 수정이 이루어지도록 제2 메모리(130)에 한다. 이 때, 이전 프레임과 오버랩 되는 부분은 가중치를 적용하여 가산되고, 상기 현재프레임 F_m의 나머지 부분은 단순히 복사된다. 이러한 루프를 반복적으로 실행하면서 한 프레임씩 시간스케일을 수정한다. 이를 보다 구체적으로 설명하면 다음과 같다.Next, the loop iterates once and processes one frame at a time. The current frame F _m of the original audio signal x (·) is taken and the synchronization delay K _m is calculated based on the highest correlation between the frame and the time scale modified signal y (·). Calculation of the synchronization delay K _m is performed by finding the point with the current frame F _m y (m * SS) to slide on the periphery of the current frame F _m and the time scale modified signal with the highest correlation. At the point with the highest correlation, the waveform similarity of both frames is the highest. Thereafter, the current frame F _m is positioned at the point, and the second memory 130 is subjected to time scale correction. At this time, the portion overlapping with the previous frame is added by applying a weight, and the remaining portion of the current frame F _m is simply copied. We run this loop repeatedly, modifying the timescale by one frame. This will be described in more detail as follows.

먼저, 신호처리기(100)는 제1 메모리(120)로부터 두 번째 프레임부터 한 프레임씩 읽어내어 그 프레임의 동기지연을 계산한다. 구체적으로, 두 번째 프레임 F₁을 읽어낼 때의 그 프레임의 시작점은 SA-L번째의 샘플이 되고, 이 지점부터 N+L개의 샘플을 읽어들인다. 이를 일반화시켜 기술하면, 동기지연 K_m을 찾기 위해 신호처리기(100)가 제1 메모리(120)로부터 읽어들이는 현재 프레임 F_m은 mSA-L부터 mSA+N번째의 샘플로 구성된다(도 2 또는 도 3의 (a) 참조). 즉, N개의 샘플에다 적어도 L개의 샘플을 여분으로 더 읽어온다(S19 단계). 이렇게 읽어 온 현재 프레임 F_m에 대하여 동기지연 K_m을 구한다(S20 단계). 현재 프레임 F_m의 동기지연 K_m은 시간스케일수정 신호 y(·)의 마지막 프레임과 최대 상관도를 갖는 지점과 스캔범위의 중심점 mSS 사이의 샘플수를 나타낸다. 따라서 동기지연 K_m을 구하기 위해 우선 현재프레임 F_m과 시간스케일 수정된 신호의 마지막 프레임 F_m-1간의 상관도 계산을 한다. 상기 상관도는 설정된 스캔범위 내에서 현재 프레임 F_m과 시간스케일 수정된 신호의 마지막 프레임 F_m-1을 슬라이딩시키면서 아래의 상관도식을 이용하여 계산한다.First, the signal processor 100 reads one frame from the second frame from the first memory 120 and calculates a synchronization delay of the frame. Specifically, when the second frame F ₁ is read out, the starting point of the frame becomes the SA-L-th sample, and from this point, N + L samples are read. In general terms, the current frame F _m read out from the first memory 120 by the signal processor 100 to find the synchronization delay K _m is composed of mSA-L to mSA + Nth samples (FIG. 2). Or (a) of FIG. 3). That is, at least L samples are additionally read out from the N samples (step S19). The synchronization delay K _m is obtained for the current frame F _m thus read (step S20). The synchronization delay K _m of the current frame F _m represents the number of samples between the point having the maximum correlation with the last frame of the time scale correction signal y (·) and the center point mSS of the scan range. Therefore, to calculate the synchronization delay K _m , _first, the correlation between the current frame F _m and the last frame F _m-1 of the time-scale modified signal is calculated. The correlation is calculated by using the following correlation equation while sliding the current frame F _m and the last frame F _m-1 of the time-scale modified signal within the set scan range.

여기서, Lo는 시간스케일 수정된 신호로 부가할 때, 인접 프레임 x(mSS+j)와 y(mSA+j+K_m) 간의 오버랩 되는 부분의 샘플수를 의미한다.Here, Lo denotes the number of samples of an overlapping portion between the adjacent frame x (mSS + j) and y (mSA + j + K _m ) when added as a time scale corrected signal.

그리고 계산된 상관도 값들 중에서 아래 식을 이용하여 최대상관도 값을 갖는 지점을 찾고, 그 지점과 상기 스캔범위의 중심점 mSS간의 샘플간격을 동기지연 K_m으로 결정한다(도 2 또는 도 3의 (b) 참조).Among the calculated correlation values, the point having the maximum correlation value is found using the following equation, and the sample interval between the point and the center point mSS of the scan range is determined as the synchronization delay K _m (see FIG. 2 or FIG. b)).

최대상관도는 아래의 상관도 계산식을 이용하여 구한다.The maximum correlation is calculated using the correlation calculation below.

그리고 동기지연 K_m이 결정되면 신호처리기(100)는 mSA+K_m번째의 샘플부터 N-K_m개의 샘플(도 2의 20b 또는 도 3의 30b)만을 제 2 메모리(130)의 시간스케일 수정된 신호의 mSS번째 샘플 지점부터 부가하고 나머지 샘플(도 2의 20a 또는 도 3의 30b)은 버린다. 이 때 시간스케일 수정된 신호의 이전 프레임과 원래의 오디오신호의 현재 프레임은 이를 식으로 표현하면 다음과 같다.In addition, when the synchronization delay K _m is determined, the signal processor 100 changes only the NK _m samples (20b of FIG. 2 or 30b of FIG. 3) from the mSA + K _m th sample to the time scale modified signal of the second memory 130. Is added from the mSS-th sample point of p and the remaining samples (20a in FIG. 2 or 30b in FIG. 3) are discarded. At this time, the previous frame of the time-scale modified signal and the current frame of the original audio signal are expressed as follows.

여기서, g(j)는 오버랩 되는 구간의 가중치를 적용한 합산을 하기 위한 가중치 함수(weighting function)를 나타내며, 대표적인 예로서 다음과 같은 선형 램프함수가 될 수 있지만, 지수함수나 그 밖의 다른 적절한 함수를 선택할 수도 있다.Here, g (j) represents a weighting function for summing by applying the weight of overlapping intervals. As a representative example, g (j) may be a linear ramp function as follows, but an exponential function or other appropriate function may be used. You can also choose.

g(j) = 0, j<0;g (j) = 0, j <0;

g(j) = j/Lo, 0<j<Lo;g (j) = j / Lo, 0 <j <Lo;

g(j) = 1,j>Log (j) = 1, j> Lo

하나의 프레임을 처리할 때마다 프레임 인덱스 m의 값을 1씩 증가시킨다(S24 단계). 그리고, 원래의 오디오신호 x(·)의 끝을 만날 때까지 위와 같은 루프를 반복적으로 실행한다(S26 단계). 이에 의해 원래의 오디오신호 x(·)는 기본적으로는 사용자가 설정한 변속율 α에 따라서 시간스케일 수정된 신호 y(·)로 합성된다.Each time one frame is processed, the value of the frame index m is increased by one (step S24). Then, the above loop is repeatedly executed until the end of the original audio signal x (·) is reached (step S26). As a result, the original audio signal x (·) is basically synthesized into a signal y (·) corrected in time scale according to the transmission rate α set by the user.

하지만, 위와 같은 처리만으로는 원래의 오디오신호 x(·)의 피치 정보를 거의 그대로 유지할 수는 있지만 시간스케일 수정된 오디오신호의 재생시간 즉, 재생속도는 지정된 변속율 α_o에 정확하게 비례한다는 보장이 없고, 특히 변속율 α_o이 1에 가까울수록 지정된 변속율 α_o에 거의 정확하게 비례하는 재생시간(이를 이상적 재생시간이라 함)과는 크게 다르게 된다. 이러한 문제를 해결하기 위해, 시간스케일 수정된 신호 y(·)의 실제 재생시간이 어느 시점에서건 상기 이상적 재생시간에비해 아주 근소한 오차만을 갖도록 하는 별도의 처리가 더 필요하다. 이를 보다 구체적으로 설명한다.However, the above processing alone can almost maintain the pitch information of the original audio signal x (·), but there is no guarantee that the playback time of the time-scale corrected audio signal, that is, the playback speed, is exactly proportional to the specified shift rate α _o . In particular, the closer the speed ratio α _o is to 1, the greater the difference from the reproduction time which is almost exactly proportional to the specified transmission rate α _o (this is called the ideal reproduction time). In order to solve this problem, further processing is required so that the actual reproduction time of the time scale corrected signal y (·) has only a slight error compared to the ideal reproduction time at any point in time. This will be described in more detail.

종래의 SOLA 또는 WSOLA 알고리즘에 따르면, 지정된 변속율 α_o가 1에 가까운 값일 때에는 처리되는 프레임이 늘어감에 따라 동기지연 K_m의 누적치도 점점 커지는 문제가 나타난다. 위에서 설명한 바 있는 상기 동기지연 K_m의 값이 크면 시간스케일의 부정확성이 커질 가능성이 높으므로, 상기 동기지연 K_m는 제한된 스캔범위 mSS-L ~ mSS+L 내에서 찾는 것이 바람직하다. 일반적으로 각 프레임의 동기지연 K_m의 부호가 음이나 양이 될 확률이나 그 크기는 불규칙하고 이를 예측하는 것도 불가능하다. 다행히도 각 프레임의 동기지연 K_m이 오디오신호 전체에 있어서 음과 양의 값을 고르게 번갈아 가진다면 어떤 구간에서건 누적된 동기지연의 값이 그리 크게 성장하지 않아 자연스럽게 비디오신호와의 동기가 이루어질 수 있긴 하지만, 항상 이렇게 될 수 있으리라는 보장은 없다. 특히 지정된 변속율 α_o가 1에 가까운 값이면 비디오신호와의 동기 불일치 문제는 더욱 크게 나타난다. 왜냐하면 이 경우에는 각 프레임의 동기지연 K_m은 연속적으로 동일한 부호를 가지는 확률이 매우 높아 동기지연 K_m의 누적값은 처리되는 프레임수가 늘어남에 따라 상쇄되지 않고 증가하는 경향을 보인다. 결국 어느 시점 이후부터는 동기지연의 누적값은 비디오신호와 오디오신호의 동기불일치 정도가 허용할 수 없는 정도로 크게 나타나는 현상을 초래한다.According to the conventional SOLA or WSOLA algorithm, when the designated shift ratio α _o is close to 1, the cumulative value of the synchronization delay K _m also increases as the processed frames increase. If the value of the synchronization delay K _m as described above is large, it is highly likely that the inaccuracy of the time scale is large, and therefore, the synchronization delay K _m is preferably found within the limited scan range mSS-L to mSS + L. In general, the probability that the sign of the synchronization delay K _m of each frame becomes negative or positive, but the magnitude thereof is irregular and cannot be predicted. Fortunately, if the synchronization delay K _m of each frame evenly changes the negative and positive values throughout the audio signal, the accumulated synchronization delay value does not grow so much in any interval, so that it can be synchronized with the video signal naturally. However, there is no guarantee that this will always happen. In particular, if the designated shift ratio α _o is close to 1, the problem of synchronization mismatch with the video signal is more significant. In this case, the synchronization delay K _m of each frame has a very high probability of having the same code in succession. Therefore, the cumulative value of the synchronization delay K _m tends to increase without being offset as the number of frames to be processed increases. As a result, after a certain point of time, the cumulative value of the synchronization delay causes a phenomenon that the degree of synchronization mismatch between the video signal and the audio signal is unacceptably large.

이와 같은 단방향성 동기지연오차가 나타나는 이유를 설명하면 다음과 같다. SOLA 알고리즘 또는 WSOLA 알고리즘의 기본 개념에 의하면, 변속율이 클수록 원래의 오디오신호의 인접 프레임간에 오버랩 되는 구간은 더 길고 (도 2의 (a)와 도 3의 (a)를 비교하기 바람), 상기 동기지연 K_m은 그 값이 일정하지 않다.The reason why such a unidirectional synchronization delay error appears is as follows. According to the basic concept of the SOLA algorithm or the WSOLA algorithm, the larger the speed ratio, the longer the overlapping interval between adjacent frames of the original audio signal (compare FIG. 2A and FIG. 3A). The synchronization delay K _m is not constant.

이러한 사실에 기초하여, 먼저 지정된 변속율 α_o가 1보다 훨씬 큰 경우(도 2를 참조), 즉 α_o> SS/(SS-L)인 경우를 고려하자. 앞서 언급한 것처럼 최대상관도 지점을 찾기 위해 제1 메모리(120)로부터 읽어오는 현재 프레임 F_m은 mSA-L부터 mSA+N번째까지의 N+L개의 샘플로 구성된다(도 2의 (a) 참조). 이렇게 읽어 온 현재프레임을 시간스케일 수정된 신호의 마지막 프레임과 mSS-L ~ mSS+L의 스캔범위(24) 내에서 슬라이딩하면서 최대 상관도를 갖는 지점, 즉 동기지연 K_m을 찾는다 (도 2의 (b) 참조). 그리고 상기 동기지연 K_m이 결정되면 mSA+K_m번째의 샘플부터 N-K_m개의 샘플(20b)만을 시간스케일 수정된 신호의 mSS 지점부터 부가하고 나머지 샘플(20a)은 버린다. 이 때, 스캔범위의 제한이 없다면 이론적으로는 최대 상관도 지점은 원래의 신호에서의 오버랩 지점(22)이지만[[도 2의 (b)에서 22 지점이 SA의 우측종점과 일치하는 것이 맞는지 검토 요망), 스캔범위(24)를 mSS-L ~ mSS+L 로 제한함으로 인해 상기 지점(22)은 상기 스캔범위(24) 외부에 위치하기 때문에, 실제의 최대상관도 지점은 이 스캔범위 내의 어떤 한 점이 될 수밖에 없다. 그리고상기 실제의 최대상관도 지점을 나타내는 동기지연 K_m의 부호가 음 또는 양이 될 확률은 랜덤 하다. 따라서 이 경우에는 동기지연오차는 단방향으로 계속 증가하지 않을 수도 있다.Based on this fact, first consider the case where the designated shift ratio α _o is much larger than 1 (see FIG. 2), that is, α _o > SS / (SS-L). As mentioned above, the current frame F _m read from the first memory 120 to find the maximum correlation point consists of N + L samples from mSA-L to mSA + N th (FIG. 2 (a)). Reference). The current frame thus read is slid within the scan frame 24 of the mSS-L to mSS + L and the last frame of the time-scale-corrected signal to find the point having the maximum correlation, that is, the synchronization delay K _m (Fig. 2). (b)). When the synchronization delay K _m is determined, only NK _m samples 20b from the mSA + K _m th samples are added from the mSS point of the time scale modified signal, and the remaining samples 20a are discarded. At this time, if there is no limit of the scan range, theoretically, the maximum correlation point is the overlap point 22 in the original signal [[Check whether 22 points in FIG. 2 (b) coincide with the right end point of SA. Desired), since the point 22 is located outside the scan range 24 due to limiting the scan range 24 to mSS-L to mSS + L, the actual maximum correlation point is not There must be one point. The probability that the sign of the synchronization delay K _m representing the actual maximum correlation point is negative or positive is random. Therefore, in this case, the synchronization delay error may not increase continuously in one direction.

이에 비해, 지정된 변속율 α_o가 1보다 약간 큰 경우, 즉 α_o≤ SS/(SS-L) 인 경우(도 3을 참조)에는 사정이 다르다. 현재 프레임의 동기지연 K_m을 찾기 위해 N+L개의 샘플(30a, 30b)을 원래의 신호로부터 취하여(도 3의 (a) 참조), 스캔범위(34) 안에서 동기지연 K_m을 구하고(도 3의 (b) 참조), 현재 프레임의 동기지연 K_m이 결정되면 이 값을 반영하여 mSA+K_m번째 샘플부터 N-K_m개의 샘플(30b)만을 시간스케일 수정된 신호의 mSS번째 지점부터 부가하고 나머지 샘플(30a)은 버리는 것(도 3의 (d) 참조)은 위의 경우와 동일하다. 하지만, α_o≤ SS/(SS-L) 인 경우에는 각 프레임의 동기지연 K_m은 음의 부호를 가질 확률보다 양의 부호를 가질 확률이 훨씬 더 높다. 그 이유는 mSS-L ~ mSS+L로 정해진 최초의 스캔범위(34)를 새로운 스캔범위(36) mSS-lag ~ mSS+2L-lag로 수정하였기 때문이다 (S12, S14 단계).On the other hand, the situation is different when the designated shift ratio α _o is slightly larger than 1, that is, when α _o ≤ SS / (SS-L) (see FIG. 3). In order to find the synchronization delay K _m of the current frame, N + L samples 30a and 30b are taken from the original signal (see FIG. 3 (a)), and the synchronization delay K _m is obtained within the scan range 34 (Figure 3). 3 (b)), when the synchronization delay K _m of the current frame is determined, reflecting this value, add only mSA + K _m th samples to NK _m samples 30b from the mSS th point of the time-scale modified signal. Discarding the remaining sample 30a (see FIG. 3 (d)) is the same as the above case. However, in the case of α _o ≤ SS / (SS-L), the synchronization delay K _m of each frame has a much higher probability of having a positive sign than the probability of having a negative sign. This is because the first scan range 34 defined in mSS-L to mSS + L is modified to the new scan range 36 mSS-lag to mSS + 2L-lag (steps S12 and S14).

이상에서는 지정된 변속율이 1보다 큰 경우, 즉 재생시간을 늘리는 경우를 예로 하여 설명하였지만, 지정된 변속율이 1보다 작은 경우에도 그 값이 1보다 약간 작은 경우에는 스캔범위의 수정이 필요한 것은 동일하다. 결국 지정된 변속율의 절대값이 SS/(SS-L)보다 작은 경우에는 스캔범위의 수정은 필요하다고 볼 수 있다.In the above description, the case where the specified speed ratio is larger than 1, that is, the playback time is extended, is described as an example. However, even when the specified speed ratio is smaller than 1, when the value is slightly smaller than 1, the scan range needs to be corrected. . As a result, when the absolute value of the designated transmission rate is smaller than SS / (SS-L), it may be considered that correction of the scan range is necessary.

그런데, 이렇게 수정된 스캔범위(36)를 적용하면, 이제는 지정된 변속율에상응하는 변속효과를 얻을 수는 있으나, 상기 최대 상관도 지점이 원래의 스캔범위(34)의 중심점 mSS의 좌측보다 우측에 존재할 확률이 훨씬 더 높아지게 된다. 즉, 동기지연 K_m은 양의 부호를 가질 확률이 더 높게 되어, 결국 각 프레임주기의 동기지연 K_m의 누적치는 점점 더 커지게 된다. 그 결과 시간스케일 수정이 진행됨에 따라 시간스케일 수정된 신호의 재생시간(즉, 재생속도)은 이상적인 재생시간에 비해 점점 더 큰 편차를 갖게 되고, 이 때문에 앞서 언급한 비디오신호와 오디오신호의 동기불일치 문제가 생기게 되는 것이다.However, if the modified scan range 36 is applied, it is now possible to obtain a shift effect corresponding to the designated shift ratio, but the maximum correlation point is located on the right side of the center point mSS of the original scan range 34. The probability of existence is much higher. That is, the synchronization delay K _m has a higher probability of having a positive sign, so that the cumulative value of the synchronization delay K _m of each frame period becomes larger. As a result, as the time scale correction proceeds, the playback time (i.e., the playback speed) of the time scale corrected signal becomes more and more different than the ideal playback time. Problems will arise.

이러한 문제를 해결하기 위해 주기적으로 동기지연 K_m의 누적치를 체크하여 그 값이 미리 설정해둔 허용오차범위를 벗어나면 변속율을 수정하여 동기지연 K_m의 누적치가 상기 허용오차범위 이내로 수렴되도록 강제한다 (도 5의 (b) 참조). 이를 구체적으로 설명한다.To solve this problem, periodically check the cumulative value of the synchronization delay K _m and if the value is out of the preset tolerance range, correct the shift ratio to force the accumulation value of the synchronization delay K _m to converge within the tolerance range. (See FIG. 5 (b)). This will be described in detail.

상기 S20 단계에서 동기지연 K_m이 산출될 때마다, 그 값을 누적적으로 산출하여 동기지연 K_m의 누적치 Err을 구한다 (S30 단계). 이를 수식을 표현하면 다음과 같다. 매 프레임 주기마다 산출되는 상기 동기지연 K_m의 누적치 Err은 그 주기까지의 실제의 재생시간과 상기 이상적인 재생시간 간의 오차를 의미한다. 따라서 이 누적치 Err이 크면 비디오신호와 오디오신호간의 동기불일치 정도가 크게 나타난다는 것을 의미한다.Whenever the synchronization delay K _m is calculated in step S20, the value is cumulatively calculated to obtain a cumulative value Err of the synchronization delay K _m (step S30). The formula is expressed as follows. The cumulative value Err of the synchronization delay K _m calculated every frame period means an error between the actual reproduction time up to the period and the ideal reproduction time. Therefore, if the cumulative value Err is large, it means that the degree of synchronization mismatch between the video signal and the audio signal is large.

Err = ΣK_m Err = ΣK _m

이어서, 보정모드의 설정상태를 체크한다 (S32 단계). 보정모드는 초기값이 오프로 설정되어 있으므로 첫 번째 루프 실행 시에는 항상 상기 누적치 Err가 허용오차범위를 초과했는지를 체크하는 단계(S34 단계)로 진행하게 될 것이다(도 6 참조). 상기 허용오차범위(E_min~ E_max)는 멀티미디어 신호의 변속재생 시를 고려하여 비디오신호와 오디오신호간의 허용가능한 동기불일치 정도에 근거하여 결정하는 것이 바람직할 것이다.Next, the setting state of the correction mode is checked (step S32). In the calibration mode, since the initial value is set to off, the first loop will always check whether the cumulative value Err exceeds the tolerance range (step S34) (see FIG. 6). The tolerance range E _min to E _max may be determined based on an allowable degree of synchronization mismatch between the video signal and the audio signal in consideration of shift reproduction of the multimedia signal.

단계 S34에서 체크 결과, 상기 누적치 Err이 허용오차범위 이내이면 상기 이상적인 재생시간(즉, 이상적인 재생속도)에 비해 편차가 무시할 수 있는 정도로 작기 때문에 굳이 특별한 처리를 할 필요는 없고 곧바로 다음 프레임에 대한 시간스케일 수정을 진행하면 되므로, 제1 메모리(120)에 원래의 오디오신호 x(·)의 데이터가 남아있는지를 체크하는 단계(S26 단계)를 처리한다. 다만, 이런 정도의 오차도 허용하지 말아야 하는 경우에는 상기 누적치 Err를 감소시키기 위한 변속율 수정 처리단계(S38~S44 단계)를 실행하면 될 것이다.As a result of checking in step S34, if the cumulative value Err is within the tolerance range, the deviation is small enough to be negligible compared to the ideal reproduction time (i.e., the ideal reproduction speed), so there is no need for special processing and the time for the next frame immediately. Since the scale correction may be performed, the process of checking whether data of the original audio signal x (·) remains in the first memory 120 (step S26). However, if the error should not be tolerated as such, the shift rate correction processing steps (steps S38 to S44) to reduce the cumulative value Err may be performed.

그러나, 단계 S34에서 체크 결과, 상기 누적치 Err가 상기 허용오차범위를 벗어나는 경우에는 다음 프레임 주기부터는 상기 누적치 Err가 감소되도록 하는 조치를 취할 필요가 있다. 상기 누적치 Err을 감소시킬 수 있는 방법은 다음 프레임 주기부터 적용되는 변속율을 상기 누적치 Err가 감소하는 데 기여할 수 있는 값으로 수정하는 것이다. 이러한 수정을 위해, 우선 상기 보정모드를 온으로 설정하면서 변속율의 보정을 행한다(S36 단계).However, as a result of checking in step S34, if the cumulative value Err is out of the tolerance range, it is necessary to take measures to reduce the cumulative value Err from the next frame period. The method of reducing the cumulative value Err is to modify the transmission rate applied from the next frame period to a value that can contribute to the reduction of the cumulative value Err. For this correction, first, the speed ratio is corrected while the correction mode is set to ON (step S36).

이어서, 상기 누적치 Err의 부호를 체크한다 (S38 단계). 상기 누적치 Err의 부호가 음이면 시간스케일 수정된 신호 y(·)의 실제의 재생시간(또는 재생속도)은 최초 지정된 변속율 α_o에 정확하게 비례하는 이상적인 재생시간(또는 재생속도) 보다 더 길어진(또는 더 느려진) 경우이므로, 변속엔진(110)은 다음 프레임 주기부터 적용될 변속율을 현재보다는 더 작은 값으로 보정한다. 그리하여 다음 프레임에 대해서는 보정된 변속율이 적용되어 상기 누적치 Err이 감소된다 (S40 단계). 반대로, 상기 누적치 Err의 부호가 양이면 실제의 재생시간( 또는 재생속도)은 상기 이상적인 재생시간( 또는 재생속도)보다 더 짧아진(또는 더 빨라진) 경우이므로, 변속율을 현재보다 더 큰 값으로 수정하여 다음 프레임 주기부터 마찬가지로 상기 누적치 Err의 절대값이 감소되도록 한다 (S42 단계).Next, the sign of the accumulated value Err is checked (step S38). If the sign of the cumulative value Err is negative, the actual playback time (or playback speed) of the time-scale corrected signal y (·) is longer than the ideal playback time (or playback speed) that is exactly proportional to the initially specified transmission rate α _o ( Or slower), the shift engine 110 corrects the shift rate to be applied from the next frame period to a value smaller than the present. Thus, the corrected speed change rate is applied to the next frame so that the cumulative value Err is reduced (step S40). Conversely, if the sign of the cumulative value Err is positive, the actual playback time (or playback speed) is shorter (or faster) than the ideal playback time (or playback speed), so that the transmission rate is made larger than the present value. The absolute value of the accumulated value Err is similarly reduced from the next frame period (step S42).

상기 변속율의 수정 정도를 지나치게 크게 하면 상기 누적치 Err은 빨리 감소시킬 수는 있는 반면 노이즈가 유입되거나 상기 누적치 Err가 부호가 반전되어 급속히 증가할 수도 있다. 반대로, 상기 변속율의 수정 정도를 지나치게 작게 하면 상기 누적치 Err의 감소가 신속히 이루어지지 않는 문제가 있다. 이 점을 고려한 적정한 정도로 상기 변속율을 수정할 필요가 있다. 도 5의 (b)는 상기 수정 정도를 예컨대 기존의 변속율의 10% 정도로 설정한 경우를 보여준다.The cumulative value Err can be reduced quickly if the correction rate of the speed change ratio is made too large, while noise may be introduced or the cumulative value Err may be reversed to increase rapidly. On the contrary, if the correction degree of the shift ratio is made too small, the cumulative value Err may not be reduced quickly. In consideration of this point, it is necessary to correct the shift ratio. FIG. 5B shows a case where the correction degree is set to, for example, about 10% of the existing shift rate.

이와 같은 방법으로 변속율의 수정을 거친 다음에는, 수정된 변속율에 따라상기 변수 SA의 값을 새롭게 계산하는 단계(S44 단계)를 거친다. 이러한 과정을 거친 다음에는 실행순서가 단계 S26으로 진행하여 처리할 원래의 오디오신호의 다음 프레임이 있는지를 체크한다.After the shift ratio is modified in this manner, a step of newly calculating the value of the variable SA according to the modified shift ratio is performed (step S44). After this process, the execution sequence goes to step S26 to check whether there is a next frame of the original audio signal to be processed.

한편, 반복적으로 루프를 실행하는 과정에서 상기 보정모드 체크단계(S32 단계)의 체크결과 보정모드가 온으로 설정되어 있는 것으로 확인되면, 상기 누적치 Err가 상기 허용오차범위 안에서 정의되는 리셋범위(R_min~ R_max) 안에 속하는지를 체크한다(S46 단계). 상기 리셋범위는 잦은 변속율 수정에 따른 시스템 특히 신호처리기(100)의 부하증대를 막기 위한 취지에 의거하여 적절한 값으로 정하면 된다. 예컨대 상기 허용오차범위의 폭의 25% 정도로 정할 수 있다.On the other hand, if it is determined that the correction mode is set to ON as a result of the check of the correction mode check step (step S32) in the course of repeatedly executing the loop, the reset range R _{min is} defined in the tolerance range. ~ R _max ) is checked (step S46). The reset range may be set to an appropriate value based on the purpose of preventing load increase of the system, in particular, the signal processor 100 due to frequent shift rate correction. For example, it may be set to about 25% of the width of the tolerance range.

상기 누적치 Err가 커져서 상기 허용오차범위를 초과하는 경우를 가정해보자. 이 경우 상기 단계 S40 혹은 S42의 실행에 의해 기존의 변속율은 상기 누적치 Err를 감소시키는 데 기여하는 새로운 변속율로 수정된다. 수정된 변속율이 다음 프레임부터 적용되면, 상기 누적치 Err은 감소되어 드디어는 그 크기가 상기 리셋범위 안에 들 정도로 작아질 수 있다. 이 경우에도 계속 수정된 변속율을 적용하면 상기 누적치 Err는 부호가 바뀌면서 다시 증가될 가능성이 높다. 나아가, 상기 누적치 Err이 상기 허용오차범위 안으로 들어오게 되었다고 곧바로 상기 수정된 변속율을 최초 설정된 변속율로 다시 회복시키는 것은, 다음 프레임 주기의 동기지연 K_m의 부호를 예측할 수 없다는 점을 고려할 때, 부적절할 수도 있다. 잦은 변속율의 수정은 신호처리기(100)로 하여금 관련 변수값을 재조정하는 등 여러 가지 프로세싱을 하도록 강요하여 과부하에 따른 재생 시스템(비도시)의 정상적인 동작에 부정적인 영향을 미칠 수 있다. 따라서 필요한 경우에만 변속율 수정이 행해지도록 하는 것이 바람직하다.Assume that the cumulative value Err is greater than the tolerance range. In this case, by executing the step S40 or S42, the existing speed ratio is corrected to a new speed ratio that contributes to reducing the cumulative value Err. When the modified shift ratio is applied from the next frame, the cumulative value Err is reduced so that the size can finally be small enough to fall within the reset range. Even in this case, if the continuously changed speed ratio is applied, the cumulative value Err is likely to increase again as the sign is changed. Further, immediately after the cumulative value Err is brought into the tolerance range, restoring the modified shift rate back to the originally set shift rate, considering that the sign of the synchronization delay K _m of the next frame period cannot be predicted, It may be inappropriate. Frequent modifications of the transmission rate may force the signal processor 100 to perform various processing such as readjusting related variable values, which may negatively affect the normal operation of a regeneration system (not shown) due to overload. Therefore, it is desirable to make the shift ratio correction only when necessary.

리셋범위의 도입 결과, 적용되는 변속율의 값은 다음과 같은 방식으로 변화한다. 최초에는 사용자가 설정한 변속율이 적용되지만, 상기 누적치 Err가 커져서 상기 허용오차범위를 벗어나게 되면 보정된 변속율이 적용된다. 그리고 보정된 변속율의 적용결과 누적치 Err가 다시 감소하여 허용오차범위 이내로 들어오게 되고 그 값이 상기 리셋범위 안으로 감소하기 전에는 여전히 상기 보정된 변속율이 계속 적용된다. 그러다가 상기 누적치 Err의 값이 상기 리셋범위 안에까지 들어오게 되면, 그 때부터는 상기 보정된 변속율 대신 최초 설정된 원래의 변속율로 회복되어 보정이 중단된다. 아울러, 다음 번 루프에서, 상기 누적치 Err이 상기 허용오차범위를 초과하였는지를 체크하는 단계(S60 단계)가 실행될 수 있도록 하기 위해, 상기 보정모드를 오프로 설정해둔다 (S48 단계).As a result of the introduction of the reset range, the value of the applied transmission rate changes in the following manner. Initially, the speed ratio set by the user is applied. However, when the cumulative value Err becomes large and goes out of the tolerance range, the corrected speed ratio is applied. As a result of the application of the corrected shift rate, the cumulative value Err decreases again to fall within the tolerance range and the corrected shift rate is still applied until the value decreases into the reset range. Then, when the value of the cumulative value Err falls within the reset range, from then on, the original speed ratio is restored instead of the corrected speed ratio, and the correction is stopped. In addition, in the next loop, the correction mode is set to off so that the step of checking whether the accumulated value Err exceeds the tolerance range (step S60) can be executed (step S48).

한편, 상기 보정모드가 온이고 상기 누적치 Err가 상기 리셋범위 이내에 속하지 않는 경우에는, 상기 누적치 Err는 적어도 그 크기가 상기 리셋범위를 벗어난 값을 가지고 있는 상태이며, 또한, 이전 루프에서 누적치 Err이 허용오차범위를 초과하여 변속율 수정이 이루어진 적이 있음을 의미한다. 이런 경우에는 상기 누적치 Err의 변화방향을 체크한다(S56 단계). 체크 결과 누적치 Err의 값이 증가하는 경우에는 다음 프레임의 처리 결과 누적치 Err의 값이 드디어 허용오차범위를 벗어날 가능성도 있으므로 과연 그렇게 되는지를 검사하기 위해 보정모드를 오프로 설정한다. 반대로 감소한 경우에는 다음 번 루프에서 누적치 Err가 또 한번 감소될 가능성도 있으므로, 굳이 강제적으로 누적치를 수정할 필요까지는 없다. 따라서 이 경우에는 특별한 처리 없이 단계 S26으로 진행함으로써 상기 보정모드가 온 상태를 계속 유지하여 다음 루프에서도 변속율 보정이 수행되지 않도록 한다 (S50, S52 단계).On the other hand, when the correction mode is on and the cumulative value Err does not fall within the reset range, the cumulative value Err is in a state where at least the magnitude thereof is out of the reset range, and the cumulative value Err is allowed in the previous loop. It means that the speed ratio correction has been made beyond the error range. In this case, the change direction of the cumulative value Err is checked (step S56). If the value of the accumulated value Err increases as a result of the check, the correction mode is set to off to check whether the value of the accumulated value Err of the next frame may finally be out of the tolerance range. On the contrary, the cumulative value Err may be decreased again in the next loop, so it is not necessary to forcibly correct the cumulative value. In this case, therefore, the process proceeds to step S26 without any special processing so that the correction mode is kept on so that the shift rate correction is not performed even in the next loop (steps S50 and S52).

이와 같이 결국 매 프레임마다 발생되는 재생시간의 오차를 변속율의 보정을 통해 즉시 감소시키는 변속율 수정루틴(S30~S52 단계)을 도입함으로써 매 주기마다 동기지연 K_m이 랜덤한 값을 갖더라도 시간스케일 수정된 오디오신호의 실제 재생시간(또는 재생속도)은 어느 시점에서나 이상적인 재생시간(또는 재생속도)으로부터 상기 허용오차범위의 절반을 벗어나지 않는 값을 가지도록 강제된다. 이와 같이 동기지연 K_m의 누적치 Err가 허용오차범위 이내로 수렴되면 멀티미디어 신호의 변속재생 시 비디오신호와 오디오신호가 항상 동기 되어 재생될 수 있다.In this way, by introducing a shift rate correction routine (steps S30 to S52) that immediately reduces the error of the playback time generated every frame through the shift rate correction, even if the synchronization delay K _m has a random value every cycle, The actual reproduction time (or reproduction speed) of the scale-modified audio signal is forced to have a value at any point not exceeding half of the tolerance range from the ideal reproduction time (or reproduction speed). As such, when the cumulative value Err of the synchronization delay K _m converges within the tolerance range, the video signal and the audio signal can be synchronized and reproduced at the time of shift reproduction of the multimedia signal.

이와 같은 방법으로 시간스케일 수정된 오디오신호는 오디오신호 재생수단(비도시)에 제공되어 재생된다. 실시간 재생을 위해, 다수개의 프레임을 하나의 패킷으로 구성하여 재생처리를 하는 것이 바람직하다. 시간스케일 수정된 오디오신호를 패킷단위로 재생하기 위해, 상기 오디오신호 재생수단내에 다수개의 출력버퍼를 마련하고, 각 패킷을 상기 다수개의 출력버퍼에 순환적으로 기록한다. 예컨대, 출력버퍼가 Buf_o, Buf₁, Buf₂, Buf₃이렇게 4개라고 가정하면, 연속되는 패킷은 'Buf_o→ Buf₁→ Buf₂→ Buf₃→ Buf_o....'의 순서로 기록한다. 그리고 각 패킷은 출력버퍼에 기록된 순서대로 아날로그신호로의 변환, 적절한 증폭 등의 후처리를 거쳐 스피커(비도시)를 통해 재생한다.The audio signal corrected in time scale in this manner is provided to the audio signal reproducing means (not shown) and reproduced. For real time reproduction, it is desirable to construct a plurality of frames into one packet for reproduction processing. In order to reproduce time-scale-modified audio signals in packet units, a plurality of output buffers are provided in the audio signal reproducing means, and each packet is cyclically recorded in the plurality of output buffers. For example, suppose there are four output buffers: Buf _o , Buf ₁ , Buf ₂ , and Buf _3. Consecutive packets are in the order of 'Buf _o → Buf ₁ → Buf ₂ → Buf ₃ → Buf _o ....' Record it. Each packet is reproduced through a speaker (not shown) through post-processing such as conversion to an analog signal and proper amplification in the order recorded in the output buffer.

이상에서 언급한 사항들을 실제 테스트에서도 확인할 수 있었다. 도 7의 파형도는 재생길이가 20초인 원래의 오디오신호의 파형을 나타낸다.The above mentioned things were confirmed in the actual test. 7 shows a waveform of an original audio signal having a reproduction length of 20 seconds.

먼저 지정된 변속율 α_o가 1.2인 경우를 고려하자. α_o< SS/(SS-L) 이어서 스캔범위의 수정이 필요한 경우이며, 실제 적용된 스캔범위는 mSS-(SS-SA) ~ mSS+2L-(SS-SA)이다. 도 8의 (a)는 지정된 변속율 α_o가 1.2일 때, 앞서 설명한 스캔범위 수정 개념만을 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정했을 때 얻어진 신호의 파형도이고, 도 8의 (b)는 지정된 변속율 α_o가 1.2일 때, 스캔범위 수정 및 변속율 보정 개념을 함께 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정했을 때 얻어진 신호의 파형도이다.Consider first the case where the specified speed shift α _o is 1.2. α _o <SS / (SS-L) Next, the scan range needs to be modified, and the actual applied scan range is mSS- (SS-SA) to mSS + 2L- (SS-SA). FIG. 8A is a waveform diagram of a signal obtained when the audio signal of FIG. 7 is time-scale corrected using the WSOLA algorithm applying only the scan range correction concept described above when the designated shift ratio α _o is 1.2. FIG. 8 (B) is a waveform diagram of a signal obtained when the audio signal of FIG. 7 is time-scale-corrected using the WSOLA algorithm which applies the concept of scan range correction and shift rate correction when the specified shift ratio α _o is 1.2.

지정된 변속율 α_o가 1.2이므로 이에 정확하게 비례하는 시간스케일 수정된 신호의 이상적인 재생시간은 24초(=20초 x 1.2)이다. 도 8의 (a)의 파형은 실제의 재생시간이 25.2초로 측정되었다. 원하는 재생시간과는 1.2초 정도의 오차를 갖는다. 20초라는 아주 짧은 시간동안에도 이 정도의 오차를 갖고, 이 오차는 앞서 설명한 바와 같이 계속 증가하는 특성을 가지므로 런링타임이 긴 멀티미디어 신호의 경우 시간이 경과함에 따라 오디오신호와 비디오신호간의 동기 불일치 정도는 매우 크게 나타나 시청이 불가능할 정도가 될 것은 쉽게 예상할 수 있다. 이와 같은 오차는 변속율 보정을 적용하지 않았기 때문에 나타난다.Since the specified shift rate α _o is 1.2, the ideal reproduction time of the time-scale-corrected signal is exactly 24 seconds (= 20 seconds x 1.2). In the waveform of Fig. 8A, the actual reproduction time was measured to be 25.2 seconds. There is an error of about 1.2 seconds from the desired reproduction time. Even in the shortest time of 20 seconds, this error is continuously increased as described above. In the case of multimedia signals with a long running time, the synchronization mismatch between the audio signal and the video signal is increased over time. The degree is so great that it can be easily predicted that viewing will be impossible. This error is due to the fact that shift ratio correction is not applied.

이에 비해, 도 8의 (b)의 파형은 실제의 재생시간이 24초에 거의 근접한 값으로 측정되었다. 즉, 변속율의 보정을 더 적용하면 원래의 오디오신호의 사이즈에 상관없이 항상 원하는 이상적인 재생시간으로 시간스케일이 수정되어 비디오신호와의 동기는 거의 정확하게 얻어질 수 있음을 알 수 있다.In contrast, the waveform of FIG. 8 (b) was measured at a value close to the actual reproduction time of 24 seconds. In other words, if the shift rate correction is further applied, the time scale is always modified to the desired ideal reproduction time regardless of the size of the original audio signal, so that the synchronization with the video signal can be obtained almost accurately.

다음으로, 지정된 변속율 α_o가 2.0인 경우를 고려하자. α_o> SS/(SS-L) 이어서 스캔범위의 수정이 필요 없는 경우이고, 실제 적용되는 스캔범위는 mSS-L ~ mSS+L이다. 도 9의 (a)는 지정된 변속율 α_o가 2.0인 경우 스캔범위 수정 개념만을 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정하여 얻어진 신호의 파형도이고, 도 9의 (b)는 지정된 변속율 α_o가 2.0인 경우 스캔범위 수정 및 변속율 보정 개념을 함께 적용한 WSOLA 알고리즘을 이용하여 도 7의 오디오신호를 시간스케일 수정하여 얻어진 신호의 파형도이다.Next, consider the case where the designated shift ratio α _o is 2.0. α _o > SS / (SS-L) Then, no correction of the scan range is necessary, and the actual scan range is mSS-L to mSS + L. FIG. 9A is a waveform diagram of a signal obtained by time-scale correcting an audio signal of FIG. 7 using a WSOLA algorithm applying only a scan range correction concept when a designated shift ratio α _o is 2.0, and FIG. 9B. Is a waveform diagram of a signal obtained by time-scale correcting an audio signal of FIG. 7 using a WSOLA algorithm applying a concept of scan range correction and shift rate correction when a designated shift ratio α _o is 2.0.

지정된 변속율 α_o가 2.0일 때, 시간스케일 수정된 신호의 이상적인 재생시간은 40초이다. 도 9의 (a)의 신호는 그 재생시간이 대략 36.8초 정도로 측정되어 이상적인 재생시간에 비교할 때 대략 3.2초 정도의 큰 오차를 갖는다. 이에 비해, 도 9의 (b) 신호는 대략 39.8초 정도로 측정되어 이상적인 재생시간과는 0.2초밖에 차이가 나지 않음을 알 수 있다.When the specified shift ratio α _o is 2.0, the ideal reproduction time of the time scale corrected signal is 40 seconds. The signal of FIG. 9A has a large error of about 3.2 seconds when its reproduction time is measured about 36.8 seconds and compared to the ideal reproduction time. In contrast, the signal (b) of FIG. 9 is measured about 39.8 seconds, and it can be seen that only 0.2 seconds differs from the ideal reproduction time.

위에서는 지정된 변속율이 1보다 큰 경우, 즉 재생시간을 늘리는 경우만을 보여줬지만, 재생시간을 줄이는 경우에도 동일한 효과가 얻어질 수 있음은 당연하다.Although only the case where the specified shift ratio is larger than 1, i.e., the playing time is increased, the same effect can be obtained even when the playing time is reduced.

이상에서 알 수 있는 바와 같이, 본 발명에 따르면 기본적으로는 사용자가 지정한 변속율에 따라 오디오신호의 재생시간(즉, 재생속도)을 늘리거나 줄일 수 있는 것은 물론, 시간스케일 수정된 신호의 재생시간이 지정된 변속율에 거의 정확하게 비례하여 변경될 수 있다. 따라서 본 발명을 멀티미디어 신호의 변속 재생에 적용하면, 비디오신호와 오디오신호가 거의 완벽하게 동기 되어 재생될 수 있다.As can be seen from the above, according to the present invention, it is basically possible to increase or decrease the playback time (that is, the playback speed) of the audio signal according to the transmission rate specified by the user, as well as the playback time of the time-scale modified signal. This can be changed almost in proportion to the specified speed ratio. Therefore, when the present invention is applied to the variable speed reproduction of the multimedia signal, the video signal and the audio signal can be reproduced in perfect synchronization.

이상에서는 본 발명의 실시예에 따라 본 발명이 설명되었지만, 본 발명의 사상을 일탈하지 않는 범위 내에서 다양한 변형이 가능함은 본 발명이 속하는 기술 분야의 당업자라면 명확히 인지할 수 있을 것이다.Although the present invention has been described above according to an embodiment of the present invention, it will be apparent to those skilled in the art that various modifications may be made without departing from the spirit of the present invention.

Claims

The audio signal is divided into a plurality of consecutive frames, and adjacent frames are separated to overlap by the first number of samples, and the interval of the adjacent frames is compressed or extended so that a second number of samples is overlapped between the divided adjacent frames. And modifying a time scale, wherein the second sample number is determined by further reflecting a synchronization delay obtained at a point of maximum correlation between adjacent frames by a value obtained by multiplying the first sample rate by a user-specified initial transmission rate. In the time scale correction method of an audio signal,

In particular, each time the time scale is modified by one frame, the synchronization delay is calculated, and the cumulative value of the synchronization delay is calculated by adding up the previously accumulated synchronization delay, and when the cumulative value exceeds a predetermined tolerance range, the cumulative value can be reduced. And modifying the time scale of the audio signal by applying a new shift rate.

The method of claim 1, wherein the tolerance range is set within a range in which a degree of synchronization mismatch between a playback picture and a playback sound is unnaturally felt when playing a multimedia signal in which an audio signal and a video signal are integrated. How to correct time scale of audio signal.

The method of claim 1, wherein when the cumulative value is decreased by a time scale correction using the new speed change rate and falls within a reset range defined within the tolerance range, the new speed change rate is restored to the initial speed change rate and applied. Time scale correction method of the audio signal, characterized in that.

The method of claim 1, wherein the maximum correlation point is determined at a point where waveform similarity between two adjacent frames is highest within a predetermined scan range.

The method of claim 4, wherein the scan range is set to a sufficient range to find the maximum waveform similarity between two adjacent frames in consideration of the upper limit value of the initial transmission rate, etc., wherein the start point (mSA) of the current frame of the audio signal is And determining the scan range to be located outside the scan range.

The method of claim 4, wherein the scan range is composed of a predetermined number of left and right samples based on a point shifted by a sample interval SA between start points of consecutive frames of the audio signal from a start point of a current frame. How to correct time scale of audio signal.

Using the overlap and add principle of successive frames to modify the time scale of the audio signal at a user-specified transmission rate, the correlation is maximized to maintain the original pitch information of the audio signal. In a method of modifying the time scale of the audio signal using a predetermined algorithm of a method of concatenating two adjacent frames at the in point,

Whenever the time scale is modified by one frame by applying the predetermined algorithm, the best cross-correlation between the current frame of the audio signal and the last frame of the time scale-corrected signal is determined within a predetermined scan range. Finding a synchronization lag that is a sample interval between a representative point and a center point of the scan range;

Calculating a cumulative value of the new synchronization delay by adding the found synchronization delay to the previous accumulated value of the synchronization delays; And

Comparing the calculated cumulative delay value with a predetermined tolerance range, if the cumulative value is out of the tolerance range, the applied transmission rate can be reduced from the next frame and converged within the tolerance range from the subsequent frame. A method of correcting a time scale of an audio signal, characterized by correcting a time scale by correcting with a new speed ratio.

8. The method of claim 7, wherein the predetermined algorithm is SOLA, WSOLA, or equivalent algorithm defining the maximum correlation as the point where the waveform similarity is highest while sliding two adjacent frames within the scan range. How to correct time scale of audio signal.

8. The method of claim 7, wherein a difference SS-SA of a start point sample interval SA of the adjacent frame of the audio signal is calculated from the start point sample interval SS of the adjacent frame of the timescale corrected signal, and the scan range is | SS-SA | > L, the scan range is defined as a range including L samples left and right about the SA m skipped sample mSA from the start of the current frame, that is, mSA-L to mSA + L, otherwise MSA- (SS-SA) to mSA + 2L- (SS-SA), wherein the variables L and m represent half of the scan range and a frame index, respectively. Way.

8. The method of claim 7, wherein the cumulative value of the synchronization delay is periodically calculated even while the time scale of the audio signal is corrected by applying the new shift rate, and then, depending on whether the calculated cumulative value converges within the tolerance range. And adjusting the speed ratio to be applied from the period.

8. The method of claim 7, wherein when the cumulative value is reduced by a time scale correction using the new speed change rate and falls within a reset range defined as a narrower range within the tolerance range, the new speed change rate is set by the user. And recovering and applying the speed ratio again.

12. The correction mode according to claim 7 or 11, wherein a default value of the correction mode is set to off, and when the magnitude of the accumulated value of the synchronization delay exceeds the tolerance range, the correction mode is changed to on, and the correction mode is turned on. If the cumulative value of the synchronization delay increases compared to the cumulative value of the synchronization delay up to the previous period, if the cumulative value of the synchronization delay exceeds the tolerance range in the next period, and if it decreases, the correction mode And keeping the ON so that the transmission rate applied to the current period is applied as it is.

The original audio signal is stored in the first memory, the original audio signal is read out one frame from the first memory, and the second audio is rearranged so that the time scale of adjacent frames is modified according to the initial transmission rate set by the user. In the time scale correction method of an audio signal stored in

Read one by one frame from the original audio signal of the first memory, the current frame is to read N + L samples from the mSA-L th sample so that a predetermined sample overlaps with the previous frame, The variable N is the number of samples constituting the unit frame, the variable SA is defined as SS / α, the variable SS is defined as the number of samples between the start points of the successive frames of the time-scale modified audio signal, α represents an applied transmission rate for each frame period, the initial transmission rate or a modified transmission rate, the variable m represents a frame index, and L represents a number of samples corresponding to half of a predetermined scan range;

Calculating a waveform similarity between the last frame of the time-scale-modified audio signal and the read current frame while sliding the current frame within the scan range;

Each frame period, the method comprising calculating a synchronization delay K _m, the synchronization delay K _m is the number of samples between the time scale of having the maximum value among the calculated waveform similarity point and the cycle correction signal starting point (mSS) Defined by the difference of;

NK _m samples to which the synchronization delay K _m calculated in the read current frame is applied are added to the last frame of the second memory from the mSS-th sample point, and the duplicated samples of both the current frame and the last frame are weighted. Summing by applying;

Each time a new synchronization delay K _m is calculated, adding a cumulative value of the synchronization delays up to the previous frame period to newly calculate the synchronization delays up to the current frame period; And

Checking whether the calculated new synchronization delay cumulative value is out of a predetermined tolerance range, and if it is confirmed that the calculated new synchronization delay cumulative value is out of a predetermined tolerance range, forcing the synchronization delay cumulative value to fall within the tolerance range by adjusting a value of the applied transmission rate from the next frame period. Time scale correction method of an audio signal, characterized in that it comprises a.

15. The method of claim 13, wherein if the cumulative synchronization delay value increases in a negative direction and falls outside the tolerance range, the applied shift ratio α is corrected to a value smaller than the initial shift ratio α _o , and the cumulative value of the synchronization delay is positive. And applying the shift ratio α to a value larger than the initial shift ratio α _o when increasing in a direction out of the tolerance range.

15. The shift ratio according to claim 14, wherein a part of the tolerance range is defined as a reset range in order to force the accumulated accumulation delay value to be within the tolerance range, and after the synchronization delay accumulation value is out of the tolerance range, And correcting the time scale as a result of applying the following to restore the applied speed change rate to the initial speed change rate α _o when the synchronization delay accumulated value falls within the reset range.

The method according to claim 13, wherein the scan range is mSS-L to mSS + L, which includes L samples left and right around the mSS point when | α _o |> SS / (SS-L). Defined, and mSS- (SS-SA) to mSS + 2L- (SS-SA) when | α _o | ≤SS / (SS-L).

The audio signal is time-scaled according to a user-specified transmission rate, but the time of the audio signal is based on the principle of finding and reflecting the best correlation while sliding two adjacent frames within a predetermined scan range. In the device for modifying the scale,

Memory means for storing the original audio signal and the audio signal corrected in time scale based on a transmission rate set by a user; And

The audio signal is divided into a plurality of consecutive frames, and adjacent frames are separated by a first number of samples, and the interval of the adjacent frames is compressed or extended so that a second number of samples is overlapped between the divided adjacent frames. A point indicating the best cross-correlation between the current frame of the audio signal and the last frame of the time-scale-corrected signal within a predetermined scan range, particularly when the time-scale is corrected one frame at a time. And calculating the synchronization delay which is the sample interval between the center point of the scan range and adding the synchronization delay accumulated up to the previous frame period, calculating the accumulation value of the synchronization delay, and comparing the calculated accumulation delay with the predetermined tolerance range. If the cumulative value is out of the tolerance range, the shift is applied from the next frame. And a signal processing means for correcting the time scale by correcting the rate to a new speed change rate that can reduce the cumulative value of the synchronization delay to converge within the tolerance range.

18. The method of claim 17, wherein in the signal processing means, a difference SS-SA of a start point sample interval SA of an adjacent frame of the audio signal is calculated from a start point sample interval SS of an adjacent frame of the time scale-corrected signal, and the scan range The | SS-SA | > L, the scan range is defined as a range including L samples left and right about the SA m skipped sample mSA from the start of the current frame, that is, mSA-L to mSA + L, otherwise Are defined as mSA- (SS-SA) to mSA + 2L- (SS-SA), wherein the integers L and m represent half of the scan range and frame index, respectively. Time scale corrector.

18. The method according to claim 17, wherein the signal processing means periodically calculates a cumulative value of the synchronization delay while the time shift of the audio signal is corrected by applying the new shift rate so that the calculated cumulative value converges within the tolerance range. And adjusting the speed change rate to be applied from a subsequent period according to whether or not the audio signal is applied.

18. The method according to claim 17, wherein the signal processing means reduces the cumulative value by a time scale correction to which the new speed change rate is applied, and enters the new speed change rate when it enters a reset range defined as a narrower range within the tolerance range. Apparatus for correcting the time scale of the audio signal, characterized in that for recovering and applying again to the shift rate set by the user.