KR101336137B1

KR101336137B1 - Method of fast normalized cross-correlation computations for speech time-scale modification

Info

Publication number: KR101336137B1
Application number: KR1020120015992A
Authority: KR
Inventors: 김형순; 임상준
Original assignee: 부산대학교 산학협력단
Priority date: 2012-02-16
Filing date: 2012-02-16
Publication date: 2013-12-05
Also published as: KR20130094607A

Abstract

본 발명은 합 테이블을 이용하여 정규화된 상호상관 함수 계산의 중복성을 제거하여 기존의 SOLA 및 WSOLA와 완전히 동일한 음질저하 없는 합성음 결과를 내면서도 음성 시간축 변환을 위한 정규화된 상호상관도 계산량을 감축할 수 있는 방법을 제공하기 위한 것으로서, 저장되어 있는 음성 신호에 대해서 임의의 재생 배속을 설정하는 단계와, 상기 설정된 배속이 고배속인 경우, 정규상호상관도의 수식에 합 테이블을 분모에만 적용하고, 상기 설정된 배속이 저배속인 경우, 정규상호상관도의 수식에 합 테이블을 분자와 분모에 각각 적용하는 단계와, 상기 생성된 합 테이블을 이용하여 고속의 정규상호상관도를 계산하여 최적 이동값을 산출하고, 산출된 최적 이동값에 따른 유사도를 판정하여 상호상관함수의 중복을 제거하는 단계와, 상기 중복 제거를 마지막 프레임까지 수행한 후, 합성하여 출력 신호로 출력하는 단계를 포함하여 이루어지는데 있다.The present invention can reduce the normalized cross-correlation calculation amount for speech time-base conversion while producing a synthesized sound result without sound deterioration that is exactly the same as existing SOLA and WSOLA by eliminating redundancy of normalized cross-correlation function calculation using a sum table. In order to provide a method, the method further comprises the steps of: setting an arbitrary reproduction speed for the stored voice signal, and applying the sum table only to the denominator in the formula of the normal cross-correlation when the set speed is a high speed. When the double speed is low, the step of applying the sum table to the numerator and the denominator in the formula of the normal cross-correlation, respectively, and calculates the optimum moving value by calculating the fast normal cross-correlation using the generated sum table Determining the degree of similarity according to the calculated optimal shift value and removing the duplication of the cross-correlation function; Following the last frame, makin comprises the steps of synthesizing and outputting the output signal.

Description

Method of fast normalized cross-correlation computations for speech time-scale modification

본 발명은 음성의 시간 축 변환 방법에 관한 것으로, 특히 음성의 시간축 변환 방법인 WSOLA(an overlap-add technique based on waveform similarity)의 계산량을 줄이는 방법에 관한 것이다.The present invention relates to a time-base conversion method of speech, and more particularly, to a method of reducing the amount of calculation of an overlap-add technique based on waveform similarity (WSOLA).

음성신호의 음성 시간축 변환을 위해 여러 가지 방법들이 제안되었으나, 그 중에서도 시간 영역에서의 변환 방법들이 계산량이 적다는 장점이 있고 성능도 우수하기 때문에 널리 사용되고 있다. 즉, 음성 신호의 전체적인 시간축을 단순히 늘리거나 줄이면 스펙트럼 정보가 왜곡되어 마치 변조된 음성처럼 들리게 되는데 시간축 변환 기술은 이와 같은 스펙트럼 정보의 왜곡을 최소화 하면서 발화 속도만을 변경시키는데 사용된다.Various methods have been proposed for converting the speech signal to the speech time base. Among them, the conversion methods in the time domain are widely used because they have a small calculation amount and excellent performance. In other words, simply increasing or decreasing the overall time axis of a speech signal distorts the spectral information and sounds like modulated speech. The time-base conversion technique is used to change only the speech rate while minimizing the distortion of such spectral information.

시간 영역에서의 음속변환 방법들은 초기 하드웨어 구현이 간단한 단순한 OLA(overlap and add) 방식이 기본적으로 이용되었으나, 이는 스펙트럼 정보의 왜곡을 최소화시키기 부족하여, 이후 SOLA(synchronized overlap and add), WSOLA(an Overlap-Add technique based on Waveform Similarity), 그리고 PSOLA(Pitch-Synchronous Overlap-Add) 등의 방법이 만들어졌다. SOLA와 WSOLA는 유사한 방식으로써 파형의 동기를 맞추어 출력 음성의 질을 높이고 있다.The sound velocity conversion methods in the time domain basically used simple OLA (overlap and add) method, which is simple to implement hardware, but this is insufficient to minimize distortion of spectral information, and then synchronized overlap and add (SOLA) and WSOLA (an Overlap-Add technique based on Waveform Similarity, and Pitch-Synchronous Overlap-Add (PSOLA). SOLA and WSOLA are similar ways to synchronize waveforms to improve the quality of the output voice.

이처럼, 오디오 신호 길이 변경을 위해 디지털 신호 처리 기술을 실현하는 방법으로 대표적인 것이 SOLA와 WSOLA이며, SOLA와 WSOLA는 고품질의 타임 스케일링된 출력 신호를 생성할 수 있다. As such, SOLA and WSOLA are typical methods for realizing digital signal processing techniques for changing audio signal length, and SOLA and WSOLA can generate high quality time scaled output signals.

그리고 현재 시간축 변환 기술인 WSOLA의 계산량을 줄이는 방법에 관한 연구가 진행 중에 있다. 그러나 기존의 음성의 주기적인 특성을 이용해서 검색 구간을 줄이는 방법은 계산량이 줄어드는 반면 시간축 변환의 합성 결과가 원래의 시간축 방법의 음질을 다소 떨어뜨리는 단점이 있다.Currently, research is being conducted to reduce the amount of computation of WSOLA, a time-base transformation technology. However, the method of reducing the search interval by using the periodic characteristics of the conventional speech reduces the computational amount, while the synthesis result of the time-base transformation slightly degrades the sound quality of the original time-base method.

최근 모바일 기기가 각광받고 있는데, 사용되고 있는 모바일 기기들은 현재 SOLA와 WSOLA 기반의 음성 시간축 변환을 위해서는 그 성능이 상대적으로 부족함에 따라, 이러한 성능이 부족한 장치에서 SOLA와 WSOLA를 적용하기 위해서는 SOLA와 WSOLA 기반의 음성 시간축 변환을 위한 정규화된 상호상관도 계산량의 감축은 물론 음질저하의 문제점을 반드시 해결해야 할 필요성이 있다.Recently, mobile devices are in the spotlight, and mobile devices are currently lacking in performance for SOLA and WSOLA-based voice timebase conversion. Therefore, in order to apply SOLA and WSOLA in such insufficient devices, SOLA and WSOLA-based The normalized cross-correlation for the speech time-base conversion of NF needs to solve the problem of sound quality reduction as well as the reduction of the computational amount.

따라서 본 발명은 상기와 같은 문제점을 해결하기 위해 안출한 것으로서, 합 테이블을 이용하여 정규화된 상호상관 함수 계산의 중복성을 제거하여 기존의 SOLA 및 WSOLA와 완전히 동일한 음질저하 없는 합성음 결과를 내면서도 음성 시간축 변환을 위한 정규화된 상호상관도 계산량을 감축할 수 있는 방법을 제공하는데 그 목적이 있다.Accordingly, the present invention has been made to solve the above problems, by eliminating the redundancy of the normalized cross-correlation function calculation using the sum table, while producing a speech result without sound deterioration that is exactly the same as the existing SOLA and WSOLA. Its purpose is to provide a way to reduce the amount of normalized cross-correlation for transforms.

본 발명의 다른 목적은 정규상호상관도(cross-correlation)를 계산할 때 고배속에서는 정규화된 상호상관 함수의 분모 부분에 대해서만 계산의 중복성이 있고, 저배속에서는 정규화된 상호상관 함수의 분모와 분자 모두에 계산의 중복성이 있음을 고려하여 중복이 발생하는 정도 및 음성의 시간축 변환 배속에 따라 음성신호의 정규상호상관도 계산에 분자와 분모의 합 테이블을 적용하여 계산량을 줄일 수 있는 방법을 제공하는데 있다.Another object of the present invention is that when calculating cross-correlation, there is a redundancy of calculation only for the denominator portion of the normalized cross-correlation function at high speed, and at both the denominator and the numerator of the normalized cross-correlation function at low speed. In consideration of the redundancy of calculations, it is possible to reduce the amount of calculation by applying the sum table of the numerator and denominator to the calculation of the normal cross-correlation of the speech signal according to the degree of duplication and the time-base transformation speed of the speech.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 음성 시간축 변환을 위한 고속의 정규화된 상호상관도 계산 방법의 특징은 (A) 저장되어 있는 음성 신호

에 대해서 임의의 재생 배속(

)을 설정하는 단계와, (B) 상기 설정된 배속에 따라, 최적의 이동값

를 구하기 위해서 음성 신호와 출력 신호를 평가하는 기준으로 사용되는 정규상호상관도의 수식에 합 테이블을 분모에만 적용할 것인지, 또는 분자와 분모에 모두 적용할 것인지를 설정하는 단계와, (C) 상기 (B) 단계에서 합 테이블을 분모에만 사용하는 경우, 해당 프레임의 분모에만 합 테이블을 생성하고, 상기 (B) 단계에서 합 테이블을 분모에만 사용하는 경우, 해당 프레임의 분자와 분모에 각각 합 테이블을 생성하는 단계와, (D) 상기 생성된 합 테이블을 이용하여 중복이 제거된 정규상호상관도 계산으로 유사도를 판정하여 이를 최대화하는 값을 최적 이동값으로 산출하는 단계와, (E) 상기 유사도 판정에 따라 동기화된 위치에 중첩을 가산하는 단계와, (F) 상기 (D) 단계의 중복 제거를 마지막 프레임까지 수행한 후, 합성하여 출력 신호로 출력하는 단계를 포함하여 이루어지는데 있다.A feature of the fast normalized cross-correlation calculation method for speech time-base transformation according to the present invention for achieving the above object is (A) the stored speech signal

Any playback speed for

) And (B) the optimum moving value according to the set double speed

(C) setting whether to apply the sum table only to the denominator or to both the numerator and the denominator in the equation of the normal cross-correlation that is used as a criterion for evaluating the speech signal and the output signal in order to obtain. When the sum table is used only for the denominator in step (B), the sum table is generated only for the denominator of the corresponding frame, and when the sum table is used only for the denominator in step (B), the sum table is added to the numerator and denominator of the frame, respectively. (D) determining similarity by calculating normal cross-correlation with duplicates removed using the generated sum table, and calculating a value for maximizing it as an optimal shift value, and (E) the similarity. Adding overlap to the synchronized position according to the determination; and (F) performing deduplication of the step (D) to the last frame, and then synthesizing and outputting the output signal. It comprises a step.

바람직하게 상기 재생 배속(

)은 시간축 변환 비율로써 1을 기준으로 그보다 클 경우 고배속을, 1보다 작을 경우 저배속을 나타내는 것을 특징으로 한다.Preferably the regeneration speed (

) Is a time-base conversion ratio, which is characterized by high speed when the value is greater than 1 and low speed when the value is less than 1.

바람직하게 상기 (B) 단계에서 설정된 재생 배속이 고배속이면, 합 테이블을 분모에만 적용하고, 설정된 재생 배속이 저배속이면, 합 테이블을 분자와 분모에 모두 적용하는 것을 특징으로 한다.Preferably, if the regeneration speed set in step (B) is high, the sum table is applied only to the denominator, and if the set regeneration speed is low, the sum table is applied to both the numerator and the denominator.

바람직하게 상기 정규상호상관도의 수식은

이며, 이때,

는 k번째 프레임의 기준 신호를 의미하고,

는 k번째 프레임의 유사도 판별을 할 신호 즉, 이동값

에 의해 이동하는 비교 신호를 나타낸 것이며, L은 중첩 길이에 해당하는 값이고, k는 합성 프레임의 인덱스를 나타내는 것을 특징으로 한다.Preferably the formula of the normal cross-correlation

Lt; / RTI >

Means the reference signal of the k-th frame,

Is a signal to determine the similarity of the kth frame, that is, a moving value

The comparison signal is moved by, L is a value corresponding to the overlap length, k is characterized in that the index of the composite frame.

바람직하게 상기 최적 이동값은 상기 정규상호상관도의 수식을 최대화하는 값인 것을 특징으로 한다.Preferably, the optimum shift value is a value that maximizes the expression of the normal correlation.

바람직하게 상기 (C) 단계에서 생성되는 분모의 합 테이블은 수식

와 같이 회귀식의 형태로 생성되는 것을 특징으로 하는 특징으로 한다.Preferably, the sum table of the denominators generated in the step (C) is

It is characterized in that it is generated in the form of a regression equation.

바람직하게 상기 (C) 단계에서 생성되는 분모의 합 테이블은

와 같이 생성되는 것을 특징으로 한다.Preferably, the sum table of the denominators generated in step (C) is

It is characterized in that it is generated as.

바람직하게 상기 (E) 단계는 음성의 시간축 변환 방법인 WSOLA에서 사용되는 방법을 이용하는 것을 특징으로 한다.Preferably, step (E) is characterized by using the method used in WSOLA, which is a time-base conversion method of speech.

바람직하게 상기 분모 부분은 비교 신호의 에너지이고, 상기 분자 부분은 비교 신호의 상호상관 함수인 것을 특징으로 한다.Preferably, the denominator portion is the energy of the comparison signal, and the molecular portion is a cross-correlation function of the comparison signal.

상기와 같은 목적을 달성하기 위한 본 발명에 따른 음성 시간축 변환을 위한 고속의 정규화된 상호상관도 계산 방법의 특징은 저장되어 있는 음성 신호에 대해서 임의의 재생 배속을 설정하는 단계와, 상기 설정된 배속이 고배속인 경우, 정규상호상관도의 수식에 합 테이블을 분모에만 적용하고, 상기 설정된 배속이 저배속인 경우, 정규상호상관도의 수식에 합 테이블을 분자와 분모에 각각 적용하는 단계와, 상기 생성된 합 테이블을 이용하여 고속의 정규상호상관도를 계산하여 최적 이동값을 산출하고, 산출된 최적 이동값에 따른 유사도를 판정하여 상호상관함수의 중복을 제거하는 단계와, 상기 상호상관함수의 중복 제거를 마지막 프레임까지 수행한 후, 합성하여 출력 신호로 출력하는 단계를 포함하여 이루어지는데 있다.A feature of the fast normalized cross-correlation calculation method for speech timebase conversion according to the present invention for achieving the above object is the step of setting an arbitrary reproduction speed for the stored speech signal, Applying a sum table only to the denominator in the formula of the normal cross-correlation for high speed, and applying the sum table to the numerator and the denominator in the formula for normal cross-correlation, if the set double speed is low. Calculating an optimal moving value by calculating a fast normal cross-correlation using the calculated sum table, determining similarity according to the calculated optimum moving value, and eliminating duplication of the cross-correlation function; And performing the removal to the last frame, then synthesizing and outputting the output signal.

바람직하게 상기 재생 배속은 시간축 변환 비율로써, 1을 기준으로 그보다 클 경우 고배속을, 1보다 작을 경우 저배속을 나타내는 것을 특징으로 한다.Preferably, the reproduction double speed is a time-base conversion ratio, and is characterized by high speed when it is larger than 1 and low speed when it is smaller than 1.

이상에서 설명한 바와 같은 본 발명에 따른 음성 시간축 변환을 위한 고속의 정규화된 상호상관도 계산 방법은 그 성능이 부족한 장치에서도 WSOLA 기반의 음성 시간축 변환을 위한 정규화된 상호상관도 계산량의 추가적인 감축을 통해 고품질의 타임 스케일링된 출력 신호를 생성할 수 있다. 이는 시간축 변환의 방법들 중 SOLA 및 WSOLA에 대해서 음질저하 없이 계산량을 감소시킬 수 있다.As described above, the fast normalized cross-correlation calculation method for speech time-base transformation according to the present invention has a high quality through additional reduction of the normalized cross-correlation calculation amount for WSOLA-based speech time-base transformation even in a device lacking its performance. Can generate a time scaled output signal. This can reduce the amount of computation without degrading sound quality for SOLA and WSOLA among the methods of time-base transformation.

이에 따라, 음성 데이터에 대하여 음색의 왜곡없이 빠른 속도의 재생 및 탐색을 수행할 수 있으며, 음성 디코딩의 계산량이 적어짐에 따라 시스템의 부하를 저감시킬 수 있는 효과가 있다.Accordingly, it is possible to perform high speed reproduction and search without distorting the tone of the voice data, and to reduce the load on the system as the calculation amount of the voice decoding decreases.

도 1 은 본 발명의 실시예에 따른 WSOLA 기반의 음성 시간축 변환을 위한 고속의 정규화된 상호상관도 계산 방법을 설명하기 위한 흐름도1 is a flowchart illustrating a fast normalized cross-correlation calculation method for WSOLA-based speech timebase conversion according to an embodiment of the present invention.

본 발명의 다른 목적, 특성 및 이점들은 첨부한 도면을 참조한 실시예들의 상세한 설명을 통해 명백해질 것이다.Other objects, features and advantages of the present invention will become apparent from the detailed description of the embodiments with reference to the accompanying drawings.

본 발명에 따른 음성 시간축 변환을 위한 고속의 정규화된 상호상관도 계산 방법의 바람직한 실시예에 대하여 첨부한 도면을 참조하여 설명하면 다음과 같다. 그러나 본 발명은 이하에서 개시되는 실시예에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예는 본 발명의 개시가 완전하도록하며 통상의 지식을 가진자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이다. 따라서 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다.A preferred embodiment of a fast normalized cross-correlation calculation method for speech time-base transformation according to the present invention will be described with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. It is provided to let you know. Therefore, the embodiments described in the present specification and the configurations shown in the drawings are merely the most preferred embodiments of the present invention and are not intended to represent all of the technical ideas of the present invention. Therefore, various equivalents It should be understood that water and variations may be present.

도 1 은 본 발명의 실시예에 따른 WSOLA 기반의 음성 시간축 변환을 위한 고속의 정규화된 상호상관도 계산 방법을 설명하기 위한 흐름도이다.1 is a flowchart illustrating a fast normalized cross-correlation calculation method for WSOLA-based speech timebase conversion according to an embodiment of the present invention.

도 1을 참조하여 설명하면, 먼저 미리 저장되어 있는 음성 신호

에 대해서 재생 배속(

)을 설정한다(S10). Referring to Figure 1, first, the pre-stored voice signal

Regeneration double speed for (

(S10).

일반적으로 음성 시간축 변환을 위한 계산에서 입력되는 음성 신호

에 대해서 출력 신호

는 다음 수학식 1과 같이 주어진다.Typically the speech signal input in the calculation for speech timebase conversion

Output signal

Is given by Equation 1 below.

이때, 상기 수학식 1에서 L은 중첩 길이에 해당하는 값이고, 윈도우는 항상 절반 중첩이기 때문에 윈도우 길이는 N = 2L 이다. 그리고 k는 합성 프레임의 인덱스를 나타내므로, 상기

는 k번째 프레임에서의 최적 이동값을 나타낸다. 또한

는 시간축 변환 비율로써 1을 기준으로 그보다 클 경우 빠른 속도(고배속)로의 변환이고, 1보다 작을 경우 느린 속도(저배속)로의 변환이 된다. In this case, in Equation 1, L is a value corresponding to the overlap length, and the window length is N = 2L since the window is always half overlapped. K denotes the index of the composite frame,

Denotes an optimal shift value in the k th frame. Also

When the ratio is larger than 1, the speed is converted to a high speed (high speed), and when it is less than 1, the speed is converted to a slow speed (low speed).

따라서 최적의 이동값

를 구하기 위해서 음성 신호와 출력 신호를 평가하는 기준으로 정규상호상관도를 사용하며, 다음 수학식 2와 같이 나타낼 수 있다.Therefore, the optimum moving value

In order to obtain, the normal cross-correlation is used as a criterion for evaluating the speech signal and the output signal, and it can be expressed as Equation 2 below.

상기 수학식 2에서 비교하는 두 신호는 다음 수학식 3과 같이 정의할 수 있다.The two signals compared in Equation 2 may be defined as in Equation 3 below.

상기 수학식 2, 3에서

는 k번째 프레임의 기준 신호를 의미하고,

는 k번째 프레임의 유사도 판별을 할 신호 즉, 이동값

에 의해 이동하는 비교 신호를 나타낸 것이다. In Equations 2 and 3

Means the reference signal of the k-th frame,

This shows a comparison signal moving by.

따라서 최적 이동값

는 상기 수학식 2를 최대화하는 값으로, 다음 수학식 4와 같이 구할 수 있다.Therefore, the optimum moving value

Is a value maximizing Equation 2, and can be obtained as Equation 4 below.

이때, 비교를 위한 범위는

에 의해 정해지고,

이 클수록 계산량이 증가한다.At this time, the range for comparison is

Determined by

The larger the value, the greater the amount of calculation.

이처럼, 이동값

가 이동하는 구간,

만큼 반복해서 계산하게 된다. 따라서 수학식 2의 계산량을 줄이는 것이 WSOLA의 계산량 감소로 직결된다. 상기 수학식 2를 살펴보면 근접한 구간 내에서 중복성을 가짐을 알 수 있고, 이러한 중복에 대해서 계산을 피하는 방법인 합 테이블(sum table) 방법을 사용하면 계산 횟수를 줄일 수 있다.Like this,

Is the interval to move,

It will be calculated repeatedly. Therefore, reducing the calculation amount of Equation 2 directly leads to the reduction of the calculation amount of the WSOLA. Looking at Equation 2, it can be seen that there is redundancy in the adjacent section, and the number of calculations can be reduced by using a sum table method, which is a method of avoiding calculation for such overlap.

이에 따라, 상기 합 테이블을 수학식 2의 분모에만 적용할 것인지 분자와 분모에 모두 적용할 것인지를 정한다(S20). 분모의 경우는 근접 신호의 계산에 있어서 중복 계산이 생기고 합 테이블을 적용하여 이를 제거할 수 있다. 또한 분자의 경우 고배속의 변환에서는 매번 새로운 신호들에 대해 계산이 이루어지므로 중복을 고려할 여지가 없다. 하지만 느린 배속에서는 중복이 발생하고 합 테이블을 쓰면 계산량이 줄어든다. Accordingly, the sum table determines whether to apply only to the denominator of Equation 2 or both to the numerator and the denominator (S20). In the case of the denominator, duplicate calculations occur in the calculation of the proximity signal, and the sum table can be applied to remove them. In the case of numerators, there is no room for redundancy because the calculations are performed on new signals each time at high-speed transformations. However, at slower speeds, redundancy occurs, and using sum tables reduces computation.

따라서 상기 S10 단계에서 설정된 배속에 따라 그 여부를 정하고 분모만의 합 테이블을 사용하는 방식과 분모와 분자의 합 테이블 모두를 사용하는 방식 두 가지 중 하나를 선택한다.Therefore, it is determined according to the double speed set in step S10, and one of the method of using the sum table of the denominator and the method of using both the sum table of the denominator and the numerator is selected.

상기 선택 결과(S20), 분모만의 합 테이블을 사용하는 경우, 해당 프레임의 분모에만 합 테이블을 생성한다(S30).As a result of the selection (S20), when using the sum table of only the denominator, the sum table is generated only on the denominator of the corresponding frame (S30).

상기 수학식 2의 분모는 각 신호들의 에너지이다. 그런데 수학식 2에서 을 최대화하는 이동값

인 최적 이동값

를 구하는 과정에 있어서, 이동값

와 상관없이 계산되는 기준 신호

의 에너지는 상수 성분으로 볼 수 있다. 따라서 수학식 2에서는 계산할 필요가 없다.The denominator of Equation 2 is the energy of each signal. By the way, in Equation 2 to maximize the moving value

Optimal shift

In the process of finding,

Reference signal calculated regardless of

The energy of can be seen as a constant component. Therefore, it is not necessary to calculate in Equation 2.

또한 상기 수학식 2에서 비교 신호

의 에너지는 인접 구간에 대해서 중복 계산을 한다. 이 경우에 합 테이블 방법을 쓰면 인접 구간 에너지에 대해서 중복 계산을 제거할 수 있다. 상기 합 테이블 방법은 먼저 에너지 계산을 하기 전에 다음 수학식 5와 같이 합 테이블을 만드는데 회귀식의 형태로 계산한다.In addition, the comparison signal in Equation 2

The energy of is duplicated for adjacent sections. In this case, the sum table method can eliminate duplicate calculations for adjacent section energy. The sum table method first generates a sum table as shown in Equation 5 before calculating the energy, and calculates it in the form of a regression equation.

그리고 이렇게 생성된 합 테이블을 이용하여 고속의 정규상호상관도를 계산하여 최적 이동값을 산출하여 최적의 유사도를 판정한다(S40). Using the sum table thus generated, a high speed normal cross-correlation is calculated to calculate an optimum moving value to determine an optimum similarity (S40).

즉, 상기 수학식 5의 테이블을 이용하면

의 에너지는 다음 수학식 6과 같이 정리될 수 있다.That is, using the table of Equation 5

The energy of can be summarized as in Equation 6 below.

기존의 계산량을 감소시키려는 목적에서 제시된 신호 주기 추정을 이용하기 위해, 음성 신호를 주기적이라고 가정하고 윈도우 크기와 주기 사이의 관계를 활용해 최적의 유사도를 위한 최적의 이동값

를 산출하고 있다. 따라서 최적의 이동값을 산출하게 되면 이를 통해 최적의 유사도를 판정할 수 있게 된다.In order to use the proposed signal period estimation for the purpose of reducing the conventional computations, we assume that the speech signal is periodic and utilize the relationship between the window size and the period to obtain the optimal shift value for optimal similarity.

Is calculating. Therefore, when the optimal movement value is calculated, the optimal similarity can be determined.

이처럼 상기 수학식 2를 최대화하는 값으로, 최적 이동값

을 산출하여 최적의 유사도를 판정하게 된다. 이 테이블을 활용하면 인접한 구간에 대해 에너지를 구하는 정규상호상관도의 계산량은 대략 50% 감소된다.As such, the value of maximizing the equation (2), the optimum moving value

The optimum similarity is determined by calculating Using this table, the computation of normal cross-correlation for energy for adjacent intervals is reduced by approximately 50%.

그리고 상기 유사도 판정에 따라 동기화된 위치에 중첩을 가산하여 상호상관함수의 중복을 제거한다(S50). 상기 동기화된 위치에 중첩을 가산하여 상호상관함수의 중복을 제거하는 방법은 기존의 음성의 시간축 변환 방법인 WSOLA에서 사용되는 방법을 그대로 이용한다.The overlap is added to the synchronized position according to the similarity determination to remove the overlap of the cross-correlation function (S50). The method of eliminating duplication of the cross-correlation function by adding the overlap to the synchronized position uses the method used in the WSOLA, which is a time-base transformation method of the existing speech.

그리고 상기 상호상관함수의 중복 제거는 마지막 프레임까지 수행한 후(S60), 합성하여 출력한다(S70).Deduplication of the cross-correlation function is performed until the last frame (S60), and then synthesized and output (S70).

또한, 상기 선택 결과(S20), 분모와 분모에 합 테이블을 사용하는 경우, 해당 프레임의 분모와 분자에 각각 합 테이블을 생성한다(S30).In addition, when using the sum table for the selection result (S20), the denominator and the denominator, a sum table is generated for the denominator and the numerator of the corresponding frame (S30).

그리고 이렇게 분모와 분자에 각각 생성된 합 테이블을 이용하여 중복이 제거된 고속의 정규상호상관도를 계산하여 유사도를 판정하여 이를 최대화하는 값을 최적 이동값으로 산출한다(S90). 이때 분모의 경우는 위에서 기재된 과정을 동일하게 수행하므로, 분모에 따른 설명은 생략하고, 분자의 경우만 설명한다.Then, using the sum table generated in the denominator and the numerator, the high-speed normal cross-correlation of which duplicates are removed is determined to determine similarity, and a value maximizing it is calculated as an optimal shift value (S90). In this case, since the denominator is performed in the same manner as described above, description according to the denominator is omitted, and only the case of the molecule is described.

상기 수학식 2의 분자 부분은 상호상관 함수이다. 즉, 저배속 변환에서는 인접 프레임들에 대해 상호상관 함수의 중복계산 부분이 존재한다. 따라서 저배속의 시간축 변환의 경우에만 중복 계산을 제거하기 위해 정규상호상관도의 분자 부분에 대해서도 다음 수학식 7과 같이 합 테이블을 구성한다.The molecular portion of Equation 2 is a cross-correlation function. That is, in the low-speed transform, there is an overlap calculation portion of the cross-correlation function for adjacent frames. Therefore, in order to remove duplicate calculations only in the case of a low-speed time-base transformation, a sum table is formed as shown in Equation 7 for the molecular part of the normal correlation.

이렇게 생성된 합 테이블을 이용하여 중복이 제거된 고속의 정규상호상관도를 계산하여 유사도를 판정하여 최적 이동값을 산출한다(S90). By using the sum table generated as described above, a high speed normal cross-correlation of which duplicates are removed is determined to determine the similarity, and an optimal shift value is calculated (S90).

그리고 상기 유사도 판정에 따라 동기화된 위치에 중첩을 가산한다(S100). 이때, 위에서 언급한 것처럼 저배속의 변환에서 현재 프레임의 합 테이블은 이전 프레임이나 다음 프레임의 합 테이블과 중복을 가질 수 있다. 앞서 나타낸 그림처럼 중복을 가질 경우 중복 크기만큼의 합 테이블 값을 가져오면 가져오는 만큼의 계산을 피할 수 있다.Then, the overlap is added to the synchronized position according to the similarity determination (S100). At this time, as mentioned above, in the low-speed conversion, the sum table of the current frame may have a duplicate of the sum table of the previous frame or the next frame. As shown in the figure above, if you have a duplicate table value that is equal to the size of the duplicate, you can avoid the calculation as much as you get.

따라서 합 테이블의 중복 구간을 구하기 위해서 먼저 기준 신호

과

의 중첩 길이를 구하면 수학식 3에 의해

로 계산되며, 이 값이 음수일 경우 중첩 구간이 없는 것이므로 실제 중첩 길이

는 다음 수학식 8과 같이 표현된다.Therefore, in order to find the overlapping interval of the sum table, first, the reference signal

and

If we find the overlap length of

If this value is negative, there is no overlapping interval, so the actual overlap length

Is expressed by Equation 8 below.

그 다음 상호상관 곱의 또 하나의 성분인

에 대해서 이전 프레임의

와 중첩 구간을 구한다. Then another component of the cross-correlation product

Of previous frame

Find the overlapping interval with.

상기 수학식 4에 의해

과

사이의 시간 차이는 의

구간을 가지고

과

가

의 구간을 갖는다. 따라서 중첩이 있는 경우를 고려한 구간

는 다음 수학식 9와 같이 나타난다.By Equation 4

and

The time difference between

With segment

and

end

Has a section of. Therefore, the interval considering the case of overlap

Is expressed as in Equation 9 below.

이때, 상기

과

이 중첩되는 구간을 수학식 8에서

와

이 중첩되는 구간을 수학식 9에서 구했다.At this time,

and

This overlapping interval is expressed in Equation 8

Wow

This overlapping section was obtained from equation (9).

이처럼 수학식 8과 수학식 9에 의해 중첩되는 구간은 동일한 상호상관 함수 곱이기 때문에 다시 계산하지 않고 이전 프레임, 즉,

의 합 테이블 값을

에 가져다 쓸 수 있다.As such, the intervals overlapped by the equations (8) and (9) are the same cross-correlation function product and thus do not need to be recalculated.

Sum of table values

Can be used to

이처럼, 정규상호상관도 분자 성분의 전체 중복량

는 다음 식으로 표현되며 수학식 8과 수학식 9의 곱의 전체 프레임 합에 해당하는데, 저배속에서 배속이 낮을수록

과

의 중복이 많아지고, 또한 최적 이동값

과

의 차이가 적을수록 중복이 많아지고 분자의 전체 프레임의 계산량 감소는 다음 수학식 10과 같다.As such, the normal cross-correlation is the total overlap of the molecular components.

Is expressed by the following equation and corresponds to the total frame sum of the product of Equation 8 and Equation 9.

and

Of duplicates, and also the optimal shift

and

The smaller the difference is, the more overlap and the decrease in the calculation amount of the entire frame of the molecule is expressed by Equation 10 below.

그리고 상기 상호상관함수의 중복 제거는 마지막 프레임까지 수행한 후(S110), 합성하여 출력한다(S70).
Deduplication of the cross-correlation function is performed until the last frame (S110), and then synthesized and output (S70).

상기에서 설명한 본 발명의 기술적 사상은 바람직한 실시예에서 구체적으로 기술되었으나, 상기한 실시예는 그 설명을 위한 것이며 그 제한을 위한 것이 아님을 주의하여야 한다. 또한, 본 발명의 기술적 분야의 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 실시예가 가능함을 이해할 수 있을 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다. Although the technical spirit of the present invention described above has been described in detail in a preferred embodiment, it should be noted that the above-described embodiment is for the purpose of description and not of limitation. It will be apparent to those skilled in the art that various modifications may be made without departing from the scope of the present invention. Accordingly, the true scope of the present invention should be determined by the technical idea of the appended claims.

Claims

(A) Stored voice signal

Any playback speed for

),
(B) the optimum moving value according to the set double speed

Setting whether to apply the sum table only to the denominator, or to both the numerator and the denominator, in the formula of the normal cross-correlation that is used as a criterion for evaluating the speech signal and the output signal to obtain
(C) When the sum table is used only for the denominator in step (B), if the sum table is generated only for the denominator of the frame, and the sum table is used only for the denominator in step (B), the numerator and denominator of the frame Generating a sum table for each;
(D) calculating similarity by calculating normal mutual correlation with duplicates removed using the generated sum table and calculating a value that maximizes this as an optimum moving value;
(E) adding the overlap to the synchronized position according to the similarity determination,
(F) performing a deduplication of the step (D) to the last frame, and then synthesizing and outputting the result as an output signal.

The method of claim 1,
The regeneration speed

) Is a time-base conversion ratio of 1 to a higher value than 1, and a high-speed normalized cross-correlation calculation method for speech time-base conversion, characterized in that less than one.

3. The method of claim 2,
If the reproduction speed set in the step (B) is high, the sum table is applied only to the denominator. If the reproduction speed is low, the sum table is applied to both the numerator and the denominator. Normalized cross-correlation calculation method.

The method of claim 1,
The formula of the normal mutual correlation

Lt;
At this time,

Means the reference signal of the k-th frame,

A method of calculating a normalized cross-correlation for speech time-base transformation, wherein the comparison signal is moved by L, L is a value corresponding to the overlap length, and k is an index of the synthesized frame.

The method of claim 1,
The sum table of the denominators generated in the step (C) is

A fast normalized cross-correlation calculation method for speech time-base transformation, characterized in that it is generated in the form of a regression equation.

The method of claim 1,
The sum table of the denominators generated in step (C) is

A fast normalized cross-correlation calculation method for speech time-base transformation, characterized in that it is generated as follows.

The method of claim 1,
The step (E) is a fast normalized cross-correlation calculation method for speech time-base conversion, characterized in that using the method used in WSOLA which is the time-base conversion method of speech.