CN1581292A - A Nonlinear Overlap Method for Time Series Transformation - Google Patents
A Nonlinear Overlap Method for Time Series Transformation Download PDFInfo
- Publication number
- CN1581292A CN1581292A CN 03127827 CN03127827A CN1581292A CN 1581292 A CN1581292 A CN 1581292A CN 03127827 CN03127827 CN 03127827 CN 03127827 A CN03127827 A CN 03127827A CN 1581292 A CN1581292 A CN 1581292A
- Authority
- CN
- China
- Prior art keywords
- value
- index value
- maximum index
- predetermined number
- critical value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000009466 transformation Effects 0.000 title 1
- 238000006243 chemical reaction Methods 0.000 claims abstract description 10
- 230000005236 sound signal Effects 0.000 claims description 48
- 230000003111 delayed effect Effects 0.000 abstract description 2
- 230000015572 biosynthetic process Effects 0.000 abstract 2
- 238000003786 synthesis reaction Methods 0.000 abstract 2
- 230000002194 synthesizing effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 230000001934 delay Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000005311 autocorrelation function Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
Images
Landscapes
- Complex Calculations (AREA)
Abstract
Description
技术领域technical field
本发明涉及提供一种信号合成方法,尤其涉及一种应用于时序转换(timescaling)的非线性重叠(nonlinear overlap)方法。The present invention relates to providing a signal synthesis method, in particular to a nonlinear overlap method applied to time scaling.
背景技术Background technique
随着科技的进步,一些如卡拉OK之类的影音播放装置所能提供的功能也越来越多,例如像是音效净化(audio clean-up)、梦幻音场(dream)、及时序转换(time scaling)等功能。所谓的时序转换(又称为time stretching、time compression/expansion或time correction)是在不影响声调(pitch)的情况下,改变音频信号的长度,亦即改变该音频信号的播放速率(tempo)。With the advancement of technology, some audio-visual playback devices such as karaoke can provide more and more functions, such as audio clean-up, dream sound field (dream), and timing conversion ( time scaling) and other functions. The so-called timing conversion (also known as time stretching, time compression/expansion or time correction) is to change the length of the audio signal without affecting the pitch (pitch), that is, to change the playback rate (tempo) of the audio signal.
目前,市面上的影音装置大都是透过以下的三种方法以完成时序转换,一为Phase Vocoder、一为MPEX(Minimum Perceived Loss TimeExpansion/Compression)、而另一则为Time Domain Harmonic Scaling(TDHS)。Phase vocoder是先利用STFT(Short Time Fourier Transform)的方式将一音频信号转换成一傅立叶型式的频域信号(complex Fourierrepresentation),再利用内差及iSTFT(inverse)的方式将该频域信号转换成一对应于该音频信号的时序转换过(time scaled)的音频信号。MPEX是近来由Prosoniq所研发出来的,MPEX是一种模拟人类听觉特性的方法,类似于人工神经网络(artificial neural network)。MPEX是依据特定时序内所收录的音频信号,并进而“学习”该特定时段内的音频信号的各种特性,以试图延长或缩短该音频信号。而TDHS则为一种较普遍的时序转换的方法,其是先计算第一音频信号的相关表(autocorrelogram)中的每一相关值(magnitudes of a autocorrelation function),接着依据该相关表中的最大相关值所对应的最大索引值延迟该第一音频信号以产生第二音频信号,然后再将该第一音频信号以重叠加成(synchronized overlap-add,SOLA)的方式复制于该第二音频信号上,以产生较第一音频信号长的第三音频信号。At present, most audio-visual devices on the market use the following three methods to complete timing conversion, one is Phase Vocoder, the other is MPEX (Minimum Perceived Loss Time Expansion/Compression), and the other is Time Domain Harmonic Scaling (TDHS) . Phase vocoder first uses STFT (Short Time Fourier Transform) to convert an audio signal into a Fourier-type frequency domain signal (complex Fourier representation), and then uses the inner difference and iSTFT (inverse) to convert the frequency domain signal into a corresponding The audio signal is time scaled with respect to the timing of the audio signal. MPEX was recently developed by Prosoniq. MPEX is a method of simulating the characteristics of human hearing, similar to artificial neural network (artificial neural network). MPEX is based on the audio signal recorded in a specific time sequence, and then "learns" various characteristics of the audio signal in the specific time period, in order to try to extend or shorten the audio signal. TDHS is a more common method for time series conversion. It first calculates each correlation value (magnitudes of a autocorrelogram function) in the correlation table (autocorrelogram) of the first audio signal, and then calculates each correlation value (magnitudes of a autocorrelation function) according to the maximum The maximum index value corresponding to the correlation value delays the first audio signal to generate a second audio signal, and then copies the first audio signal to the second audio signal in a synchronized overlap-add (SOLA) manner to generate a third audio signal longer than the first audio signal.
一般而言,上述的相关表是透过数位信号处理器(DSP)来建立,而DSP是专门作为处理如回旋计算(convolution)、快速傅立叶转换(fast Fouriertransform,FFT)等复杂的数学运算之用。虽然如此,DSP将该第一音频信号中所有重叠于该第二音频信号的部分皆重叠合成于该第二音频信号以形成该第三音频信号的过程不仅冗长,而且就某种程度而言也没有必要。Generally speaking, the above-mentioned correlation table is established through a digital signal processor (DSP), and the DSP is specially used for processing complex mathematical operations such as convolution calculation (convolution), fast Fourier transform (fast Fouriertransform, FFT), etc. . Even so, the process of DSP overlapping and synthesizing all parts of the first audio signal overlapping the second audio signal with the second audio signal to form the third audio signal is not only tedious, but also to some extent no need.
发明内容Contents of the invention
因此本发明的主要目的在于提供一种用于时序转换的非线性重叠方法,该方法在快速地将该第一音频信号及该第二音频信号合成于该第三音频信号的同时,又不至于显著地影响该第三音频信号的品质。Therefore, the main object of the present invention is to provide a non-linear overlapping method for time sequence conversion, which rapidly synthesizes the first audio signal and the second audio signal into the third audio signal without The quality of the third audio signal is significantly affected.
根据本发明的权利要求书,本发明是揭露一种用来将S1[n]及S2[n]合成为S3[n]的非线性重叠的时序转换方法,其中S1[n]包含N1个信号,而S2[n]包含N2个信号,该方法包含下列步骤:(a)将S2[n]延迟一预定数目以形成S5[n],(b)建立S1[n]及S5[n]的相关表,以及(c)将S3[n]设定成:According to the claims of the present invention, the present invention discloses a nonlinear overlapping timing conversion method for synthesizing S 1 [n] and S 2 [n] into S 3 [n], wherein S 1 [n] comprising N 1 signals, and S 2 [n] comprising N 2 signals, the method comprises the following steps: (a) delaying S 2 [n] by a predetermined number to form S 5 [n], (b) establishing S A correlation table of 1 [n] and S 5 [n], and (c) setting S 3 [n] as:
S1[n],当0<=n<(该预定数目+该相关表中的最大相关值所对应的最大索引值+第一临界值)时;S 1 [n], when 0<=n<(the predetermined number+the maximum index value corresponding to the maximum correlation value in the correlation table+the first critical value);
S1[n]加权合成于S4[n],当(该预定数目+该最大索引值+该第一临界值)<=n<(N1-第二临界值)时;S 1 [n] is weighted and synthesized in S 4 [n], when (the predetermined number+the maximum index value+the first critical value)<=n<(N 1 -the second critical value);
S4[n-(该预定数目+该最大索引值)],当(N1-该第二临界值)<=n<=N2+该预定数目+该最大索引值;S 4 [n-(the predetermined number+the maximum index value)], when (N 1 -the second critical value)<=n<=N 2 +the predetermined number+the maximum index value;
其中该第一、第二临界值不同时为零,而S4[n]是S5[n]延迟该最大索引值。The first and second critical values are not zero at the same time, and S 4 [n] is the maximum index value of S 5 [n] delay.
本发明的方法是仅将该第一音频信号中重叠于该第二音频信号的部分中的一部分加权合成于该第二音频信号以产生该第三音频信号,因此,可增加用来处理时序转换的DSP所在的电脑的运作效能。In the method of the present invention, only a part of the part of the first audio signal that overlaps the second audio signal is weighted and synthesized with the second audio signal to generate the third audio signal. Therefore, it can be used to process timing conversion The operating performance of the computer where the DSP is located.
附图说明Description of drawings
图1为本发明方法的流程图。Fig. 1 is the flowchart of the method of the present invention.
图2为本发明方法将S1[n]及S2[n]合成为S3[n]的示意图。Fig. 2 is a schematic diagram of synthesizing S 1 [n] and S 2 [n] into S 3 [n] by the method of the present invention.
图3为本发明方法增长音频信号的示意图。Fig. 3 is a schematic diagram of increasing an audio signal by the method of the present invention.
图4为本发明方法缩短音频信号的示意图。Fig. 4 is a schematic diagram of shortening an audio signal by the method of the present invention.
图式的符号说明Graphical Symbol Description
Δ 预定数目 τmax 最大索引值Δ predetermined number τ max maximum index value
th1 第一临界值 th2 第二临界值th 1 first threshold th 2 second threshold
具体实施方式Detailed ways
在建立对应于第一音频信号及第二音频信号(或延迟于该第二音频信号的音频信号)的相关表后,本发明的较佳实施例中的方法100是依据该相关表中的最大相关值所对应的最大索引值、第一临界值、第二临界值和该第一音频信号及该第二音频信号来计算第三音频信号。具体地讲,为了节省用以合成该第一音频信号及该第二音频信号以产生该第三音频信号的DSP的计算时间,方法100在计算出该最大索引值并将该第二音频信号延迟该最大索引值后,并非将该第一音频信号中所有重叠于该第二音频信号的部分皆加权合成于该第二音频信号,反而是仅将该第一音频信号中重叠于该第二音频信号的部分中的一部分(亦即该重叠部分中位于该第一临界值及该第二临界值间的重叠部分)加权合成于该第二音频信号以产生该第三音频信号。After establishing the correlation table corresponding to the first audio signal and the second audio signal (or the audio signal delayed in the second audio signal), the
请参阅图1,图1为本发明的较佳实施例中方法100的流程图。方法100包含下列步骤:Please refer to FIG. 1 , which is a flowchart of a
步骤102:开始;Step 102: start;
(S1[n]及S2[n]将被合成为S3[n],假设S1[n]及S2[n]分别包含N1及N2个信号)(S 1 [n] and S 2 [n] will be synthesized into S 3 [n], assuming S 1 [n] and S 2 [n] contain N 1 and N 2 signals respectively)
步骤104:将S2[n]延迟一预定数目Δ以形成S5[n];Step 104: Delay S 2 [n] by a predetermined number Δ to form S 5 [n];
(为了避免影音播放装置内的光学读取头(pickuphead)在读取S3[n]时发生读取数据不足(run-in)的现象,所以本发明的方法100是先将S2[n]延迟预定数目Δ后,才计算合成S1[n]及S5[n]所需的最大索引值τmax。在本发明的优选实施例中,预定数目Δ是等于[N1/3])(In order to avoid the phenomenon that the optical pickup head (pickuphead) in the audio-visual playback device reads S 3 [n], the phenomenon of insufficient read data (run-in), so the
步骤106:建立S1[n]及S5[n]的相关表(crosscorrelogram)并依据该相关表中的最大相关值所对应的最大索引值τmax延迟S5[n]以形成S4[n];Step 106: Establish a correlation table (crosscorrelogram) of S 1 [n] and S 5 [n] and delay S 5 [n] according to the maximum index value τ max corresponding to the maximum correlation value in the correlation table to form S 4 [ n];
(该相关表中包含多个相关值(magnitudes of a crosscorrelationfunction),每一相关值皆对应一索引值)(The correlation table contains multiple correlation values (magnitudes of a crosscorrelation function), and each correlation value corresponds to an index value)
步骤108:将S1[n]及S4[n]合成于S3[n];Step 108: Synthesize S 1 [n] and S 4 [n] into S 3 [n];
(S3[n]是被设定成:(S 3 [n] is set to:
S1[n],当0<=n<(预定数目Δ+最大索引值τmax+第一临界值th1)时;S 1 [n], when 0<=n<(predetermined number Δ+maximum index value τ max +first critical value th 1 );
S1[n]加权合成于S4[n],当(预定数目Δ+最大索引值τmax+第一临界值th1)<=n<(N1-第二临界值th2)时;S 1 [n] is weighted and synthesized on S 4 [n], when (predetermined number Δ+maximum index value τ max +first critical value th 1 )<=n<(N 1 -second critical value th 2 );
S4[n-(预定数目Δ+最大索引值τmax)],当(N1-第二临界值th2)<=n<=N2+预定数目Δ+最大索引值τmax;S 4 [n-(predetermined number Δ+maximum index value τ max )], when (N 1 -second critical value th 2 )<=n<=N 2 +predetermined number Δ+maximum index value τ max ;
其中第一临界值th1及第二临界值th2不同时为零)Wherein the first critical value th1 and the second critical value th2 are not zero at the same time)
步骤110:结束。Step 110: end.
请参阅图2,图2为本发明的优选实施例中的S1[n]及S2[n]合成为S3[n]的示意图。图4中的第一部分401是显示方法100的步骤102中的S1[n]及S2[n]、第二部分402是显示方法100的步骤104中的S1[n]及S5[n]、第三部分403是显示方法100的步骤106中所计算出的τmax及S4[n]、而第四部分404及第五部分405则显示方法100的步骤108中由S1[n]及S4[n]所合成的S3[n]。Please refer to FIG. 2 . FIG. 2 is a schematic diagram of synthesizing S 1 [n] and S 2 [n] into S 3 [n] in a preferred embodiment of the present invention. The
在图2的第四部分404中所显示的S3[n]在(预定数目Δ+最大索引值τmax+第一临界值th1)<=n<(N1-第二临界值th2)时是等于:S 3 [n] shown in the
而图2的第五部分405中所显示的S3[n]在(预定数目Δ+最大索引值τmax+第一临界值th1)<=n<(N1-第二临界值th2)时是等于:And S 3 [n] shown in the
上述的S1[n]若全等于S2[n],亦即S1[n]与S2[n]皆是分离自S[n]的同一位置,如图3所示,则方法100是增长S1[n]。相反地,S1[n]及S2[n]若不相等,亦即S1[n]与S2[n]皆是分离自S[n]的不同位置,如图4所示,则方法100是将S1[n]、S6[n](被舍弃)、及S2[n]缩短为S3[n]。If the above S 1 [n] is completely equal to S 2 [n], that is, both S 1 [n] and S 2 [n] are separated from the same position of S[n], as shown in Figure 3, then the
与已知的TDHS相比较,本发明的方法是依据相关表中的最大相关值所对应的最大索引值及两个用来缩减S1[n]及S2[n]的重叠部分的第一及第二临界值来计算合成于S1[n]及S2[n]的S3[n]。由于本发明在计算出该最大索引值后,不需一一计算S1[n]重叠于S2[n]的全部数值,亦即仅需计算S3[n]中介于该第一及第二临界值间的部分数值,因此可节省用来依据S1[n]及S2[n]以合成S3[n]的DSP计算S3[n]所需花费的时间,连带地,也增加该DSP所在的电脑的运作效能。Compared with the known TDHS, the method of the present invention is based on the maximum index value corresponding to the maximum correlation value in the correlation table and the first two used to reduce the overlap of S 1 [n] and S 2 [n] and the second critical value to calculate S 3 [n] synthesized in S 1 [n] and S 2 [n]. Since the present invention does not need to calculate all the values of S 1 [n] overlapping S 2 [n] one by one after calculating the maximum index value, that is, it only needs to calculate the values between the first and the second in S 3 [n]. Part of the value between the two critical values, thus saving the time required for the DSP calculation of S 3 [n] based on S 1 [n] and S 2 [n] to synthesize S 3 [ n], and jointly, also Increase the operating performance of the computer where the DSP is located.
以上所述仅为本发明的优选实施例,凡依本发明权利要求所做的均等变化与修改,皆应属本发明专利的涵盖范围。The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the claims of the present invention shall fall within the scope of the patent of the present invention.
Claims (19)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 03127827 CN1244901C (en) | 2003-08-11 | 2003-08-11 | A Nonlinear Overlap Method for Time Series Transformation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 03127827 CN1244901C (en) | 2003-08-11 | 2003-08-11 | A Nonlinear Overlap Method for Time Series Transformation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1581292A true CN1581292A (en) | 2005-02-16 |
CN1244901C CN1244901C (en) | 2006-03-08 |
Family
ID=34578871
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 03127827 Expired - Fee Related CN1244901C (en) | 2003-08-11 | 2003-08-11 | A Nonlinear Overlap Method for Time Series Transformation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN1244901C (en) |
-
2003
- 2003-08-11 CN CN 03127827 patent/CN1244901C/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN1244901C (en) | 2006-03-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI221561B (en) | Nonlinear overlap method for time scaling | |
AU2002242265B2 (en) | Method for time aligning audio signals using characterizations based on auditory events | |
US6073100A (en) | Method and apparatus for synthesizing signals using transform-domain match-output extension | |
McLoughlin | Applied speech and audio processing: with Matlab examples | |
McLoughlin | Speech and Audio Processing: a MATLAB-based approach | |
US9058384B2 (en) | System and method for identification of highly-variable vocalizations | |
CN111916093B (en) | Audio processing method and device | |
CN113314140A (en) | Sound source separation algorithm of end-to-end time domain multi-scale convolutional neural network | |
EP2881944B1 (en) | Audio signal processing apparatus | |
CN1719514A (en) | High-quality real-time voice change method based on speech analysis and synthesis | |
CN101620856A (en) | Method for time scaling of a sequence of input signal values | |
US20040133292A1 (en) | Generalized envelope matching technique for fast time-scale modification | |
EP1074968B1 (en) | Synthesized sound generating apparatus and method | |
CN113113033B (en) | Audio processing method, device and readable storage medium | |
Ferreira-Paiva et al. | A survey of data augmentation for audio classification | |
CN101290775B (en) | Method for rapidly realizing speed shifting of audio signal | |
CN1244901C (en) | A Nonlinear Overlap Method for Time Series Transformation | |
CN112309425B (en) | Sound tone changing method, electronic equipment and computer readable storage medium | |
US7899678B2 (en) | Fast time-scale modification of digital signals using a directed search technique | |
JPH0783752A (en) | Device and method for measuring audio distortion | |
TWI259994B (en) | Adaptive multiple levels step-sized method for time scaling | |
JP2612867B2 (en) | Voice pitch conversion method | |
CN100421151C (en) | Adaptive multi-step time sequence conversion method | |
US8484018B2 (en) | Data converting apparatus and method that divides input data into plural frames and partially overlaps the divided frames to produce output data | |
Saputri et al. | Effect Of Using Window Type On Time Scale Modification On Voice Recording Using Waveform Similarity Overlap and Add |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20060308 Termination date: 20140811 |
|
EXPY | Termination of patent right or utility model |