TW305929B

TW305929B - Method of changing the tone and the speed of sounds by using differential mean absolute error

Info

Publication number: TW305929B
Application number: TW85108649A
Authority: TW
Inventors: Ian-Huei Wang; Der-Chwan Wu
Original assignee: Ind Tech Res Inst
Priority date: 1996-07-15
Filing date: 1996-07-15
Publication date: 1997-05-21

Abstract

A method of changing the parameters of sounds includes the following steps:Convert an analog signal into a digital signal; Divide the above digital signal into several intervals; Revise the tone and/or the speed of one interval; Combine the above revised interval with a non-revised interval in which the non-revised interval overlaps the end part of the revised one to make a cross weakening at the portion of the end part having similar sound structure to the non-revised one. The similarity of the sound structure is defined by the formula of differential mean absolute error Repeat steps c) and d) on the other non-revised intervals Convert the revised digital signals into the analog signals.

Description

305929 A7 B7 五、發明説明（1 ) 發明範疇本發明是關於一個改變音訊的音調及播放速度的方法，特別是一個利用計算平均絕對誤差値以找出最好的接合點’使得音訊的各個區段能接合以達到改變音調及播放速度的高效率方法。發明背景在音訊錄製的某些應用上，人們嚐試改變音訊的音調及播放速度，如取樣合成器、諧合器、聲碼器、語言學習機、電話答錄機、及電腦合成音樂的軟體等的應用。如果要改變人聲的音訊，我們可以用壓縮的技術，根據發聲人的音調來調整訊號的振幅。一般說來，輸入的音訊其振幅可調範圍在八度之內。音訊總共可用24個半音來調整，這24個半音包括12個降半音以及12個升半音。調整音訊的方法必須符合使用比較簡單的硬體設計來即時處理資料的這個要求，同時也必須避免可測知的聲音失眞。經濟部中央樣隼扃員工消费合作社印装 (請先閲讀背面之注意Ϋ項再填寫本頁) 傳統上人們採用分離及接合法，利用重新取樣及格式化來改變音訊。但是這個方法會使聲音失眞至一無法接受的程度。重新取樣主要是改變取樣頻率，以致不僅音訊的振幅改變，訊號長度及格式包封的形狀也會改變。爲了維持訊號原有的長度，也有人在重新取樣後採用壓縮及擴張的技術。但是壓縮/擴張的步驟經常會產生短暫的爆音。而且格式包封形狀的改變也會產生高音雜訊。分離/接合法利用直線預測濾波器及傅立葉轉換來維持格式的形狀。但是這種方法需要許多計算步驟。另外也有人利用振盪器及減波器組來改變音調》這些方法不僅會產生低頻及高頻雜訊’而且也需要許多計算。 - 因此本發明的目的之一就是要產生一個能改變音訊的音調及播放速度的方法，但沒有先前技藝的缺點。本發明的另外一個目的是要用計算音訊的平均絕對誤差値以決定最佳接合點的方法，來提供一個能改變音訊的音調及播放速度的方法。本發明更進一步的目標是要提供一個能改變音訊的音調及播放速度的方法，這個方法將區塊二分搜尋法倂入音訊的平均絕對誤差的計算。 ^紙张尺度逋用中困國家標準^⑺以从規格“⑴^^了公釐）^ ~ -2^ A7 B7 經濟部中央橾準局員工消费合作社印製五、發明説明（z ) 發明摘要根據上述的發明目標，從第一方面來說，本發明提供一個修改音訊參數的方法。這個方法首先將類比的音訊轉換爲數位訊號，再將數位訊號分割成數個音區，然後再修改音區裡數位訊號的音調及播放速度。修改過的音區再與尙未修改的音區接合。接合時尙未修改的音區須與修改過的音區的末端區域重疊，以便交叉衰落。重疊是將與上述末端區域聲音結構類似的部份重疊。聲音結構的類似是由下列定義的接合平均絕對誤差的公式來決定’這個公式須要的計算步驟最少：305929 A7 B7 V. Description of the invention (1) Scope of the invention The present invention relates to a method for changing the pitch and playback speed of audio, in particular, a method for calculating the average absolute error value to find the best junction point. Segments can be joined to achieve a highly efficient method of changing pitch and playback speed. Background of the Invention In some applications of audio recording, people try to change the pitch and playback speed of audio, such as sample synthesizer, synthesizer, vocoder, language learning machine, answering machine, and computer synthesis software, etc. Applications. If we want to change the voice of the human voice, we can use compression technology to adjust the amplitude of the signal according to the tone of the speaker. Generally speaking, the amplitude of the input audio is adjustable within octave. The audio can be adjusted with a total of 24 semitones. These 24 semitones include 12 descending semitones and 12 rising semitones. The method of adjusting the audio must meet the requirement of using simpler hardware design to process the data in real time, and at the same time, avoid the loss of measurable sound. Printed by the Central Sample Falcon Employee Consumer Cooperative (Please read the note Ϋ on the back before filling in this page) Traditionally, people use the separation and joining method, which uses resampling and formatting to change the audio. But this method will make the sound lose its level to an unacceptable level. Resampling mainly changes the sampling frequency, so that not only the amplitude of the audio changes, but also the signal length and the shape of the format envelope. In order to maintain the original length of the signal, some people use compression and expansion techniques after re-sampling. But the compression / expansion step often produces a short pop. And the change of the format envelope shape will also produce treble noise. The split / join method uses a linear prediction filter and Fourier transform to maintain the shape of the format. But this method requires many calculation steps. In addition, some people use oscillators and wave reducers to change the tone. These methods not only generate low-frequency and high-frequency noise, but also require a lot of calculations. -Therefore, one of the objects of the present invention is to produce a method that can change the pitch and playback speed of audio, but without the disadvantages of the prior art. Another object of the present invention is to provide a method capable of changing the pitch and playback speed of audio by calculating the average absolute error value of audio to determine the optimal junction. A further object of the present invention is to provide a method that can change the pitch and playback speed of audio. This method incorporates the block binary search method into the calculation of the average absolute error of audio. ^ The paper standard uses the national standard for middle-aged people ^ ⑺ to follow the specification "⑴ ^^ a mm" ^ ~ -2 ^ A7 B7 Printed by the Consumer Cooperative of the Central Bureau of Economic Affairs of the Ministry of Economy V. Invention description (z) Summary of invention According to the above-mentioned object of the invention, from the first aspect, the present invention provides a method for modifying audio parameters. This method first converts the analog audio into a digital signal, then divides the digital signal into several sound regions, and then modifies the sound region The pitch and playback speed of the digital signal in the mile. The modified sound area is then joined with the unmodified sound area. When joined, the unmodified sound area must overlap the end area of the modified sound area for cross-fading. The overlap is Overlap the sound structure similar to the above-mentioned end area. The sound structure similarity is determined by the following formula for the mean absolute error of the joint. This formula requires the least calculation steps:

ID =ΣIX](m)-x2(m+x)|+|X](m+1)-X](m+l+t)-(x1(m)+x2(m+x))| m 其中DMA£爲接合的微分平均絕對誤差，m爲介於〇與cs之間的點數合，而 CS爲交叉衰落區域的大小；〇 < τ s sr ’其中sr爲搜尋區域；X〗指的是修改過的音區，x2指的是尙未修改的音區。把修改然後再接合的步驟重複應用到上述尙未修改及其他尙未修改的數位訊號的音區，以得到一個修改過的數位訊號。最後再將修改過的數位訊號轉換回類比的格式。如果修改導致音區變長，將過長的尙未修改的音區棄置，以維持播放時間不變》反過來說，如果修改使音區變短，不足的音區取自原數位訊號’以維持播放時間不變。ID = ΣIX] (m) -x2 (m + x) | + | X] (m + 1) -X] (m + l + t)-(x1 (m) + x2 (m + x)) | m Where DMA £ is the differential average absolute error of the joint, m is the number of points between 〇 and cs, and CS is the size of the cross-fading area; 〇 < τ s sr 'where sr is the search area; X〗 means Is the modified tone zone, and x2 refers to the unmodified tone zone. Repeatedly apply the modified and rejoined steps to the sound area of the above unmodified and other unmodified digital signals to obtain a modified digital signal. Finally, the modified digital signal is converted back to the analog format. If the modification causes the sound zone to become longer, discard the too long unmodified sound zone to maintain the playing time. On the contrary, if the modification makes the sound zone shorter, the insufficient sound zone is taken from the original digital signal. Keep the playing time unchanged.

DMAE値彼此相差ητ點，η是整數，並且隨計算所容許的準確範晒定。首先將搜尋區域分成幾個區段，進一步定義每一個區段的DMAE値，然後取DMAE 値最小的那個區段，當作最佳接合位置。找出MAE値最小的那個區段所須的胃+胃&$胃 n[3 + 2(log2 MS/n-2)] 其中η爲區段個數，MS爲搜尋區域的長度。表紙張尺度適用中國國家橾準（CNS ) A4規格（2丨0X29*7公釐） Ί— n I - 1.. ml I. I I a^n HI n ^ ^ (請先閱讀背面之注意事項再填寫本I ) 305S29 A7 B7 五、發明説明（3 ) 在第二方面，本發明提供一個改變音訊參數的方法。這個方法首先將類比的音訊轉換爲數位訊號，然後再將數位訊號分割成數個音區’再修改一個音區的播放時間。修改過的音區再與尙未修改的音區接合。接合時尙未修改的音區須與修改過的音區的末端區域重疊，以便交叉衰落。重疊是將與上述末端區域聲音結構類似的部份重叠。聲音結構的類似是由下列定義的接合平均絕對誤差的公式來決定，這個公式須要的計算步驟最少： DMAE = Σ|χ1(ιη)-χ2(ιη+τ)| + |χ1(ιη+1)-χ1(ιη)-Χ2(πι+1+τ)-Χ2^ιη+τ^ Π) =Σ 丨 Xl(m)-x2(mn) μ 丨 x/m+l)-χ/ΐϋ+Ι+τΗχ/πΟ+χ/ιη+τ)〕丨 m 其中DMA£爲接合的微分平均絕對誤差，m爲介於Q與cs之間的點數合，而 cs爲交叉衰落區域的大小；〇 < τ < sr，其中sr爲搜尋區域；X丨指的是修改過的音區，x2指的是尙未修改的音區。把修改然後再接合的步驟重複應用到上述尙未修改及其他尙未修改的數位訊號的音區，以得到一個修改過的數位訊號。最後再將修改過的數位訊號轉換回類比的格式。此外，如果音訊的處理會增加音訊的振幅，改變播放時間的修改步驟就包括增加播放時間，以便音訊的播放速度及振幅能維持不變。因此娜音訊的處理會降低音訊的振幅，3文變播放時間的修改步驟就包括減少播放時間’以便音訊的播放速度及振幅能維持不變。經濟部中央樣準局貝工消费合作社印装DMAE values differ from each other by ητ points, η is an integer, and is determined with the exact range allowed by the calculation. First, the search area is divided into several sections, and the DMAE value of each section is further defined, and then the section with the smallest DMAE value is taken as the optimal joint position. Find the stomach + stomach & $ stomach n [3 + 2 (log2 MS / n-2)] required for the section with the smallest MAE value, where η is the number of sections and MS is the length of the search area. Table paper scale is applicable to China National Standard (CNS) A4 (2 丨 0X29 * 7mm) Ί— n I-1 .. ml I. II a ^ n HI n ^ ^ (Please read the notes on the back first Fill in this I) 305S29 A7 B7 5. Description of the invention (3) In the second aspect, the present invention provides a method for changing audio parameters. This method first converts the analog audio into a digital signal, and then divides the digital signal into several sound zones' and then modifies the playing time of one sound zone. The modified tone zone is then joined with the unmodified tone zone. The unmodified sound area at the time of splicing must overlap the end area of the modified sound area in order to cross fade. Overlap is to overlap a part similar to the sound structure of the above-mentioned end region. The similarity of the sound structure is determined by the following formula for the mean absolute error of the joint, which requires the least calculation steps: DMAE = Σ | χ1 (ιη) -χ2 (ιη + τ) | + | χ1 (ιη + 1) -χ1 (ιη) -Χ2 (πι + 1 + τ) -Χ2 ^ ιη + τ ^ Π) = Σ 丨 Xl (m) -x2 (mn) μ 丨 x / m + l) -χ / Ιϋ + Ι + τΗχ / πΟ + χ / ιη + τ)] m where DMA £ is the differential average absolute error of the joint, m is the number of points between Q and cs, and cs is the size of the cross-fading region; 〇 < τ < sr, where sr is the search area; X 丨 refers to the modified sound area, and x2 refers to the unmodified sound area. Repeatedly apply the modified and rejoined steps to the sound area of the above unmodified and other unmodified digital signals to obtain a modified digital signal. Finally, the modified digital signal is converted back to the analog format. In addition, if the audio processing will increase the amplitude of the audio, the modification step to change the playback time includes increasing the playback time so that the playback speed and amplitude of the audio can be maintained. Therefore, the processing of Na audio will reduce the amplitude of the audio. The step of modifying the playback time of the three-character variable includes reducing the playback time 'so that the playback speed and amplitude of the audio can be maintained. Printed by the Beigong Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs

Ha n^i 1^1 HI i ml m n t— n^i m i— \ < (請先閱讀背面之注意事項再填寫本頁) DMAE値彼雌差ητ點，η是整數，並細計算所容許的準確範°胃 •先將搜尋區域分成幾個區段，進一步定義每一個區段的DMAE値，然後取DME 値最小的那個區段，當作最佳接合位置。 · 找出DME値最小的那個區段所須的計算次數爲 n[3 + 2(log2 MS/(n-2))3 其中η爲區段個數，MS爲搜尋區域的長度。 · . 本紙伕尺度適用中國國家標準（CNS) A4规格（210X297公釐） 4" A7 __B7 五、發明説明（4 ) 本發明提供一個改變音訊參數的裝置。這個裝置包括一個輸入放大器、一個輸出放大器、第一及第二個低通爐波器、一個類比/數位轉換器、一個數位/類比轉換器、以及一個改變曰調處理器。輸入放大器與第一·個低通爐波器以及類比/ 數位轉換器串連起來，並提供輸入給改變音調處理器。數位/類比轉換器與第二個低通濾波器以及輸出放大器則在改變音調處理器的輸出端串連起來。改變音調處理器包含一個輸入裝置，這個輸入裝置與一個輸入緩衝器相連， —個輸出裝置，這個輸出裝置與一個輸出緩衝器相連，一個交叉衰落的資料記憶體，記憶體用來儲存須要作交叉衰落的音訊部份，一個位址裝置，這個位址裝置與輸入緩衝器、輸出緩衝器、交叉衰落的資料記憶體、一個暫存資料裝置、一個計算平均絕對誤差及交叉衰落値的數位處理裝置、以及一個控制裝置相連。輸入緩衝器、交叉衰落的資料記憶體、數位處理裝置'控制裝置、以及輸出緩衝器則是透過一個匯排流系統將彼此的作業連接起來。圖式說明考量下列的說明及附圖，本發明的目的、特點、及優點就會很明顯。附圖包括：圖1說明在增加取樣點及減少取樣點的情況下，以同樣的速度播放音訊。圖2說明本發明升高音階的音區接合法。圖3說明本發明降低音階的音區接合法。圖4說明找出音區的最佳接合位置的範圍及搜尋方法。經濟部中央樣準局貝工消费合作社印製 m - I- I - - - In i I HI Hi - - --1 I- ! Τ» W ，ve (請先閱讀背面之注意事項再填寫本頁) 圖5說明本發明找出最佳接合位置的二分搜尋法》圖6爲本發明的方塊圖。圖7爲圖6的改變音調處理裝置的方塊圖。詳細說明本發明提供一個改變音訊的音調及播放速度的方法，但沒有先前技藝的缺點》本紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） —— 經濟部中央標準局員工消费合作社印裝 A7 B7___ 五、發明説明（亨）改變一個音訊的音調最簡單的方法就是產生像錄音帶以較快速度或較慢速度播放的效果。有兩種不同的方法可以產生這種效果。第一，如果播放速度保持固定，取樣點可以成比例的增加或減少，如圖1所示。標示（ι〇)爲原來的音訊，音訊（12)顯示取樣點成比例的減少，以達到用較快速度播放的效果。音訊 (14)顯示取樣點成比例的增加’以產生用較慢速度播放的效果。第二種方法是取樣點維持不變’但是提高或降低播放速度。這種方法與用較低或較高速度錄音帶的原理類似。但是上述兩種方法都會造成改變播放時間的缺點。爲了解決這個問題，我們可以利用複製/棄置的方法’先將連續的音訊分成幾個區段。這些區段稱之爲音區。如果因爲振幅減小而導致較長的音區’將多餘的音訊棄置。反之，如果因爲振幅加大’音區變得較短’則不足的音訊可以用其他音區的區段來塡補。利用這種技術’每一個音區的長度維持不變。下面進一步說明上述用其他的音區來塡補不足的音訊的方法。假設一個音區的播放時間爲Μ毫秒，如果頻率提高X倍，音調上昇，播放時間縮短而導致音區的輸出成爲Μ/χ毫秒。在時間亥!J度末端長度不足的音區可以取原音訊音區的一段來塡補，長度在Μ/χ到M/x f Μ間，將這一段接合在長度不足的音區的末端。每一個音區必須再加上一小段的音訊區域（20)，以便作交叉衰落。如圖2 所示，在取樣點數成比例地減少或取樣頻率增加之後，一個音區的一段音訊輸入 (16)長度縮減如（⑻。由音區（18)的末端開始（不包括交叉衰落的部份 (20))，此段音區與原音訊作比對，如圖2的（22)所示。對其餘的音訊重複施行這個步驟。反之，如果音訊的音調降低而導致頻率降低X倍，則總播放時間增加成爲 χΜ毫秒，如圖3所示。類似於上述的情況，在音訊播放的最後，由原音訊相對的位置取一段長度在xM到xM + Μ毫秒間的音區，接到聲音輸出的最後。交叉衰落的區段也是用類似的方式接合至每一個音區的交界處。例如音區（32)爲輸入音訊的一個區段，在增加取樣點或降低取樣頻率後，音區（32)的長度增加如 (34)。然後拿音區（34)的尾端（不包括交叉衰落的部份（36))與圖3裡原音訊的音區（38)比對。這個步驟重複施行，直到完成全部作業。本紙張尺度適用中國國家標準（CNS ) A4規格（210 X 297公釐） n - I - - I - : -- -I. ----- *i- ^^1 1 -11 ^^1 nn 1^1 ，v5 (請先閱讀背面之注意事項再填寫本頁) 305929 A7 B7 ____ 五、發明説明（心） (請先閲讀背面之注意事項再填寫本頁) 利用本發明所修改的音訊，其音階變化的程度視音區大小及交叉衰落區域的大小而定。一般來說，音調調整至愈高，音區及交叉衰落區域應該愈小，這樣就可以避免可測知的回音。我們同時也發現，交叉衰落區域愈長’所產生的噪音愈小。但是如果交叉衰落區域太長，音質則會變差。即使我們可以利用交叉衰落來接合音區，使轉換平順’但由於音區相對位置的不同’仍然可能產生噪音。藉著找出這個音區與另外一個音區最相似的部份，以便接合兩音區而不致產生明顯的噪音，本發明可以作進一步的改善。圖4所示爲一個找尋這種位置的方法。譬如，拿音區（40)尾端的小音區（42)與第二個音區（46)前端的區段（44)相比較。音區（42)顯示交叉衰落區域的大小，這個區域比音區（46)的區段（44) 小，因此須要由音區（46)找出與音區（42)聲音結構類似的音區，以便接合音區（46)與音區（40)。我們提出一個數學方法，用來找尋音區的最相似接合區域。這個方法是要計算接合的微分平均絕對誤差，所須要的計算步驟最少’因此能產生高效率的接合。計算公式如下： DMAE = Σ|χ1(ηι)-Χ2(ίη+τ) l + lxi(nH-l)-x1(ra)-X2(m+l+x)-X2(m+T) I m =Σ|χ1(πι)-Χ2(πΗ-τ) l + lx1(m+l)-x1(m+l+^)-[xi(m)4-X2(m+T)] I m 經濟部中央揉準局員工消费合作社印製其中DMAE爲接合的微分平均絕對誤差，m爲介於0與cs之間的點數合，而 cs爲交叉衰落區域的大小；0 < τ S sr ’其中sr爲搜尋區域；指的是修改過的音區，x2指的是尙未修改的音區。m的點數愈大，聲音的品質愈好。DMAE 所在位置爲最佳的接合點。由於_的計算只用到加法與減法，不須乘法，因此十分簡單。利用DMAE來找出最佳的接合位置時，音區內所有的樣品都要加以計算。我們發現由於音訊有其規則性，任何相鄰的兩點間的差別很小，因此可以由每兩點間取一點用在次取樣方法裡的計算。利用次取樣的方法，總計算次數減少一半，本紙浪尺度逋用中國國家標準（CNS ) A4规格（210X 297公釐）經濟部中央樣準局员工消費合作社印製 A7 B7 五、發明説明（7 ) 而計算的準確度受到的影響並不明顯。表1所列爲利用DMAE及DMAE/次取樣兩種方法對一男性聲音'小提琴聲音、及電子音樂所計算出來的訊雜比。訊雜比 DMAE DMAE及次取樣男性聲音 26.25415 26.20773 小提琴聲音 31.56789 31.14602 電子音樂 19.85814 19.737 表1 由表1可看出，用次取樣或不用次取樣的方法對不同音訊的訊雜比沒有很大的影響。在實際的聆聽測試裡，正常人的耳朵無法測知這兩者的差別。只要準確度的誤差在容許的範圍之內，我們也可以由每三點或每四點間取一個樣品點，以更進一步減少計算次數。本發明的另一個實施例是利用動作預'測法，動作預測法通常是用在移動影像的處理上。進一步地將動作預測法合倂入DME的計算，找出DMAE位置所須要的總計算次數可以大幅地減少。換言之，找尋最佳接合位置的方法可以由二維降低爲一維的二分搜尋法。爲改善這種搜尋結果的準確度，先將搜尋區域分成許多個區段，然後再決定每一個區段的MAE値。各個DMAE値間再作比較，並選定最小値爲最佳接合位置。這個修改過的方法稱之爲區塊二分搜尋法，顯示在圖 5。將一個標示爲（52)的音區等分成四份，其中小音區（54)、（56)、及（58) 分別代表1/4 ' 2/4、及3/4的區域。先計算出這幾個區域的DMAE値，然後決定（58)爲最相稱的位置。一個相對的小音區（60)則被定爲中心位置，然後找出與（60)前方1/8處的小音區（62)及（60)後方1/8處的小音區（64) 最相稱的位置。如圖5所示，在5/8處的小音區（62)被決定爲最相稱的位置。不斷地應用這個方法，直到這三個相鄰的小音區彼此間只差一個點，並決定最相稱的位置（66)爲這兩個音區的最佳接合位置。 · 假設上述的搜尋區域分成η個區段，找出每一個最匹配點所須的計算爲 η · [3 + 2 · (log. MS/n-2)) 表紙張尺度適用中國國家標準（CNS ) A4規格（210X297公釐） n n n n IT .^1 n n n i T 0¾ 、-° (請先閲讀背面之注意事項再填寫本頁) A7 A7 用4 忮及術對著5 章作陶 B7 7、發明説明（& ) 其中η爲區段個數，MS爲搜尋區域的長度。例如 n = 4 ， MS = 10 ms * 22.05 kHz = 220.5 如果應用區塊二分搜尋法，所須的總計算次數可以減少至42 ,也就是原計 _：次數的20%。如果同時也採用次取樣的方法’總計算次數可以再減少至1/2，也就是原計算次數的。區塊二分搜尋法的效率如表2所示。由表2可看出’用區塊二分搜尋法或不用區塊二分搜尋法，三種不同音訊的訊雜比僅有很小的差別。正常人的法測知這些差別。雜比 DMAE DMAE及區塊二分搜尋法 DMAE、區塊二分搜難及次取樣男性聲音 26.25415 25.66386 25.32933 小提琴聲音 31.56789 31.11732 31.06021 電子音樂 19.85814 19.60205 19.76816 表2 因此本發明使得取樣點的改變可以藉由改變聲音的播放速度來達成。由上面顯示，修改過的聲音，雖然播放時間減少或增加，仍然可以用同樣的播放播放而不致改變音調。如果某一音訊的計算使頻率增加，那麼音訊的資料量 $會增加。用同樣的速度來播放，總播放時間會增加但振幅不變。反之，如果計箅使振幅縮小，音訊的資料量就會減少，使得播放時間變得較短但振幅不變。因此，利用本發明，音訊可以用較快或較慢的速度播放但音調不變。音訊通常是類比電子訊號的形式。但是處理這些訊號必須使用數位處理方法。這些數位訊號在處理後再轉換回類比訊號輸出。圖δ爲合倂改變音調的音_ 訊處理方塊圖。首先一個麥克風將聲音轉換爲類比訊號χ(τ)以便處理°輸入放大器σ〇)將這個類比訊號Χ(τ)放大’以加強訊號。、放大後的訊號通過一個低通濾波器（72),以消除雜訊。過濾後的訊號再送到一個類比/數位轉換器本紙張尺度適用中國國家操準（CNS ) Α4規格（210Χ 297公釐） m - - - ----- —LI — ms I - . 111 I-1. I I ^^1--- (請先閱讀背面之注意Ϋ項再填寫本頁) 經濟部中夾梂隼局男工消费合作社印装 305929 A7 B7 五、發明説明（？）耸專工用利f 技校及術 1出來的數位訊號是訊 (74)，轉換類比訊號成數位訊號。這個時後號，PCM訊號再送到一個改變音調處理器（ mg P進一步的處理。處理過後的訊號再送到一個數位/類比轉換器（78)，以便將訊號轉回類比訊號。這些類比訊號再被送到另外一個個低通濾波器（80)，然後再送到一個輸出放大器（82) ’以便將已改變音調的音訊透過哺抑\，輸出人耳可聞的聲音。圖7爲改變音調處理器的架構圖。聲音資料由p[ (9G)送到一個輸入緩衝器（92)。交叉衰落的資料（94)儲存在前一個須要作交叉衰落的音區尾端。DPU (96)是用來計算MAE及交叉衰落値。音訊在處理後送到一個輸出緩衝器（98) 及PO (100)，以便輸出。以上對本發明的說明中使用許多術語，其目的在描述本發明，而非模仿其他發明。上述說明僅爲範例’而非發明範圍。揭露本發明之較佳實施例，目的在協助本行技藝之人實施本發明。任何變化或修改’如不悖離本發明之範疇及精神，仍屬本發明之申請專利範圍。 ------„-----t.------IT (請先閣讀背面之注意事項再填寫本頁) 經濟部中央標準局貝工消費合作社印製本紙張尺度適用中國國家操芈（CNS ) A4規格（210X297公釐） .....| 0 ——Ha n ^ i 1 ^ 1 HI i ml mnt— n ^ imi— \ < (please read the precautions on the back before filling this page) DMAE is the difference between ητ points, η is an integer, and fine calculation allows Accurate range of stomach • First divide the search area into several sections, further define the DMAE value of each section, and then take the section with the smallest DME value as the best joint position. · The number of calculations required to find the section with the smallest DME value is n [3 + 2 (log2 MS / (n-2)) 3 where η is the number of sections and MS is the length of the search area. · The paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) 4 " A7 __B7 5. Description of the invention (4) The present invention provides a device for changing audio parameters. This device includes an input amplifier, an output amplifier, first and second low-pass furnaces, an analog / digital converter, a digital / analog converter, and a change-over processor. The input amplifier is connected in series with the first low-pass furnace and analog / digital converter, and provides input to the tone change processor. The digital-to-analog converter is connected in series with the second low-pass filter and output amplifier at the output of the tone-changing processor. The tone change processor contains an input device, which is connected to an input buffer, an output device, which is connected to an output buffer, and a cross-fading data memory, which is used to store cross-fading Fading audio part, an address device, this address device and input buffer, output buffer, cross-fading data memory, a temporary data device, a digital processing device to calculate the average absolute error and cross-fading value , And a control device. The input buffer, cross-fading data memory, digital processing device's control device, and output buffer are connected to each other's operations through a bus system. BRIEF DESCRIPTION OF THE DRAWINGS The purpose, characteristics, and advantages of the present invention will be apparent in consideration of the following description and drawings. The drawings include: Figure 1 illustrates the case of playing audio at the same speed with increasing sampling points and decreasing sampling points. Fig. 2 illustrates the method of joining the sound regions of the present invention to raise the scale. FIG. 3 illustrates the method of joining the sound regions for reducing the scale of the present invention. FIG. 4 illustrates the range and search method for finding the best joint position of the sound zone. Printed by the Ministry of Economic Affairs, Central Bureau of Standards, Pui Kong Consumer Cooperatives m-I- I---In i I HI Hi----1 I-! Τ »W, ve (please read the precautions on the back before filling this page Fig. 5 illustrates the binary search method of the present invention for finding the optimal joint position. Fig. 6 is a block diagram of the present invention. 7 is a block diagram of the tone change processing device of FIG. 6. Detailed description The present invention provides a method for changing the pitch and playback speed of audio, but without the shortcomings of previous skills. "This paper scale is applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm)-Employee Consumer Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs Printed A7 B7___ 5. Description of the invention (Heng) The easiest way to change the pitch of an audio is to produce an effect like a tape playing at a faster or slower speed. There are two different ways to produce this effect. First, if the playback speed remains fixed, the sampling point can be increased or decreased proportionally, as shown in Figure 1. The mark (ι〇) is the original audio, and the audio (12) shows that the sampling points are reduced proportionally to achieve the effect of faster playback. Audio (14) shows a proportional increase in sampling points ’to produce a slower playback speed. The second method is to keep the sampling point unchanged but increase or decrease the playback speed. This method is similar to the principle of using lower or higher speed tapes. But the above two methods will cause the disadvantage of changing the playing time. To solve this problem, we can use the copy / discard method to first divide the continuous audio into several sections. These zones are called sound zones. If the longer sound zone due to the reduced amplitude ’discards excess audio. Conversely, if the "sound range becomes shorter" because of the increase in amplitude, the insufficient sound can be compensated by using sections of other sound ranges. With this technique, the length of each tone zone remains unchanged. The following further describes the above method of using other sound regions to compensate for insufficient audio. Suppose that the playback time of a sound zone is Μ milliseconds. If the frequency is increased by X times, the tone rises and the play time is shortened, resulting in the output of the sound zone being Μ / χ ms. At the end of time, the sound zone with insufficient length at the end of J degrees can be complemented by a section of the original audio zone, with a length between Μ / χ to M / x f Μ, and this section is joined at the end of the sound zone with insufficient length. Each audio zone must be added with a small audio zone (20) in order to make cross-fading. As shown in Figure 2, after the number of sampling points is reduced proportionally or the sampling frequency is increased, the length of a section of audio input (16) of a sound zone is reduced as (⑻. Starting from the end of the sound zone (18) (excluding cross-fading Part (20)), this section of sound is compared with the original audio, as shown in (22) of Figure 2. Repeat this step for the rest of the audio. Conversely, if the tone of the audio is reduced and the frequency is reduced by X Times, the total playback time increases to χΜ milliseconds, as shown in Figure 3. Similar to the above situation, at the end of the audio playback, a length of the audio zone between xM and xM + Μ milliseconds is taken from the relative position of the original audio. It is connected to the end of the sound output. The cross-fading section is also joined to the junction of each sound zone in a similar manner. For example, the sound zone (32) is a section of the input audio, after increasing the sampling point or decreasing the sampling frequency , The length of the sound zone (32) increases as (34). Then compare the end of the sound zone (34) (excluding the cross-fading part (36)) with the original sound zone (38) in Figure 3 This step is repeated until all operations are completed. The paper size applies to the Chinese National Standard (CNS) A4 (210 X 297 mm) n-I--I-:--I. ----- * i- ^^ 1 1 -11 ^^ 1 nn 1 ^ 1, v5 (please read the precautions on the back before filling this page) 305929 A7 B7 ____ 5. Description of the invention (heart) (please read the precautions on the back before filling in this page) Use the audio modified by the present invention, which The degree of scale change depends on the size of the sound zone and the size of the cross-fading area. Generally speaking, the higher the pitch is adjusted, the smaller the sound zone and the cross-fading area should be, so as to avoid measurable echo. We also It was found that the longer the cross-fading area ', the smaller the noise. But if the cross-fading area is too long, the sound quality will deteriorate. Even if we can use the cross-fading to join the sound regions, the conversion is smooth', but due to the relative position of the sound regions The difference may still produce noise. By finding the most similar part of this sound zone and another sound zone in order to join the two sound zones without generating obvious noise, the invention can be further improved. Figure 4 shows Find this bit for a For example, compare the small sound area (42) at the end of the sound area (40) with the section (44) in front of the second sound area (46). The sound area (42) shows the size of the cross-fading area, This area is smaller than the section (44) of the sound area (46), so the sound area (46) needs to find a sound area similar to the sound structure of the sound area (42) in order to join the sound area (46) and the sound area ( 40). We propose a mathematical method to find the most similar junction area of the sound zone. This method is to calculate the differential average absolute error of the junction, which requires the fewest calculation steps' and therefore can produce a highly efficient junction. The calculation formula is as follows : DMAE = Σ | χ1 (ηι) -Χ2 (ίη + τ) l + lxi (nH-l) -x1 (ra) -X2 (m + l + x) -X2 (m + T) I m = Σ | χ1 (πι) -Χ2 (πΗ-τ) l + lx1 (m + l) -x1 (m + l + ^)-[xi (m) 4-X2 (m + T)] I m Central Ministry of Economic Affairs Printed by the employee consumer cooperative where DMAE is the differential average absolute error of the joint, m is the number of points between 0 and cs, and cs is the size of the cross-fading area; 0 < τ S sr 'where sr is the search area ; Refers to the modified sound zone, x2 refers to the unmodified sound . The larger the m point, the better the sound quality. DMAE is at the best junction. Because the calculation of _ only uses addition and subtraction, no multiplication is needed, so it is very simple. When using DMAE to find the best joint position, all samples in the sound area must be calculated. We found that due to the regularity of audio, the difference between any two adjacent points is very small, so we can take a point between every two points and use it in the calculation of the subsampling method. Using sub-sampling method, the total number of calculations is reduced by half. This paper uses the Chinese National Standard (CNS) A4 specification (210X 297 mm). The A7 B7 is printed by the Employee Consumer Cooperative of the Central Bureau of Samples of the Ministry of Economic Affairs. ) The accuracy of the calculation is not significantly affected. Table 1 lists the signal-to-noise ratios calculated using DMAE and DMAE / subsampling for a male voice 'violin sound, and electronic music. Signal-to-noise ratio DMAE DMAE and sub-sampling male voice 26.25415 26.20773 violin sound 31.56789 31.14602 electronic music 19.85814 19.737 Table 1 It can be seen from Table 1 that the sub-sampling method with or without sub-sampling has no great effect on the signal-to-noise ratio of different audio . In the actual listening test, normal human ears cannot detect the difference between the two. As long as the accuracy error is within the allowable range, we can also take a sample point every three or four points to further reduce the number of calculations. Another embodiment of the present invention uses motion prediction methods. Motion prediction methods are generally used in the processing of moving images. Further incorporating the motion prediction method into the DME calculation, the total number of calculations required to find the location of the DMAE can be greatly reduced. In other words, the method of finding the optimal joint position can be reduced from two-dimensional to one-dimensional binary search. In order to improve the accuracy of this search result, the search area is divided into many sections first, and then the MAE value of each section is determined. Compare each DMAE value, and select the smallest value as the best joint position. This modified method is called the block binary search method and is shown in Figure 5. Divide a sound zone marked (52) into four equal parts, with the small sound zones (54), (56), and (58) representing 1/4 '2/4, and 3/4 areas, respectively. First calculate the DMAE values of these areas, and then decide (58) as the most commensurate position. A relative small tone zone (60) is set as the center position, and then find the small tone zone (62) 1/8 place in front of (60) and the small tone zone (64) 1/8 place behind (60) ) The most suitable position. As shown in Fig. 5, the small tone area (62) at 5/8 is determined to be the most commensurate position. Continue to apply this method until the three adjacent small sound zones differ by only one point from each other, and determine the most commensurate position (66) as the best joint position of the two sound zones. · Assuming that the above search area is divided into η segments, the calculation required to find each best match point is η · [3 + 2 · (log. MS / n-2)) The table paper scale applies the Chinese National Standard (CNS ) A4 specification (210X297 mm) nnnn IT. ^ 1 nnni T 0¾,-° (please read the precautions on the back before filling in this page) A7 A7 use 4 刮和术 against 5 chapters to make pottery B7 7, invention description (&;) Where η is the number of sections and MS is the length of the search area. For example, n = 4, MS = 10 ms * 22.05 kHz = 220.5 If the block binary search method is applied, the total number of calculations required can be reduced to 42, which is 20% of the original calculation _: times. If the sub-sampling method is also used at the same time, the total number of calculations can be reduced to 1/2 again, which is the original number of calculations. The efficiency of the block binary search method is shown in Table 2. It can be seen from Table 2 that with the block binary search method or without the block binary search method, there is only a small difference in the signal-to-noise ratio of the three different audios. Normal people know these differences. Miscellaneous ratio DMAE DMAE and block binary search method DMAE, block binary search difficulty and sub-sampling male voice 26.25415 25.66386 25.32933 violin sound 31.56789 31.11732 31.06021 electronic music 19.85814 19.60205 19.76816 Table 2 Therefore, the present invention allows the sampling point to be changed by changing the sound To achieve the playback speed. As shown above, the modified sound, although the play time is reduced or increased, can still be played with the same play without changing the pitch. If the frequency of an audio calculation increases the frequency, the amount of audio data $ will increase. Playing at the same speed, the total playing time will increase but the amplitude will not change. Conversely, if the amplitude of the grid is reduced, the amount of audio data will be reduced, making the playback time shorter but the amplitude unchanged. Therefore, with the present invention, audio can be played at a faster or slower speed without changing the pitch. Audio is usually in the form of analog electronic signals. But to deal with these signals must use digital processing methods. These digital signals are converted back to analog signals after processing. Figure δ is a block diagram of the audio signal processing for changing the pitch. First, a microphone converts the sound into an analog signal χ (τ) for processing. The input amplifier σ〇) amplifies this analog signal X (τ) to enhance the signal. 3. The amplified signal passes a low-pass filter (72) to eliminate noise. The filtered signal is sent to an analog / digital converter. The paper standard is applicable to China National Standards (CNS) Α4 specification (210Χ 297 mm) m-------- —LI — ms I-. 111 I- 1. II ^^ 1 --- (please read the note Ϋ on the back first and then fill in this page) Printed 305929 A7 B7 by the Men ’s Consumers Cooperative of the Falcon Bureau of the Ministry of Economic Affairs V. Invention description (?) For professional use The digital signal from the technical school and technique 1 is signal (74), which converts the analog signal into a digital signal. At this time, the PCM signal is sent to a tone-changing processor (mg P for further processing. The processed signal is sent to a digital-to-analog converter (78) to convert the signal back to an analog signal. These analog signals are then It is sent to another low-pass filter (80), and then to an output amplifier (82) to transmit the audio of the changed tone through the sound output, and output the sound that can be heard by the human ear. Figure 7 shows the tone change processor Diagram of the structure. The sound data is sent from p [(9G) to an input buffer (92). The cross-fading data (94) is stored at the end of the previous sound zone that requires cross-fading. The DPU (96) is used to Calculate the MAE and cross-fading values. After processing, the audio is sent to an output buffer (98) and PO (100) for output. The above description of the present invention uses many terms, its purpose is to describe the present invention, not to imitate other Invention. The above description is only an example and not the scope of the invention. The preferred embodiments of the present invention are disclosed to help those skilled in the art to implement the present invention. Any changes or modifications do not depart from the present invention The scope and spirit of the invention still belong to the scope of the patent application of the present invention. ------ „----- t .------ IT (Please read the precautions on the back before filling this page) Economy The size of the paper printed by the Beigong Consumer Cooperative of the Central Standards Bureau of the Ministry of Standards is applicable to the Chinese National Operations Standard (CNS) A4 (210X297 mm) ..... | 0 ——

Claims

Λ8 Β8 C8 D8 305929 VI. Scope of patent application Scope of patent application 1. A method for modifying audio parameters, including the following steps: (a) Converting an analog signal into a digital signal; (b) Dividing the above-mentioned digital signal into several sounds Zone; (c) modify the tone and playback speed of a zone in the above digital signal; (d) join the above-mentioned modified zone with a very unmodified zone ”the above-mentioned unmodified zone and the above The ends of the modified sound regions overlap so that the parts similar to the sound structure of the above end region are cross-faded. The above sound structure is similarly defined by a formula that requires a minimum calculation step of differential average absolute error to determine DMAE = I | x1 (m) -x2 (m + t) | + | x1 (m + l) -x1 (m) -X2 (ni + l + T:)-x2 (in + x) I Specialist-m ^ \ (e) Repeat steps for the above unmodified sound area and other unmodified sound areas, and fortunately M (d) to get a modified digital signal; and (f) Convert the above modification The digital signal passed back to the analog format. 2. For example, the method of modifying the audio parameters of the first patent application, in which if the above-mentioned modification causes the sound zone to become longer, the unnecessary unmodified sound zone is discarded to keep the playback time unchanged. 3. For example, the method of modifying the audio parameters of the first patent application, where if the above-mentioned modification causes the sound zone to become shorter, the insufficient sound zone is taken from the original digital signal to maintain the playback time unchanged. 4. The method of modifying audio parameters as claimed in item 1 of the patent scope, wherein the above-mentioned MAE values differ from each other by ητ points, η is an integer, and depends on the accurate range allowed by the calculation. The size of this paper is in accordance with Chinese National Standard (£: Yang) 8.4 specifications (210 > < 297 g *) I nn n'-1 I.---HI ί HI I (please read the notes on the back before you fill in Benyu) = I | xj (di) -x2 (id + x) I + | χ1 (πι + 1) -Χι (πι + 1 + τ)-(χ1 (ιη) + Χ2 (ιη + τ)) I m where DMAE is the differential average absolute error of the joint 'm is a point between 0 and cs | # 工. Number combination, and cs is the size of the cross-fading area; 〇 < τ s sr, its mouth r sut is Search area; Xi refers to the modified sound area, x2 refers to the unmodified sound. 12 A8 B8 C8 D8 printed by the Industrial and Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs • Patent scope 5. If the patent scope is applied Item 1 The method of modifying audio parameters, in which the above-mentioned search area is divided into several sections, the above-mentioned DMAE value of each above-mentioned section is further defined, the above-mentioned DMAE values are compared with each other, and a region with the smallest DMAE-value is selected As the best joint position. 6. For example, the method of modifying the audio parameters of item 5 of the patent application, which requires the calculation of the section with the smallest MAE value mentioned above The number of calculations is n [3 + 2 (log2 MS / n-2)] η is the number of segments, and MS is the length of the above search area. 7. A method of modifying audio parameters, including the following steps: (a) Convert an analog signal into a digital signal; (b) Divide the above digital signal into several sound zones; (c) Modify the playing time of a sound zone; (d) Combine the above-mentioned modified sound zone with a single unmodified one The sound area, the above unmodified sound area overlaps with the end of the above modified sound area, so that the part similar to the sound structure of the above end area is cross-faded. The above sound structure is similar by defining a minimum calculation The formula of the differential average absolute error of the steps determines DMAE = Σ | X | (m) -X2 (m + x) I +1 Xj (m + 1) -Xj (m) -X2 (ni + 1 + T)- X2 ^ in + X ^ I m = Σ | Χ | (ιη) -Χ2 (ιη + τ) | + | xj (m + l) -x1 (m + l ^)-(xj (m) + X2 (m + 'c)] | m where DMAE is the differential average absolute error of the joint, m is the point between 〇 and cs nmsn * I.-II--------im I --- Ding, va (Please read the precautions on the back before filling out this page) Industrial and consumer cooperatives print numbers, and cs is the size of the cross-fading area; 0 < τ < sr, where s〆 domain; X! Refers to the modified sound area, X2 refers to the unmodified sound Zone; Qi and Shu (e) Repeat steps (CO to obtain a modified digital signal for the above unmodified sound zone and other unmodified sound zones; and Μ {Ο Convert the above modified Digital signal back to analog format. [The paper size of the search area table uses the Chinese National Standard (CNS) A4 specification (210X297 mm) —η — ABCD 3u5S29 6. Scope of patent application (please read the precautions on the back and fill in this page) 8. If you apply for a patent model The method of modifying the audio parameters in item 7, wherein if the audio processing increases the sampling point of the audio, the step of modifying the playback time includes increasing the playback time to maintain the playback speed and sampling point of the audio. 9. For example, the method of modifying the audio parameters of the seventh patent application, in which if the audio processing will reduce the sampling point of the audio, the step of modifying the playback time includes reducing the playback time to maintain the playback speed of the audio and The sampling point is unchanged. 10. The method for modifying audio parameters as claimed in item 7 of the patent scope, wherein the above-mentioned DMAE values differ from each other by ητ points, η is an integer, and depends on the accurate range allowed by the calculation. 11. The method of modifying the audio parameters according to item 7 of the patent application, in which the above-mentioned search area is divided into several sections, the above-mentioned DMAE value of each of the above-mentioned sections is further defined, and the above-mentioned DME values are compared with each other and selected A section with the smallest DMAE value serves as the best joint position. 12. For example, the method of modifying the audio parameters of item 11 of the patent application scope, in which to find the above-mentioned DME minimum segment, the required calculation times is nC3 + 2 (l〇g2 MS / n-2)] η is The number of sectors, MS is the length of the above search area. Printed by the Beigong Consumer Cooperative of the Central Bureau of Economic Affairs of the Ministry of Economic Affairs 13. A device for modifying audio parameters, including an input amplifier, an output amplifier, second low-pass filter, an analog / digital converter, a digital / Analog converter, and a tone-changing processor; the above-mentioned input amplifier is connected in series with the above-mentioned first low-pass dripper and the above-mentioned analog / digital converter at the input end of the above-mentioned tone-changing processor, The digital-to-analog converter and the above-mentioned second low-pass filter and the above-mentioned output amplifier are connected in series at the output of the above-mentioned tone changing processor. 14. The device for modifying audio parameters as claimed in item 13 of the patent scope, wherein the tone change processor described above includes an input device connected to an input buffer, an output device connected to an output buffer, and a storage Cross-fading data memory of the cross-fading audio part, an address device connected to the above-mentioned input buffer, the above-mentioned output buffer, and the above-mentioned cross-fading data memory, and a temporary data storage device A digital processing device that calculates the average absolute error and cross-fading value 'and a control device; the above input buffer (210X297 mm) — (3 — A8 B8 C8 D8 VI. Patent application scope and the above cross-fading data The memory, the temporary data storage device, the digital processing device, the control device, and the output buffer are connected to each other through a bus system. 1 ^ 1 1 ^ 1--I ----- II ---, taxi ^ I---II ----------I *, words (please read the precautions on the back before filling this page) Printed by the Consumer Labor Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs. This paper scale is applicable to the Yin Guo National Standard (CNS) Α4 specification (210X297 mm)-[4- —Τ-