TW200407027A - Advanced technique for enhancing delivered sound - Google Patents

Advanced technique for enhancing delivered sound Download PDF

Info

Publication number
TW200407027A
TW200407027A TW92115246A TW92115246A TW200407027A TW 200407027 A TW200407027 A TW 200407027A TW 92115246 A TW92115246 A TW 92115246A TW 92115246 A TW92115246 A TW 92115246A TW 200407027 A TW200407027 A TW 200407027A
Authority
TW
Taiwan
Prior art keywords
sound
signal
sound signal
processing
environment
Prior art date
Application number
TW92115246A
Other languages
Chinese (zh)
Other versions
TWI318531B (en
Inventor
Thomas Paddock
James Barber
Original Assignee
Sonic Focus Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sonic Focus Inc filed Critical Sonic Focus Inc
Publication of TW200407027A publication Critical patent/TW200407027A/en
Application granted granted Critical
Publication of TWI318531B publication Critical patent/TWI318531B/en

Links

Abstract

Techniques and systems for enhancing delivered audio signals are disclosed which may be employed in a delivery system at a server side (212), a client side (252), or both. The techniques include forming a processed audio signal by processing audio signals through multiple pathways (510, 520, 540, 560) which operate on different frequency bands using dynamic processing and other elements, and thereafter providing recording or listening environment enhancements (590, 592) and other sound enhancements (593, 594, 595) to the processed audio signal. Also disclosed are techniques and systems for implementing the multi-pathway processing and environmental and sound enhancements.

Description

200407027 玖、發明說明: 【發明所屬之技術領域】 本發明相關於用以增強傳送聲音信號之改進處理技術, 更特別地係相關於用以增強在有限頻寬連結上傳送音樂之 處理技術。 【先前技術】 網際網路的快速流行已經造成本身急速發展更新、更有 效率用以使用其通信技術的方法,遠超過只有文字為主的 應用。兩種獲得關注之新應用係聲音及視訊廣播。這兩種 應用具有共同的問題:當與該網際網路之連接受限於頻寬 時,它們的功用遭受損害。因為其對於頻寬更多的要求, 視訊廣播對於大多數使用有限頻寬連結之網際網路使用者 (即用戶)係特別有問題。 在網際網路上傳送像是音樂的聲音的常見方法係為將該 等聲音檔案,,下載,,到該好的電腦。數位數音檔案也很常 見於被複製及壓縮成MPEG3或其他格式到光碟片、個人撥 放器或電腦硬碟,相較於串流式聲音,傾聽這些播案係在 更為合適或是輕便的傾聽環境了。 網際網路傳送聲音之另一種形式係串流逝聲音。"串流式 係指邊下載邊收聽。通常地,相對於該好端與網際網辟 的連結,該伺服器具有非當合 ,非㊉同的頻寬連結,在對音樂使用 串流式聲音時,網際網路主機 峪王機站(即词服器,,)能夠透過網際 網路的連結,提供即時I ^ 〒曰樂成曰會、DJ選擇音樂或存檔, 樂給該傾聽終端使用者(即” 用戶麵)。但疋由於用戶端典变 85801 200407027 具有有限頻寬連結,_流式或下載(已壓縮)音樂完全稱不上 是理想的傾聽經驗,特別是習慣CD品質音樂的用戶。 該傾聽經驗之劣化可以追溯到兩個主要來源··為了補償 有限頻寬傳輸要求或降低儲存所需之檔案大小的壓縮信號 所做的妥協,及該用戶端不良傾聽環境。關於後者,網際 網路下載中或已下載音樂通常係藉由該用戶端電腦所附屬 的喇只來傾聽,而通常地,很有有人注意要提供該電腦所 在的地方良好傾聽環境。在近來致力於針對該有限聲道頻 寬問題做出改善的同時,該不良傾聽環境的問題尚未獲得 滿意的解決。因此,提供增強該環境之技術解決方案係有 利的’以讓用戶端在該環境中能夠接收及傾聽經由有限頻 寬連結所接收之聲音信號。再者,提供一種系統能夠補償 由於將聲音檔案壓縮到較小的檔案内所導致之扭曲,這也 是有利的。 【發明内容】 本發明揭示一種改良聲音信號處理方法及系統。該揭示 方法/系統係用以增強要被壓縮及/或已經被壓縮之聲音信 唬的品質。該系統使用一陣列的可調式數位信號處理器, Μ等處理器對該聲音信號饋送執行各種功能。根據一種實 施例’該方法/系統能夠在聲音訊號被壓縮到更小格式之 前,先將該信號,,剥除”(Γιρ)。如同在本發明之先前技術中 斤描it該聲音#號之壓縮係必須的,以便經由有限頻寬 網路連結傳送該信號。為了將聲音信號之複本儲存在諸如 軟碟片、光碟片、快閃記憶體及磁性驅動器之有限儲存空 85801 200407027 間的媒體,壓縮也是必須。該方法/系統之其他實施例係用 以增強解壓縮後之聲音信號。例如,該方法/系統與以用戶 4為主的串流式媒體接收器一起使用,以增強經過一串流 式接收器解壓縮後之聲音信號。根據其他範例,該方法及 系統增強從有限儲存媒體所讀取及解壓縮之該聲音信號。 在一較佳實施例中,該揭示方法/系統使用在該聲音串流之 壓縮端及解壓縮端。然而,本發明也涵蓋該揭示方法/系統 月匕夠專門使用在該聲音串流之壓縮端或解壓縮端之任一 端。 該方法/系統之向上串流(upstream)(即壓縮端)實施例應 用係為一 ”剥除"程式,該程式係以比即時還要更快的速度 來處理該聲音信號。該"剥除”程式對於壓縮電子聲音檔案而 儲存在儲存裝置上之前增強該檔案係有用。因為該"剝除" 程式係以比即時更快的速度來操作,所以可以大大地減少 壓縮該檔案所要求的時間。該方法/系統之向下串流 (downstream)(即解壓縮端)實施例.該向下串流實施例可二 用以當該聲音信號從該儲存媒體讀取及解壓縮時,增強該 信號。該向下串流實施例也可以用以當—串流$音信號被 -接收器所接收時’增強該信號。因為該揭示方法/系統係 以比即時更快速度來操作’所以能夠以最小化時間延遲效 應,有效增強該解壓縮聲音信號。 【實施方式】 本發明揭示-種用以增強經由有限頻寬傳輸系統傳送到 使用者’或是來自壓縮數位檔案的聲音。更特別地,所揭 85801 -8 · 200407027 示的係為I音檔案之用戶端❹的技%,該等檔案可以經 由網際網路或其他方法以串流或下載方式傳送到用戶裝 置,像是CD、可攜式播放器、機頂盒及類似的裝置,及能 夠在具有有限保真度(fidelity)之以電腦為主的聲音系統上 及在具有週遭雜訊或其他不良聲波特性的環境巾播放。同 樣地也揭示以比即時更快速來壓縮聲音信號之技術,使得 該聲音信號能夠在有限頻寬連結上廣播。其他實施例包含 以用戶為王的應用,其中聲音信號在經過解壓縮之後係被 增強,像是串流式媒體接收器或電子聲音檔案播放器(即 MPEG 3播放器)。因此,該揭示方法/系統能夠使用在該等 下述應用中: • 一伺服器端”剝除器,,,以比即時更快的速度操作; • 一不需事先剝除聲音檔案之用戶端增強裝置; 廣播伺服器’在此處聲音信號係即時增強,· • 一伺服器端1剝除器”,在此處壓縮檔案稍後係在該 用戶端解碼,用以進一步增強品質及清晰度;及 • 一用戶—伺服器配置,在此處該聲音信號在壓縮之 前先於該伺服器端增強,然後在解壓縮之後於該用 戶端再進一步增強。 圖1係為根據一較佳實施例,一說明用以增強聲音資料之 改進技術的流程圖。在步驟102,聲音資料係以數位格式化 信號來編碼。在該處,該數位信號也可以壓縮以待後續傳 輸。在步騾104,一旦該編碼聲音信號變成數位格式,則可 以使用各種加強在後續傳輸期間預期會損失或損毁之頻率 85801 -9- 200407027 及動態的處理技術來增強。之後,在步驟丨〇6,該增強聲音 信號係經由一連結傳輸到一像是網際網路之類的網路,該 連結係只有低等或中等頻寬。抵達用戶端之後,在步騾 1 08,该傳輸聲音信號係進行解碼(如果需要的話,也可以 進行解壓縮)。最後,在步騾11 〇,該目前解碼聲音信號係 受到進一步增強處理,以恢復在傳輸期間預期會損失或損 壞的頻率及動態。 圖2A根據一較佳、實施例,說明發生在網路之伺服器端(即 該主機站)的增強處理。在該主機站2〗〇,音樂係從一音樂 來源202挑選,像是例如從儲存檔案或即時饋送。一增強處 一聲音編解碼器204 理元件212係插入在該音樂來源2〇2與 間。該增強處理元件212在被該傳輸聲音編解碼器2〇4編碼 之則先增強該聲音信號。假如該串流伺服器2〇6係對具有已 知及/或類似傾聽環境之用戶廣播,則增強處理係有利的。 同樣的,當準備廣播之音樂的類型係已知或決定時,或是 總是類似類型時,這也是有利的,因為該增強處理能狗以 最利於該特定類型音樂的方式來調整。 该傳輸音樂編解碼器204透過一編碼器(即一 之另-半傳輸部分)來處理音樂,該編碼器係以針== 端網際網路連結之頻寬來調適的方式來 八术格式化及壓縮該音 樂。 一編解碼器係為一種編碼器/解碼器系統,為了在此討食 的目的’其功能係當作為-聲音資料.壓縮器(編碼器)及_ 聲音/資料解I㈣(解碼H)。-資料壓縮⑽㈣的編㈣ 85801 •10- 200407027 态同樣已知可作為一 ”音幅縮深器"(c〇mpander)。在本揭示 文件中’資料壓縮”係指任何降低資料檔案大小之過程, 而聲音等級壓縮,’係指8Track、Dolby AC3及WMA(UP3)。 在施加該傳輸聲音編解碼器2〇4之後,一串流伺服器2〇6 接著係經由與該網際網路之輸出連結214,將該資料已壓縮 且已格式化音樂的資料傳送到該指定位址。雖然該描述主 要係指音樂之串流及增強,這同樣地也能夠施加於任何聲 音或聲音/視訊的資料。再者,應注意的是該系統及技術能 夠與各種聲音轉換協定一起使用,其包含例如Real Audi〇、 MP3 及 Windows Media。 當在此使用時,,’即時”意指該傾聽用戶聽到該音樂的同 時,實質上該伺服器係正在處理在該聲音編解碼器内的音 樂。當可能存在有些由於與該等喇叭的連接所導致的延遲 要視為”即時”的時候,較佳地在該音樂來源處的音樂串流 與泫用戶正在傾聽的味J Ϊ2八間之音樂的任何片段並沒有大量 緩衝’而連續音樂片段接著係出現該等制队。下載的樓案 係全部地儲存,然後稍後播放,這些檔案較佳地係以如同 串流檔案一樣的方式來壓縮,雖然該壓縮比可能低於即時 串流所使用的壓縮比。 圖2B根據本發明,說明在網路之用戶端(即解碼器端增強) 所發生的增強處理。該類型的增強處理係有利於具有廣泛 的各種傾聽及或音樂類型的情況。透過低等或中等頻寬連 結222,該增強且已編碼信號抵達該用戶站(site)23〇。特別 地,該k號222可以&供給個人電腦244或其他合適的處理 85801 -11- 200407027 平台。在該較佳實施例中,該個人電腦244包含一數據機 242、一與該接收聲音編解碼器246與一增強處理元件252 相結合之處理器244、喇叭驅動器248,及喇叭250。就像在 該伺器站210處所提供的增強處理元件212,當一解碼信號 已經被該接收器聲音編解碼器244解碼後,該增強處理元件 252較佳地係為了該信號之增強做準備。 該用戶接收解碼器246之處理器係與該中央處理器244相 連接,用以執行大部分為該伺服器的傳輸聲音編解碼器244 之反相(inverse)。特別地,該接收編解碼器246將該資料串 流向後轉後成便於使用的音樂格式,及解壓縮該音樂以將 該音樂盡可能地儲存成該音樂在該音樂來源202處之原有 品質。該接收聲音編解碼器244的程序可以利用在該中央處 理器244上的軟體來執行,或是可以藉由使用添加音效卡, 以硬體的方式來執行。喇队驅動器248也可以在該音效卡中 發現或以軟體來實現。在典型用戶的傾聽環境中的喇队250 包含一對從劣等到中級品質的中等驅動器,及包含低音擴 聲器(woofer)及/或次音擴聲器(sub-woofer)。放置該用戶及 電腦的用戶站230係該傾聽環境之最後構成要素:這相當程 度地影響到所察覺聲音品質,因為該聲音的頻譜反應,像 是共振,及其所引起週遭雜訊。 考慮到在該傳輸聲音編解碼器204及接收聲音編解碼器 246間之連結的頻寬限制,該等編解碼器係設計用以產生一 實質上類似於該輸入信號之輸出。這些編解碼器(204、246) 的資料壓縮程序引進令人討厭的人工及扭曲。這些壓縮程 85801 -12- 200407027 序不一定需要利用下面所述的改進技術來修正。 在圖2B(及圖3)的組態中,該增強處理元件252較佳地係 為與该處理器相連接的軟體。但是對於可供選擇的實施例 也可以想像其他配置。例如,該處理係發生於一專門數位 信號處理器内,該處理器位在一連接裝置附近或是位在該 裝置之上。 圖3根據其他較佳實施例,說明在該網路之用戶端所發生 的增強處理。與圖2B中描述的實施例不同的是,圖3中所描 述的貫施例有一麥克風3〇2係包含在該用戶站3〇〇。該麥克 風302係經由耦合306連接到該增強處理元件252,以將回饋 提供給該元件。基於該回饋,該增強處理元件252能夠提供 該喇队驅動器248之額外控制。 數個改良及技術係被採用,以提供具有只使用適度或典 型功率之優秀的處理效能。一種該技術係使用一延伸位元 深度來進行該聲音處理,以在該系統中產生大動態範圍, 排除強烈輸入限制者的需要而降低截斷錯誤雜訊。 任何類型的處理(例如信號、等化、壓縮等等的混合)會 改變該原有數位資料的程度係隨著該資料的位元解析度反 向變化。為了僅供說明,該等下述技術對該資料處理的各 階段係採用64位元的聲音取樣。然而,它也涵蓋可以使用 其他取樣大小,像是8位元、16位元、32位元及128位元。 圖4係為根據本發明,一說明用以增強聲音信號之信號處 理功说之方塊圖。在圖4中,一聲音信號4〇5係提供給一人 工智慧(AI)動態壓縮器41〇。該AI動態壓縮器41〇係透過信 85801 •13- 200407027 號線路412與該AI動態解壓縮器415 一前一後的作用,以便 將該進入聲音信號405之動態範圍增強到某個要求範圍。在 這兩各處理器410、415中之偏移對該信號產生一整體動態 擴展。在利用該AI動態壓縮器41〇處理之後,該聲音信號係 利用兩個奕行放置的組件來處理:一高頻人工遮罩處理器 420 ;及一清晰處理器(中等)425。該高頻人工遮罩處理器 420包含一可碉式濾波器及一可變時間延遲電路,該處理器 對於來自該進入聲音信號之令人討厭的人工及聲音產生一 遮罩效應。該清澈處理器4 2 5也包含一具有一可變時間延遲 電路之可調式濾波器,該處理器對於在該進入聲音信號中 之令人时厭的中間頻率產生一重新校準(realignment)效 應。在經由這兩個元件處理之後,該聲音信號係利用一混 合器427來結合,然後饋送到一 3D/即時增強器430。該3D/ 即時增強器430將現場及立體音觀感添加該聲音信號之聲 域中。該3D/即時增強器430使用三維模型,以決定信號處 理所發生的程度。在該聲音信號已經被該3D/即時增強器 430處理之後,該信號係利用該記錄環境模擬器435來處 理’該模擬器將擴散(diffusion)、殘響(reverb)、深度、再 生及空間衰減(decay)添加到該聲音信號。該記錄環境模擬 器435在沒有添加共鳴(resonant)模式及節點(n〇de)到虛擬 記錄空間之下,完成這些效應。在經過該記錄環境模擬器 43 5之處理後,該聲音信號係由一語音消除器440來處理, 該消除器能夠有效地消除在該聲音信號中的語音軌(vocal track)。因為大部分語音軌道係居中且在該整體聲音信號中 85801 -14- 200407027 係相當乾澀(dry),所以可以實現該功能。當該等語音信號 移除之後,該聲音信號係由一廣闊立體音增強器445,該增 強器將較廣闊立體音觀感(perspective)添加該聲音信號之 聲域。在該處’該聲音信號係钱送到該AI動態解壓縮器 41 5,在該處信號係以人工智慧法則來處理以確保該聲音信 號之整個動態範圍係被恢復(restore)。該聲音信號經由該 AI動態擴展處理器41 5處理過之後,該信號接著係由一 AI 衰減及扭曲偵測處理器450來處理,該處理器調整該信號之 等級(即音量),直到達到最佳化增益。該AI衰減及扭曲偵 測處理器450係週適用以動態地調整該聲音信號之增益,使 得前後一致的信號等級能夠持續地傳送到該傾聽者。在該 處’该已處理聲音信號4 5 5係饋送到一驅動器或一組驅動 器,使得人們能夠聽到該信號。 圖5係為根據一較佳實施例,一說明與該有限頻寬音樂之 用戶端增強有關之信號處理功能的方塊圖。在圖5中雖然只 有說明一個處理聲道但是應了解的是也可以採用多個處理 聲道。再者,該等下述解碼及增強程序較佳地係為在處理 器上運作的軟體常式(routines),因而提及信號路徑係指將 資料從一常式傳送到另一常式之常見程式技術。因此,與 該較佳實施例一致,一單一路徑或路徑並非是指一實體連 結;然而,不同連結可以使用在不同實施例中。 該增強程序係以該接收編解碼器246所輸出之聲音信號 開始。初始地,該信號係透過聲道輸入5〇2傳到該限制器 (limite〇504。該限制器504較佳地係為一標準聲音限制器, 85801 -15- 200407027 即’讓該聲音之大聲部分避免由於缺乏動態範圍而覆蓋該 向下串流處理的處理功能。回應該等聲音等級,該限制器 504製造增益變化,該變化對於該聲音具有一美化效應 (coloring effect),像是’’激發 ’’(’’pumping”及·’修剪 ’’(clipping) 。由於限制或解壓縮之結果而發生的增益變化經常對該傾 聽者是明顯,而這係稱作為”激發。”修剪’’係發生於當該 信號超過一系統可用的最大可數值時。 該限制器504之輸出將該信號分成四個分離的路徑 (pathway)或波段(band)。這些係稱作為該全頻寬路徑510、 該低音路徑520、該中等路徑540,及該高音路徑560。每個 路徑較佳地係獨立地處理。該全頻寬路徑51 〇係針對該全頻 寬聲音。對比於下面討論的該等各種濾波波段之處理,該 全波段路徑510較佳地係沒有被聲音等級解壓縮。該音、中 等、及高音路徑(520、540、560)較佳地將該信號濾波到非 重疊頻帶。 應了解的是可以採用一點路徑或少一點的路徑。例如, 對於次低音擴聲器波段存在有一額外路徑,而該中頻波段 係分割成兩個分開中頻波段。當在另一實施例中所使用之 頻率波段的數目係非常高的時候,該濾波較佳地係由一 ARBI濾波器來提供。例如,該限制器504係為一具有動態、 圖形滅波之二百個立體音聲道的ARBI滤波器(因而也要求 三百個聲音等級解壓縮之立體音聲道及三百個時間延遲校 準之立體音聲道)。 處理之前,全頻寬、低音、中等及高音路徑(51〇、52〇、 85801 -16 - 200407027 540、560)之分別輸入係由放大器5〇6a-d來放大。處理之後, 該全頻寬、低音、中等及高音路徑(510、520、540、560) 之分別輸出係由放大器507a-d來放大,然後在該混合器578 結合。 由該等濾波器所形成之每個頻率波段係由圖5中所示及 在該等後續段落中所描述的各種處理元件獨立地處理。 除了該全波段路徑5 1 0之外,每個波段包含一參數等化之 等化器。該等參數等化器係標示為參考數字522、542及 562,分別針對該低音、中等及高音路徑(52〇、542、562)。 每個該參數等化器(522、542、562)提供多個窄頻濾波器, 每個濾波器能夠控制增益、頻寬或”Q”及中央頻寬。該等等 化器(522、542、562)係包含一 Nyquist補償濾波器,該濾波 器能夠降低由於取樣混疊(aliasing)所造成之假信號。 每個頻率波段之特定可程式化的聲音等級擴展或壓縮係 利用動態處理元件來完成,該等元件係包含在該低音、中 等及咼音路控(520、540、560)的每一路徑。該等處理元件 較佳地包含各種濾波器,伴隨著一擴展器及/或壓縮器。該 低音路徑520較佳地包含一高坡型(high_shelf)濾波器524、 一低通濾波器526,及一高通濾波器528,伴隨著一擴展器 530及一壓縮器532。該中等路徑540較佳地包含一高坡型滤 波器544及一帶通滤波器546,伴隨著一擴展器54 8及一壓縮 器550。該高音路徑560較佳地包含一高坡型濾波器564、一 低通濾波器566,及一南通遽波器568,伴隨著一擴展器 570。該全頻寬路徑較佳地受限於一壓縮器512。應了解的 85801 -17· 200407027 是使用在每個路徑中之該等處理元件係基於與該路徑及其 他設計選擇有關之該等波段之數目及類型來變化。 每個波段(包含全頻寬路徑510)較佳地也提供時間延遲 校準元件,以補償該等不同時間延遲,該時間延遲係由該 等前面元件所產生或是在該伺服器端之記錄或處理中就已 經產生。該等時間延遲元件係標示以參考數字514、534、 552及572 ’其分別針對該全頻寬、低音、中等及高音路徑 (510、520、540、560)。典型地,適當校準的時間延遲係屬 於微秒的等級。 處理之後,每個波段輸出係連接到一混合器578。該混合 器57 8提供在該等四個路徑(51〇、520、540、560)之間一信 號平衡,及將該混合信號導入一主等化器580。 該主等化器580提供對於離開該混合器578之信號的參數 等化。它提供該信號之最後寬頻譜整形。該等化信號接著 (可選擇地)係通過高度等化共鳴濾波器,以加強該等次頻 率及低音頻率。該等濾波器較佳地包含一高坡型滤波器 582、一低通遽波器584及一高通遽波器586。 一壁形模擬器590能夠耦合到該高通濾波器586。該壁形 模擬器590使用擴散域矩陣(DFM)技術,以產生模擬來自一 真實階段(stage)之反射的時間延遲。該聲音反射環境模擬 能夠在沒有導入無用的共鳴峰值下,將愉快(liveliness)或 是殘響品質添加該音樂。 傳統DFM技術對非諧波(non-harmonic)、非共鳴 (non-resonant)波反射,使用數值理論法則。例如,1986年 85801 -18- 200407027 由 M.R. Schroeder,Springer-Verlag,Berlin所著之第二版之 科學及通信之數值理論的段落15·8中所描述的二次餘數 (quadratic)及段落13.9中所描述的本質根(primitive root)係 應用在本文中。然而,這些傳統技術只有提供長時間模擬 房間之殘響的反射。較佳地採用一本質根計算,該計算係 基於Schroede〗·所教之該等方法,藉由應用一擴散域矩陣 DFM技術來改良,以提供該聲音先期反射,即在該直接聲 音之5到30微秒之内的反射。 違壁形模擬器590也能夠協助中斷(break-up)、重新整形 (re-shape)或移除強烈週期處理人工或麻討厭週期特徵之 典用效應。使用在該階段(stage)模擬器中之dfm技術沒有 使用再生,即從該輸出到該處理元件之輸入的回饋。該處 理階段之控制參數包含該大小及離該壁之距離。 忒壁形模擬器590之輸出係導入到該房間模擬器592。該 房間模擬器592使用DFM技術,以產生類似自然的房間聲波 (acousdc)之時間延遲及共鳴。該DFM技術類似於該壁形模 擬器590,但是使用再生。該房間模擬器592能夠添加殘響 及衰減強乾遮音樂材料’而進—步混淆編解碼器所 引起微妙的扭曲。該處理階段之其他參數包含房間大小、 房間長寬比,及濕/乾混音模擬器592之另一用法 係用以補償在該傾聽者傾聽環境中之不良的房間聲波。該 等與用以添加自然房間或階段聲波到—乾源信號相同之^ 上面所述的刪技術也能夠用以不再加以強調該傾聽者房 間中疋共鳴或錢,及降低該房間所感知週遭雜訊等級。 85801 •19- 200407027 為了該目的’該傾聽者的总門级 貧的房間聲波係利用放置在該傾聽者 通常傾聽的位置附近的麥克風來取得,及功能上連接到該 CPU’這就如圖3中所示。DFM技術較佳地只使用在該壁形 模擬器590及該房間模擬器Μ〕,a ϊ士、、、 天狹咨’在此處只有茲房間模擬器 592使用再生式組件。 各種濾波器係基於該用戶站或傾聽房間之品質來施加, 這可以利用該房間模擬器592來量測及補償。某—漉波器可 以補償該傾聽房間之聲波,這係基於一轉換一函數κ(ω), 該函數具有一些共鳴。假如該房間的大部分具有軟質表 面,像是地毯、窗簾或靠墊家具,則有可能該房間轉換函 數R(0)會在高頻處向下降。然而,假如該傾聽房間具有許 多硬質表面,則有可能該房間轉換函數κ(ω)之高頻端不會 下降到如此的程度。 用以完成房間共鳴補償初始步驟係為使用該麥克風3〇2 決定該傾聽房間之聲波(參見圖3)。該房間聲波的決定係使 用該等喇叭250(參見圖3)以產生具有一已知頻譜 聲音’然後使用該麥克風來監視該等房間聲波對該等制p八 所產生之聲音的效果。該等喇叭250產生一像是,,白色雜訊,, 之聲音’該聲音在每個頻率上具有相等能量。該麥克風所 轉換(transduced)之信號的頻譜ΝΚω)接著係用以根據下式 來計算該房間轉換函數R(g), Κ(ω)=Νι(ω)/[Ν〇(ω) Μ(ω)] » 此處頻譜Ni( ω)及Ν〇( ω)兩者都是以該SPLA尺度量測其分 貝值,及如同上面所述,Μ( ω)係為該麥克風所產生之轉 85801 -20- 200407027 換。或是,假如Ν〇( ω)係為一,,平坦”白色雜訊頻譜,如在該 較佳實施例中,則 R(6j)=Ni(6J)/[k Μ(ω)], 典型的補償房間濾波器接著係正為該房間頻譜的反相,或 F(6J)=1/R(6J) 此處F(y)係為該傾聽房間之補償濾波器。該濾波器以⑷可 以在该增強器中實現,不論是在該房間模擬器中592或是該 主等化器580中,或是在兩者之中。 也可以採用其他濾波器來補償週遭雜訊。週遭房間雜訊 補領係利用提兩該音樂之特定頻譜波段到超過週遭房間雜 訊所對應的波段來獲得。該等提升改良該信噪比,因此該 音樂的清晰度就不需要求助於將該整體音量提高。當該雜 訊頻謂基本上沒有變化時,該雜訊降低技術可以執行的很 好。伴隨著該聲波之濾波器,該麥克風3〇2(參見圖3)係被 採用以獲得在該傾聽房間内對週遭雜訊的量測。電聲轉換 係以一麥克風轉換函數Μ(ω)來描述。因此,描述由該麥克 風所引起原始聲音頻譜到該信號之頻譜之轉換的轉換函數 係以下式表示 Μ(ω)·Τ(ω)=Μ(ω)·ΙΙ(ω)·3(ω)·(:(ω)·Ι(ω)·Ρ(ω) 該傾聽者所聽到的聲音藉由將該麥克風302放在靠近該 傾聽者之位置大部分係被精確地監視。用以補償週遭雜訊 遽波器之頻譜典型地係具有與該週遭雜訊頻譜相同的一般 形狀。該濾波器也可以在實現在該增強器中,如在該房間 模擬器592中或是該主等化器580中,或是兩者之中。 85801 -21 - 200407027 進一步増強可以利用補償錄製該音樂之環境或是一模擬 錄製環境(貫際上係不同於錄製該音樂的環境)來獲得。該 用戶係被給予多重錄製環境的選擇。根據該較佳實施例, 該等下列六個模擬錄製環境係可以提供用戶選擇:綠音室 (A、B)、大廳(a、B),及體育場。例如,在一錄音室環境 中,存在有先前反射的增強。或是,在一模擬大廳環境中, 存在有短殘響拍子(times),而一模擬體育場係具有可觀的 較長殘響拍子(times)。就某種意義來說,該使用者會變成” 製作人’·’因為該使用者模擬該音樂如何被錄製。或者,該 模擬錄製環境之應用係單獨地基於錄製該音樂之實際環 境’而不是該使用者的喜好設定。在該範例中,該系統會 從該錄製修正無用的人工,及下載或串流檔案係包含一標 籤,像是該MP3檔案之ID3標籤,該標籤可以辨識該等合適 錄製房間聲波。 該房間模擬器592之輸出係連接到該卡拉OK元件593。該 卡拉OK元件593具有來自立體音聲道之房間模擬的輸入。 這些左聲道信號與右聲道信號係相比較,而在兩邊聲道具 有相同能量的音樂組件,像是語音,則會被移除以提供卡 拉OK的效果。除了該卡拉〇Κ元件593沒有重新導入該等原 始立體聲信號,這係較佳地以類似在該3D增強器595之方 式來完成’這將在下面討論。 該卡拉OK元件593之輸出係連接到該廣闊元件594。該廣 闊元件594比較左右聲道,然後將計算及延遲函數執行於該 等兩聲道,以改變在該等聲道間之所感知距離。該效果改 85801 •22- 200407027 變該音樂之感知立體聲分離範圍。其他試圖產生一增強廣 闊度會導致該信號之低頻部分的損失,而該廣闊元件594 可以產生該分離同時留下該等低頻組件實質上沒有改變。 該效果之處理係整合到標準PL-2處理内,由美國加州,舊 金山,杜比公司所發展及散佈的一種定位準則。特別地, 該卡拉OK元件593、該廣闊元件594,及該3D增強器595(將 於下面討論)係以雙聲道之組合使用來完成PL-2解碼,每個 元件要求該等左右聲道間的互動。 ,該廣闊元件594之輸出係連接到該3D增強器595。該3D增 強器595移除來自該立體聲信號之”相等能量”(一般模式)信 號内容,(通常獨唱聲樂及樂器)將之延遲,然後使用頻域 及時域函數,將之與該原始信號重新混音。這在沒有去局 部化該等能量材料的情形下,提供一”寬闊化’’聲音階段給 該傾聽者。 該3D增強器595之輸出則連接到該等級(leveling)放大器 596。 依次地,該等級放大器596係連接到該AI等級控制器 597。 該AI等級控制597電路功能係用以在峰值事件期間降 低該聲音等級,及在傳送一峰值事件之後將其交回。為了 讓聲音在該等傾聽過程或是當錄製該聲音時免於扭曲,人 性工程師總是會藉由向下移動該引起題儀器或語音的音量 控制,將該音量降低。藉由基本上模擬人性工程師,該AI 等級控制597藉由分析扭曲及信號過載之數位串流以辨識 峰值事件,可以快速地向下移動該聲音等級。接著在該峰 值事件發生之後,在不需要一”總是開啟”的聲音壓縮電路 85801 -23 - 200407027 情形下’它會將該音量回到該初始音量設定,這會不合意 地導致動態邊緣及平坦聲音的損失。 該AI等級控制597之輸出係連接到該主擴展器598,該擴 展器係用以選擇地增加該主要立體聲信號之動態範圍。該 主擴展器598之輸出係連接到一放大器599。 该主擴展器5 9 8控制該系統之最後輸出音量等級。它允許 該傾聽者能夠設定成他或她所喜歡的音量等級,而不需擔 心會過載該喇队驅動器電路或該等喇叭。該特徵可以利用 藉由監視扭曲取樣來偵測一喇队過載峰值聲音等級之程序 來完成。根據該較佳實施例,該修剪程度的模糊邏輯計數 (fuzzy logic tally)係用以決定該音量等級應該降低的程 度。或者,該程序可以事先查看該音樂串流而預測一喇队 過載峰值聲音等級之抵達。假如該等級係達到或預測將達 到,則該主增益等級係使用一非線性衰減相對時間曲線來 自動地變小,該曲線模擬活生生的人會使用的衰減相對時 該主擴展器598係為該增強處理之最後階段,而提供該增 強信號給聲道輸出504,這依次地連接到該味J p八驅動器電 路。該喇队驅動器電路將該信號之處理器的增強數位表示 轉換成一硬體類比信號,然後提供必要的放大及對該味j P八 的連接。 在此所描述的聲音等級解壓縮提供對該音樂之動態範圍 的擴展,以協助修正該聲音信號之壓縮,該壓縮已經向前 發生在該原始聲音來源之錄製的任何時間上。典型地,該 85801 -24- 200407027 '曰樂之錄製及混合包含許多該等音軌之聲音等級壓縮,以 獲得該錄製媒介之有限動態範圍的好處。同樣地,某些壓 縮係加於後錄製期間,以降低網際網路廣播目的之頻寬。 該後者類型的壓縮實質上係由該接收編解碼器所移除,但 是卻已經不足以修正或是需要進一步擴展,以改良該音樂 之’’愉快(liveness)",或其他主觀上的品質。較佳地,採用 一使用具有不同時間常數及擴展係數之動態的處理特徵。 圖5中所示的各種處理元件係利用一主控制程式來控 制’該程式可以越過任何程序,及可以規定每個程序的參 數。該”表皮”係為允許該用戶能夠控制參數及預先設定之 界面,即該π表皮”係為在該傾聽者個人電腦螢幕上所顯示 之增強程式之視覺及互動部分❶推桿(facjer)控制係可用於 讓該傾聽者能夠規定在該系統中之每個參數,及”單選按益 即開/關切換)係可用以選擇預先設定參數之群組。該等增 強參數係可以分開地調整,或是各種預先設定可以挑選。 該系統係包含一 ’’巨大”控制,其能夠同時地控制該等個 別波段處理器之參數。對於該”巨大”參數處於低值時,則 係發生較小動態處理,而該聲音等級動態範圍係等於該音 樂錄製時的動態範圍。對於該,,巨大”參數係處於較高值 時,每個波段的處理動態係相對於該錄製音樂之聲音等集 動範圍增加。 預先設定參數群組係屬於兩種類型:傾聽者定義及内 建。傾聽者可以從他們自己先前標記群組中選擇預先設 定,或是可以從一内建式菜單的預先設定中選擇。内建的 85801 200407027 預先設定係基於頻寬、編解碼器類型、傾聽者的喇叭及音 樂類型的考慮來設計。一旦傾聽者選擇一内建預先設定, 該傾聽者則會調整任何個別參數或參數群組,以客制化該 内建的預先設定。該調整的參數群組則會被加上標記,然 後存檔成一新的預先設定。例如,假如選擇一内建的預先 設定,則該傾聽者實質上係選擇一組的房間補償參數,該 組參數係施加於該選定的内建預先設定。 圖6係為根據一較佳實施例,說明一 3D增強器之方塊 圖。正如具有其他元件,該元件具有一左側輸入602及一右 侧輸入604,還有一左側輸出650及一右側輸出652。一混合' 器640係有左側輸出650有關,同時其他混合器642係與右側 輸出652有關。 與左側輸入602有關之信號係傳送通過一低通濾波器606 及一高通濾波器608。類似地,與左側輸入604有關之信號 係傳送通過一低通濾波器610及一高通濾波器612。該等低 通濾波器606及610之輸出係分別地傳送通過放大器622及 放大器628,該等放大器之輸出係分別地導入到混合器640 及混合器642。該等高通濾波器608及612之輸出係分別地傳 送通過放大器624及放大器626,該等放大器之輸出係分別 地導入到混合器640及混合器642。該等高通濾波器608及 612之輸出也是在加法器632相加在一起,然後導入到放大 器634。該放大器634之輸出係傳送到混合器640,還有傳送 到時間延遲元件636,該元件之輸出係進一步導入到混合器 642 〇 85801 -26- 200407027 圖7根據一較佳實施例,說明一廣闊(wide)元件之方塊 圖。正如具有其他元件,該元件具有一左侧輸入7〇2及一右 側輸入704,還有一左側輸出750及一右側輸出752。一混合 器74〇係有左側輸出750有關,同時其他混合器742係與右側 輸出752有關。與左側輸入702有關之信號係傳送通過一高 通濾波器706及一低通濾波器708。類似地,與右側輸入704 有關之信號係傳送通過一高通濾波器710及一低通濾波器 7i2。該等低通濾波器708及712之輸出係分別地傳送到放大 器740及放大器742。類似地,該等高通濾波器706及710之 輸出係分別地傳送通過時間延遲元件724及726,該等元件 之輸出係別地導入到混合器740及混合器742。較佳地,時 間延遲元件724所提供的時間延遲係大於時間延遲元件726 所提供之時間延遲。例如,與元件724有關的時間延遲係為 〇.〇5〜2.00微秒,而與元件726有關的時間延遲係為1〇〜3〇微 秒。 圖8係根據該揭示方法/系統,說明該增強處理器之另一 貫施例之方塊圖。在圖8中所說明之系統包含許多與圖4中 所描述相同的元件,而也是以上述相同的方法來操作。然 而,應注意的是圖8包含該等下列額外的元件:一低音動態 處理器902 ;時間延遲元件905、918及919 ; 壁形模 擬器909 ; —偏移裝置907 ; —波形產生器915 ; —增益視窗 臨界處理器917及一語音”s”偵、測電路918。同樣在圖8中所 描述的是一喇叭921(具有一伴隨放大器920)及一麥克風 922。該低音動態處理器9〇2包含一特殊濾波器,其係與一 85801 -27- 200407027 可變時間延遲電路及壓縮器及擴展器方塊相組合,以增強 一動態低音聲音。該壁形模擬器909執行與該等前面圖示有 關且與上述相同的功能。該波形產生器915係用以在靜音期 間防止Intel FPU’·去正交(denormal)"操作。該偏移裝置9〇7 係用以允許該AI動態壓縮器9 01與該AI動態解壓縮器9〗3之 間進行通信。也應注意的是該AI衰減及扭曲偵測裝置916 可以用以監視該傾聽者環境923而提供回饋,使得能夠施加 一合適的增益等級於該輸出信號。這可以透過使用 Fletcher-Munson查詣表來執行。 雖然該等較佳實施例係在該等伴隨圖式中說明,及從該 等前面詳細描述中描述,但是應了解的是本發明並非受限 於所揭示的實施例,而是要能夠讓數種重新配置、修正及 替代方案不背離在此所提出的本發明之精神及其能在該申 請專利範圍及等同物下定義。 【圖式簡單說明】 圖1係為根據一較佳實施例,一用以增強壓縮聲音資料之 改進技術之流程圖。 圖2 A係為一根據一較佳實施例說明在網路之飼服器端所 發生之增強處理的方塊圖。 圖2B係為一根據一較佳實施例說明在網路之用戶端所發 生之增強處理的方塊圖。 圖3係為一根據一較佳實施例說明在網路之伺服器端所 發生之增強處理的方塊圖。 圖4係為一根據一較佳實施例說明用以增強聲音信號之 85801 •28- 200407027 信號處理功能的方塊圖。 圖5係為一根據一較佳實施例與有限頻寬音樂之用戶端 增強有關之信號處理功能的方塊圖。 圖6係為一根據其他較佳實施例說明用以增強聲音信號 之信號處理功能的方塊圖。 圖7係為一根據其他較佳實施例說明用以增強聲音信號 之信號處理功能的方塊圖。 圖8係為一根據其他較佳實施例說明用以增強聲音信號 之信號處理功能的方塊圖。 【圖式代表符號說明】 202 音樂源 204 聲音編解碼器 206 串流伺服器 210 主機站 212 增強處理元件 214 輸出連接 222 低等或中等頻寬連接;信號 230 該用戶站 242 數據機 244 個人電腦;處理器;接收聲音編解碼器; 246 接收聲音編解碼器 248 喇叭驅動器 250 制P八 252 增強處理元件 300 用戶站 302 麥克風 85801 -29 - 200407027 306 耦合 405 聲音信號 410 人工智慧動態壓縮器 412 信號線路 415 人工智慧動態解壓縮器 420 高頻人工遮罩處理器 425 清晰處理器(中等) 427 混合器 430 3D/即時增強器 435 錄製環境模擬器 440 語音消除器 445 廣闊立體聲增強器 450 人工智慧衰減及扭曲偵測處理器 455 該處理聲音信號 502 聲道輸入 504 限制器 506a_d、507a_d 放大器 510 全部頻寬路徑 520 低音路徑 540 中等路徑 560 高音路徑 546 帶通濾波器 522、542、562 低音、中等、高音參數等化器 524、544、564、582 高坡型濾波器 526、566、584 低通滤波器 528、568、586 高通濾波器 530、548、570 擴展器 -30- 85801 200407027 532、550、512 壓縮器 514、534、552、572 時間延遲元件 578 混合器 580 主等化器 590 壁形模擬器 592 房間模擬器 593 卡拉OK元件 594 廣闊元件 595 3D增強器 596 等級放大器 597 人工智慧等級控制 598 主擴展器 599 放大器 602 左側輸入 604 右側輸入 650 左輸出 652 右輸出 640、642 混合器 606、610 低通遽波器 608、612 向通滤波器 622、624、626、628、634 放大器 632 加法器 636 時間延遲元件 702 左側輸入 704 右側輸入 750 左輸出 752 右輸出 -31 - 85801 200407027 740 、 742 混合器 706 、 710 南通遽波器 708、712 低通漉波器 726 > 724 時間延遲元件 902 低音動態處理器 905、918、919 時間延遲元件 909 擴散域矩陣壁形模擬器 907 偏移裝置 915 波形產生器 917 增益視窗臨界處理器 918 語音’V’偵測電路 920 放大器 921 920 麥克風 901 人工智慧動態壓縮器 913 人工智慧動態解壓縮器 923 傾聽者環境 916 人工智慧衰減及扭曲偵測裝置 85801 -32 -200407027 (1) Description of the invention: [Technical field to which the invention belongs] The present invention relates to improved processing techniques for enhancing the transmission of sound signals, and more particularly to processing techniques for enhancing the transmission of music over a limited bandwidth connection. [Previous Technology] The rapid popularity of the Internet has led to its rapid development and updating, and more efficient ways to use its communication technology, far exceeding text-only applications. Two new applications that have gained attention are sound and video broadcasting. These two applications have a common problem: when the connection to the Internet is limited by bandwidth, their functionality suffers. Because of its more bandwidth requirements, video broadcasting is particularly problematic for most Internet users (ie, users) using limited bandwidth links. A common method of transmitting sounds like music on the Internet is to download these sound files to a good computer. Digital audio files are also commonly copied and compressed into MPEG3 or other formats to optical discs, personal players, or computer hard drives. Compared to streaming sound, listening to these broadcasts is more suitable or portable. Listening environment. Another form of Internet transmission sound is streaming sound. " Streaming means listening while downloading. Generally, compared to the connection between the good end and the Internet, the server has a non-appropriate, non-different bandwidth connection. When using streaming sound for music, the Internet host 峪 王 机 站 ( That is, the word server,) can provide real-time I ^ Le Cheng Yue Hui, DJ choose music or archive through the Internet connection, to the listening end user (that is, the "user side"). But since the user端 典 变 85801 200407027 With limited bandwidth connection, _ streaming or downloading (compressed) music is not an ideal listening experience, especially for users who are accustomed to CD quality music. The deterioration of this listening experience can be traced back to two Main sources: Compromises to compensate for limited bandwidth transmission requirements or compressed signals that reduce the file size required for storage, and the client ’s poor listening environment. Regarding the latter, downloading or downloaded music on the Internet is usually borrowed The client attached to the client computer listens, and usually, there are people who pay attention to provide a good listening environment where the computer is located. Recently, efforts have been made to While improving the channel bandwidth problem, the problem of the poor listening environment has not been satisfactorily resolved. Therefore, it is advantageous to provide a technical solution to enhance the environment, so that the client can receive and listen to the environment in this environment. Limited bandwidth links the received sound signals. Furthermore, it is advantageous to provide a system that can compensate for distortions caused by compressing sound files into smaller files. [Summary of the Invention] The present invention discloses an improved sound signal Processing method and system. The revealing method / system is used to enhance the quality of the sound to be compressed and / or has been compressed. The system uses an array of adjustable digital signal processors. The signal feed performs various functions. According to one embodiment, 'the method / system is able to strip the signal before it is compressed to a smaller format' (Γιρ). As in the prior art of the present invention, the compression of the sound # is necessary to transmit the signal via a limited bandwidth network link. In order to store copies of sound signals in media with limited storage space such as floppy discs, optical discs, flash memory and magnetic drives, 85801 200407027 compression is also necessary. Other embodiments of the method / system are used to enhance the decompressed sound signal. For example, the method / system is used with a streaming media receiver that is primarily user 4 to enhance the sound signal after it has been decompressed by a streaming receiver. According to other examples, the method and system enhance the sound signal read and decompressed from a limited storage medium. In a preferred embodiment, the disclosed method / system is used at the compression and decompression ends of the audio stream. However, the invention also covers that the disclosed method / system can be used exclusively on either the compressed or decompressed side of the audio stream. The upstream (ie, compression) embodiment of the method / system is applied as a "stripping" program that processes the sound signal faster than real-time. The " The "Delete" program is useful for compressing electronic sound files before storing them on a storage device. Because the " stripping " program operates faster than real-time, the time required to compress the file can be greatly reduced. An embodiment of the method / system for downstream (i.e. decompression). The downstream streaming embodiment can be used to enhance the audio signal when the audio signal is read and decompressed from the storage medium. The downstream embodiment can also be used to 'enhance a -stream $ tone signal' when it is received by a -receiver. Since the disclosed method / system operates faster than instant ', it is possible to effectively enhance the decompressed sound signal with a minimum time delay effect. [Embodiment] The present invention discloses a method for enhancing the sound transmitted to a user 'via a limited bandwidth transmission system or from a compressed digital file. More specifically, the disclosed 85801-8.200407027 is the technology of the client terminal of the I audio file, which can be transmitted to the user device by streaming or downloading through the Internet or other methods, such as CDs, portable players, set-top boxes, and similar devices can be played on computer-based sound systems with limited fidelity and in environmental towels with surrounding noise or other undesirable acoustic characteristics. Also disclosed is a technique for compressing a sound signal faster than in real time, enabling the sound signal to be broadcast over a limited bandwidth connection. Other embodiments include user-centric applications where the sound signal is enhanced after decompression, such as a streaming media receiver or an electronic sound file player (ie, an MPEG 3 player). Therefore, the disclosed method / system can be used in the following applications: • A server-side "striker", operating at a faster speed than real-time; • A client that does not need to strip the sound file in advance Enhancement device; Broadcast server 'here the sound signal is enhanced in real time, · • a server-side 1 stripper ", where the compressed file is later decoded at the client to further enhance the quality and clarity ; And • a user-server configuration, where the sound signal is enhanced before the server before compression, and then further enhanced at the client after decompression. FIG. 1 is a flowchart illustrating an improved technique for enhancing sound data according to a preferred embodiment. In step 102, the audio data is encoded as a digitally formatted signal. There, the digital signal can also be compressed for subsequent transmission. At step 104, once the encoded sound signal has become a digital format, it can be enhanced using various frequencies that are expected to be lost or destroyed during subsequent transmissions 85801 -9-200407027 and dynamic processing techniques. Thereafter, in step 006, the enhanced sound signal is transmitted to a network such as the Internet via a link, and the link has only a low or medium bandwidth. After arriving at the client, the transmitted sound signal is decoded in step 08 08 (or decompressed if necessary). Finally, at step 1110, the currently decoded sound signal is further enhanced to restore frequencies and dynamics that are expected to be lost or corrupted during transmission. Figure 2A illustrates an enhancement process that occurs on the server side of the network (i.e., the host station) according to a preferred embodiment. At the host station 2, music is selected from a music source 202, such as, for example, from a stored file or live feed. An enhancement, a sound codec 204, and a processing element 212 are inserted between the music source 202 and. The enhancement processing element 212 enhances the sound signal before it is encoded by the transmission sound codec 204. If the streaming server 206 is broadcasting to users with known and / or similar listening environments, enhanced processing would be advantageous. Likewise, it is advantageous when the type of music to be broadcast is known or determined, or is always similar, because the enhancement can be adjusted in a way that is most beneficial to that particular type of music. The transmission music codec 204 processes music through an encoder (ie, another-half transmission part). The encoder is formatted in eight ways by adjusting the bandwidth of the Internet connection. And compress the music. A codec is an encoder / decoder system, and for the purpose of eating here ’its function is to act as a sound material. Compressor (encoder) and _ audio / data solution I㈣ (decode H). -Compilation of data compression 85801 • 10- 200407027 State is also known as a "sound reducer" (componder). In this disclosure, 'data compression' refers to any reduction in the size of the data file Process, and sound level compression, 'refers to 8Track, Dolby AC3 and WMA (UP3). After the transmission sound codec 204 is applied, a streaming server 206 then transmits the compressed and formatted music data to the designated via the output connection 214 with the Internet. Address. Although the description mainly refers to streaming and enhancement of music, this can be applied to any audio or sound / video material as well. Furthermore, it should be noted that the system and technology can be used with various sound conversion protocols, including, for example, Real Audi 0, MP3 and Windows Media. When used herein, 'instant' means that while the listening user is listening to the music, the server is essentially processing the music in the sound codec. When there may be some connection with such speakers When the resulting delay is to be considered "instant", it is preferred that the music stream at the music source and any clips of the music that the user is listening to are not heavily buffered, and the music clips are continuous Then there are such teams. The downloaded files are stored and played later. These files are preferably compressed in the same way as streaming files, although the compression ratio may be lower than that of real-time streaming. Compression ratio used. Figure 2B illustrates the enhancement processing that occurs at the user end (ie, decoder-side enhancement) of the network according to the present invention. This type of enhancement processing is useful for situations with a wide variety of listening and / or music types Through the low or medium bandwidth connection 222, the enhanced and encoded signal arrives at the subscriber site 23. In particular, the k-number 222 can & provide personal power 244 or other suitable processing platform 85801-11-200407027. In the preferred embodiment, the personal computer 244 includes a modem 242, a processing combined with the received sound codec 246 and an enhanced processing element 252 244, speaker driver 248, and speaker 250. Like the enhanced processing element 212 provided at the server station 210, when a decoded signal has been decoded by the receiver sound codec 244, the enhanced processing element 252 is It is better to prepare for the enhancement of the signal. The processor of the user receiving decoder 246 is connected to the central processing unit 244 to perform the inversion of most of the server's transmission sound codec 244 ( inverse). In particular, the receiving codec 246 converts the data stream back into a convenient music format, and decompresses the music to save the music as much as possible at the music source 202. Original quality. The program of the received sound codec 244 can be executed by software on the central processing unit 244, or it can be added to the hardware by using a sound card. The Rabat driver 248 can also be found in the sound card or implemented in software. The Rabat 250 in a typical user listening environment contains a pair of medium drivers ranging from inferior to intermediate quality, and includes bass reinforcement Woofer and / or sub-woofer. The user station 230 where the user and the computer are placed is the final component of the listening environment: this affects the perceived sound quality to a considerable extent because the sound The spectrum response, such as resonance, and the surrounding noise it causes. Considering the bandwidth limitation of the connection between the transmitting sound codec 204 and the receiving sound codec 246, these codecs are designed to An output is generated that is substantially similar to the input signal. The data compression programs of these codecs (204, 246) introduce nasty artifacts and distortions. These compression procedures 85801 -12- 200407027 do not necessarily need to be modified using the improvement techniques described below. In the configuration of Figure 2B (and Figure 3), the enhanced processing element 252 is preferably software connected to the processor. However, other configurations are conceivable for alternative embodiments. For example, the processing occurs in a dedicated digital signal processor, which is located near or on a connected device. Figure 3 illustrates the enhancements that occur at the client of the network according to other preferred embodiments. Different from the embodiment described in Fig. 2B, the embodiment described in Fig. 3 has a microphone 3002 included in the subscriber station 300. The microphone 302 is connected to the enhancement processing element 252 via a coupling 306 to provide feedback to the element. Based on the feedback, the enhanced processing element 252 can provide additional control of the squad driver 248. Several improvements and techniques have been adopted to provide excellent processing performance with only moderate or typical power. One technique uses an extended bit depth to perform the sound processing to generate a large dynamic range in the system, eliminating the need for strong input limiters and reducing truncated error noise. Any type of processing (such as mixing of signals, equalization, compression, etc.) will change the degree of the original digital data inversely with the bit resolution of the data. For illustrative purposes only, these techniques described below use 64-bit sound sampling at each stage of this data processing. However, it also covers the use of other sample sizes, such as 8-bit, 16-bit, 32-bit, and 128-bit. Fig. 4 is a block diagram illustrating a signal processing theory for enhancing a sound signal according to the present invention. In FIG. 4, a sound signal 405 is provided to an artificial intelligence (AI) dynamic compressor 41. The AI dynamic compressor 41 is a series of actions through the signal 85801 • 13- 200407027 line 412 and the AI dynamic decompressor 415 in order to enhance the dynamic range of the incoming sound signal 405 to a certain required range. The offset in these two processors 410, 415 produces an overall dynamic spread of the signal. After processing by the AI dynamic compressor 410, the sound signal is processed using two components placed by the top row: a high-frequency artificial mask processor 420; and a clear processor (medium) 425. The high-frequency artificial mask processor 420 includes a tunable filter and a variable time delay circuit. The processor generates a masking effect on the objectionable artificial and sound from the incoming sound signal. The clear processor 4 2 5 also includes a tunable filter with a variable time delay circuit. The processor produces a realignment effect on the time-consuming intermediate frequency in the incoming sound signal. After processing through these two components, the sound signal is combined using a mixer 427 and then fed to a 3D / instant booster 430. The 3D / real-time enhancer 430 adds the live and stereo sound perception to the sound domain of the sound signal. The 3D / instant booster 430 uses a three-dimensional model to determine the extent to which signal processing occurs. After the sound signal has been processed by the 3D / instant booster 430, the signal is processed using the recording environment simulator 435 'The simulator will diffuse, reverb, depth, regenerate and spatially attenuate (Decay) to the sound signal. The recording environment simulator 435 accomplishes these effects without adding a resonance mode and a node (node) to the virtual recording space. After processing by the recording environment simulator 435, the sound signal is processed by a voice canceller 440, which can effectively eliminate the vocal track in the voice signal. Because most of the voice track system is centered and 85801 -14-200407027 is quite dry in the overall sound signal, this function can be realized. After the speech signals are removed, the sound signal is a wide stereo sound booster 445, which adds a wider stereo sound perspective to the sound domain of the sound signal. Here the sound signal is sent to the AI dynamic decompressor 41 5 where the signal is processed by artificial intelligence rules to ensure that the entire dynamic range of the sound signal is restored. After the sound signal is processed by the AI dynamic expansion processor 415, the signal is then processed by an AI attenuation and distortion detection processor 450. The processor adjusts the level (that is, the volume) of the signal until it reaches the maximum Optimize the gain. The AI attenuation and distortion detection processor 450 is adapted to dynamically adjust the gain of the sound signal so that consistent signal levels can be continuously transmitted to the listener. There, the processed sound signal 4 5 5 is fed to a driver or a group of drivers so that people can hear the signal. Fig. 5 is a block diagram illustrating signal processing functions related to the client-side enhancement of the limited-bandwidth music according to a preferred embodiment. Although only one processing channel is illustrated in FIG. 5, it should be understood that multiple processing channels may be used. Furthermore, the following decoding and enhancement procedures are preferably software routines that run on a processor, so the reference to signal paths refers to the common transmission of data from one routine to another Program technology. Therefore, consistent with the preferred embodiment, a single path or path does not refer to a physical connection; however, different connections may be used in different embodiments. The enhancement procedure starts with a sound signal output from the receiving codec 246. Initially, the signal is transmitted to the limiter (limite504) through the channel input 502. The limiter 504 is preferably a standard sound limiter, 85801 -15- 200407027, that is, 'make the sound louder' The sound part avoids the processing function of the down stream processing due to the lack of dynamic range. In response to the sound levels, the limiter 504 creates a gain change that has a coloring effect on the sound, like ' 'Pumping' and 'clipping'. Gain changes that occur as a result of limiting or decompressing are often apparent to the listener, and this is called "trimming." 'It occurs when the signal exceeds the maximum achievable value of a system. The output of the limiter 504 divides the signal into four separate paths or bands. These are called the full bandwidth path 510, the bass path 520, the middle path 540, and the treble path 560. Each path is preferably processed independently. The full-bandwidth path 51 is for the full-bandwidth sound. Compared to For the processing of the various filtering bands discussed below, the full-band path 510 is preferably not decompressed by the sound level. The tone, medium, and treble paths (520, 540, 560) preferably filter the signal to Non-overlapping frequency bands. It should be understood that a little or less path can be used. For example, there is an additional path for the subwoofer band, and the IF band is divided into two separate IF bands. When in another When the number of frequency bands used in an embodiment is very high, the filtering is preferably provided by an ARBI filter. For example, the limiter 504 is a two-hundred-hundred-band dynamic wave ARBI filters for stereo channels (thus requiring three hundred sound level decompression stereo channels and three hundred three time delay calibrated stereo channels). Before processing, full bandwidth, bass, medium and treble The respective inputs of the paths (51, 52, 85801-16-200407027 540, 560) are amplified by amplifiers 506a-d. After processing, the full-bandwidth, bass, medium and treble paths (510, 520, 540, 560) respectively are amplified by amplifiers 507a-d, and then combined in the mixer 578. Each frequency band formed by these filters is shown in Figure 5 and after these subsequent The various processing elements described in the paragraph are processed independently. In addition to the full-band path 5 1 0, each band contains a parameter equalizer. The parameter equalizers are labeled as reference numerals 522, 542 And 562, respectively for the bass, middle and treble paths (52, 542, 562). Each parameter equalizer (522, 542, 562) provides multiple narrow-band filters, each of which can control the gain , Bandwidth or "Q" and central bandwidth. The equalizers (522, 542, 562) include a Nyquist compensation filter, which can reduce false signals caused by sampling aliasing. The specific programmable sound level expansion or compression of each frequency band is accomplished using dynamic processing elements that are included in each path of the bass, middle, and cymbal control (520, 540, 560). The processing elements preferably include various filters, accompanied by an expander and / or compressor. The bass path 520 preferably includes a high-shelf filter 524, a low-pass filter 526, and a high-pass filter 528, along with an expander 530 and a compressor 532. The intermediate path 540 preferably includes a high-slope filter 544 and a band-pass filter 546, along with an expander 5488 and a compressor 550. The treble path 560 preferably includes a high-slope filter 564, a low-pass filter 566, and a Nantong chirper 568, accompanied by an expander 570. The full bandwidth path is preferably limited by a compressor 512. It should be understood that 85801 -17 · 200407027 are those processing elements used in each path that vary based on the number and type of these bands related to the path and other design choices. Each band (including the full bandwidth path 510) also preferably provides a time delay calibration element to compensate for the different time delays, which are generated by the preceding components or recorded or recorded on the server side. Already generated during processing. The time delay elements are labeled with reference numerals 514, 534, 552, and 572 ', which are directed to the full bandwidth, bass, middle, and treble paths (510, 520, 540, 560), respectively. Typically, properly calibrated time delays are on the order of microseconds. After processing, each band output is connected to a mixer 578. The mixer 578 provides a signal balance between the four paths (51, 520, 540, 560) and directs the mixed signal to a main equalizer 580. The main equalizer 580 provides parameter equalization of the signals leaving the mixer 578. It provides the last wide spectrum shaping of the signal. The equalized signal is then (optionally) passed through a highly equalized resonance filter to enhance the sub-frequency and bass frequencies. The filters preferably include a high-slope filter 582, a low-pass chirper 584, and a high-pass chirper 586. A wall-shaped simulator 590 can be coupled to the high-pass filter 586. The wall simulator 590 uses diffusion domain matrix (DFM) technology to generate a time delay that simulates reflections from a real stage. This sound reflection environment simulation can add liveliness or reverb quality to the music without introducing unwanted resonance peaks. Traditional DFM technology uses non-harmonic and non-resonant wave reflection using numerical theory rules. For example, 1986 85801 -18- 200407027 by M. R.  Schroeder, Springer-Verlag, Berlin, Second Edition, Quadratic and Paragraphs described in Paragraph 15.8 of Numerical Theory of Science and Communication and Paragraph 13. The primitive roots described in 9 are used in this paper. However, these traditional techniques only provide reverberation reflections that simulate a room for a long time. A fundamental root calculation is preferably used, which is based on the methods taught by Schroede, and is improved by applying a diffusion domain matrix DFM technology to provide the sound early reflection, that is, 5 to 30 of the direct sound Reflections in microseconds. The wall violation simulator 590 can also assist in break-up, re-shape, or removal of the typical effects of strong periodic processing of artificial or nasty periodic features. The dfm technology used in the stage simulator does not use regeneration, that is, feedback from the output to the input of the processing element. The control parameters of the processing stage include the size and the distance from the wall. The output of the wall-shaped simulator 590 is imported into the room simulator 592. The room simulator 592 uses DFM technology to generate natural room acoustic time delays and resonances. The DFM technology is similar to the wall simulator 590, but uses regeneration. This room simulator 592 is capable of adding reverb and attenuating strong dry cover music material 'to further confuse the subtle distortion caused by the codec. Other parameters in this processing stage include room size, room aspect ratio, and another use of the wet / dry mixing simulator 592 to compensate for the poor room acoustics in the listening environment of the listener. These are the same as those used to add a natural room or stage sound wave to-dry source signal ^ The deletion techniques described above can also be used to no longer emphasize the resonance or money in the listener's room, and reduce the perceived surroundings of the room Noise level. 85801 • 19- 200407027 For this purpose, 'the listener's door-to-door room sound wave system is obtained using a microphone placed near the position where the listener normally listens, and functionally connected to the CPU'. This is shown in Figure 3. As shown. The DFM technology is preferably used only in the wall simulator 590 and the room simulator M], and only the room simulator 592 here uses regenerative components. Various filters are applied based on the quality of the user station or listening room, which can be measured and compensated using the room simulator 592. An 漉 -wave filter can compensate the sound waves of the listening room, which is based on a conversion-function κ (ω), which has some resonances. If most of the room has a soft surface, such as carpets, curtains, or cushion furniture, it is possible that the room conversion function R (0) will decrease downward at high frequencies. However, if the listening room has many hard surfaces, it is possible that the high-frequency end of the room transfer function κ (ω) will not fall to such an extent. The initial step to complete room resonance compensation is to use the microphone 302 to determine the sound waves of the listening room (see Figure 3). The room sound wave was determined using the speakers 250 (see Figure 3) to produce a sound with a known spectrum 'and then using the microphone to monitor the effect of the room sound waves on the sound produced by the system p8. The horns 250 produce a sound like, white noise, and the sound has equal energy at each frequency. The spectrum of the signal converted by the microphone (Nκω) is then used to calculate the room conversion function R (g) according to the following formula, κ (ω) = Νι (ω) / [Ν〇 (ω) Μ (ω )] »Here the spectrum Ni (ω) and NO (ω) are both measured in decibels at the SPLA scale, and as mentioned above, M (ω) is the rotation generated by the microphone 85801- 20- 200407027. Or, if NO (ω) is one, the flat white noise spectrum, as in the preferred embodiment, then R (6j) = Ni (6J) / [kM (ω)], typically The compensated room filter is then the inverse of the frequency spectrum of the room, or F (6J) = 1 / R (6J) where F (y) is the compensation filter for the listening room. It is implemented in the booster, whether in the room simulator 592 or the main equalizer 580, or both. Other filters can also be used to compensate for surrounding noise. Surrounding room noise Replenishment is obtained by raising two specific frequency bands of the music to more than the band corresponding to the surrounding room noise. The enhancements improve the signal-to-noise ratio, so the clarity of the music does not need to resort to increasing the overall volume When the noise frequency is basically unchanged, the noise reduction technology can perform well. Along with the acoustic wave filter, the microphone 3202 (see FIG. 3) is used to obtain the listening Measurement of ambient noise in a room. Electroacoustic conversion is described by a microphone transfer function M (ω). Therefore , The conversion function describing the conversion of the original sound spectrum to the signal's spectrum caused by the microphone is represented by the following formula: M (ω) · T (ω) = M (ω) · ΙΙ (ω) · 3 (ω) · ( : (Ω) · Ι (ω) · P (ω) The sound heard by the listener is mostly accurately monitored by placing the microphone 302 close to the listener. It is used to compensate for surrounding noise. The frequency spectrum of the wave filter typically has the same general shape as the surrounding noise spectrum. The filter can also be implemented in the booster, such as in the room simulator 592 or the main equalizer 580, Or both. 85801 -21-200407027 Further stubbornness can be obtained by compensating the environment in which the music was recorded or an analog recording environment (which is different from the environment in which the music was recorded). The user is given multiple Selection of recording environment. According to the preferred embodiment, the following six analog recording environments can provide user selection: green sound room (A, B), hall (a, B), and stadium. For example, in a recording In the room environment, there is an enhancement of previous reflections. Or, in In the simulated hall environment, there are short reverberation times (times), and a simulated stadium system has considerable long reverberation times (times). In a sense, the user will become a "producer '·" Because the user simulates how the music is recorded. Or, the application of the simulated recording environment is based solely on the actual environment in which the music is recorded, rather than the user's preference settings. In this example, the system will start from the recording Fix useless labor, and download or stream files contain a tag, such as the ID3 tag of the MP3 file, which can identify these suitable recorded room sound waves. The output of the room simulator 592 is connected to the karaoke component 593. The karaoke element 593 has a room simulation input from a stereo channel. These left channel signals are compared with the right channel signals, and the sound components with the same energy on both sides of the sound props, such as speech, will be removed to provide a karaoke effect. Except that the karaoke element 593 does not re-import the original stereo signals, this is preferably done in a manner similar to that of the 3D enhancer 595 ', which will be discussed below. The output of the karaoke element 593 is connected to the wide element 594. The wide component 594 compares the left and right channels, and then performs calculations and delay functions on the two channels to change the perceived distance between the channels. This effect changes 85801 • 22- 200407027 to change the perceived stereo separation range of the music. Other attempts to produce an enhanced breadth will result in the loss of the low frequency portion of the signal, and the wide element 594 can produce the separation while leaving the low frequency components essentially unchanged. The processing of this effect is integrated into the standard PL-2 processing, a positioning criterion developed and distributed by California, Old Kingsoft, and Dolby. In particular, the karaoke element 593, the wide element 594, and the 3D enhancer 595 (to be discussed below) are used in a combination of two channels to complete PL-2 decoding. Each element requires the left and right channels. Interaction. The output of the wide element 594 is connected to the 3D enhancer 595. The 3D enhancer 595 removes the "equal energy" (normal mode) signal content from the stereo signal, delays it (usually solo vocals and instruments), and then remixes it with the original signal using the frequency domain and time domain functions. sound. This provides a "broadening" sound stage to the listener without delocalizing the energy materials. The output of the 3D booster 595 is connected to the leveling amplifier 596. In turn, the A level amplifier 596 is connected to the AI level controller 597. The AI level control 597 circuit function is used to reduce the sound level during a peak event and return it after transmitting a peak event. Listening process or to avoid distortion when recording the sound, human engineers always lower the volume by moving the volume control of the problem instrument or voice down. By basically simulating a human engineer, the AI level controls 597 By analyzing the digital stream of distortion and signal overload to identify the peak event, you can quickly move down the sound level. After the peak event occurs, there is no need for an "always on" sound compression circuit 85801- 23-200407027 Case 'It will return the volume to this initial volume setting, which will undesirably result in dynamic edges and flat sound. Loss. The output of the AI level control 597 is connected to the main expander 598, which is used to selectively increase the dynamic range of the main stereo signal. The output of the main expander 598 is connected to an amplifier 599. The The main extender 5 9 8 controls the final output volume level of the system. It allows the listener to set his or her favorite volume level without worrying about overloading the squad driver circuit or the speakers. This feature This can be accomplished using a program that detects peak sound levels of a squadron overload by monitoring distortion sampling. According to the preferred embodiment, the fuzzy logic tally of the trim level is used to determine if the volume level should be lowered Alternatively, the program can look at the music stream in advance to predict the arrival of a peak sound level of a team ’s overload. If the level is reached or predicted to be reached, the main gain level uses a non-linear attenuation versus time curve To automatically decrease, the curve simulates the attenuation that a living person would use relative to when the main expander 598 is the enhancement process In the final stage, the enhanced signal is provided to the channel output 504, which in turn is connected to the wei J p eight driver circuit. The raster driver circuit converts the processor's enhanced digital representation of the signal into a hardware analog signal, It then provides the necessary amplification and connection to the flavor. The sound level decompression described here provides an extension of the dynamic range of the music to assist in correcting the compression of the sound signal, which compression has already occurred At any time of the recording of the original sound source. Typically, the 85801 -24- 200407027 'Yue Le's recording and mixing includes a sound level compression of many of these tracks to obtain the benefits of the limited dynamic range of the recording medium. Similarly, some compression is added during post-recording to reduce bandwidth for Internet broadcast purposes. The latter type of compression is essentially removed by the receiving codec, but it is no longer sufficient to modify or needs further expansion to improve the `` liveness '' of the music, or other subjective qualities . Preferably, a processing feature using dynamics with different time constants and expansion coefficients is used. The various processing elements shown in Fig. 5 are controlled by a master control program. The program can pass through any program, and the parameters of each program can be specified. The "epidermis" is an interface that allows the user to control parameters and preset settings, that is, the "pi skin" is the visual and interactive part of the enhanced program displayed on the screen of the listener's personal computer. Facjer control Can be used to enable the listener to specify each parameter in the system, and "radio on / off is on / off switching" can be used to select a group of preset parameters. These enhancement parameters can be adjusted separately, or various preset settings can be selected. The system includes a "huge" control that can simultaneously control the parameters of these individual band processors. When the "huge" parameter is at a low value, less dynamic processing occurs, and the sound level dynamic range Is equal to the dynamic range when the music is recorded. For this, when the "Huge" parameter is at a higher value, the processing dynamics of each band are increased relative to the moving range of the recorded music, etc. There are two types of preset parameter groups: listener definition and built-in. Listeners can choose presets from their own previously tagged groups, or they can choose from presets in a built-in menu. The built-in 85801 200407027 preset is designed based on considerations of bandwidth, codec type, listener's speaker, and music type. Once the listener selects a built-in preset, the listener will adjust any individual parameter or parameter group to customize the built-in preset. The adjusted parameter group is marked and then archived as a new preset. For example, if a built-in preset is selected, the listener essentially selects a set of room compensation parameters that are applied to the selected built-in preset. Fig. 6 is a block diagram illustrating a 3D enhancer according to a preferred embodiment. As with other components, the component has a left input 602 and a right input 604, and a left output 650 and a right output 652. A mixer 640 is related to the left output 650, while other mixers 642 are related to the right output 652. The signal related to the left input 602 is passed through a low-pass filter 606 and a high-pass filter 608. Similarly, the signal related to the left input 604 is passed through a low-pass filter 610 and a high-pass filter 612. The outputs of these low-pass filters 606 and 610 are passed through amplifiers 622 and 628, respectively, and the outputs of these amplifiers are introduced to mixers 640 and 642, respectively. The outputs of these high-pass filters 608 and 612 are passed through amplifiers 624 and 626, respectively, and the outputs of these amplifiers are introduced to mixers 640 and 642, respectively. The outputs of these high-pass filters 608 and 612 are also added together in an adder 632 and then introduced to an amplifier 634. The output of the amplifier 634 is transmitted to the mixer 640, and also to the time delay element 636. The output of the component is further introduced to the mixer 642. 0885801 -26- 200407027 FIG. 7 illustrates a wide range according to a preferred embodiment. (Wide) component block diagram. As with other components, this component has a left input 702 and a right input 704, and a left output 750 and a right output 752. One mixer 74 is related to the left output 750, while the other mixer 742 is related to the right output 752. The signal related to the left input 702 is passed through a high-pass filter 706 and a low-pass filter 708. Similarly, the signal related to the right input 704 is passed through a high-pass filter 710 and a low-pass filter 7i2. The outputs of the low-pass filters 708 and 712 are transmitted to an amplifier 740 and an amplifier 742, respectively. Similarly, the outputs of these high-pass filters 706 and 710 are passed through time delay elements 724 and 726, respectively, and the outputs of these elements are separately introduced to mixer 740 and mixer 742. Preferably, the time delay provided by the time delay element 724 is greater than the time delay provided by the time delay element 726. For example, the time delay associated with element 724 is 0. 〇5 ~ 2. 00 microseconds, and the time delay associated with element 726 is between 10 and 30 microseconds. FIG. 8 is a block diagram illustrating another embodiment of the enhanced processor according to the disclosed method / system. The system illustrated in Fig. 8 contains many of the same elements as described in Fig. 4 and operates in the same manner as described above. However, it should be noted that FIG. 8 contains the following additional components: a bass dynamic processor 902; time delay elements 905, 918, and 919; wall simulator 909;-offset device 907;-waveform generator 915; -Gain window critical processor 917 and a voice "s" detection and measurement circuit 918. Also depicted in FIG. 8 is a speaker 921 (with a companion amplifier 920) and a microphone 922. The bass dynamic processor 902 includes a special filter, which is combined with a 85801 -27- 200407027 variable time delay circuit and compressor and expander blocks to enhance a dynamic bass sound. This wall simulator 909 performs the same functions as those described above in relation to the foregoing illustrations. The waveform generator 915 is used to prevent Intel FPU 'denormalization " operation during the mute period. The offset device 907 is used to allow communication between the AI dynamic compressor 9 01 and the AI dynamic decompressor 9 〖3. It should also be noted that the AI attenuation and distortion detection device 916 can be used to monitor the listener environment 923 and provide feedback, so that a suitable gain level can be applied to the output signal. This can be performed by using a Fletcher-Munson lookup table. Although the preferred embodiments are illustrated in the accompanying drawings and described in the foregoing detailed description, it should be understood that the invention is not limited to the disclosed embodiments, but is intended to enable Such reconfiguration, modification, and alternatives do not depart from the spirit of the invention proposed herein and can be defined within the scope and equivalents of the patent application. [Brief description of the drawings] FIG. 1 is a flowchart of an improved technique for enhancing compressed audio data according to a preferred embodiment. Figure 2A is a block diagram illustrating the enhancement process that occurs at the feeder end of the network according to a preferred embodiment. FIG. 2B is a block diagram illustrating an enhancement process that occurs at a client of a network according to a preferred embodiment. Figure 3 is a block diagram illustrating the enhancement processing that occurs on the server side of the network according to a preferred embodiment. FIG. 4 is a block diagram illustrating the 85801 • 28- 200407027 signal processing function for enhancing sound signals according to a preferred embodiment. Fig. 5 is a block diagram of signal processing functions related to the enhancement of the client of limited bandwidth music according to a preferred embodiment. FIG. 6 is a block diagram illustrating a signal processing function for enhancing a sound signal according to another preferred embodiment. FIG. 7 is a block diagram illustrating a signal processing function for enhancing a sound signal according to other preferred embodiments. FIG. 8 is a block diagram illustrating a signal processing function for enhancing a sound signal according to other preferred embodiments. [Illustration of representative symbols of the figure] 202 music source 204 sound codec 206 streaming server 210 host station 212 enhanced processing element 214 output connection 222 low or medium bandwidth connection; signal 230 the user station 242 modem 244 personal computer ; Processor; receive sound codec; 246 receive sound codec 248 horn driver 250 P P 252 enhanced processing element 300 user station 302 microphone 85801 -29-200407027 306 coupling 405 sound signal 410 artificial intelligence dynamic compressor 412 signal Line 415 artificial intelligence dynamic decompressor 420 high-frequency artificial mask processor 425 clear processor (medium) 427 mixer 430 3D / immediate enhancer 435 recording environment simulator 440 speech canceller 445 wide stereo enhancer 450 artificial intelligence attenuation And distortion detection processor 455 This process sound signal 502 channel input 504 limiter 506a_d, 507a_d amplifier 510 full bandwidth path 520 bass path 540 middle path 560 treble path 546 bandpass filter 522, 542, 562 bass, medium, Treble Equalizers 524, 544, 564, 582 High-slope filters 526, 566, 584 Low-pass filters 528, 568, 586 High-pass filters 530, 548, 570 Expander -30- 85801 200407027 532, 550, 512 Compression 514, 534, 552, 572 Time delay element 578 Mixer 580 Master equalizer 590 Wall simulator 592 Room simulator 593 Karaoke element 594 Broad element 595 3D enhancer 596 Level amplifier 597 Artificial intelligence level control 598 Main extension 599 amplifier 602 left input 604 right input 650 left output 652 right output 640, 642 mixer 606, 610 low-pass chirper 608, 612 pass filter 622, 624, 626, 628, 634 amplifier 632 adder 636 time Delay element 702 Left input 704 Right input 750 Left output 752 Right output -31-85801 200407027 740, 742 Mixer 706, 710 Nantong waveband 708, 712 Lowpass waveband 726 > 724 Time delay element 902 Bass dynamic processing 905, 918, 919 Time delay element 909 Diffusion domain matrix wall shape simulator 907 Offset device 915 Waveform production 917 Gain Critical windows voice processor 918 'V' detection amplifier circuit 920 microphone 901 921 920 913 AI AI dynamic compressor decompressor dynamic environment listener 923 916 AI attenuation and distortion detecting device 85801-32--

Claims (1)

200407027 拾、申請專利範圍: i 一種用以增強傳輸聲音資料之方法,其包含: (a) 聲音資料編碼到數位格式的信號; (b) 利用預期會損失或扭曲之預先強調頻率及動態來 增強該編碼聲音信號; (c) 將該增強聲音信號傳送到用戶站; (d) 在傳送到該用戶站之後,解碼該增強聲新資料;及 (e) 處理痞解碼聲音信號,以恢復利用預期會損失或扭 曲的頻率及動態之預先強調所保存的頻率及動態。 2·如申请專利範圍第1項之方法,其中該編碼聲音信號之預 期損失或扭曲係全部或部分歸因該聲音信號之壓縮。 如申明專利範圍第1項之方法,其中該編碼聲音信號之預 期抽失或扭曲係全部或部分歸因於該聲音信號之傳輸。 4·如申請專利範圍第4之方法,肖包含在該增強聲音信號 進行傳輸之前,先壓縮該增強聲音信號。 5·如申請專利範圍第4項之方法,其中該增強聲音信號在其 傳輸之後解壓縮。 6· —種用以增強壓縮或數位儲存的聲音信號之方法,立包 含: (a) 接收一壓縮聲音信號; (b) 將該聲音信號分成離散波段; (c) 藉由不同處理路徑,處理一或更多之離散波段; (d) 將該等處理路徑聚集,以重新產生一標準信號於一 或更多聲道中;及 85801 200407027 ⑷在認聚集信號上執行額外後處理,以梅錦所使用的 該編解碼器及設備所造成之人工及響應異常。 7· —種用以補償操作不良聲波環境中聲音設備之方法,其 包含: (a) 量測該傾聽者環境之脈衝響應,其中聲音設備存在 於該環境中; (b) 使用該量測脈衝響應,得到一補償處理;及 (c) 利用採用該補償處理,補償在聲音播放期間令傾舍 者環境及聲音設備中的缺陷。 ^ 8 ·如申請專利範圍第7項之方法,其中麥克風係用 、J以I測該 傾聽者環境之脈衝響應。 δ58〇 1200407027 Scope of patent application: i A method for enhancing the transmission of sound data, which includes: (a) the encoding of sound data into a digital format; (b) the use of pre-emphasized frequencies and dynamics that are expected to be lost or distorted to enhance The encoded sound signal; (c) transmitting the enhanced sound signal to the user station; (d) decoding the enhanced sound new data after transmitting to the user station; and (e) processing and decoding the sound signal to restore the expected use Pre-emphasis on frequencies and dynamics that are lost or distorted. Preserved frequencies and dynamics. 2. The method of claim 1 in the scope of patent application, wherein the expected loss or distortion of the encoded sound signal is due in whole or in part to the compression of the sound signal. For example, the method of claiming the scope of patent is stated, wherein the expected loss or distortion of the encoded sound signal is due in whole or in part to the transmission of the sound signal. 4. As in the method of claim 4, the method includes compressing the enhanced sound signal before transmitting the enhanced sound signal. 5. The method of claim 4 in which the enhanced sound signal is decompressed after its transmission. 6. · A method for enhancing a compressed or digitally stored sound signal, comprising: (a) receiving a compressed sound signal; (b) dividing the sound signal into discrete bands; (c) processing through different processing paths One or more discrete bands; (d) aggregate these processing paths to regenerate a standard signal in one or more channels; and 85801 200407027 执行 perform additional post-processing on the identified aggregated signal to Meijin Labor and response abnormalities caused by the codec and equipment used. 7. · A method for compensating a sound device in a poorly operated sonic environment, comprising: (a) measuring the impulse response of the listener's environment, where the sound device is present in the environment; (b) using the measurement pulse In response, a compensation process is obtained; and (c) using the compensation process to compensate for defects in the environment of the dumper and the sound equipment during sound playback. ^ 8 The method according to item 7 of the scope of the patent application, in which the microphone is used to measure the impulse response of the listener's environment with. δ58〇 1
TW92115246A 2002-06-05 2003-06-05 Advanced technique for enhancing delivered sound TWI318531B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US38654102P 2002-06-05 2002-06-05

Publications (2)

Publication Number Publication Date
TW200407027A true TW200407027A (en) 2004-05-01
TWI318531B TWI318531B (en) 2009-12-11

Family

ID=45073504

Family Applications (1)

Application Number Title Priority Date Filing Date
TW92115246A TWI318531B (en) 2002-06-05 2003-06-05 Advanced technique for enhancing delivered sound

Country Status (1)

Country Link
TW (1) TWI318531B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI476603B (en) * 2009-01-09 2015-03-11 Lsi Corp Systems and methods for adaptive target search
TWI808670B (en) * 2022-03-07 2023-07-11 華碩電腦股份有限公司 Audio visualization method and system thereof

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI559296B (en) * 2015-05-26 2016-11-21 tian-ci Zhang How to handle tracks

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI476603B (en) * 2009-01-09 2015-03-11 Lsi Corp Systems and methods for adaptive target search
TWI808670B (en) * 2022-03-07 2023-07-11 華碩電腦股份有限公司 Audio visualization method and system thereof

Also Published As

Publication number Publication date
TWI318531B (en) 2009-12-11

Similar Documents

Publication Publication Date Title
JP4817658B2 (en) Acoustic virtual reality engine and new technology to improve delivered speech
US11503421B2 (en) Systems and methods for processing audio signals based on user device parameters
JP6574046B2 (en) Dynamic range control of encoded audio extension metadatabase
US10070245B2 (en) Method and apparatus for personalized audio virtualization
US20090182563A1 (en) System and a method of processing audio data, a program element and a computer-readable medium
US11611828B2 (en) Systems and methods for improving audio virtualization
JP2013541275A (en) Spatial audio encoding and playback of diffuse sound
MX2007010636A (en) Device and method for generating an encoded stereo signal of an audio piece or audio data stream.
US6865430B1 (en) Method and apparatus for the distribution and enhancement of digital compressed audio
US20200273481A1 (en) Method and electronic device
TW200407027A (en) Advanced technique for enhancing delivered sound
WO2007004397A1 (en) Acoustic signal processing apparatus, acoustic signal processing method, acoustic signal processing program, and computer readable recording medium
AU2003251403B2 (en) Acoustical virtual reality engine and advanced techniques for enhancing delivered sound
US20230143062A1 (en) Automatic level-dependent pitch correction of digital audio
US8086448B1 (en) Dynamic modification of a high-order perceptual attribute of an audio signal
WO2022126271A1 (en) Stereo headphone psychoacoustic sound localization system and method for reconstructing stereo psychoacoustic sound signals using same

Legal Events

Date Code Title Description
MK4A Expiration of patent term of an invention patent