TW201843675A

TW201843675A - Apparatus and method for downmixing multichannel audio signals

Info

Publication number: TW201843675A
Application number: TW107105810A
Authority: TW
Inventors: 謝沛倫; 吳采頤
Original assignee: 美商無比的優聲音科技公司
Priority date: 2017-02-17
Filing date: 2018-02-21
Publication date: 2018-12-16
Also published as: WO2018151858A1; EP3583786A1; KR20190109726A; EP3583786A4; JP2020508590A; CN109644315A

Abstract

A method for processing a multi-channel input audio signal is performed at a computing device. The method includes the following steps: selecting, from the multi-channel input audio signal, a left input channel and a right input channel, wherein the left input channel and the right input channel correspond to a pair of spatially symmetrical signal sources; generating one or more cross-channel features from the left input channel and the right input channel; processing, in accordance with the cross-channel features, the left input channel and the right input channel to generate a left intermediate channel and a right intermediate channel; and combining each of the left intermediate channel and the right intermediate channel with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal.

Description

Device and method for downmixing multi-channel audio signals

本發明係關於一種音頻信號處理方法，特別是一種用於降混多聲道音頻信號的電腦實施方法、裝置以及電腦可用程式碼。 The invention relates to an audio signal processing method, in particular to a computer implementation method and device for downmixing multi-channel audio signals and computer usable program code.

環繞音效是使用環繞聆聽者的多聲道，以產生、傳送及播放音頻的技術，通常使用多個分立聲道達成。多聲道或環繞音效兩種最普遍的配置方式為5.1環繞音效及7.1環繞音效。一套5.1環繞音效的配置由一對前置揚聲器(L及R)、一個中央前置聲道(C)、一對側置揚聲器(Ls及Rs)和一個低頻效果聲道(LFE)組成，傳統的配置順序為L，C，R，Ls，Rs，LFE。一套7.1環繞音效的配置由一對前置揚聲器(L及R)、一個中央前置聲道(C)、一對側置環繞揚聲器(Lss及Rss)、一對後置環繞揚聲器(Lrs及Rrs)和一個低頻效果聲道(LFE)組成，傳統的配置順序為L，C，R，Lss，Rss，Lrs，Rrs，LFE。 Surround sound is a technique that uses multiple channels around the listener to generate, transmit, and play audio, usually using multiple discrete channels. The two most common configuration methods for multi-channel or surround sound are 5.1 surround sound and 7.1 surround sound. A set of 5.1 surround sound configuration consists of a pair of front speakers (L and R), a center front channel (C), a pair of side speakers (Ls and Rs) and a low frequency effect channel (LFE), The traditional configuration sequence is L, C, R, Ls, Rs, LFE. A set of 7.1 surround sound configuration consists of a pair of front speakers (L and R), a center front channel (C), a pair of side surround speakers (Lss and Rss), and a pair of rear surround speakers (Lrs and Rrs) and a low-frequency effect channel (LFE), the traditional configuration sequence is L, C, R, Lss, Rss, Lrs, Rrs, LFE.

降混是一種將具有多聲道配置的音頻內容(例如一多聲道音頻檔)，轉換為較少聲道音頻內容的方法。舉例來說，一個5.1環繞音效檔或7.1環繞音效檔經由降混處理，可進而使用一雙聲道的立體聲播放系統播放，而藉由雙聲道立體聲播放系統仍可為聽眾提供優質的聆聽體驗。 Downmixing is a method of converting audio content with a multi-channel configuration (for example, a multi-channel audio file) into less-channel audio content. For example, a 5.1 surround sound file or 7.1 surround sound file can be played back using a two-channel stereo playback system through downmix processing, and the dual-channel stereo playback system can still provide a good listening experience for listeners .

傳統的降混方法中，各個多聲道音頻輸入經由個別處理器獨立且分開處理，以產生一或兩個聲道輸出。各聲道上的處理程序並未恰當考量到兩個成對揚聲器之間有意義的資訊。因此，使用這類傳統降混方法得到的音頻輸出較為失真，且可能使身歷其境的聆聽體驗受到減損。 In the traditional downmix method, each multichannel audio input is processed independently and separately by individual processors to produce one or two channel outputs. The processing procedures on each channel do not properly consider the meaningful information between the two pairs of speakers. Therefore, the audio output obtained by using such traditional downmix methods is relatively distorted, and may detract from the immersive listening experience.

本發明的目的是發展一種成對處理輸入聲道的音頻降混處理管線，能夠使降混後所生成的輸出具有更高準確度，同時保留原本多聲道音頻流的空間資訊。聲道輸入及輸出的數量雖可以是定義範圍內任何有效數字，以下敘述採用5.1聲道降混為立體聲及7.1聲道降混為立體聲作為示例。 The purpose of the present invention is to develop an audio downmix processing pipeline that processes input channels in pairs, which can make the output generated after downmixing have higher accuracy, while retaining the spatial information of the original multi-channel audio stream. Although the number of channel inputs and outputs can be any significant number within the defined range, the following description uses 5.1 channel downmixing to stereo and 7.1 channel downmixing to stereo as examples.

本發明一方面提出一種於運算裝置上執行的多聲道輸入音頻信號處理方法，該運算裝置具有一或多個處理器、記憶體，及儲存於該記憶體且由該一或多個處理器執行的複數個程式模組。該方法包括下列步驟：從該多聲道輸入音頻信號中選取一左輸入聲道及一右輸入聲道，其中該左輸入聲道及該右輸入聲道對應於一對空間上對稱之信號源；從該左輸入聲道及該右輸入聲道產生一或多個跨聲道特徵；根據該等跨聲道特徵處理該左輸入聲道及該右輸入聲道，以產生一左中間聲道及一右中間聲道；以及將該左中間聲道及該右中間聲道，分別與該多聲道輸入音頻信號的一第三輸入聲道結合，以形成一雙聲道輸出音頻信號。 One aspect of the present invention provides a multi-channel input audio signal processing method executed on a computing device. The computing device has one or more processors, a memory, and the one or more processors stored in the memory Multiple program modules executed. The method includes the following steps: selecting a left input channel and a right input channel from the multi-channel input audio signal, wherein the left input channel and the right input channel correspond to a pair of spatially symmetric signal sources Generating one or more cross-channel features from the left input channel and the right input channel; processing the left input channel and the right input channel according to the cross-channel features to generate a left center channel And a right center channel; and the left center channel and the right center channel, respectively, are combined with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal.

本發明另一方面提出一種運算裝置，其包括：一或多個處理器；記憶體；及儲存於該記憶體中且由該一或多個處理器執行的複數個程式模組。當該複數個程式模組由一或多個處理器執行時，能使該運算裝置執行上述處理多聲道輸入音頻信號之方法。本發明又一方面提出一種電腦程式產品，其儲存於一非暫態電腦可讀取儲存媒體，該儲存媒體與一運算裝置連結，且該運算裝置具有一或多個處理器，該電腦程式產品包括複數個程式模組，當該等程式模組由一或多個處理器執行時，會使該運算裝置執行上述處理多聲道音頻信號之方法。 Another aspect of the present invention provides a computing device, including: one or more processors; a memory; and a plurality of program modules stored in the memory and executed by the one or more processors. When the plurality of program modules are executed by one or more processors, the computing device can enable the above-mentioned method of processing multi-channel input audio signals. Another aspect of the present invention provides a computer program product, which is stored in a non-transitory computer readable storage medium, the storage medium is connected to a computing device, and the computing device has one or more processors, the computer program product It includes a plurality of program modules. When the program modules are executed by one or more processors, the computing device will execute the above method for processing multi-channel audio signals.

100‧‧‧數據處理系統 100‧‧‧Data processing system

102‧‧‧通訊架構 102‧‧‧Communication architecture

104‧‧‧處理器單元 104‧‧‧ processor unit

106‧‧‧記憶體 106‧‧‧Memory

108‧‧‧持久性儲存器 108‧‧‧Persistent storage

110‧‧‧通訊單元 110‧‧‧ communication unit

112‧‧‧輸入/輸出單元 112‧‧‧I / O unit

114‧‧‧顯示器 114‧‧‧Monitor

116‧‧‧揚聲器 116‧‧‧speaker

118‧‧‧電腦可讀取儲存器媒體 118‧‧‧ Computer can read the storage media

120‧‧‧程式碼/模組 120‧‧‧Code / module

122‧‧‧電腦程式產品 122‧‧‧Computer program products

222、224‧‧‧信號對 222, 224‧‧‧ signal pair

232、234‧‧‧PROC 232, 234‧‧‧ PROC

240‧‧‧7.1環繞音效檔 240‧‧‧7.1 surround sound file

242、244、246‧‧‧信號對 242, 244, 246‧‧‧ signal pair

252、254、254、256‧‧‧PROC 252, 254, 254, 256‧‧‧ PROC

410‧‧‧輸入信號對 410‧‧‧ Input signal pair

420‧‧‧PROC/信號對處理器 420‧‧‧PROC / signal pair processor

422‧‧‧M/S混音器 422‧‧‧M / S mixer

424‧‧‧側邊信號 424‧‧‧Side signal

426‧‧‧中央分量(M) 426‧‧‧Central component (M)

428‧‧‧等化器(EQ) 428‧‧‧Equalizer (EQ)

430‧‧‧動態範圍壓縮器(DRC) 430‧‧‧Dynamic Range Compressor (DRC)

432‧‧‧串音消除模組(XTC) 432‧‧‧crosstalk cancellation module (XTC)

434‧‧‧寬度控制器(WC) 434‧‧‧Width Controller (WC)

436‧‧‧動態範圍壓縮器(DRC) 436‧‧‧Dynamic Range Compressor (DRC)

440‧‧‧輸出信號對 440‧‧‧ Output signal pair

450‧‧‧帶通濾波信號 450‧‧‧band-pass filtered signal

452‧‧‧殘餘信號 452‧‧‧ Residual signal

462、464、466‧‧‧放大器 462, 464, 466

472、474‧‧‧輸出信號對 472, 474‧‧‧ output signal pair

476‧‧‧輸出信號 476‧‧‧Output signal

482、484、486‧‧‧寬度控制器(WC) 482, 484, 486‧‧‧ width controller (WC)

488‧‧‧立體聲輸出信號 488‧‧‧stereo output signal

500‧‧‧使用者介面(UI) 500‧‧‧User Interface (UI)

510‧‧‧左側面板 510‧‧‧Left panel

520、530、540‧‧‧控制區 520, 530, 540‧‧‧ control area

602至630‧‧‧步驟 602 to 630‧‧‧ steps

本說明書隨附圖式提供對本發明具體實施例更進一步的理解，並構成說明書之一部分，其繪示所述具體實施例且連同說明書詳細解釋本發明的工作原理。相似的元件編號用以表示類似的對應構件。 The accompanying drawings in this specification provide a further understanding of specific embodiments of the present invention, and constitute a part of the description, which depicts the specific embodiments and explain in detail the working principle of the present invention together with the description. Similar component numbers are used to indicate similar corresponding components.

第一A圖係根據部分具體實施例，例示對一5.1聲道輸入信號實施傳統降混方法之方塊圖。 The first diagram A is a block diagram illustrating the implementation of a conventional downmix method on a 5.1-channel input signal according to some specific embodiments.

第一B圖係根據部分具體實施例，例示對一5.1聲道輸入信號實施傳統LoRo降混方法之方塊圖。 Figure B is a block diagram illustrating the implementation of a conventional LoRo downmix method on a 5.1-channel input signal according to some specific embodiments.

第一C圖係根據部分具體實施例，例示對一7.1聲道輸入信號執行環繞音效虛擬化或空間化的傳統降混方法之方塊圖。 The first C diagram is a block diagram illustrating a conventional downmix method for performing surround sound virtualization or spatialization on a 7.1-channel input signal according to some specific embodiments.

第二圖係根據本發明之一例示性具體實施例，例示一經配置執行音頻降混的數據處理系統之方塊圖。 The second diagram is a block diagram illustrating a data processing system configured to perform audio downmixing according to an exemplary embodiment of the present invention.

第三A圖至第三B圖係根據部分具體實施例，例示處理多聲道輸入信號之音頻降混管線方塊圖。 Figures 3A to 3B are block diagrams illustrating audio downmix pipelines for processing multi-channel input signals according to some specific embodiments.

第四A圖係根據部分具體實施例的信號處理流程方塊圖，該信號處理流程包含一PROC，應用在一輸入信號對上。 The fourth diagram A is a block diagram of a signal processing flow according to some specific embodiments. The signal processing flow includes a PROC applied to an input signal pair.

第四B圖係根據部分具體實施例，例示應用在一7.1環繞音效檔之信號處理流程方塊圖。 Figure 4B is a block diagram illustrating a signal processing flow applied to a 7.1 surround sound file according to some specific embodiments.

第五圖係根據部分具體實施例，例示一軟體應用程式或其外掛程式元件之使用者介面，其係用於管理第四B圖所示之7.1環繞音效檔信號處理管線之執行。 The fifth diagram illustrates the user interface of a software application or its plug-in components according to some specific embodiments, which is used to manage the execution of the 7.1 surround sound file signal processing pipeline shown in the fourth diagram.

第六A圖至第六C圖係根據部分具體實施例，例示一多聲道音頻信號降混方法之流程圖。 Figures 6A to 6C are flowcharts illustrating a method for downmixing a multi-channel audio signal according to some specific embodiments.

以下將進行本發明具體實施例之詳細說明，其示例描繪於隨附圖式中。下列敘述將詳細揭示數個非限制性的特定特徵，以協助了解本發明之標的。所屬技術領域中具通常知識者可明顯確知，在不脫離本發明申請專利範圍的情況下，本發明可以各種替代方式加以實施，且無需此處詳述之特定細節亦可實施本發明標的。舉例而言，所屬技術領域中具通常知識者將可明顯確知，本發明揭示之標的可實施於多種無線電通訊系統，例如智慧型手機與平板電腦。 The following is a detailed description of specific embodiments of the present invention, examples of which are depicted in the accompanying drawings. The following description will reveal several non-limiting specific features in detail to assist in understanding the subject matter of the present invention. Those of ordinary skill in the art can clearly know that the present invention can be implemented in various alternative ways without departing from the scope of the present invention, and the subject matter of the present invention can be implemented without the specific details detailed here. For example, those of ordinary skill in the art will clearly know that the subject of the present disclosure can be implemented in a variety of radio communication systems, such as smart phones and tablet computers.

請參閱圖式理解以下說明，圖式中方塊圖所例示之數據處理環境僅說明例示性實施例可據以實施之環境。應瞭解的是，所參照圖式僅為示例性質，並非意欲主張或暗示對於不同實施例可施作環境的限制，所描述之環境仍允許諸多修改。 Please refer to the drawings to understand the following description. The data processing environment illustrated in the block diagram in the drawings only illustrates the environment on which the exemplary embodiments can be implemented. It should be understood that the drawings referred to are merely exemplary in nature, and are not intended to claim or imply limitations on the environment that can be imposed on different embodiments, and the described environment still allows many modifications.

請參閱第一A圖，該圖例示對一5.1輸入信號實施傳統降混方法之方塊圖。如第一A圖所示，各輸入聲道(即：L、C、R、Ls、Rs、LFE)不論與其他聲道的關係為何，均經過分別處理且傳送至各自的處理器模組(PROC)。處理器可包括一或多個子模組(圖中未顯示)，例如增益、時間延遲、低通濾波器及/或其他音頻處理模組。每一處理模組對各別聲道進行處理後的輸出可包括一或多個聲道，端視在處理模組中執行的程序類型而定。最後，在此例中，這些輸出加總(即：Σ)為一雙聲道音頻(即：L和R輸出聲道)。 Please refer to FIG. 1A, which illustrates a block diagram of a conventional downmix method applied to a 5.1 input signal. As shown in Figure A, each input channel (ie: L, C, R, Ls, Rs, LFE), regardless of its relationship with other channels, is processed separately and sent to its respective processor module ( PROC). The processor may include one or more sub-modules (not shown), such as gain, time delay, low-pass filter, and / or other audio processing modules. The output of each processing module processing each channel may include one or more channels, depending on the type of program executed in the processing module. Finally, in this example, the sum of these outputs (ie: Σ) is a two-channel audio (ie: L and R output channels).

請參閱第一B圖，該圖例示對一5.1輸入信號實施傳統單左聲道/單右聲道(LoRo)降混方法之方塊圖。各輸入聲道(即L、C、R、Ls、Rs、LFE)將分別通過一增益模組。增益的調整依照如同由環繞音效系統再現的實際位置而定。雖然環繞聲道可能比左/右聲道被減弱更多，但左側和右側聲道之間的關係是被忽略的。左聲道輸出是藉由將左側所有聲道相加，再加上減弱的C及LFE信號而產生。中央聲道被分為兩聲道，因為在立體聲重現的環境裡，中線上並無實體揚聲器。LFE也被分為兩聲道。右聲道輸出是藉由將右側所有聲道相加，再加上減弱的C及LFE信號而產生。在此例中，每一輸入聲道都由一單獨的處理器分別處理，其包括一簡易增益模組，處理器接收一單聲道輸入並產生一單或雙聲道輸出。最後，所有PROC輸出會依據該輸入所預定的再現位置被加總。 Please refer to FIG. 1B, which illustrates a block diagram of a conventional mono-left channel / mono-right channel (LoRo) downmix method applied to a 5.1 input signal. Each input channel (ie L, C, R, Ls, Rs, LFE) will pass through a gain module. The gain adjustment depends on the actual position as reproduced by the surround sound system. Although the surround channels may be weakened more than the left / right channels, the relationship between the left and right channels is ignored. The left channel output is generated by adding all the channels on the left and adding the weakened C and LFE signals. The center channel is divided into two channels, because in a stereo reproduction environment, there is no physical speaker on the center line. LFE is also divided into two channels. The right channel output is generated by adding all the channels on the right and adding the weakened C and LFE signals. In this example, each input channel is processed separately by a separate processor, which includes a simple gain module. The processor receives a mono input and generates a mono or dual output. Finally, all PROC outputs are summed up according to the reproduction position predetermined by the input.

參閱第一C圖，該方塊圖例示對一7.1輸入信號進行環繞音效虛擬化或空間化之傳統降混方法。在此例中，對於輸入的多聲道音頻之各聲道(即：L、C、R、Lss、Rss、Lrs、Rrs、LFE)，除了各別增益處理之外，還會由代表各實體揚聲器預定位置的頭部相關轉移函數(head-related transfer function，HRTF)各別處理，以產生一雙聲道輸出。舉例而言，左聲道輸入將由代表環繞音效系統中揚聲器左聲道的HRTF加以處理。類似處理程序亦應用於其他所有輸入聲道。所有依據各別輸入聲道產生的雙聲道輸出組將會被加總，分別形成左聲道輸出及右聲道輸出。在此例中，每個輸入聲道亦由一單獨的處理器(例如包括一增益模組及一HRTF濾波器)分別進行處理。處理器會接收一單聲道輸入並產生一雙聲道輸出。所有雙聲道輸出將會被加總，形成最後的雙聲道輸出。 Referring to FIG. 1C, this block diagram illustrates a conventional downmixing method for virtualizing or spatializing surround sound effects on a 7.1 input signal. In this example, for each channel of the input multi-channel audio (ie: L, C, R, Lss, Rss, Lrs, Rrs, LFE), in addition to the individual gain processing, it will also be represented by each entity The head-related transfer function (HRTF) at the predetermined position of the speaker is processed separately to generate a two-channel output. For example, the left channel input will be processed by the HRTF representing the left channel of the speaker in the surround sound system. Similar processing procedures are also applied to all other input channels. All the two-channel output groups generated according to the respective input channels will be added together to form the left channel output and the right channel output. In this example, each input channel is also processed separately by a separate processor (including, for example, a gain module and an HRTF filter). The processor will receive a mono input and generate a two-channel output. All two-channel outputs will be added together to form the final two-channel output.

第二圖係根據本發明一例示性具體實施例，例示經配置以執行音頻降混處理的一數據處理系統100之方塊圖。在此示例中，數據處理系統100包括一通訊架構102，其提供一處理器單元104、一記憶體106、一持久性儲存器108、一通訊單元110、一輸入/輸出(I/O)單元112、一顯示器114，及一或多個揚聲器116之間的通訊。須注意的是，揚聲器116可內建於數據處理系統100之中，或外接於數據處理系統100。在部分實施例中，數據處理系統100的形式可為筆記型電腦、桌上型電腦、平板電腦、行動電話(例如：智慧型手機)、多媒體播放裝置、導航裝置、教育性裝置(例如：兒童學習玩具)、遊戲機系統、影音(AV)接收器，或控制裝置(例如：家用或工業用控制器)。 The second diagram is a block diagram illustrating a data processing system 100 configured to perform audio downmix processing according to an exemplary embodiment of the present invention. In this example, the data processing system 100 includes a communication architecture 102 that provides a processor unit 104, a memory 106, a persistent storage 108, a communication unit 110, and an input / output (I / O) unit 112. Communication between a display 114 and one or more speakers 116. It should be noted that the speaker 116 may be built in the data processing system 100 or externally connected to the data processing system 100. In some embodiments, the data processing system 100 may be in the form of a notebook computer, a desktop computer, a tablet computer, a mobile phone (for example: a smartphone), a multimedia playback device, a navigation device, and an educational device (for example: a child Learning toys), game console systems, audio-visual (AV) receivers, or control devices (such as home or industrial controllers).

處理器單元104的功能是執行可裝載至記憶體106之軟體程式指令。處理器單元104可為包含一或多個處理器的一個套組，或可為一多核心處理器，依所需執行的特定工作而定。再者，處理器單元104可使用一或多個異質處理器系統實施，其中一主要處理器與次要處理器共同位於單一晶片上。另一示例中，處理器單元104可為對稱的多處理器系統，包含同樣類型的多個處理器。 The function of the processor unit 104 is to execute software program instructions that can be loaded into the memory 106. The processor unit 104 may be a set including one or more processors, or may be a multi-core processor, depending on the specific tasks to be performed. Furthermore, the processor unit 104 may be implemented using one or more heterogeneous processor systems, where a primary processor and a secondary processor are co-located on a single chip. In another example, the processor unit 104 may be a symmetric multi-processor system, including multiple processors of the same type.

在這些示例中，記憶體106可為隨機存取記憶體或任何其他合適的揮發性或非揮發性儲存裝置。持久性儲存器108可依特定的工作執行而採用不同的形態。舉例而言，持久性儲存器108可包含一或多個構件或裝置，例如：硬碟、快閃記憶體、可複寫式光碟、可複寫式磁帶，或以上構件或裝置之組合。持久性儲存器108所使用的媒體亦可為可移除式。舉例而言，持久性儲存器108可使用可移除硬碟。 In these examples, the memory 106 may be random access memory or any other suitable volatile or non-volatile storage device. Persistent storage 108 can take different forms depending on the specific job execution. For example, persistent storage 108 may include one or more components or devices, such as a hard disk, flash memory, rewritable optical disc, rewritable magnetic tape, or a combination of the above components or devices. The media used by persistent storage 108 may also be removable. For example, persistent storage 108 may use removable hard drives.

在這些示例中，通訊單元110提供與其他數據處理系統或裝置之通訊。在這些示例中，通訊單元110為一網路介面卡。通訊單元110可透過使用實體和無線通訊連結之其一或兩者，提供通訊功能。 In these examples, the communication unit 110 provides communication with other data processing systems or devices. In these examples, the communication unit 110 is a network interface card. The communication unit 110 can provide a communication function by using one or both of physical and wireless communication links.

輸入/輸出單元112提供與其他可連結於數據處理系統100的裝置間之數據輸入或輸出。舉例而言，輸入/輸出單元112可透過鍵盤和滑鼠，提供使用者一輸入連結設備。甚者，輸入/輸出單元112可傳送輸出至一印表機。顯示器114提供一資訊顯示機制予使用者。揚聲器116則向使用者播放音效。 The input / output unit 112 provides data input or output with other devices that can be connected to the data processing system 100. For example, the input / output unit 112 can provide a user with an input connection device through a keyboard and a mouse. Moreover, the input / output unit 112 can send and output to a printer. The display 114 provides an information display mechanism to the user. The speaker 116 plays sound effects to the user.

作業系統指令及應用程式或程式位於持久性儲存器108上。這些指令可以被加載至記憶體106，供處理器單元104執行。以下各實施例中所述處理程序，可由處理器單元104使用位於記憶體(例如：記憶體106)上的電腦實施指令加以執行。這些指令稱為程式碼(或模組)，電腦可用程式碼(或模組)，或電腦可讀取程式碼(或模組)，其可被處理器單元104中的一個處理器讀取並執行。這些不同實施例中的程式碼(或模組)可在各類實體的或有形的電腦可讀取媒體上具體化，例如記憶體106或持久性儲存器108。 Operating system commands and applications or programs are located on persistent storage 108. These instructions may be loaded into the memory 106 for execution by the processor unit 104. The processing procedures described in the following embodiments can be executed by the processor unit 104 using computer-implemented instructions located on a memory (for example, the memory 106). These instructions are called program code (or module), the computer can use the program code (or module), or the computer can read the program code (or module), which can be read by a processor in the processor unit 104 and carried out. The code (or modules) in these different embodiments may be embodied on various types of physical or tangible computer-readable media, such as memory 106 or persistent storage 108.

程式碼/模組120位於一電腦可讀取儲存媒體118的一功能性形態上，該電腦可讀取儲存媒體118為可選擇為移動式的，程式碼/模組120亦可加載或傳輸至數據處理系統100，由處理器單元104加以執行。在這些示例中，程式碼/模組120及電腦可讀取儲存媒體118構成一電腦程式產品122。在一示例中，電腦可讀取儲存媒體118可為有形形態，舉例而言，如插入或置入持久性儲存器108一部分之硬碟機或其他裝置中之光碟或磁碟，以傳送資料至一儲存裝置，例如為持久性儲存器108一部分之硬碟機。如為有形形態，電腦可讀取儲存媒體118也可以為連結至數據處理系統100之持久性儲存器形式，例如：硬碟、隨身碟或快閃記憶體等。電腦可讀取儲存媒體118的有形形態亦可稱為電腦可記錄儲存媒體。在部分示例中，電腦可讀取儲存媒體118可能無法從數據處理系統100上移除。 The code / module 120 is located on a functional form of a computer-readable storage medium 118. The computer-readable storage medium 118 is optionally removable, and the code / module 120 can also be loaded or transferred to The data processing system 100 is executed by the processor unit 104. In these examples, the program code / module 120 and the computer-readable storage medium 118 constitute a computer program product 122. In one example, the computer-readable storage medium 118 may be in a tangible form, such as an optical disc or disk inserted or inserted into a hard drive or other device in a portion of persistent storage 108 to transfer data A storage device, such as a hard drive that is part of persistent storage 108. In a tangible form, the computer-readable storage medium 118 may also be a form of persistent storage connected to the data processing system 100, such as a hard disk, pen drive, or flash memory. The tangible form of the computer-readable storage medium 118 may also be referred to as computer-recordable storage medium. In some examples, the computer-readable storage medium 118 may not be removable from the data processing system 100.

或者，程式碼/模組120可透過與通訊單元110連結之一通訊連結方式和/或藉由與輸入/輸出單元112連接之構件，從電腦可讀取儲存媒體118傳輸至數據處理系統100。在示例中，此通訊連結方式和/或連接構件可為實體或無線。此電腦可讀取媒體亦可為非有形媒體形態，例如：包含程式碼/模組之通訊連結方式或無線傳輸方式。 Alternatively, the program code / module 120 may be transmitted from the computer-readable storage medium 118 to the data processing system 100 through a communication link with the communication unit 110 and / or through a component connected to the input / output unit 112. In an example, this communication link and / or connection member may be physical or wireless. The computer-readable media can also be in the form of non-tangible media, such as: communication link method or wireless transmission method including code / module.

繪示於數據處理系統100之不同構件並非為了對不同實施例之可能實施形態提供架構上的限制。這些不同的例示性實施例可於一數據處理系統上實施，該數據處理系統可包括所繪示的數據處理系統100構件以外的構件，或包括其他可替代之構件。其他於第一圖顯示之構件，可與示例所顯示之構件不同。 The different components shown in the data processing system 100 are not intended to provide architectural limitations to the possible implementation forms of different embodiments. These different exemplary embodiments may be implemented on a data processing system, which may include components other than those shown in the data processing system 100, or include other alternative components. The other components shown in the first figure may be different from the components shown in the examples.

在一示例中，數據處理系統100之一儲存裝置為任何可儲存數據之硬體設備。記憶體106、持久性儲存器108及電腦可讀取儲存媒體118皆為有形儲存裝置之例。 In one example, a storage device of the data processing system 100 is any hardware device that can store data. Memory 106, persistent storage 108, and computer-readable storage medium 118 are all examples of tangible storage devices.

在另一示例中，一匯流排系統可用來實施通訊架構102，且由一或多個匯流排組成，例如一系統匯流排或輸入/輸出匯流排。匯流排系統的實施可使用任何合適的架構類型，提供與匯流排系統連結之不同構件或裝置間之資料傳輸。此外，通訊單元可包括用於傳送及接收數據之一或多個裝置，例如：數據機或網路配接器。再者，此處之記憶體舉例而言可為記憶體106或快取記憶體，其可能存在於通訊架構102之中的介面與記憶體控制器中樞。 In another example, a bus system can be used to implement the communication architecture 102 and consist of one or more bus bars, such as a system bus or input / output bus. The implementation of the bus system can use any suitable architecture type to provide data transmission between different components or devices connected to the bus system. In addition, the communication unit may include one or more devices for transmitting and receiving data, such as a modem or a network adapter. Furthermore, the memory here may be, for example, the memory 106 or the cache memory, which may exist in the interface and the memory controller hub in the communication architecture 102.

為解決本發明背景中所述之傳統方法的問題，以下敘述本發明之不同具體實施例，其與成對降混輸入聲道之系統與方法相關，以提供更佳的音頻資訊準確度，並保留原多聲道音頻流的空間資訊。與傳統方式不同的是，輸入聲道間的關係是成對考量的，每一信號對會被各別傳送至一處理器。單一信號對所包含的信號資訊會被相互比較與分析，以創造更堅實的音像與更佳的空間效果。為使信號對具有意義，處理程序中至少一模組需就一信號對本身包含的信號資訊進行交互參照。各信號對之雙聲道輸出將與單聲道加總，以產生一左聲道輸出及一右聲道輸出。 In order to solve the problems of the conventional methods described in the background of the present invention, the following describes different specific embodiments of the present invention, which are related to the system and method of paired downmix input channels to provide better accuracy of audio information, and Retain the spatial information of the original multi-channel audio stream. Different from the traditional method, the relationship between the input channels is considered in pairs, and each signal pair is separately transmitted to a processor. The signal information contained in a single signal pair will be compared and analyzed with each other to create a stronger audio-visual and better spatial effect. In order to make the signal pair meaningful, at least one module in the processing program needs to cross-reference the signal information contained in the signal to the signal itself. The dual channel output of each signal pair will be summed with the mono channel to produce a left channel output and a right channel output.

第三A圖至第三B圖係根據本發明之部分實施例，例示處理多聲道輸入信號的音頻降混管線方塊圖。第三A圖中的多聲道輸入信號為一5.1環繞音效檔210，其包括一左前置聲道(L)、一中央前置聲道(C)、一右前置聲道(R)、一左側置聲道(Ls)、一右側置聲道(Rs)與一低頻效果聲道(LFE)。第三B圖中的多聲道輸入信號為一7.1環繞音效檔240，其包括一左前置聲道(L)、一中央前置聲道(C)、一右前置聲道(R)、一左側置環繞聲道(Lss)、一右側置環繞聲道(Rss)、一左後置環繞聲道(Lrs)、一右後置環繞聲道(Rrs)與一低頻效果聲道(LFE)。 Figures 3A to 3B are block diagrams illustrating an audio downmix pipeline for processing multi-channel input signals according to some embodiments of the present invention. The multi-channel input signal in the third diagram A is a 5.1 surround sound file 210, which includes a left front channel (L), a center front channel (C), and a right front channel (R) , A left channel (Ls), a right channel (Rs) and a low frequency effect channel (LFE). The multi-channel input signal in Figure 3B is a 7.1 surround sound file 240, which includes a left front channel (L), a center front channel (C), and a right front channel (R) , A left surround channel (Lss), a right surround channel (Rss), a left surround channel (Lrs), a right surround channel (Rrs) and a low frequency effect channel (LFE) ).

在部分實施例中，本系統從多聲道輸入信號中挑選一或多個輸入信號對。在部分實施例中，一輸入信號對與欲在對稱兩側揚聲器上重現的兩組音頻流相對應。因此，此輸入信號對包括一對空間上對稱之信號源。在部分實施例中，一輸入信號對包括兩個聲道，其位於兩側(即：左側及右側)且與中線保持相同角度。舉例而言，前端對包括左前置(L)聲道及右前置(R)聲道，左側與右側各自與中線保持30°角。在另一例中為杜比7.1環繞音效設置，後端對包括左後置環繞(Lrs)聲道及右後置環繞(Rrs)聲道，左右各自與中線保持135°角。選出的每個輸入信號對接著被發送至各別的處理器(PROC)，以產生一輸出音頻信號對。 In some embodiments, the system selects one or more input signal pairs from multi-channel input signals. In some embodiments, an input signal pair corresponds to two sets of audio streams to be reproduced on symmetrical two-sided speakers. Therefore, the input signal pair includes a pair of spatially symmetrical signal sources. In some embodiments, an input signal pair includes two channels, which are located on both sides (ie, left and right) and maintain the same angle with the center line. For example, the front pair includes the left front (L) channel and the right front (R) channel, and the left and right sides are each maintained at an angle of 30 ° to the center line. In another example, the Dolby 7.1 surround sound setting, the rear pair includes the left rear surround (Lrs) channel and the right rear surround (Rrs) channel, and the left and right sides maintain a 135 ° angle with the center line. Each selected input signal pair is then sent to a respective processor (PROC) to generate an output audio signal pair.

在本發明部分實施例中，如第三A圖所示者，系統從5.1環繞音效檔之多輸入聲道中選取左前置聲道(L)及右前置聲道(R)成為一信號對222，並選取左側置聲道(Ls)及右側置聲道(Rs)成為一信號對224。在部分實施例中，如第三B圖所示者，系統從7.1環繞音效檔之多輸入聲道中選取左前置聲道(L)及右前置聲道(R)成為一信號對242，選取左側置環繞聲道(Lss)及右側置環繞聲道(Rss)成為一信號對244，以及選取左後置環繞聲道(Lrs)及右後置環繞聲道(Rrs)成為一信號對246。位於中線上的聲道(例如：中央前置聲道C)及全向式聲道(例如：LFE)各為單一聲道，而不會與其他任何聲道配對。 In some embodiments of the present invention, as shown in FIG. 3A, the system selects the left front channel (L) and the right front channel (R) from the multiple input channels of the 5.1 surround sound file as a signal Pair 222, and select the left channel (Ls) and the right channel (Rs) to become a signal pair 224. In some embodiments, as shown in Figure 3B, the system selects the left front channel (L) and the right front channel (R) from the multiple input channels of the 7.1 surround sound file to become a signal pair 242 , Select the left surround channel (Lss) and right surround channel (Rss) as a signal pair 244, and select the left rear surround channel (Lrs) and right rear surround channel (Rrs) as a signal pair 246. The channels located on the center line (for example: center front channel C) and omnidirectional channels (for example: LFE) are each a single channel, and will not be paired with any other channel.

在部分實施例中，各信號對將分別被傳送進入一處理器。舉例而言，在第三A圖中，信號對222及224分別被傳送至PROC 232及234；在第三B圖中，信號對242、244、246分別被傳送至PROC 252、254、256。在此程序中，單一信號對所包含的信號資訊會被相互比較與分析，以創造更堅實的音像與更佳的空間效果。為使信號對具有意義，在各個處理器PROC所包含的一或多個模組中，至少一個模組需交互參照一信號對內兩個聲道間的資訊。每一處理器PROC之輸出信號包括雙聲道，且各信號對之雙聲道輸出將與其他單一聲道之輸出進行加總(Σ)，以產生包括一左聲道輸出(L’)及一右聲道輸出(R’)的輸出信號(如第三A圖和第三B圖分別所示)。 In some embodiments, each signal pair will be transmitted into a processor. For example, in the third diagram A, the signal pairs 222 and 224 are transmitted to PROC 232 and 234, respectively; in the third diagram B, the signal pairs 242, 244, and 246 are transmitted to PROC 252, 254, and 256, respectively. In this procedure, the signal information contained in a single signal pair will be compared and analyzed with each other to create a stronger audio-visual and better spatial effect. To make the signal pair meaningful, among the one or more modules included in each processor PROC, at least one module needs to cross-reference the information between the two channels in a signal pair. The output signal of each processor PROC includes two channels, and the two channel output of each signal pair will be summed with the output of other single channels (Σ) to produce a left channel output (L ') and The output signal of a right channel output (R ') (as shown in Figures A and B respectively).

在部分實施例中，信號對處理器(PROC)可為任何可能的「二進二出」音頻信號處理器。如上所述，此處理器包括至少一個從一輸入信號對之中將跨聲道特徵合併的模組。在部分實施例中，一信號對處理器(PROC)包括一或多個從一輸入信號對讀取信號資訊的模組。基於此輸入信號對的信號資訊，信號對處理器(PROC)接著在成對的基礎上改變輸出流。 In some embodiments, the signal-to-processor (PROC) may be any possible "two-in, two-out" audio signal processor. As mentioned above, the processor includes at least one module that combines cross-channel features from an input signal pair. In some embodiments, a signal pair processor (PROC) includes one or more modules that read signal information from an input signal pair. Based on the signal information of this input signal pair, the signal pair processor (PROC) then changes the output stream on a paired basis.

在部分實施例中，一信號對處理器(PROC)由複數個不同構件(或模組)組成，包括聲道非獨立處理構件及/或聲道獨立處理構件。在部分實施例中，聲道非獨立處理構件執行一「多聲道進、多聲道出」的處理程序，例如：一信號對處理器中的二進二出程序。在部分實施例中，聲道非獨立處理構件產生多個輸出聲道，其中每一輸出聲道係基於一個以上的輸入聲道產生。在部分實施例中，聲道非獨立處理構件利用輸入信號的資訊，並基於所提取的跨聲道特徵調整處理程序。在部分實施例中，跨聲道特徵包括各輸入聲道音量的比較、左右輸入聲道頻譜特性(例如：強度及/或相位)之間的關係，及/或左右輸入聲道信號開始之時間差和振幅差。在部分實施例中，信號對處理器(PROC)包括一或多個聲道非獨立處理構件，例如：M/S混音器(mid/side mixer，中央/兩側混音器)及/或寬度控制器(WC)。舉例而言，M/S混音器可使用左右輸入信號的總和及差，產生中央及兩側信號。在另一例中，藉由比較輸入信號頻譜的重疊區域，M/S混音器可產生中央及兩側信號。 In some embodiments, a signal pair processor (PROC) is composed of a plurality of different components (or modules), including channel independent processing components and / or channel independent processing components. In some embodiments, the channel-independent processing component executes a "multi-channel in, multi-channel out" processing program, for example, a signal to the two-in-two-out program in the processor. In some embodiments, the channel independent processing component generates multiple output channels, where each output channel is generated based on more than one input channel. In some embodiments, the channel independent processing component uses the information of the input signal and adjusts the processing program based on the extracted cross-channel features. In some embodiments, the cross-channel characteristics include the comparison of the volume of each input channel, the relationship between the spectral characteristics (eg, intensity and / or phase) of the left and right input channels, and / or the time difference between the start of the left and right input channel signals And amplitude difference. In some embodiments, the signal-to-processor (PROC) includes one or more channel-independent processing components, such as an M / S mixer (mid / side mixer) and / or Width controller (WC). For example, the M / S mixer can use the sum and difference of the left and right input signals to generate the center and side signals. In another example, by comparing the overlapping areas of the input signal spectrum, the M / S mixer can generate the center and both side signals.

在部分實施例中，聲道獨立處理構件執行一「多聲道進、多聲道出」的程序(包括二進二出)，分開處理多聲道輸入檔的每一個別聲道。在部分實施例中，聲道獨立處理構件處理多聲道輸入檔時，是把多個聲道當成被切分為同數量的單聲道信號進行處理，各個單聲道信號皆被獨立處理，再將多個聲道各別處理過後的單聲道信號全部加總。在部分實施例中，信號對處理器(PROC)包括一或多個聲道獨立處理構件，例如等化器(EQ)及/或動態範圍壓縮器(DRC)。舉例而言，一等化器(EQ)模組分別接收單一聲道，並產生相應的聲道，而不需使用其他聲道的任何資訊。等化器(EQ)的產出結果，即如同各聲道均使用相同的輸入參數分別進行等化處理。 In some embodiments, the channel independent processing component executes a "multi-channel in, multi-channel out" procedure (including two in and two out), and processes each individual channel of the multi-channel input file separately. In some embodiments, when the channel-independent processing component processes a multi-channel input file, multiple channels are treated as mono signals that are divided into the same number, and each mono signal is processed independently. Then, the mono signals after the processing of multiple channels are added up. In some embodiments, the signal pair processor (PROC) includes one or more channel independent processing components, such as an equalizer (EQ) and / or a dynamic range compressor (DRC). For example, an equalizer (EQ) module receives a single channel and generates the corresponding channel without using any information from other channels. The output of the equalizer (EQ) is that each channel is equalized using the same input parameters.

第四A圖係根據本發明之部分實施例，例示將一PROC 420應用於一輸入信號對410之信號流程方塊圖。在部分實施例中，PROC 420包括複數個模組(亦稱為構件)，該等模組經配置以處理輸入信號對410。在部分實施例中，PROC 420可為任何信號對處理器，例如第三A圖至第三B圖所示的PROC 232、PROC 234、PROC 252、PROC 254或PROC 256。因此，信號對處理器420可應用於任何輸入信號對，例如第三A圖至第三B圖所示的信號對222、信號對224、信號對242、信號對244或信號對246。在部分實施例中，信號對處理器420包括一M/S混音器422、一等化器(EQ)428、一動態範圍壓縮器(DRC)430及一串音消除(XTC)模組432，信號對處理器 420的上述構件相互耦接，如第四A圖所示。PROC 420之輸出信號會進一步由一寬度控制器(WC)434及另一動態範圍壓縮器(DRC)436處理，以獲得輸出信號對440，如第四A圖所示。 The fourth diagram A is a block diagram illustrating a signal flow of applying a PROC 420 to an input signal pair 410 according to some embodiments of the present invention. In some embodiments, PROC 420 includes a plurality of modules (also referred to as components) that are configured to process input signal pairs 410. In some embodiments, PROC 420 may be any signal pair processor, such as PROC 232, PROC 234, PROC 252, PROC 254, or PROC 256 shown in FIGS. 3A to 3B. Therefore, the signal pair processor 420 can be applied to any input signal pair, such as the signal pair 222, the signal pair 224, the signal pair 242, the signal pair 244, or the signal pair 246 shown in the third to third B diagrams. In some embodiments, the signal pair processor 420 includes an M / S mixer 422, an equalizer (EQ) 428, a dynamic range compressor (DRC) 430, and a crosstalk cancellation (XTC) module 432 The signal is coupled to the above components of the processor 420 as shown in the fourth diagram A. The output signal of PROC 420 is further processed by a width controller (WC) 434 and another dynamic range compressor (DRC) 436 to obtain an output signal pair 440, as shown in the fourth diagram A.

在部分實施例中，數據處理系統100首先傳送左、右輸入信號410進入M/S混音器422。在部分實施例中，M/S混音器422為一混音工具，經配置以從輸入信號對410中產生三個信號分量(signal components)(包括兩個側邊分量(S)424，即左側及右側，以及一中央分量(M)426)。左側分量代表只出現在左聲道的音源，而右側分量對應於只出現在右聲道的音源。中央分量為只出現在音場的幻象中央聲道(phantom center)之音源，例如：主要音樂元素及對白。 In some embodiments, the data processing system 100 first transmits the left and right input signals 410 into the M / S mixer 422. In some embodiments, the M / S mixer 422 is a mixing tool configured to generate three signal components (including two side components (S) 424 from the input signal pair 410, ie Left and right, and a central component (M) 426). The left component represents a source that only appears on the left channel, while the right component corresponds to a source that only appears on the right channel. The central component is the sound source of the phantom center that appears only in the sound field, for example: the main musical elements and dialogue.

如此一來，M/S混音器422可分離出有助於各種後續音場增強處理的資訊，並將音質上不必要的失真(例如：音染)減至最低。再者，此一步驟也有助於降低左右分量的關聯性。在部分實施例中，M/S混音器422分析音象並對中央及左、右聲道而來的音效進行估測，之後M/S混音器422將兩個輸入聲道(即左側與右側信號410)分成單聲道的中央信號426及兩聲道的側邊信號424。有關M/S混音器422更詳細的說明，可見2015年10月27日提出申請，名稱為APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT之PCT申請案(案號PCT/US2015/057616)，其全文以併入參照方式納入本案。 In this way, the M / S mixer 422 can separate information that is helpful for various subsequent sound field enhancement processing, and minimize unnecessary distortion (eg, sound stain) in sound quality. Furthermore, this step also helps to reduce the correlation between the left and right components. In some embodiments, the M / S mixer 422 analyzes the audio image and estimates the sound effects from the center and left and right channels, and then the M / S mixer 422 converts the two input channels (ie, the left The right side signal 410) is divided into a mono-channel center signal 426 and a two-channel side signal 424. For a more detailed description of the M / S mixer 422, see the application for PCT named APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT (Case No. PCT / US2015 / 057616), which was filed on October 27, 2015. The way of reference is included in this case.

接下來，系統發送側邊信號424至等化器(EQ)428以調整側邊信號424之頻率分量。在部分實施例中，兩個側邊信號424進入之等化器428包括一或多個多頻帶等化器，用以對兩個側邊信號執行帶通濾波功能。在部分實施例中，應用於各個側邊信號之多頻帶等化器均相同。在部分其他實施例中，應用於一側邊信號之多頻帶等化器不同於應用在另一側邊信號之等化器。然而，其功能同樣是保持音頻信號原本的音色，且避免此兩信號出現模糊的空間線索。在部分實施例中，等化器428亦可用來基於兩側邊信號分量的頻譜分析，選取目標音源。在部分實施例中，如第四A圖所示者，等化器428產生兩個輸出信號450及452。在部分實施例中，輸出信號450及452各自為兩聲道的音頻信號。在部分實施例中，等化器428對每個兩聲道的側邊信號424分別使用帶通濾波器，以獲得帶通濾波信號450。 Next, the system sends the side signal 424 to an equalizer (EQ) 428 to adjust the frequency component of the side signal 424. In some embodiments, the equalizer 428 into which the two side signals 424 enter includes one or more multi-band equalizers to perform a band-pass filtering function on the two side signals. In some embodiments, the multi-band equalizer applied to each side signal is the same. In some other embodiments, the multi-band equalizer applied to one side signal is different from the equalizer applied to the other side signal. However, its function is also to maintain the original timbre of the audio signal, and to avoid fuzzy spatial clues of the two signals. In some embodiments, the equalizer 428 can also be used to select the target sound source based on the spectrum analysis of the signal components on both sides. In some embodiments, as shown in the fourth diagram A, the equalizer 428 generates two output signals 450 and 452. In some embodiments, the output signals 450 and 452 are each two-channel audio signals. In some embodiments, the equalizer 428 uses a band-pass filter on each two-channel side signal 424 to obtain a band-pass filtered signal 450.

在部分實施例中，等化器428亦基於其輸入信號(即兩聲道的側邊信號424)及輸出信號(即兩聲道的帶通濾波信號450)之頻帶差，產生殘餘信號452。在部分實施例中，數據處理系統100藉由從帶通濾波處理後之左側分量及帶通濾波處理後之右側分量，分別減去左側分量及右側分量，而產生一左側殘餘分量及一右側殘餘分量。在部分實施例中，分別對殘餘信號及串音消除處理產生之生成信號應用了各別的放大器，以在兩信號結合前調整其增益。有關等化器428更詳細的說明，可見2015年10月27日提出申請，名稱為APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT之PCT申請案(案號PCT/US2015/057616)，其全文以併入參照方式納入本案。 In some embodiments, the equalizer 428 also generates a residual signal 452 based on the band difference between its input signal (ie, the two-channel side signal 424) and the output signal (ie, the two-channel band-pass filtered signal 450). In some embodiments, the data processing system 100 generates a left residual component and a right residual by subtracting the left component and the right component from the left component after the band pass filtering process and the right component after the band pass filtering process, respectively Weight. In some embodiments, separate amplifiers are applied to the residual signal and the generated signal generated by the crosstalk cancellation process to adjust the gain before combining the two signals. For a more detailed description of the equalizer 428, see the application filed on October 27, 2015, named PCARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT (Case No. PCT / US2015 / 057616), the full text of which is incorporated by reference Included in this case.

在部分實施例中，帶通濾波信號450會被傳送至動態範圍壓縮器(DRC)430。在部分實施例中，動態範圍壓縮器430包括一帶通濾波器(與等化器428之帶通濾波器不同)，用於在一預定頻率範圍內放大此兩個聲音信號(亦即：兩聲道的帶通濾波信號450)，以將串音消除模塊(XTC) 432所達成之音場增強效果最大化。在部分實施例中，使用者(例如：音頻工程師)可調整動態範圍壓縮器430之帶通濾波器，以使一特定頻帶突出。如此一來，使用者可突顯其所選擇的某些特定聲音事件。舉例而言，數據處理系統100使用等化器428的第一帶通濾波器，對左側分量及右側分量執行等化處理後，接著使用動態範圍壓縮器430的第二帶通濾波器，針對左側分量及右側分量移除一預定頻帶。等化器模塊428及動態範圍壓縮器模塊430可使用之代表性帶通濾波器，包括雙二階濾波器或Butterworth濾波器。在部分實施例中，數據處理系統100使用等化器428之第一帶通濾波器對左側分量及右側分量執行等化處理後，接著使用動態範圍壓縮器430對左右側分量執行第一動態範圍壓縮，以突顯其他頻率的預定頻帶。有關動態範圍壓縮器430更詳細的說明，可見2015年10月27日提出申請，名稱為APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT之PCT申請案(案號PCT/US2015/057616)，其全文以併入參照方式納入本案。 In some embodiments, the band-pass filtered signal 450 is transmitted to the dynamic range compressor (DRC) 430. In some embodiments, the dynamic range compressor 430 includes a band-pass filter (unlike the band-pass filter of the equalizer 428) for amplifying the two sound signals within a predetermined frequency range (ie: two sounds) The channel's band-pass filtered signal 450) to maximize the sound field enhancement effect achieved by the crosstalk cancellation module (XTC) 432. In some embodiments, a user (for example, an audio engineer) can adjust the band-pass filter of the dynamic range compressor 430 to make a specific frequency band stand out. In this way, the user can highlight some specific sound events of his choice. For example, the data processing system 100 uses the first bandpass filter of the equalizer 428, performs equalization processing on the left and right components, and then uses the second bandpass filter of the dynamic range compressor 430 for the left The component and the right component remove a predetermined frequency band. Representative bandpass filters that can be used by the equalizer module 428 and the dynamic range compressor module 430 include biquad filters or Butterworth filters. In some embodiments, the data processing system 100 uses the first band-pass filter of the equalizer 428 to perform equalization processing on the left and right components, and then uses the dynamic range compressor 430 to perform the first dynamic range on the left and right components Compressed to highlight predetermined frequency bands of other frequencies. For a more detailed description of the dynamic range compressor 430, see the application filed on October 27, 2015, named PCARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT (Case No. PCT / US2015 / 057616), the full text of which is incorporated by reference Into the case.

在部分實施例中，信號從動態範圍壓縮器430輸出之後傳送至串音消除(XTC)模組432，以執行串音消除程序。串音是立體聲(即：雙聲道)音響揚聲器播放時既有的問題，會在聲音從各揚聲器到達相對側的耳中時發生，且會為原信號帶來不必要的頻譜音染。此問題的解決方法是串音消除(XTC)運算。其中一種XTC運算是使用廣義的方向性雙耳轉移函數，例如頭部相關轉移函數(HRTFs)及/或空間雙耳脈衝響應函數(BRIR)，來表示相對於聽者位置的兩個實體揚聲器的角度。另一種XTC運算系統為遞迴式串音消除方法，該方法不需計算頭部轉移函數(HRTF)、空間雙耳脈衝響應函數(BRIR)或其他雙耳轉移函數。基本的運算方法可以下列公式表示之：left[n]=left[n]-A _L＊right[n-d _L] right[n]=right[n]-A _R＊left[n-d _R]其中，A _L及A _R為信號之衰減係數，d _L及d _R為從個別揚聲器到相對側耳中發生延遲的數據樣本數。在部分實施例中，第四A圖所示之串音消除模組432使用遞迴式串音消除方法或廣義的方向性雙耳轉移函數。有關串音消除模組432更詳細的說明，可見2014年12月12日提出申請，名稱為APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT之美國專利14/569,490號申請案(於2016年12月27日獲准，核准專利號9,532,156)，以及2016年11月11日提出申請，名稱為APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT之美國專利15/349,822號申請案，上述案件之全文以併入參照方式納入本案。 In some embodiments, the signal is output from the dynamic range compressor 430 and then transmitted to the crosstalk cancellation (XTC) module 432 to perform the crosstalk cancellation process. Crosstalk is an existing problem in stereo (ie, two-channel) audio speakers. It will occur when the sound from each speaker reaches the opposite side of the ear, and it will bring unnecessary spectrum to the original signal. The solution to this problem is crosstalk cancellation (XTC) operation. One of the XTC operations is to use generalized directional binaural transfer functions, such as head-related transfer functions (HRTFs) and / or spatial binaural impulse response functions (BRIR), to represent the two physical speakers' relative to the listener's position. angle. Another XTC computing system is a recursive crosstalk cancellation method, which does not need to calculate the head transfer function (HRTF), spatial binaural impulse response function (BRIR), or other binaural transfer functions. The basic calculation method can be expressed by the following formula: left [ n ] = left [ n ] -A _L * right [ n - d _L ] right [ n ] = right [ n ] -A _R * left [ n - d _R ] Among them, A _L and A _R are the attenuation coefficients of the signal, and d _L and d _R are the number of data samples that are delayed from individual speakers to the opposite side ear. In some embodiments, the crosstalk cancellation module 432 shown in FIG. 4A uses a recursive crosstalk cancellation method or a generalized directional binaural transfer function. For a more detailed description of the crosstalk cancellation module 432, see the application filed on December 12, 2014, US Patent Application No. 14 / 569,490, named APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT (Approved on December 27, 2016, Approved Patent No. 9,532,156), and an application filed on November 11, 2016, named US Patent No. 15 / 349,822, named APPARATUS AND METHOD FOR SOUND STAGE ENHANCEMENT. The full text of the above cases is incorporated by reference.

在如第四A圖所示之部分實施例中，串音消除模組432之輸出信號會被傳送至放大器462，殘餘信號對452被傳送至放大器464，中央分量(M)426亦被傳送至放大器466，接著上述三種信號再被傳送至寬度控制器(WC)434進行處理與結合。 In some embodiments shown in FIG. 4A, the output signal of the crosstalk cancellation module 432 is sent to the amplifier 462, the residual signal pair 452 is sent to the amplifier 464, and the central component (M) 426 is also sent to The amplifier 466 then sends the above three signals to the width controller (WC) 434 for processing and combining.

在部分實施例中，放大器462、464、466之輸出信號被傳送至寬度控制器434以調整音場寬度。在部分實施例中，寬度控制器434使用經過分析的輸入信號對資訊來控制輸出音頻信號之音場寬度。音場寬度範圍可以從最窄為0°，到完全沉浸式音效的360°寬。在部分實施例中，會針對一信號對(例如：輸出信號對472或474)的跨聲道資訊進行分析及調整。在部分實施例中，使用者可以在寬度控制器上指定想要的音場寬度，如以下第五圖所例示。指定的音場寬度可能基於先前分析之資訊，影響寬度控制器434之信號對求和矩陣。在部分實施例中，音場寬度可以下列等式進行調整： In some embodiments, the output signals of the amplifiers 462, 464, 466 are sent to the width controller 434 to adjust the width of the sound field. In some embodiments, the width controller 434 uses the analyzed input signal pair information to control the width of the sound field of the output audio signal. The width of the sound field can range from the narrowest to 0 °, to 360 ° wide for fully immersive sound effects. In some embodiments, the cross-channel information of a signal pair (eg, output signal pair 472 or 474) is analyzed and adjusted. In some embodiments, the user can specify the desired width of the sound field on the width controller, as illustrated in the fifth figure below. The specified width of the sound field may be based on the previously analyzed information and affect the signal pair summation matrix of the width controller 434. In some embodiments, the width of the sound field can be adjusted by the following equation:

其中音場寬度參數為-5β0。當β=0時，生成信號具有最大音場寬度，當β=-5時，生成信號接近一單聲道信號。 Where the sound field width parameter is -5 β 0. When β = 0, the generated signal has the maximum sound field width, and when β = -5, the generated signal is close to a mono signal.

在部分實施例中，寬度控制器434之輸出信號476會被傳送至第二動態範圍壓縮器(DRC)436，以放大音頻在母帶後期製作程序中聲音信號之整體輸出等級。在部分實施例中，數據處理系統100藉由動態範圍壓縮器436對左側分量及右側分量執行第二動態範圍壓縮，以保存數位音頻輸出信號之定位線索。 In some embodiments, the output signal 476 of the width controller 434 is sent to the second dynamic range compressor (DRC) 436 to amplify the overall output level of the audio signal in the mastering post-production process. In some embodiments, the data processing system 100 uses the dynamic range compressor 436 to perform second dynamic range compression on the left component and the right component to save the positioning clues of the digital audio output signal.

如第四A圖所示，此流程管線之輸出為一立體聲音頻信號440，其包括一左聲道(L’)及一右聲道(R’)。在部分實施例中，此一包括PROC 420的信號處理流程，可應用於如第三A圖所示的5.1環繞音效檔。舉例而言，第三A圖的PROC 232及/或PROC 234，可能近似於第四A圖所例示之PROC 420。 As shown in the fourth diagram A, the output of this process pipeline is a stereo audio signal 440, which includes a left channel (L ') and a right channel (R'). In some embodiments, the signal processing flow including PROC 420 can be applied to the 5.1 surround sound file as shown in the third diagram A. For example, the PROC 232 and / or PROC 234 of the third diagram A may be similar to the PROC 420 illustrated in the fourth diagram A.

第四B圖之方塊圖為根據本發明部分具體實施例，例示應用於一7.1環繞音效檔之信號處理流程。同時參考第三B圖，在部分實施例中，7.1環繞音效檔的輸入信號被分類成不同的信號對，亦即L/R信號對242、Lss/Rss信號對244，以及Lrs/Rrs信號對246。接下來，L/R信號對242、Lss/Rss信號對244及Lrs/Rrs信號對246分別被傳送至PROC 252、PROC 254及PROC 256。在部分實施例中，每一各別的PROC 252、PROC 254及PROC 256均與第四A圖中所討論的PROC 420相似。在其他實施例中，PROC 252、PROC 254或PROC 256在不同配置下，可能包括一或多個其他模組(或構件)。PROC 252、PROC 254及PROC 256各自所輸出的信號，會被分別傳送至各自的寬度控制器，例如寬度控制器482、寬度控制器484以及寬度控制器486。寬度控制器482、寬度控制器484或寬度控制器486可與第四A圖中所討論的寬度控制器434相似。寬度控制器482、寬度控制器484以及寬度控制器486之輸出信號，會與中央信號C及低頻效果聲道信號LFE結合，產生立體聲音頻輸出信號488。 The block diagram of the fourth B diagram is a signal processing flow applied to a 7.1 surround sound file according to some specific embodiments of the present invention. Referring also to the third diagram B, in some embodiments, the input signals of the 7.1 surround sound file are classified into different signal pairs, namely, L / R signal pair 242, Lss / Rss signal pair 244, and Lrs / Rrs signal pair 246. Next, the L / R signal pair 242, the Lss / Rss signal pair 244, and the Lrs / Rrs signal pair 246 are transmitted to the PROC 252, PROC 254, and PROC 256, respectively. In some embodiments, each individual PROC 252, PROC 254, and PROC 256 are similar to the PROC 420 discussed in Figure 4A. In other embodiments, PROC 252, PROC 254, or PROC 256 may include one or more other modules (or components) in different configurations. The signals output by PROC 252, PROC 254, and PROC 256 are sent to their respective width controllers, such as width controller 482, width controller 484, and width controller 486, respectively. The width controller 482, the width controller 484, or the width controller 486 may be similar to the width controller 434 discussed in the fourth A figure. The output signals of the width controller 482, the width controller 484 and the width controller 486 are combined with the center signal C and the low-frequency effect channel signal LFE to generate a stereo audio output signal 488.

第五圖例示根據本發明部分實施例，一軟體應用程式或其外掛程式元件之使用者介面(UI)500，其係用以管理前述第四B圖所示7.1環繞音效檔信號處理管線之執行。該信號處理管線可包括複數個信號對處理器，例如第四A圖所例示之PROC 420。在此示例中共有三組信號對：前置信號對(L及R)、側置信號對(Lss及Rss)及後置信號對(Lrs及Rrs)。使用者介面500之左側面板510控制各別輸入聲道之增益。控制區520控制等化器(EQ)之頻率分量。控制區530控制寬度控制器之寬度。如參考第四B圖時所述，每一信號對會通過不同的信號對處理器PROC，而每一PROC使用不同的參數，因此控制區540可用來選擇欲處理的一信號對(例如：前置信號對、側置信號對或後置信號對)，以輸入執行PROC處理程序時的參數。 The fifth figure illustrates a user interface (UI) 500 of a software application or its plug-in components according to some embodiments of the present invention, which is used to manage the execution of the 7.1 surround sound file signal processing pipeline shown in the fourth figure B above. . The signal processing pipeline may include a plurality of signal pair processors, such as the PROC 420 illustrated in FIG. 4A. In this example, there are three sets of signal pairs: front signal pairs (L and R), side signal pairs (Lss and Rss), and rear signal pairs (Lrs and Rrs). The left panel 510 of the user interface 500 controls the gain of each input channel. The control area 520 controls the frequency component of the equalizer (EQ). The control area 530 controls the width of the width controller. As described with reference to FIG. 4B, each signal pair passes through a different signal pair processor PROC, and each PROC uses different parameters, so the control area 540 can be used to select a signal pair to be processed (eg: front Set signal pair, side signal pair or post signal pair) to input the parameters when executing the PROC processing program.

如上所述，聲音在媒體及娛樂上扮演重要的角色，對觀眾產生情緒作用並使他們與故事連結。為使聽者擁有更身歷其境的聆聽體驗，於是發展出了多聲道環繞音效。多聲道音頻格式使用多個音軌，以在一個對應的多聲道聲音重現系統上重建音效。降混可以在音頻的生成端以及重現端進行。在生成端，混音器通常從最多的聲道數開始混音，再降混至較少的聲道數。在重現端，一多聲道音軌可以被降混至較少的聲道數，以符合重現系統的聲道數。在兩種情況中，目標都是保持與原始創意構想相符的音效使用與配置。 As mentioned above, sound plays an important role in media and entertainment, producing an emotional effect on the audience and connecting them to the story. To enable listeners to have a more immersive listening experience, they developed a multi-channel surround sound effect. The multi-channel audio format uses multiple audio tracks to reconstruct sound effects on a corresponding multi-channel sound reproduction system. Downmixing can be done on the audio generation and reproduction side. On the generation side, the mixer usually starts with the largest number of channels and then downmixes to a smaller number of channels. On the reproduction side, a multi-channel audio track can be downmixed to a smaller number of channels to match the number of channels in the reproduction system. In both cases, the goal is to maintain the use and configuration of sound effects consistent with the original creative concept.

傳統的降混方法如第一A圖至第一C圖所示，係對個別音軌各自應用一處理流程。音軌之間的關係並未列入考量。本文提出的方法和系統在成對的基礎上考量音頻輸入。首先將多聲道音軌依據重現系統的物理位置關係分類成對。其次，分析一信號對之中兩聲道間的關係。最後，基於分析結果，處理該信號對輸入。藉由納入各信號對之間的關係，原多聲道音頻輸入的空間資訊可被保存得更完善。因此，以相同的音頻輸入來說，使用本文介紹之系統及方法降混處理產生的多聲道音頻輸出，將會比用傳統方法降混更接近原本的多聲道音頻輸入，並創造出更準確的音效場域。 The traditional downmixing method is shown in Figures A to C, and a processing flow is applied to individual audio tracks. The relationship between audio tracks is not considered. The method and system proposed in this paper consider audio input on a paired basis. First, the multi-channel audio tracks are classified into pairs according to the physical position relationship of the reproduction system. Second, analyze the relationship between the two channels in a signal pair. Finally, based on the analysis result, the signal pair input is processed. By incorporating the relationship between the signal pairs, the spatial information of the original multi-channel audio input can be preserved better. Therefore, for the same audio input, the multi-channel audio output generated by the downmix processing using the system and method described in this article will be closer to the original multi-channel audio input than the traditional method of downmixing, and create a more Accurate sound field.

第六A圖至第六C圖係根據本發明之部分實施例，例示使用如第2圖之數據處理系統100進行多聲道音頻信號降混之流程圖。數據處理系統100從多聲道音頻輸入信號中選擇(步驟602)一左輸入聲道及一右輸入聲道。在部分實施例中，左輸入聲道及右輸入聲道對應一對在空間上對稱之信號源。在部分實施例中，多聲道音頻輸入信號例如為第三A圖中的5.1環繞音效檔210，或第三B圖中的7.1環繞音效檔240。在部分實施例中，5.1環繞音效輸入信號210包括一左前置聲道L及一右前置聲道R，其形成信號對222，以及一左側置聲道Ls及一右側置聲道Rs，其形成信號對224。在部分實施例中，7.1環繞音效輸入信號240包括一左前置聲道L及一右前置聲道R，其形成信號對242；一左側置環繞聲道Lss及一右側置環繞聲道Rss，其形成信號對244；以及一左後置環繞聲道Lrs及一右後置環繞聲道Rrs，其形成信號對246。 FIGS. 6A to 6C are flowcharts illustrating the multi-channel audio signal downmixing using the data processing system 100 as shown in FIG. 2 according to some embodiments of the present invention. The data processing system 100 selects (step 602) a left input channel and a right input channel from the multi-channel audio input signal. In some embodiments, the left input channel and the right input channel correspond to a pair of spatially symmetric signal sources. In some embodiments, the multi-channel audio input signal is, for example, the 5.1 surround sound effect file 210 in the third A diagram, or the 7.1 surround sound effect file 240 in the third B diagram. In some embodiments, the 5.1 surround sound input signal 210 includes a left front channel L and a right front channel R, which form a signal pair 222, and a left channel Ls and a right channel Rs, It forms a signal pair 224. In some embodiments, the 7.1 surround sound input signal 240 includes a left front channel L and a right front channel R, which form a signal pair 242; a left surround channel Lss and a right surround channel Rss , Which forms a signal pair 244; and a left rear surround channel Lrs and a right rear surround channel Rrs, which form a signal pair 246.

數據處理系統100隨後從一選取信號對的左輸入聲道及右輸入聲道，產生(步驟604)一或多個跨聲道特徵。在部分實施例中，該一或多個跨聲道特徵包括比較一信號對左右輸入聲道的音量，左右輸入聲道頻譜特性(例如：強度和/或相位)之間的關係，及/或左右輸入聲道信號起始之時間差和振幅差。 The data processing system 100 then generates (step 604) one or more cross-channel features from the left and right input channels of a selected signal pair. In some embodiments, the one or more cross-channel features include comparing the volume of a signal to the left and right input channels, the relationship between the spectral characteristics (eg, intensity and / or phase) of the left and right input channels, and / or The time difference and amplitude difference of the left and right input channel signal start.

數據處理系統100隨後依據跨聲道特徵，處理(步驟606)所選取信號對之左輸入聲道及右輸入聲道，以產生一左中間聲道及一右中間聲道。在部分實施例中，係使用如第三A圖至第三B圖、及第四A圖至第四B圖例示之處理器(PROC)，處理一信號對之左輸入聲道及右輸入聲道。在部分實施例中，PROC如第四A圖所示，包括一或多個模組。 The data processing system 100 then processes (step 606) the left input channel and the right input channel of the selected signal pair according to the cross-channel characteristics to generate a left center channel and a right center channel. In some embodiments, a processor (PROC) as illustrated in FIGS. 3A to 3B and 4A to 4B is used to process the left and right input sounds of a signal pair Road. In some embodiments, PROC includes one or more modules as shown in the fourth diagram A.

接著，數據處理系統100將每一左中間聲道及右中間聲道，與多聲道音頻輸入信號的一第三輸入聲道結合(步驟608)，以形成一雙聲道音頻輸出信號。舉例而言，如第三A圖所示，由各別PROC處理的左中間聲道及右中間聲道(例如：L/R信號對或Ls/Rs信號對)，係與中央聲道C和/或低頻效果聲道(LFE)結合，而產生雙聲道音頻輸出信號L’/R’。類似地，如第三B圖所示，由各別PROC處理的左中間聲道及右中間聲道(例如：L/R信號對、Lss/Rss信號對或Lrs/Rrs信號對)，係與中央聲道C和/或低頻效果聲道(LFE)結合，而產生雙聲道音頻輸出信號L’/R’。 Next, the data processing system 100 combines each of the left center channel and the right center channel with a third input channel of the multi-channel audio input signal (step 608) to form a two-channel audio output signal. For example, as shown in Figure 3A, the left center channel and the right center channel (for example: L / R signal pair or Ls / Rs signal pair) processed by the respective PROC are connected to the center channel C and / Or Low Frequency Effect Channel (LFE) combined to produce a two-channel audio output signal L '/ R'. Similarly, as shown in Figure 3B, the left center channel and the right center channel processed by the respective PROC (for example: L / R signal pair, Lss / Rss signal pair or Lrs / Rrs signal pair) are related to The center channel C and / or the low frequency effect channel (LFE) are combined to produce a two-channel audio output signal L '/ R'.

在聲學工程中，音場通常定義為音頻重現之最左側可感知位置到最右側可感知位置之間的區域。換句話說，音場為一聲音物件可被感知的最遠範圍。因此，音場寬度的定義為左右邊界之間的距離。在一般情況下，立體聲重現的音場寬度是兩個揚聲器間的距離。在本案中，則採用音場的概念分別應用在每一組對稱的聲道。舉例而言，在部分實施例中，數據處理系統100使用一寬度控制器(例如：寬度控制器434)，進一步調整(步驟610)與左中間聲道及右中間聲道關聯的一音場寬度，再將左中間聲道、右中間聲道與第三輸入聲道結合。在部分實施例中，數據處理系統100接收(步驟612)一使用者輸入，其明確定出雙聲道音頻輸出信號之音場寬度。使用者輸入可在如第五圖所示的使用者介面500上接收。 In acoustic engineering, the sound field is usually defined as the area between the leftmost perceptible position and the rightmost perceptible position of audio reproduction. In other words, the sound field is the farthest range that a sound object can be perceived. Therefore, the width of the sound field is defined as the distance between the left and right boundaries. In general, the width of the sound field reproduced in stereo is the distance between the two speakers. In this case, the concept of sound field is applied to each set of symmetrical channels separately. For example, in some embodiments, the data processing system 100 uses a width controller (eg, width controller 434) to further adjust (step 610) the width of a sound field associated with the left center channel and the right center channel , And then combine the left center channel, the right center channel and the third input channel. In some embodiments, the data processing system 100 receives (step 612) a user input that clearly determines the width of the sound field of the two-channel audio output signal. User input can be received on the user interface 500 as shown in the fifth figure.

在部分實施例中，處理左輸入聲道及右輸入聲道的步驟606，進一步包括從左輸入聲道及右輸入聲道提取(步驟614)一中央信號分量、一左側信號分量，及一右側信號分量。舉例而言，如第四A圖所例示，輸入信號對410由M/S混音器422處理，以產生一中央分量426以及一左側分量與右側分量S 424。在部分實施例中，數據處理系統100處理(步驟616)左側分量及右側分量，再將兩者與中央分量結合，以產生左中間聲道及右中間聲道。 In some embodiments, the step 606 of processing the left input channel and the right input channel further includes extracting (step 614) a central signal component, a left signal component, and a right side from the left input channel and the right input channel Signal component. For example, as illustrated in the fourth diagram A, the input signal pair 410 is processed by the M / S mixer 422 to generate a center component 426 and a left component and a right component S 424. In some embodiments, the data processing system 100 processes (step 616) the left component and the right component, and then combines the two with the center component to generate a left center channel and a right center channel.

在部分實施例中，處理左輸入聲道及右輸入聲道的步驟606，進一步包括使用一帶通濾波器對左側分量及右側分量執行(步驟618)等化處理(例如由第四A圖之等化器模塊428執行)，以獲得一左帶通濾波分量及一右帶通濾波分量(例如等化器之輸出信號450)。在部分實施例中，等化處理步驟進一步基於左側分量與左帶通濾波分量之差，產生(步驟620) 一左側殘餘分量，以及基於右側分量與右帶通濾波分量之差，產生一右側殘餘分量，例如第四A圖所例示之左側殘餘分量及右側殘餘分量452。 In some embodiments, the step 606 of processing the left input channel and the right input channel further includes performing (step 618) equalization processing on the left component and the right component using a bandpass filter (e.g. The equalizer module 428 executes) to obtain a left band-pass filtered component and a right band-pass filtered component (such as the output signal 450 of the equalizer). In some embodiments, the equalization processing step further generates (step 620) a left residual component based on the difference between the left component and the left bandpass filtered component, and generates a right residual based on the difference between the right component and the right bandpass filtered component Components, such as the left residual component and the right residual component 452 illustrated in the fourth A diagram.

在部分實施例中，對左側分量及右側分量執行等化處理後，數據處理系統100即對左帶通濾波分量及右帶通濾波分量(例如由第四A圖之等化器428所產生)分別執行(步驟622)一第一動態範圍壓縮(例如由第四A圖之動態範圍壓縮器430執行)，以獲得相應的一左壓縮分量及一右壓縮分量。 In some embodiments, after the equalization process is performed on the left component and the right component, the data processing system 100 performs the left band pass filter component and the right band pass filter component (for example, generated by the equalizer 428 in the fourth A diagram) Performing (step 622) a first dynamic range compression (for example, by the dynamic range compressor 430 of the fourth image A) separately to obtain a corresponding left compression component and a right compression component.

在部分實施例中，數據處理系統100於執行完第一動態範圍壓縮後，分別對左壓縮分量及右壓縮分量(例如由第四A圖之動態範圍壓縮器430所產生)執行(步驟624)串音消除(例如由第四A圖之串音消除模組432執行)，以獲得一串音消除左側分量及一串音消除右側分量。 In some embodiments, after performing the first dynamic range compression, the data processing system 100 executes the left compression component and the right compression component (such as generated by the dynamic range compressor 430 of the fourth A image) (step 624) Crosstalk cancellation (for example, performed by the crosstalk cancellation module 432 in the fourth diagram A) to obtain a crosstalk cancellation left component and a crosstalk cancellation right component.

在部分實施例中，數據處理系統100結合(步驟626)串音消除左側分量及串音消除右側分量、左側殘餘分量及右側殘餘分量、中央分量，以產生左中間聲道及右中間聲道。在部分實施例中，該結合步驟更進一步包括：調整(步驟628)與左中間聲道及右中間聲道關聯的一音場寬度(例如由第四A圖之寬度控制器434調整)後，再將兩者與第三輸入聲道結合。舉例而言，如第四B圖所示，由各別PROC產生的左、右中間聲道被傳送到各別之寬度控制器，以調整音場寬度。調整後的信號即與一第三輸入聲道(例如：C或LFE聲道)結合，產生立體聲輸出信號488。 In some embodiments, the data processing system 100 combines (step 626) crosstalk cancellation left component and crosstalk cancellation right component, left residual component and right residual component, and center component to generate a left center channel and a right center channel. In some embodiments, the combining step further includes: after adjusting (step 628) the width of a sound field associated with the left center channel and the right center channel (for example, adjusted by the width controller 434 of the fourth image A), Then combine the two with the third input channel. For example, as shown in the fourth B diagram, the left and right middle channels generated by the respective PROC are transmitted to the respective width controllers to adjust the width of the sound field. The adjusted signal is combined with a third input channel (eg, C or LFE channel) to generate a stereo output signal 488.

在部分實施例中，在調整完音場寬度後，數據處理系統100執行(步驟630)一第二動態範圍壓縮(例如由第四A圖之動態範圍壓縮器436)，以產生左中間聲道及右中間聲道。 In some embodiments, after adjusting the width of the sound field, the data processing system 100 performs (step 630) a second dynamic range compression (for example, the dynamic range compressor 436 of the fourth image A) to generate the left center channel And the right middle channel.

最後，須注意的是，本發明可實施為完全硬體形態、完全軟體形態，或包括軟硬體元件之形態。在一較佳實施例中，本發明於軟體上實施，其包括但不限於韌體、常駐軟體、微程式碼等等。 Finally, it should be noted that the present invention can be implemented in a complete hardware configuration, a complete software configuration, or a configuration including hardware and software components. In a preferred embodiment, the present invention is implemented on software, which includes but is not limited to firmware, resident software, microcode, and so on.

此外，本發明可為一電腦程式產品，其係透過電腦可用或電腦可讀取媒體存取，提供可供電腦或任何指令執行系統使用或與之相關的程式碼。就本案說明書之目的而言，一電腦可用或電腦可讀取媒體可為任何有形裝置，該有形裝置可包含、儲存、通訊、傳播或傳送可供指令執行系統、裝置、元件使用或與其相關的程式。 In addition, the present invention can be a computer program product, which is accessible through a computer or computer readable medium, and provides program code that can be used by or related to a computer or any command execution system. For the purposes of this specification, a computer-usable or computer-readable medium can be any tangible device that can include, store, communicate, propagate, or transmit for use by or in connection with an instruction execution system, device, or component Program.

此媒體可為一電子、磁性、光學、電磁、紅外線或半導體系統(或裝置或元件)或傳播媒體。電腦可讀取媒體的示例包括半導體或固態記憶體、磁帶、可移除式電腦硬碟、隨機存取記憶體(RAM)、唯讀式記憶體(ROM)、硬式磁碟機及光碟機。目前光碟的示例包括唯讀式的CD-ROM、可讀寫式的CD-R/W及DVD。 This medium may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or device or component) or a propagation medium. Examples of computer-readable media include semiconductor or solid-state memory, magnetic tape, removable computer hard drives, random access memory (RAM), read-only memory (ROM), hard drives, and optical drives. Current examples of optical discs include read-only CD-ROM, readable and writable CD-R / W and DVD.

適合儲存及/或執行程式碼之數據處理系統包括至少一個處理器，其透過一系統匯流排直接或間接耦接至記憶體元件。該記憶體元件可包括實際執行程式碼期間所使用的本機記憶體、大容量儲存器以及快取記憶體，快取記憶體可提供至少部分程式碼之暫時儲存，以減少執行期間必須從大容量儲存器讀取程式碼的次數。在部分實施例中，數據處理系統以半導體晶片(例如：系統單晶片)的形態實施，其將一電腦的所有元件或其他電子系統整合為一單晶片基板。 A data processing system suitable for storing and / or executing program code includes at least one processor, which is directly or indirectly coupled to the memory element through a system bus. The memory component may include local memory, mass storage, and cache memory used during the actual execution of the code. The cache memory may provide temporary storage of at least part of the code to reduce The number of times the volume memory has read the code. In some embodiments, the data processing system is implemented in the form of a semiconductor chip (for example, a system single chip), which integrates all components of a computer or other electronic systems into a single chip substrate.

輸入/輸出(I/O)裝置(包括但不限於鍵盤、顯示器、指向裝置等)可直接或透過中介I/O控制器耦接至系統。 Input / output (I / O) devices (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly or through intermediary I / O controllers.

網路配接器也可連接至系統，以使數據處理系統可透過中介之私人或公共網路與其他數據處理系統、遠端印表機或儲存裝置連接。數據機、纜線數據機及乙太網路卡是目前通用的一些網路配接器類型。 The network adapter can also be connected to the system so that the data processing system can be connected to other data processing systems, remote printers or storage devices through intermediary private or public networks. Modems, cable modems, and Ethernet cards are some common types of network adapters.

本發明說明書係作為例示和說明目的提出，但不為窮盡或限制所揭露之本發明形式。許多修飾和變更對所屬技術領域中具通常知識者為顯而易見。實施例的選擇及描述，目的在妥善解釋發明原理及實際應用，且使其他所屬技術領域中具有通常知識者透過諸多實施例理解本發明，並理解本發明可依據所想要的特定用途進行諸多修飾。 The description of the present invention is presented for illustrative and explanatory purposes, but it is not intended to be exhaustive or to limit the disclosed forms of the present invention. Many modifications and changes will be apparent to those of ordinary skill in the art. The purpose of selecting and describing the embodiments is to properly explain the principles and practical applications of the invention, and to enable those with ordinary knowledge in other technical fields to understand the present invention through many embodiments and understand that the present invention can be carried out in accordance with the specific purpose desired. Grooming.

本文實施例說明之用語僅為敘述特定實施例使用，而非用以限制申請專利範圍。在實施例說明及所附申請專利範圍中使用之單數形字詞「一」、「一個」、「該」亦涵蓋複數形，除非上下文清楚指明其他意涵。本文之用語「及/或」，係指並包括一或多個所列項目相關之所有可能組合。並應進一步理解，本說明書中之用語「包含」及/或「包括」，係指存在所述之特徵、整體、步驟、操作、元件及/或構件，但不排除存在或增加一或多個其他特徵、整體、步驟、操作、元件、構件及/或組件。 The terms used in the description of the embodiments herein are only used to describe specific embodiments, rather than to limit the scope of patent applications. The singular terms "a", "an", and "the" used in the description of the embodiments and the appended patent applications also cover the plural, unless the context clearly indicates other meanings. The term "and / or" herein refers to and includes all possible combinations of one or more of the listed items. It should be further understood that the terms "comprising" and / or "including" in this specification refer to the existence of the described features, wholes, steps, operations, elements and / or components, but does not exclude the presence or addition of one or more Other features, wholes, steps, operations, elements, components and / or assemblies.

亦須了解的是，雖然第一、第二等詞語可能在本文中用來描述不同元件，這些元件不應受該等詞語所限制。這些詞語旨在區隔某一元件與其他元件。舉例而言，一第一埠可能稱為第二埠，相同的，一第二埠可能稱為第一埠，而未偏離實施例之範圍。第一埠及第二埠均為埠，但非指同一個埠。 It should also be understood that although words such as first, second, etc. may be used herein to describe different elements, these elements should not be limited by these words. These words are intended to distinguish one element from other elements. For example, a first port may be called a second port. Similarly, a second port may be called a first port without departing from the scope of the embodiment. Both the first port and the second port are ports, but not the same port.

在閱讀上述詳細說明及相關圖式中所呈現的教示後，熟習該技術者將得以思及諸多對本文所揭露實施例之修飾及替代實施方式。因此，須理解本案申請專利範圍並不限於所揭露之特定實施例，諸多修飾與其他實施方式亦應包括在所附申請專利範圍內。雖然本文使用某些特定詞語，但其僅屬概括與敘述性質，並非為限制之目的。 After reading the above detailed description and the teachings presented in the related drawings, those skilled in the art will be able to think of many modifications and alternative implementations of the embodiments disclosed herein. Therefore, it should be understood that the scope of patent application in this case is not limited to the specific embodiments disclosed, and many modifications and other embodiments should also be included in the scope of the attached patent application. Although some specific words are used in this article, they are only general and narrative and are not intended to be limiting.

實施例的選擇及描述，目的在妥善解釋發明原理及其實際應用，以使其他熟習該技術者能夠妥善運用本發明原理及諸多實施方式，並依據所想要的特定用途進行諸多修飾。 The purpose of the selection and description of the embodiments is to properly explain the principles of the invention and its practical application, so that other people familiar with the technology can properly use the principles of the present invention and many implementations, and make many modifications according to the specific uses desired.

Claims

A computer-implemented method for processing a multi-channel input audio signal includes: executing on a computing device, the computing device having one or more processors, a memory, and stored in the memory by the one or Multiple program modules executed by multiple processors: selecting a left input channel and a right input channel from the multi-channel input audio signal, wherein the left input channel and the right input channel correspond to a pair of spaces An upper symmetrical signal source; generating one or more cross-channel features from the left input channel and the right input channel; processing the left input channel and the right input channel according to the cross-channel features to generate A left center channel and a right center channel; and combining the left center channel and the right center channel with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal.

The computer-implemented method described in item 1 of the patent application scope further includes: before combining with the third input channel, first adjusting the width of a sound field associated with the left center channel and the right center channel.

The computer-implemented method as described in item 2 of the patent application scope further includes: receiving a user input indicating the width of the sound field of the two-channel output audio signal.

The computer-implemented method as described in item 1 of the patent application scope, wherein the step of processing the left input channel and the right input channel further includes: extracting a central component from the left input channel and the right input channel, a A left component and a right component; and processing the left component and the right component, and then combining the two with the center component to generate the left center channel and the right center channel.

The computer-implemented method as described in item 4 of the patent application scope, wherein the step of processing the left component and the right component further includes: performing an equalization process on the left component and the right component using a band-pass filter to obtain a left component A bandpass filtered component and a right bandpass filtered component; and based on the difference between the left component and the left bandpass filter, a left residual component is generated, and based on the difference between the right component and the right bandpass filtered component, a right side is generated Residual component.

The computer-implemented method as described in item 5 of the patent application scope further includes: after performing equalization processing on the left component and the right component, performing a first on the left bandpass filter component and the right bandpass filter component, respectively A dynamic range compression to obtain a corresponding left compressed component and a right compressed component.

The computer-implemented method as described in item 6 of the patent application scope further includes: after performing the first dynamic range compression, performing crosstalk cancellation on the left compression component and the right compression component, respectively, to obtain a crosstalk cancellation left The component and a crosstalk eliminate the right component.

The computer-implemented method as described in item 7 of the patent application scope further comprises: combining the left cross-talk cancellation component with the right cross-talk cancellation component, the left residual component and the right residual component, and the central component to generate the left The center channel and the right center channel, wherein the combining step further includes: adjusting the width of a sound field associated with the left center channel and the right center channel before combining with the third input channel.

The computer implementation method as described in item 8 of the patent application scope further includes: after adjusting the width of the sound field, performing a second dynamic range compression to generate the left center channel and the right center channel.

The computer implementation method as described in item 1 of the patent application scope, wherein the left input channel is a left front channel and the right input channel is a right front channel.

The computer implementation method as described in item 1 of the patent application scope, wherein the left input channel is a left surround channel and the right input channel is a right surround channel.

The computer implementation method as described in item 1 of the patent application scope, wherein the left input channel is a left rear surround channel and the right input channel is a right rear surround channel.

The computer implementation method as described in item 1 of the patent application scope, wherein the third input channel is a center channel.

The computer-implemented method as described in item 1 of the patent application scope, wherein the third input channel is a low-frequency effect channel.

An arithmetic device for processing a multi-channel input audio signal, the arithmetic device includes: one or more processors; a memory; and a plurality of program modules, which are stored in the memory and are stored by the one or more Processor execution, wherein when the plurality of program modules is executed by the one or more processors, the computing device is caused to perform a plurality of steps, including: selecting a left input channel from the multi-channel input audio signal and A right input channel, wherein the left input channel and the right input channel correspond to a pair of spatially symmetrical signal sources; one or more cross-channel features are generated from the left input channel and the right input channel ; Processing the left input channel and the right input channel according to the cross-channel features to produce a left center channel and a right center channel; and the left center channel and the right center channel respectively with A third input channel of the multi-channel input audio signal is combined to form a two-channel output audio signal.

The computing device as described in item 15 of the patent application scope, wherein the computing device further executes: before combining with the third input channel, first adjusts a sound field associated with the left center channel and the right center channel width.

The computing device as described in item 16 of the patent application scope, wherein the computing device further executes: receiving a user input indicating the width of the sound field of the two-channel output audio signal.

The computing device according to item 15 of the patent application scope, wherein the step of processing the left input channel and the right input channel further includes: extracting a central component and a left side from the left input channel and the right input channel Component and a right component; and process the left component and the right component, and then combine the two with the center component to generate the left center channel and the right center channel.

The arithmetic device as described in item 18 of the patent application range, wherein the step of processing the left component and the right component further comprises: performing an equalization process on the left component and the right component using a band-pass filter to obtain a left band A pass filter component and a right band pass filter component; and based on the difference between the left component and the left band pass filtered component, a left residual component is generated, and based on the difference between the right component and the right band pass filtered component, a right side is generated Residual component.

The arithmetic device as described in item 19 of the patent application scope, wherein the arithmetic device further executes: after performing equalization processing on the left component and the right component, the left bandpass filter component and the right bandpass filter component are respectively Perform a first dynamic range compression to obtain a corresponding left compressed component and a right compressed component.

The computing device according to item 20 of the patent application scope, wherein the computing device further executes: after performing the first dynamic range compression, performing crosstalk cancellation on the left compressed component and the right compressed component, respectively, to obtain a string The sound eliminates the left component and a string of sounds eliminates the right component.

The computing device as described in item 21 of the patent application scope, wherein the computing device further executes: combining the left component of the crosstalk cancellation with the right component of the crosstalk cancellation, the left residual component and the right residual component, the central component, to Generating the left center channel and the right center channel, wherein the combining step further includes: adjusting the width of a sound field associated with the left center channel and the right center channel before combining with the third input channel .

The arithmetic device as described in Item 22 of the patent application scope, wherein the arithmetic device further executes: after adjusting the width of the sound field, performs a second dynamic range compression to generate the left center channel and the right center channel.

The computing device according to item 15 of the patent application scope, wherein the left input channel is a left front channel and the right input channel is a right front channel.

The computing device as described in item 15 of the patent application scope, wherein the left input channel is a left surround channel and the right input channel is a right surround channel.

The computing device as described in item 15 of the patent application scope, wherein the left input channel is a left rear surround channel, and the right input channel is a right rear surround channel.

The arithmetic device as described in item 15 of the patent application scope, wherein the third input channel is a center channel.

The arithmetic device as described in item 15 of the patent application scope, wherein the third input channel is a low-frequency effect channel.

A computer program product stored in a non-transitory computer readable storage medium, the non-transitory computer readable storage medium is connected to a computing device having one or more processors to process an audio signal, The computer program product includes a plurality of program modules. When the plurality of program modules are executed by the one or more processors, the computing device is caused to perform a plurality of steps, including: selecting one from multi-channel input audio signals Left input channel and a right input channel, wherein the left input channel and the right input channel correspond to a pair of spatially symmetrical signal sources; one or more are generated from the left input channel and the right input channel Cross-channel features; processing the left input channel and the right input channel according to the cross-channel features to produce a left center channel and a right center channel; and the left center channel and the right channel The middle channel is combined with a third input channel of the multi-channel input audio signal to form a two-channel output audio signal.

The computer program product as described in item 29 of the patent application scope, in which the computing device further executes: before combining with the third input channel, first adjust a sound field associated with the left center channel and the right center channel width.

The computer program product as described in item 30 of the patent application scope, in which the computing device further executes: receiving a user input, the user input indicating the width of the sound field of the two-channel output audio signal.

The computer program product as described in item 29 of the patent application scope, wherein the step of processing the left input channel and the right input channel further includes: extracting a central component from the left input channel and the right input channel, a A left component and a right component; and processing the left component and the right component, and then combining the two with the center component to generate the left center channel and the right center channel.

The computer program product as described in item 32 of the patent application range, wherein the step of processing the left component and the right component further comprises: performing an equalization process on the left component and the right component using a band-pass filter to obtain a left A bandpass filtered component and a right bandpass filtered component; and based on the difference between the left component and the left bandpass filtered component, a left residual component is generated, and based on the difference between the right component and the right bandpass filtered component, a The residual component on the right.

The computer program product as described in item 33 of the patent application scope, wherein the arithmetic device further executes: after performing equalization processing on the left component and the right component, the left band-pass filtered component and the right band-pass filtered component A first dynamic range compression is performed respectively to obtain a corresponding left compression component and a right compression component.

The computer program product as described in item 34 of the patent application scope, wherein the computing device further executes: after performing the first dynamic range compression, performing crosstalk cancellation on the left compressed component and the right compressed component, respectively, to obtain a Crosstalk eliminates the left component and a crosstalk eliminates the right component.

The computer program product as described in item 35 of the patent application scope, wherein the computing device further performs: combining the left component of the crosstalk cancellation with the right component of the crosstalk cancellation, the left residual component and the right residual component, and the central component, to Generating the left center channel and the right center channel, wherein the combining step further includes: before combining with the third input channel, first adjusting a sound field associated with the left center channel and the right center channel width.

The computer program product as described in item 36 of the patent application scope, wherein the computing device further executes: after adjusting the width of the sound field, performs a second dynamic range compression to generate the left center channel and the right center channel .

The computer program product as described in item 29 of the patent application scope, wherein the left input channel is a left front channel and the right input channel is a right front channel.

The computer program product as described in item 29 of the patent application scope, wherein the left input channel is a left surround channel and the right input channel is a right surround channel.

The computer program product as described in item 29 of the patent application scope, wherein the left input channel is a left rear surround channel and the right input channel is a right rear surround channel.

The computer program product as described in item 29 of the patent application scope, wherein the third input channel is a center channel.

The computer program product as described in item 29 of the patent application scope, wherein the third input channel is a low-frequency effect channel.