TWI467979B

TWI467979B - Systems, methods, and apparatus for signal change detection

Info

Publication number: TWI467979B
Application number: TW96128125A
Authority: TW
Inventors: Vivek Rajendran; Ananthapadmanabhan A Kandhadai
Original assignee: Qualcomm Inc
Priority date: 2006-07-31
Filing date: 2007-07-31
Publication date: 2015-01-01
Also published as: CN101496095B; TW200818802A; CN101496095A

Description

System, method and device for signal change detection

本揭示案係關於信號處理。This disclosure relates to signal processing.

由數位技術進行之語音傳輸已變得普遍，特別係在長途電話、諸如IP語音(VoIP)之封包交換電話及諸如蜂巢式電話之數位無線電話中。此增長已建立對減少用於在傳輸通道上傳送語音通信之資訊量同時維持重建語音之感知品質的興趣。Voice transmission by digital technology has become commonplace, particularly in long distance telephones, packet switched telephones such as Voice over IP (VoIP), and digital wireless telephones such as cellular telephones. This growth has established an interest in reducing the amount of information used to transmit voice communications over the transmission channel while maintaining the perceived quality of reconstructed speech.

經組態以藉由擷取與人類語音產生模型相關之參數而壓縮語音之設備稱為"語音編碼器"。語音編碼器一般包括編碼器及解碼器。編碼器通常將傳入之語音信號(表示音訊資訊之數位信號)劃分成稱為"訊框"之時間片段、分析每一訊框以擷取某些相關參數，且將該等參數量化成二進位表示，諸如，一組位元或二進位資料封包。資料封包在傳輸通道(亦即，有線或無線網路連接)上經傳輸至包括解碼器之接收器。解碼器接收並處理資料封包、將其反量化以產生該等參數，且使用經反量化之參數重新建立語音訊框。A device configured to compress speech by taking parameters associated with a human speech production model is referred to as a "speech encoder." Speech encoders typically include an encoder and a decoder. The encoder usually divides the incoming speech signal (the digital signal representing the audio information) into time segments called "frames", analyzes each frame to capture certain relevant parameters, and quantizes the parameters into two. A carry indicates, for example, a set of bit or binary data packets. The data packet is transmitted over a transmission channel (i.e., a wired or wireless network connection) to a receiver including a decoder. The decoder receives and processes the data packet, dequantizes it to generate the parameters, and re-establishes the voice frame using the inverse quantized parameters.

在一典型對話中，每一說話者在約百分之六十的時間內係靜寂的。語音編碼器通常經組態以區分語音信號中含有語音之訊框("活動訊框")與語音信號中僅含有靜寂或背景雜訊之訊框("不活動訊框")。此編碼器可經組態以使用不同的編碼模式及/或速率來編碼活動及不活動訊框。舉例而言，語音編碼器通常經組態以在比經編碼之活動訊框低的速率下傳輸經編碼之不活動訊框(亦稱為"靜寂描述符"、"靜寂描述"或SID)。In a typical conversation, each speaker is silent for about sixty percent of the time. The speech encoder is typically configured to distinguish between frames containing speech in the speech signal ("active frame") and frames containing only silence or background noise in the speech signal ("inactive frame"). This encoder can be configured to encode active and inactive frames using different encoding modes and/or rates. For example, a speech encoder is typically configured to transmit an encoded inactive frame (also referred to as a "quiet descriptor", "silence description" or SID at a lower rate than the encoded active frame. ).

在全雙工電話通信期間之任一時刻，可能預期至語音編碼器之至少一者的輸入將為不活動訊框。可能需要編碼器針對少於所有之不活動訊框而傳輸SID。此操作亦稱為不連續傳輸(DTX)。在一實例中，語音編碼器藉由針對每一串32個連貫之不活動訊框傳輸一個SID而執行DTX。對應解碼器應用SID中之資訊來更新由舒適雜訊產生演算法用於合成不活動訊框之雜訊產生模型。At any time during full duplex telephone communication, it may be expected that input to at least one of the speech encoders will be an inactive frame. It may be desirable for the encoder to transmit the SID for less than all of the inactive frames. This operation is also known as discontinuous transmission (DTX). In one example, the speech encoder performs DTX by transmitting one SID for each of the 32 consecutive inactive frames. Corresponding to the information in the SID of the decoder application, the noise generation model used by the comfort noise generation algorithm for synthesizing the inactive frame is updated.

根據一組態之一種處理語音信號之方法包括產生基於語音信號之複數個不活動訊框之頻譜傾斜值序列。此方法包括：計算頻譜傾斜值序列之至少兩個值之間的改變；及對於該複數個不活動訊框當中之一不活動訊框，決定是否傳輸該訊框之描述。在此方法中，決定是否傳輸該訊框之描述係基於計算出的改變。A method of processing a speech signal according to a configuration includes generating a sequence of spectral tilt values for a plurality of inactive frames based on the speech signal. The method includes: calculating a change between at least two values of a sequence of spectral tilt values; and determining whether to transmit the description of the frame for one of the plurality of inactive frames. In this method, the description of whether to transmit the frame is based on the calculated change.

根據另一組態之一種電腦程式產品包括一電腦可讀媒體。此媒體包括用於使至少一個電腦產生基於語音信號之複數個不活動訊框之頻譜傾斜值序列的程式碼。此媒體包括用於使至少一個電腦計算頻譜傾斜值序列之至少兩個值之間的改變之程式碼；及用於使至少一個電腦針對該複數個不活動訊框當中之一不活動訊框且基於計算出的改變來決定是否傳輸該訊框之描述的程式碼。A computer program product according to another configuration includes a computer readable medium. The medium includes code for causing at least one computer to generate a sequence of spectral tilt values for a plurality of inactive frames based on the speech signal. The medium includes code for causing at least one computer to calculate a change between at least two values of a sequence of spectral tilt values; and for causing at least one computer to actuate one of the plurality of inactive frames and A method of determining whether to transmit the description of the frame is based on the calculated change.

根據又一組態之一種用於處理語音信號之裝置包括一序列產生器，該序列產生器經組態以產生基於語音信號之複數個不活動訊框之頻譜傾斜值序列。此裝置包括：一計算器，其經組態以計算頻譜傾斜值序列之至少兩個值之間的改變；及一比較器，其經組態以針對該複數個不活動訊框當中之一不活動訊框且基於計算出的改變來決定是否傳輸該訊框之描述。An apparatus for processing a speech signal according to yet another configuration includes a sequence generator configured to generate a sequence of spectral tilt values for a plurality of inactive frames based on the speech signal. The apparatus includes: a calculator configured to calculate a change between at least two values of a sequence of spectral tilt values; and a comparator configured to target one of the plurality of inactive frames The activity frame and based on the calculated change determines whether to transmit the description of the frame.

根據再一組態之一種用於處理語音信號之裝置包括用於產生基於語音信號之複數個不活動訊框之頻譜傾斜值序列的構件。此裝置包括：用於計算頻譜傾斜值序列之至少兩個值之間的改變之構件；及用以針對該複數個不活動訊框當中之一不活動訊框且基於計算出的改變來決定是否傳輸該訊框之描述的構件。An apparatus for processing a speech signal according to yet another configuration includes means for generating a sequence of spectral tilt values for a plurality of inactive frames based on the speech signal. The apparatus includes: means for calculating a change between at least two values of a sequence of spectral tilt values; and determining, for one of the plurality of inactive frames, an inactive frame and determining whether based on the calculated change The component that describes the description of the frame.

本文所述之組態包括用於偵測語音信號改變之系統、方法及裝置。舉例而言，若干組態經揭示以用於偵測信號之不活動時期期間之改變且基於此偵測而起始對信號描述之更新。此等組態通常意在用於封包交換網路(例如，經配置以根據諸如IP語音或VoIP之協定載運語音傳輸之有線及/或無線網路)中，儘管亦明確涵蓋並由此揭示了在電路交換網路中之使用。The configurations described herein include systems, methods, and apparatus for detecting changes in voice signals. For example, several configurations are disclosed for detecting changes during periods of inactivity of a signal and initiating an update to the signal description based on the detection. Such configurations are generally intended for use in packet switched networks (e.g., wired and/or wireless networks configured to carry voice transmissions according to protocols such as voice over IP or VoIP), although explicitly covered and thus disclosed Use in circuit switched networks.

除非在其情境中明確加以限制，否則術語"計算"在本文中係用於指示其普通意義之任一者，諸如，計算、評估、平滑及自複數個值中進行選擇。在術語"包含"用於目前描述及申請專利範圍的情況下，其並不排除其他元件或操作。術語"A基於B"用於指示其普通意義之任一者，包括下述情況：(i)"A基於至少B"，及(ii)"A等於B"(若在特定情境中係合適的)。The term "calculation" is used herein to indicate any of its ordinary meanings, such as calculation, evaluation, smoothing, and selection from a plurality of values, unless explicitly limited in its context. Where the term "comprising" is used in the context of the present description and claims, it does not exclude other elements or operations. The term "A based on B" is used to indicate either of its ordinary meanings, including the following: (i) "A is based on at least B", and (ii) "A is equal to B" (if appropriate in a particular context) ).

實施DTX之編碼器可經組態以根據遮沒機制(blanking scheme)丟棄(或"遮沒")大多數不活動訊框。遮沒機制之一實例以規則間隔(例如，每16個或32個連貫不活動訊框一次)發布對靜寂描述(silence description)之更新。其他遮沒機制(亦稱為"智慧遮沒"機制)經組態以在偵測到可指示背景雜訊改變之能量及/或頻譜特性波動後即發布對靜寂描述之更新。The encoder implementing DTX can be configured to discard (or "mask") most of the inactive frames according to the blanking scheme. An instance of the occlusion mechanism issues an update to the silence description at regular intervals (eg, every 16 or 32 consecutive inactive frames). Other occlusion mechanisms (also known as "smart occlusion" mechanisms) are configured to issue an update to the silence description upon detection of fluctuations in energy and/or spectral characteristics indicative of background noise changes.

僅依賴於能量波動之遮沒機制可能有時無法偵測感知上顯著的背景雜訊改變。在某些情況下，感知上不同的不活動訊框將具有類似的能量特性(通常經編碼為增益值)。儘管(例如)街道中之背景雜訊("街道雜訊")可具有與擁擠空間中之背景雜訊("混串音雜訊"(babble noise))之能量分布相類似的隨時間能量分布，但是此等兩種類型之雜訊通常以極為不同的方式被感知。無法區分感知上不同類型之雜訊的遮沒機制可能在解碼器處產生可聽假聲(artifact)。因為活動訊框亦包括(例如)背景雜訊，所以在解碼器自解碼之活動訊框切換至產生自不當SID之舒適雜訊時可能發生可聽不連續性。An occlusion mechanism that relies solely on energy fluctuations may sometimes fail to detect perceptually significant background noise changes. In some cases, perceptually different inactive frames will have similar energy characteristics (typically encoded as gain values). Although background noise ("street noise") in a street, for example, may have a time-dependent energy distribution similar to the energy distribution of background noise ("babble noise") in a crowded space. , but these two types of noise are often perceived in very different ways. An occlusion mechanism that is indistinguishable from perceiving different types of noise may produce audible artifacts at the decoder. Since the active frame also includes, for example, background noise, audible discontinuities may occur when the decoder self-decoding active frame switches to comfort noise generated from improper SID.

需要遮沒機制偵測感知上顯著的背景雜訊改變。舉例而言，可能需要遮沒機制偵測背景雜訊之一或多個頻譜特性(例如，頻譜傾斜)中之突然改變。如本文所述之方法或裝置可用於實施此遮沒機制。或者，如本文所述之方法或裝置可用於輔助另一遮沒機制。舉例而言，語音編碼器或語音編碼方法可將如本文所述之方法或裝置與如美國專利申請案公開案第2006/0171419號(Spindola等人，2006年8月3日公開)中所述之遮沒機制或與經組態以偵測訊框能量變化及/或語音信號之頻譜特性變化(諸如，線譜對向量之間的差)的另一遮沒機制相組合。An occlusion mechanism is needed to detect perceptually significant background noise changes. For example, an obscuration mechanism may be required to detect sudden changes in one or more spectral characteristics (eg, spectral tilt) of background noise. A method or apparatus as described herein can be used to implement this obscuration mechanism. Alternatively, a method or apparatus as described herein can be used to assist with another occlusion mechanism. For example, a speech coder or a speech encoding method can be as described in the method or apparatus described herein as described in US Patent Application Publication No. 2006/0171419 (Spindola et al., issued Aug. 3, 2006). The occlusion mechanism is combined with another occlusion mechanism configured to detect changes in frame energy and/or spectral characteristics of the speech signal, such as differences between line spectra and vectors.

圖1A展示根據一般組態之方法M100之流程圖。基於語音信號之複數個不活動訊框，任務T200產生頻譜傾斜值序列。任務T400計算頻譜傾斜值序列內之改變(例如，序列之至少兩個值之間的改變)。對於語音信號之一不活動訊框而言，任務T500決定是否傳輸該訊框之描述，其中該決定基於計算出的改變。舉例而言，是否傳輸描述之決定可基於(A)計算出的改變之量值與(B)臨限值之間的關係。Figure 1A shows a flow chart of a method M100 in accordance with a general configuration. Based on the plurality of inactive frames of the speech signal, task T200 generates a sequence of spectral tilt values. Task T400 calculates a change within the sequence of spectral tilt values (eg, a change between at least two values of the sequence). For one of the speech signals, the task T500 determines whether to transmit the description of the frame, wherein the decision is based on the calculated change. For example, the decision whether to transmit the description may be based on the relationship between (A) the calculated magnitude of the change and (B) the threshold.

在方法M100之典型實施例中，頻譜傾斜值序列當中之每一者基於對應不活動訊框之頻譜傾斜。語音信號之訊框之頻譜傾斜為描述訊框內之能量在頻率範圍上之分布的值。通常，頻譜傾斜指示對應訊框上信號之頻譜之斜率，且可為正的或負的。產生頻譜傾斜序列之下一個值之行為亦稱為"更新"該序列。In an exemplary embodiment of method M100, each of the sequence of spectral tilt values is based on a spectral tilt of the corresponding inactive frame. The spectral tilt of the frame of the speech signal is the value that describes the distribution of the energy within the frame over the frequency range. Typically, the spectral tilt indicates the slope of the spectrum of the signal on the corresponding frame and can be positive or negative. The act of generating a value below the spectral tilt sequence is also referred to as "updating" the sequence.

頻譜傾斜值序列之值通常配置為按時間順序，以使得序列之連續值對應於時間上連續的信號片段。以此方式配置之頻譜傾斜值序列可被說成表示描述語音信號之能譜斜率隨時間之改變的輪廓(亦即，頻譜傾斜輪廓)。The values of the sequence of spectral tilt values are typically configured in chronological order such that successive values of the sequence correspond to temporally consecutive segments of the signal. The sequence of spectral tilt values configured in this manner can be said to represent a profile (i.e., a spectral tilt profile) that describes the change in the slope of the energy spectrum of the speech signal over time.

可實施任務T200以用若干不同方式中之任一方式來產生頻譜傾斜值序列。舉例而言，任務T200可經組態以自儲存元件或陣列(例如，半導體記憶體單元或陣列)、自較大過程(諸如，語音編碼方法)之另一任務或自諸如語音編碼器之一裝置之元件接收此序列。或者，任務T200可經組態以如本文所述計算此序列。Task T200 can be implemented to generate a sequence of spectral tilt values in any of a number of different manners. For example, task T200 can be configured to self-store elements or arrays (eg, semiconductor memory cells or arrays), another task from a larger process (such as a speech encoding method), or from one of, for example, a speech encoder The components of the device receive this sequence. Alternatively, task T200 can be configured to calculate this sequence as described herein.

任務T200可經組態以輸出所接收或計算出的序列(本文中亦表示為x)而作為產生之頻譜傾斜值序列。或者，任務T200可經組態以藉由對此序列x執行一或多個其他操作而產生一頻譜傾斜值序列y。此等其他操作可包括自序列x之值當中選擇另一序列：例如，每n個值選擇(其中n為大於1之整數)，及/或僅選擇對應於不活動訊框之彼等值。如本文所述，此等其他操作亦可包括平滑所接收的、計算出的或選定之序列。Task T200 can be configured to output a received or calculated sequence (also denoted herein as x) as a sequence of generated spectral tilt values. Alternatively, task T200 can be configured to generate a sequence of spectral tilt values y by performing one or more other operations on this sequence x. Such other operations may include selecting another sequence from the values of the sequence x: for example, every n values are selected (where n is an integer greater than 1), and/or only the values corresponding to the inactive frames are selected. As described herein, such other operations may also include smoothing the received, computed, or selected sequences.

語音信號之每一片段(亦稱為"片段"或"訊框")在時間上之持續時間通常經選擇為足夠短的，以使得可預期信號之頻譜包絡保持為相對平穩的。舉例而言，一個典型訊框長度為20毫秒，此對應於8千赫(kHz)取樣速率下之160個樣本，儘管可使用被視為適合於特定應用之任何訊框長度或取樣速率。在某些應用中，訊框為非重疊的，而在其他應用中則使用重疊訊框機制。舉例而言，語音編碼器普遍地在編碼器處使用重疊訊框機制而在解碼器處使用非重疊訊框機制。The duration of each segment of the speech signal (also referred to as "fragment" or "frame") is typically chosen to be sufficiently short in time that the spectral envelope of the predictable signal remains relatively smooth. For example, a typical frame length is 20 milliseconds, which corresponds to 160 samples at a sampling rate of 8 kilohertz (kHz), although any frame length or sampling rate deemed appropriate for a particular application can be used. In some applications, frames are non-overlapping, while in other applications, overlapping frames are used. For example, speech encoders generally use an overlap frame mechanism at the encoder and a non-overlapping frame mechanism at the decoder.

在典型應用中，邏輯閘陣列經組態以執行方法M100之各種任務中之一者、一者以上乃至全部任務。舉例而言，此或此等任務可實施為待由諸如處理器之可程式化陣列執行之機器可執行碼。方法M100之任務亦可由一個以上之此陣列執行。在此等或其他實施例中，該等任務可執行於無線通信設備內，諸如，蜂巢式電話或具有此通信能力之其他設備。此設備可經組態以(例如，使用諸如VoIP之一或多個協定)與電路交換及/或封包交換網路相通信。舉例而言，此設備可包括經組態以傳輸編碼之活動訊框及SID之RF電路。方法M100亦可實施為體現於電腦程式產品(例如，一或多個資料儲存媒體，諸如，磁碟、快閃記憶卡或其他非揮發性記憶卡、半導體記憶體晶片等)中之機器可讀碼。In a typical application, the logic gate array is configured to perform one, more than one, or even all of the various tasks of method M100. For example, this or such tasks can be implemented as machine executable code to be executed by a programmable array, such as a processor. The task of method M100 can also be performed by more than one such array. In these or other embodiments, the tasks can be performed within a wireless communication device, such as a cellular telephone or other device having such communication capabilities. The device can be configured to communicate with the circuit switched and/or packet switched network (e.g., using one or more protocols such as VoIP). For example, the device can include an RF circuit configured to transmit the encoded active frame and SID. The method M100 can also be implemented as a machine readable product embodied in a computer program product (eg, one or more data storage media such as a magnetic disk, a flash memory card or other non-volatile memory card, a semiconductor memory chip, etc.) code.

在方法M100之典型應用中，任務T400在任務T200所產生之頻譜傾斜值序列上進行迭代，以基於頻譜傾斜值之連續對而計算一系列改變，且任務T500在該系列改變上進行迭代以執行一系列傳輸決定。一般而言，任務T200作為現行過程執行，且任務T400與T500以串行方式或並行方式進行迭代，從而使得頻譜傾斜值以及對應的計算出的改變及傳輸指示針對語音信號之每一不活動訊框而產生(例如，可能在一或多個不活動訊框之初始化時期之後)。亦可能實施方法M100以使得相比每個不活動訊框，任務T200較不頻繁地(例如，對於每兩個或三個訊框)產生頻譜傾斜值，使得任務T400與任務T200一樣頻繁地執行或相比任務T200較不頻繁地(例如，對於任務T200之每兩個或三個迭代)執行，及/或使得任務T500與任務T400一樣頻繁地地執行或相比任務T400較不頻繁地(例如，對於任務T400之每兩個或三個迭代)執行。In a typical application of method M100, task T400 iterates over a sequence of spectral tilt values generated by task T200 to calculate a series of changes based on successive pairs of spectral tilt values, and task T500 iterates over the series of changes to perform A series of transmission decisions. In general, task T200 is performed as a current process, and tasks T400 and T500 are iterated in a serial or parallel manner such that the spectral tilt value and the corresponding calculated change and transmission indication are for each inactive signal of the voice signal. The box is generated (eg, possibly after the initialization period of one or more inactive frames). It is also possible to implement method M100 such that task T200 generates a spectral tilt value less frequently (e.g., for every two or three frames) compared to each inactive frame, such that task T400 executes as frequently as task T200 Or less frequently than task T200 (eg, for every two or three iterations of task T200), and/or such task T500 is performed as frequently as task T400 or less frequently than task T400 ( For example, for every two or three iterations of task T400).

圖1B展示根據一般組態之裝置A100之方塊圖。序列產生器120經組態以產生基於語音信號之複數個不活動訊框之頻譜傾斜值序列。舉例而言，序列產生器120可經組態以執行如本文所揭示之任務T200之實施例。計算器140經組態以計算頻譜傾斜值序列之至少兩個值之間的改變。舉例而言，計算器140可經組態以執行如本文所揭示之任務T400之實施例。比較器150經組態以決定是否傳輸語音信號之不活動片段之描述，其中該決定基於計算出的改變(例如，基於(A)計算出的改變之量值與(B)臨限值之間的關係)。舉例而言，比較器150可經組態以執行如本文所揭示之任務T500之實施例。在典型應用中，裝置A100之實施例經配置以處理頻譜傾斜值序列，並基於該序列產生一系列傳輸決定。Figure 1B shows a block diagram of an apparatus A100 in accordance with a general configuration. Sequence generator 120 is configured to generate a sequence of spectral tilt values for a plurality of inactive frames based on the speech signal. For example, sequence generator 120 can be configured to perform an embodiment of task T200 as disclosed herein. The calculator 140 is configured to calculate a change between at least two values of the sequence of spectral tilt values. For example, the calculator 140 can be configured to perform an embodiment of the task T400 as disclosed herein. Comparator 150 is configured to determine whether to transmit a description of an inactive segment of the speech signal, wherein the decision is based on the calculated change (eg, based on (A) the calculated magnitude of the change and (B) the threshold Relationship). For example, comparator 150 can be configured to perform an embodiment of task T500 as disclosed herein. In a typical application, an embodiment of apparatus A100 is configured to process a sequence of spectral tilt values and generate a series of transmission decisions based on the sequence.

裝置A100之各種元件可以視為適合於所欲應用之硬體、軟體及/或韌體之任一組合而實施。舉例而言，此等元件之任一者均可實施為一或多個邏輯閘陣列。此等元件之任兩者或兩者以上乃至全部可實施於同一陣列或若干相同陣列內。此或此等陣列可實施於一或多個晶片內(例如，包括兩個或兩個以上晶片之晶片組內)。裝置A100之各種元件之任一者亦可實施為一或多個電腦(例如，經程式化以執行一或多個指令集或指令序列之陣列，亦稱為"處理器")，且此等元件之任兩者或兩者以上乃至全部可實施於同一個此電腦或若干相同的此電腦內。裝置A100之各種元件可包括於無線通信設備內，諸如，蜂巢式電話或具有此類通信能力之其他設備。此設備可經組態以(例如，使用諸如VoIP之一或多個協定)與電路交換及/或封包交換網路相通信。舉例而言，此設備可包括經組態以根據對應傳輸決定之結果而傳輸SID之語音編碼器，及/或經組態以傳輸編碼之活動訊框及SID之RF電路。The various components of device A100 can be implemented as any combination of hardware, software, and/or firmware suitable for the application. For example, any of these elements can be implemented as one or more logic gate arrays. Either or both of these elements may be implemented in the same array or in several identical arrays. The array or arrays can be implemented in one or more wafers (eg, within a wafer set comprising two or more wafers). Any of the various components of apparatus A100 can also be implemented as one or more computers (eg, programmed to execute one or more sets of instructions or arrays of instructions, also referred to as "processors"), and such Any two or more of the components, or even all of them, may be implemented in the same computer or in several identical computers. The various components of device A100 can be included within a wireless communication device, such as a cellular telephone or other device having such communication capabilities. The device can be configured to communicate with the circuit switched and/or packet switched network (e.g., using one or more protocols such as VoIP). For example, the device can include a speech encoder configured to transmit the SID based on the results of the corresponding transmission decision, and/or an RF circuit configured to transmit the encoded active frame and SID.

值可用於指示訊框之頻譜傾斜的參數之一實例為第一反射係數k₀ ，且下文將描述其他此類參數。任務T200可經配置以自較大程序(諸如，語音編碼方法)之另一任務接收頻譜傾斜值序列。或者，任務T200可經實施以包括任務T210，如下文所述，任務T210經組態以計算此等值。同樣地，序列產生器120可經配置以自諸如語音編碼器或通信設備之較大裝置之另一元件接收頻譜傾斜值序列。或者，序列產生器120可經實施以包括計算器128，如下文所述，計算器128經組態以計算此等值。An example of one of the parameters that can be used to indicate the spectral tilt of the frame is the first reflection coefficient k ₀ , and other such parameters are described below. Task T200 can be configured to receive a sequence of spectral tilt values from another task of a larger program, such as a speech encoding method. Alternatively, task T200 can be implemented to include task T210, which is configured to calculate this value, as described below. Likewise, sequence generator 120 can be configured to receive a sequence of spectral tilt values from another element of a larger device, such as a speech encoder or communication device. Alternatively, sequence generator 120 can be implemented to include a calculator 128 that is configured to calculate such values, as described below.

任務T200可經實施以包括任務T300，任務T300平滑頻譜傾斜值序列。任務T300之典型實施例經組態以根據自我回歸模型(諸如，無限脈衝回應(IIR)濾波器)對頻譜傾斜值序列進行濾波。任務T300之特定實例執行下述之第一級IIR濾波操作，以將經平滑序列y之每一值計算為輸入之頻譜傾斜值序列x之當前值與經平滑序列y之前一值的加權平均值：y [n ]＝ax [n ]＋(1－a )y [n －1], (1)其中n 表示順序索引。視所要之平滑度而定，增益因數a可具有自0至1之任一值。一般而言，增益因數a 具有不大於0.6之值。舉例而言，增益因數a 可具有處於自0.1(或自0.15)至0.4(或至0.5)之範圍中之值。在一特定實例中，序列x為第一反射係數k ₀ 之一系列值，且增益因數a 具有值0.2(零點二)。圖1C展示方法M100之實施例M101之流程圖，其中任務T200實施為任務T300。圖1D展示裝置A100之實施例A101之方塊圖，其中序列產生器120實施為經組態以執行任務T300之實施例之平滑器130。Task T200 can be implemented to include task T300, which smoothes the sequence of spectral tilt values. The exemplary embodiment of task T300 is configured to filter a sequence of spectral tilt values according to a self-regressive model, such as an infinite impulse response (IIR) filter. A particular instance of task T300 performs the first level IIR filtering operation described below to calculate each value of the smoothed sequence y as a weighted average of the current value of the input spectral tilt value sequence x and the previous value of the smoothed sequence y. : y [ n ]= ax [ n ]+(1- a ) y [ n -1], (1) where n denotes a sequential index. The gain factor a may have any value from 0 to 1 depending on the desired smoothness. In general, the gain factor a has a value of no more than 0.6. For example, the gain factor a can have a value in the range from 0.1 (or from 0.15) to 0.4 (or to 0.5). In a particular example, the sequence x is a series of values of the first reflection coefficient k ₀ and the gain factor a has a value of 0.2 (zero two). 1C shows a flowchart of an embodiment M101 of method M100 in which task T200 is implemented as task T300. 1D shows a block diagram of an embodiment A101 of apparatus A100 in which sequencer 120 is implemented as a smoother 130 configured to perform an embodiment of task T300.

圖2展示平滑器130之實施例132之一實例的方塊圖。平滑器132包括：第一乘法器，其經配置以將增益因數G10應用於輸入之頻譜傾斜值序列之當前值x [n ]；第二乘法器，其經配置以將增益因數G20應用於如自延緩元件D所獲取的經平滑之頻譜傾斜值序列之前一值y [n －1]；及加法器，其經配置以輸出作為該兩個乘積之和的y [n ]。可能需要(例如，為了穩定性)增益因數G10具有如上文參考任務T300所描述之值a ，且需要增益因數G20具有值(1－a )。在一特定實例中，序列x 為第一反射係數k ₀ 之一系列值，增益因數G10具有值0.2(零點二)，且增益因數G20具有值0.8(零點八)。如上所述，平滑器132可以視為適合於所欲應用之硬體、軟體及/或韌體之任一組合而實施。2 shows a block diagram of one example of an embodiment 132 of smoother 130. The smoother 132 includes a first multiplier configured to apply a gain factor G10 to a current value x [ n ] of the input spectral tilt value sequence, and a second multiplier configured to apply the gain factor G20 to A value y [ n -1] preceding the sequence of smoothed spectral tilt values obtained from the delay element D; and an adder configured to output y [ n ] as the sum of the two products. It may be desirable (eg, for stability) that the gain factor G10 has a value a as described above with reference to task T300 and that the gain factor G20 is required to have a value (1- a ). In a particular example, the sequence x is a series of values of the first reflection coefficient k ₀ , the gain factor G10 has a value of 0.2 (zero two), and the gain factor G20 has a value of 0.8 (zero eight). As noted above, the smoother 132 can be implemented as any combination of hardware, software, and/or firmware suitable for the application.

其他或另外，任務T300可經組態以藉由對頻譜傾斜值序列x (或對序列x 執行平滑操作的結果)執行一或多個其他求平均值、積分及/或低通濾波操作而計算經平滑之頻譜傾斜值序列y之值。舉例而言，在方法M100之一替代實施例中，任務T300經組態以根據移動平均模型(諸如，有限脈衝回應(FIR)濾波器)而對序列x進行濾波。在方法M100之另一替代實施例中，任務T300經組態以根據自我回歸移動平均(ARMA)模型而對序列x 進行濾波。類似地，平滑器130可實施為經組態以基於兩個或兩個以上輸入值而產生平滑值之積分器或其他低通濾波器(諸如，FIR或ARMA濾波器)。Alternatively or additionally, task T300 can be configured to perform one or more other averaging, integrating, and/or low pass filtering operations by performing a sequence of spectral tilt values x (or a smoothing operation on sequence x ). The value of the smoothed spectral tilt value sequence y. For example, in an alternate embodiment of method M100, task T300 is configured to filter sequence x according to a moving average model, such as a finite impulse response (FIR) filter. In another alternate embodiment of method MlOO, task T300 is configured to filter sequence x according to a self-regressive moving average (ARMA) model. Similarly, smoother 130 can be implemented as an integrator or other low pass filter (such as an FIR or ARMA filter) configured to produce a smoothed value based on two or more input values.

方法M100通常經實施以使得在任務T300中經平滑之頻譜傾斜值序列x之每一值對應於語音信號之複數個連續訊框中之一者。類似地，裝置A100通常經實施以使得由平滑器130進行平滑之序列x 之每一值對應於語音信號之複數個連續訊框中之一者。注意，此等連續訊框無需為連貫的，下文將對此更為詳細地進行描述。Method M100 is typically implemented such that each value of the sequence of smoothed spectral tilt values x in task T300 corresponds to one of a plurality of consecutive frames of the speech signal. Similarly, device A100 is typically implemented such that each value of sequence x smoothed by smoother 130 corresponds to one of a plurality of consecutive frames of the speech signal. Note that these consecutive frames need not be coherent, as will be described in more detail below.

語音信號將通常含有活動訊框以及不活動訊框。然而，在活動訊框期間之能量分布很可能主要歸因於背景雜訊之外的因素，以使得來自活動訊框之能量分布值不太可能提供關於背景雜訊改變之可靠資訊。因此，可能需要頻譜傾斜值序列x 僅包括對應於不活動訊框之值。在此情況下，序列x 之值可對應於語音信號中不連貫的連續(不活動)訊框。The voice signal will usually contain an active frame and an inactive frame. However, the energy distribution during the active frame is likely to be primarily due to factors other than background noise, so that the energy distribution values from the active frame are less likely to provide reliable information about background noise changes. Therefore, it may be desirable for the sequence of spectral tilt values x to include only values corresponding to inactive frames. In this case, the value of the sequence x may correspond to a discontinuous continuous (inactive) frame in the speech signal.

為說明此原理，圖3展示一實例，其中每一圓圈表示語音信號中隨著時間的一系列連貫訊框中之一者。表示不活動訊框之圓圈各自標記有頻譜傾斜值序列x 中之對應值之索引編號。在此實例中，值74及75在序列中係連貫的。儘管對應於值74及75之不活動訊框在語音信號中係連續的，但是其由活動訊框區塊分隔，且因此並非彼此連貫。To illustrate this principle, Figure 3 shows an example in which each circle represents one of a series of consecutive frames in a speech signal over time. The circles indicating the inactive frames are each marked with an index number of a corresponding value in the sequence of spectral tilt values x . In this example, the values 74 and 75 are consecutive in the sequence. Although the inactive frames corresponding to values 74 and 75 are contiguous in the speech signal, they are separated by active frame blocks and are therefore not coherent with each other.

方法M100可經配置以使得任務T300僅接收序列x中對應於不活動訊框之頻譜傾斜值。或者，任務T300可經實施以自對應於連貫訊框之頻譜傾斜值序列當中僅選擇對應於不活動訊框之彼等值。舉例而言，任務T300之此實施例可經組態以如下文所述基於接收自語音編碼器、語音編碼方法或語音活動偵測任務T100之語音活動指示而選擇對應於不活動訊框之頻譜傾斜值(及/或去除對應於活動訊框之值)。Method M100 can be configured such that task T300 receives only the spectral tilt values in sequence x corresponding to the inactive frame. Alternatively, task T300 can be implemented to select only those values corresponding to the inactive frames from the sequence of spectral tilt values corresponding to the consecutive frames. For example, this embodiment of task T300 can be configured to select a spectrum corresponding to an inactive frame based on a voice activity indication received from a speech encoder, a speech encoding method, or a voice activity detection task T100, as described below. The tilt value (and/or the value corresponding to the active frame).

同樣地，裝置A100可經配置以使得平滑器130僅接收序列x 中對應於不活動訊框之頻譜傾斜值。或者，平滑器130可經實施以自對應於連貫訊框之頻譜傾斜值序列當中僅選擇對應於不活動訊框之彼等值。舉例而言，平滑器130之此實施例可經組態以如下文所述基於接收自語音編碼器、語音編碼方法或語音活動偵測器110之語音活動指示而選擇對應於不活動訊框之頻譜傾斜值(及/或去除對應於活動訊框之值)。Likewise, device A100 can be configured such that smoother 130 only receives spectral tilt values in sequence x that correspond to inactive frames. Alternatively, the smoother 130 can be implemented to select only the values corresponding to the inactive frames from the sequence of spectral tilt values corresponding to the consecutive frames. For example, this embodiment of smoother 130 can be configured to select an inactive frame based on a voice activity indication received from a speech encoder, a speech encoding method, or a voice activity detector 110, as described below. The spectral tilt value (and/or the value corresponding to the active frame).

任務T400計算任務T200所產生之頻譜傾斜值序列之至少兩個值之間的改變。舉例而言，任務T400可經組態以根據諸如下述表達式之表達式計算經平滑序列y之連貫值之間的差值(亦稱為"德耳塔(Delta)")：z [n ]＝y [n ]－by [n －1], (2)其中z 表示輸出，且b 表示增益因數。圖4展示計算器140之實施例142，實施例142可用於執行任務T400之此實例中b等於1的特定情況(亦即，根據第一級FIR高通濾波操作z [n ]＝y [n ]－y [n －1])。計算器140及/或任務T400之其他實施例可經組態以使用b之不同值而應用此濾波操作。舉例而言，b之值可根據所要之頻率回應進行選擇。對於任務T200經組態以產生序列x 的情況而言，T400或計算器142之此實施例可經配置以根據諸如z [n ]＝x [n ]－x [n －1]之表達式而計算差值。如上所述，計算器142可以視為適合於所欲應用之硬體、軟體及/或韌體之任一組合而實施。Task T400 calculates a change between at least two values of the sequence of spectral tilt values produced by task T200. For example, task T400 can be configured to calculate a difference between consecutive values of the smoothed sequence y (also known as "Delta") based on an expression such as the following expression: z [ n ]= y [ n ]- by [ n -1], (2) where z represents the output and b represents the gain factor. 4 shows an embodiment 142 of the calculator 140, which may be used to perform a particular case where b is equal to 1 in this example of task T400 (ie, according to the first stage FIR high pass filtering operation z [ n ]= y [ n ] - y [ n -1]). Other embodiments of calculator 140 and/or task T400 can be configured to apply this filtering operation using different values of b. For example, the value of b can be selected based on the desired frequency response. For the case where task T200 is configured to generate sequence x , this embodiment of T400 or calculator 142 can be configured to be based on an expression such as z [ n ]= x [ n ] -x [ n -1] Calculate the difference. As noted above, the calculator 142 can be implemented as any combination of hardware, software, and/or firmware suitable for the application.

其他或另外，任務T400可經組態以對產生之頻譜傾斜值序列執行一或多個其他微分操作，諸如，不同的高通濾波操作(例如，將第一級IIR高通濾波器應用於產生之序列)，或用其他方式計算產生之序列之值之間的距離或其他改變。類似地，計算器140可實施為經組態以計算兩個或兩個以上輸入值之間的差值或其他距離或改變的微分器、差值計算器或其他高通IIR或FIR濾波器。Alternatively or additionally, task T400 can be configured to perform one or more other differential operations on the generated sequence of spectral tilt values, such as different high pass filtering operations (eg, applying a first stage IIR high pass filter to the generated sequence) ), or otherwise calculate the distance or other change between the values of the resulting sequence. Similarly, calculator 140 may be implemented as a differentiator, difference calculator, or other high pass IIR or FIR filter configured to calculate a difference or other distance or change between two or more input values.

任務T400所計算出的改變可用於指示產生之頻譜傾斜值序列之改變速率。舉例而言，如上所述之z [n ]之量值可用於指示背景雜訊之頻譜傾斜輪廓自一個不活動訊框至下一個不活動訊框改變了多少。任務T400通常經配置以用迭代方式計算一系列距離，該等距離之量值表示在各別訊框時期上經平滑輪廓之改變速率。The change calculated by task T400 can be used to indicate the rate of change of the resulting sequence of spectral tilt values. For example, the magnitude of z [ n ] as described above can be used to indicate how much the spectral tilt profile of the background noise has changed from one inactive frame to the next inactive frame. Task T400 is typically configured to iteratively calculate a series of distances that represent the rate of change of the smoothed profile over the respective frame period.

任務T500決定是否傳輸語音信號之不活動片段之描述，其中該決定基於任務T400所計算出的對應改變。舉例而言，任務T500可經組態以藉由將計算出的改變之量值與臨限值T 相比較而決定是否傳輸描述。任務T500之此實施例可經組態以根據此比較之結果而設定二進位旗標：其中旗標p [n ]之值指示傳輸決定之結果。在此情況下，一或邏輯TRUE之p [n ]值係正傳輸指示(亦即，具有正態之傳輸指示、傳輸賦能指示、對傳輸之決定之指示)，其指示應針對當前訊框而傳輸對靜寂描述之更新；且零或邏輯FALSE之p [n ]值係負傳輸指示(亦即，具有負態之傳輸指示、傳輸去能指示、對不要傳輸之決定之指示)，其指示不應針對當前訊框而傳輸對靜寂描述之更新。在一實例中，臨限值T 具有值0.2。較低臨限值可用於提供對產生之頻譜傾斜值序列中之變化的較大敏感性，而較高臨限值可用於提供產生之頻譜傾斜值序列中之瞬變的較大去除。Task T500 determines whether to transmit a description of the inactive segment of the speech signal, wherein the decision is based on the corresponding change computed by task T400. For example, task T500 can be configured to determine whether to transmit a description by comparing the calculated magnitude of the change to threshold value T. This embodiment of task T500 can be configured to set a binary flag based on the result of this comparison: The value of the flag p [ n ] indicates the result of the transmission decision. In this case, the p [ n ] value of one or logical TRUE is a positive transmission indication (ie, having a normal transmission indication, a transmission grant indication, an indication of a decision on transmission) indicating that it should be for the current frame. And transmitting an update to the silence description; and the p [ n ] value of the zero or logical FALSE is a negative transmission indication (ie, a negative transmission indication, a transmission disable indication, an indication of a decision not to transmit), Indicates that the update to the silence description should not be transmitted for the current frame. In an example, the threshold T has a value of 0.2. A lower threshold can be used to provide greater sensitivity to changes in the resulting sequence of spectral tilt values, while a higher threshold can be used to provide a larger removal of transients in the resulting sequence of spectral tilt values.

熟習此項技術者將認識到，在方法M100之替代實施例中，任務T400可根據諸如下述表達式之表達式而將改變計算為一量值：z [n ]＝|y [n ]－by [n －1]|,且任務T500可經組態以根據諸如下述比較之比較的結果而設定二進位旗標： Those skilled in the art will recognize that in an alternate embodiment of method M100, task T400 can calculate the change as a magnitude based on an expression such as the following expression: z [ n ]=| y [ n ]- By [ n -1]|, and task T500 can be configured to set a binary flag based on the result of a comparison such as the comparison below:

方法M100亦可經實施以包括任務T500之不同變型，諸如，將臨限值與計算出的改變之兩者或兩者以上之平均量值(例如，當前及先前訊框之計算出的改變之平均量值)相比較之實施例。Method M100 can also be implemented to include different variations of task T500, such as an average magnitude of both or more of the threshold and the calculated change (eg, the calculated change of the current and previous frames) The average amount) is compared to the examples.

圖5展示比較器150之實施例152之方塊圖，實施例152可用於執行任務T500之實施例。在此實例中，比較器152經組態以藉由計算所計算出的改變之量值並將該量值與臨限值T10相比較而執行傳輸決定。在一特定實例中，臨限值T10具有值0.2(零點二)。圖6展示比較器150之另一實施例154之方塊圖，實施例154可用於執行任務T500之實施例。在此實例中，比較器154經組態以分別將計算出的改變之帶正負號值與正臨限值T10及負臨限值T20相比較，並在計算出的改變大於(或者，不小於)臨限值T10或小於(或者，不大於)臨限值T20時發布正傳輸指示。在一實例中，臨限值T20具有為臨限值T10之負值的值，以使得比較器152與154經組態以產生相同結果。然而，比較器154亦可經實施以使得臨限值T20視需要與臨限值T10具有不同的量值。FIG. 5 shows a block diagram of an embodiment 152 of comparator 150, which may be used to perform an embodiment of task T500. In this example, comparator 152 is configured to perform a transmission decision by calculating the magnitude of the calculated change and comparing the magnitude to threshold T10. In a particular example, threshold T10 has a value of 0.2 (zero two). 6 shows a block diagram of another embodiment 154 of comparator 150, which may be used to perform an embodiment of task T500. In this example, the comparator 154 is configured to compare the calculated positive and negative sign values with the positive threshold T10 and the negative threshold T20, respectively, and the calculated change is greater than (or, not less than The positive transmission indication is issued when the threshold T10 is less than (or not greater than) the threshold T20. In an example, threshold T20 has a value that is a negative value of threshold T10 such that comparators 152 and 154 are configured to produce the same result. However, the comparator 154 can also be implemented such that the threshold T20 has a different magnitude than the threshold T10 as needed.

比較器150之另一實施例經配置以自計算器140接收計算出的改變作為一量值，並將此量值與臨限值T10相比較。如上所述，比較器150之此等實施例(亦即，包括比較器152及154)可以視為適合於所欲應用之硬體、軟體及/或韌體之任一組合而實施。圖7A展示裝置A100之一實施例A102之方塊圖，實施例A102經組態以對輸入信號x[n]執行如上所述之多種操作以產生對應傳輸指示。Another embodiment of the comparator 150 is configured to receive the calculated change from the calculator 140 as a magnitude and compare the magnitude to the threshold T10. As discussed above, such embodiments of comparator 150 (i.e., including comparators 152 and 154) can be implemented as any combination of hardware, software, and/or firmware suitable for the application. 7A shows a block diagram of an embodiment A102 of apparatus A100 that is configured to perform various operations as described above on input signal x[n] to generate a corresponding transmission indication.

圖8A展示一指令集之原始碼列表之一實例，該指令集可由可程式化邏輯元件陣列或其他狀態機(例如，電腦或處理器)執行以執行方法M101之一實施例，該實施例包括任務T300、T400及T500之實施例。在此實例中，變數k0保存當前訊框之頻譜傾斜值x [n ]，變數y_current最初保存經平滑之頻譜傾斜值序列y之最近值，而旗標p保存傳輸指示之狀態。Part 1(第1部分)藉由使用增益因數a之值0.2根據上文之表達式(1)來計算經平滑序列y 之當前值而執行任務T300。Part 2(第2部分)藉由使用增益因數b之值1根據上文之表達式(2)來計算經平滑序列y 之當前值與最近值之間的改變而執行任務T400。Part 3(第3部分)藉由使用臨限值0.2根據計算出的改變與臨限值之間的比較結果來設定旗標p而執行任務T500。在典型應用中，以迭代方式執行該指令集(例如，針對每一不活動訊框)，從而使得每一迭代之變數y_current的初始值為在先前迭代期間所計算出的變數y_current最終值。8A shows an example of a source code list of an instruction set that may be executed by an array of programmable logic elements or other state machine (eg, a computer or processor) to perform an embodiment of method M101, which includes Examples of tasks T300, T400, and T500. In this example, the variable k0 holds the spectral tilt value x [ n ] of the current frame, the variable y_current initially holds the most recent value of the smoothed spectral tilt value sequence y, and the flag p holds the state of the transmission indication. Part 1 (Part 1) performs task T300 by calculating the current value of the smoothed sequence y according to the above expression (1) using the value of the gain factor a of 0.2. Part 2 (Part 2) performs task T400 by calculating the change between the current value and the most recent value of the smoothed sequence y according to the above expression (2) using the value 1 of the gain factor b. Part 3 (Part 3) performs task T500 by setting the flag p based on the comparison result between the calculated change and the threshold using the threshold value 0.2. In a typical application, the set of instructions is executed in an iterative manner (eg, for each inactive frame) such that the initial value of the variable y_current for each iteration is the final value of the variable y_current calculated during the previous iteration.

如上所述，任務T300可經組態以基於頻譜傾斜值序列x 之一或多個過去值及/或經平滑之頻譜傾斜值序列y 之一或多個過去值而計算經平滑序列y 之當前值。然而，對於經平滑序列y 之初始值而言，序列x 之過去值及/或經平滑序列y 之過去值可能不存在。若任務T300使用任意值或零值替代過去值來計算經平滑序列y 之值，則結果可使任務T400輸出大得不適當的一計算出的改變，此又可導致任務T500甚至在頻譜傾斜輪廓實際上恆定的情況下亦輸出正傳輸指示。As described above, task T300 can be configured to calculate the current of the smoothed sequence y based on one or more past values of the spectral tilt value sequence x and/or one or more past values of the smoothed spectral tilt value sequence y . value. However, for the initial value of the smoothed sequence y , the past value of the sequence x and/or the past value of the smoothed sequence y may not exist. If task T300 uses an arbitrary value or a zero value instead of the past value to calculate the value of the smoothed sequence y , the result can cause task T400 to output a calculated change that is too large, which in turn can cause task T500 to even be spectrally skewed. A positive transmission indication is also output in the case of a constant constant.

可能需要初始化經組態以保存序列x 及/或經平滑序列y 之過去值的一或多個變數(例如，資料儲存位置)。此初始化可在任務T300首次執行之前執行，及/或可在任務T300中執行。舉例而言，一或多個此類變數可經初始化成序列x 之當前值。在特定實例中，經組態以儲存經平滑序列之過去值(上文之表達式(1)中之y [n －1])之變數被初始化為輸入序列之當前值(上文之表達式(1)中之x [n ])。對於任務T400經配置以基於值x [n ]及x [n －1]而計算改變的不同實例而言，經組態以儲存輸入序列之過去值x [n －1]之變數被初始化為輸入序列之當前值x [n ]。其他或另外，方法M100可經組態以避免針對最先幾個不活動訊框輸出正傳輸指示(例如，藉由迫使任務T500針對彼等訊框輸出具有負態之傳輸指示)。在此情況下，任務T200(可能包括任務T300)可經組態以將任意值或零值用於一或多個過去值之每一者，而非如本文所述初始化彼等變數。It may be desirable to initialize one or more variables (eg, data storage locations) configured to hold the sequence x and/or past values of the smoothed sequence y . This initialization may be performed prior to the first execution of task T300, and/or may be performed in task T300. For example, one or more such variables can be initialized to the current value of the sequence x . In a particular example, the variable configured to store the past value of the smoothed sequence ( y [ n -1] in expression (1) above) is initialized to the current value of the input sequence (the expression above) (1) in the x [ n ]). For the different instances in which task T400 is configured to calculate changes based on values x [ n ] and x [ n -1], variables configured to store past values x [ n -1] of the input sequence are initialized to inputs The current value of the sequence x [ n ]. Alternatively or additionally, method M100 can be configured to avoid outputting a positive transmission indication for the first few inactive frames (eg, by forcing task T500 to output a negative transmission indication for the frame). In this case, task T200 (which may include task T300) may be configured to use any value or zero value for each of one or more past values, rather than initializing their variables as described herein.

圖8B展示一指令集之原始碼列表之另一實例，該指令集可由可程式化邏輯元件陣列或其他狀態機(例如，處理器)執行以執行方法M101之一實施例，該實施例包括任務T300之實施例T310以及任務T400及T500之實施例。在此實例中，任務T310包括一初始化操作，該初始化操作使用變數Y_VALID來指示是否之前已調用該指令集且因此指示儲存於變數y_current中之值是否有效。在此情況下，調用常式(例如，較大程序，諸如語音編碼方法)將經組態以在調用該指令集之前將Y_VALID之值初始化為FALSE。若該指令集判定Y_VALID之值為FALSE(亦即，若該指令集係首次執行)，則將變數y_current初始化為變數k0之當前值。8B shows another example of a source code list of an instruction set that may be executed by an array of programmable logic elements or other state machine (eg, a processor) to perform an embodiment of method M101, the embodiment including tasks Embodiment T310 of T300 and embodiments of tasks T400 and T500. In this example, task T310 includes an initialization operation that uses the variable Y_VALID to indicate whether the instruction set has been previously called and thus indicates whether the value stored in the variable y_current is valid. In this case, the calling routine (eg, a larger program, such as a speech encoding method) will be configured to initialize the value of Y_VALID to FALSE before calling the instruction set. If the instruction set determines that the value of Y_VALID is FALSE (ie, if the instruction set is first executed), then the variable y_current is initialized to the current value of the variable k0.

靜寂描述(SID)通常包括訊框之頻譜包絡之描述及/或訊框之能量包絡之描述。此等描述可得自當前不活動訊框及/或一或多個先前不活動訊框。SID亦可叫作其他名稱，諸如"靜寂描述更新"、"靜寂描述符"、"靜寂插入描述符"、"舒適雜訊描述符訊框"及"舒適雜訊參數"。在如文件3GPP2 C.S0014-C版本1.0 "Enhanced Variable Rate Codec,Speech Service Options 3,68,and 70 for Wideband Spread Spectrum Digital Systems"中所述之增強型可變速率編解碼器(EVRC)之特定實例中，使用雜訊激勵線性預測(NELP)編碼模式以八分之一速率(每訊框16個位元)對SID進行編碼，而使用碼激勵線性預測(CELP)、原型音高週期(PPP)或NELP編碼模式以全速率(每訊框171個位元)、半速率(每訊框80個位元)或四分之一速率(每訊框40個位元)對活動訊框進行編碼。The Silent Description (SID) usually includes a description of the spectral envelope of the frame and/or a description of the energy envelope of the frame. Such descriptions may be obtained from a current inactive frame and/or one or more previously inactive frames. The SID can also be called other names such as "Quiet Description Update", "Quiet Descriptor", "Quiet Insert Descriptor", "Comfort Noise Descriptor Frame" and "Comfort Noise Parameters". Specific to the Enhanced Variable Rate Codec (EVRC) as described in document 3GPP2 C.S0014-C Version 1.0 "Enhanced Variable Rate Codec, Speech Service Options 3, 68, and 70 for Wideband Spread Spectrum Digital Systems" In the example, the noise-stimulated linear prediction (NELP) coding mode is used to encode the SID at an eighth rate (16 bits per frame), while code-excited linear prediction (CELP), prototype pitch period (PPP) is used. Or NELP encoding mode encodes the active frame at full rate (171 bits per frame), half rate (80 bits per frame) or quarter rate (40 bits per frame) .

頻譜包絡描述一般包括一組編碼參數，諸如濾波係數、反射係數、線譜頻率(LSF)、線譜對(LSP)、導抗頻譜頻率(ISF)、導抗譜對(ISP)、倒頻譜係數、或對數面積比。可配置為一或多個向量之該組編碼參數通常作為一或多個索引量化至對應查找表或"碼簿"中。The spectral envelope description generally includes a set of coding parameters such as filter coefficients, reflection coefficients, line spectral frequency (LSF), line spectrum pair (LSP), impedance spectrum frequency (ISF), impedance spectrum pair (ISP), cepstral coefficient Or logarithmic area ratio. The set of encoding parameters configurable as one or more vectors is typically quantized as one or more indices into a corresponding lookup table or "codebook."

SID內之頻譜包絡描述之典型長度目前處於八至28個位元之範圍中。在如上文引用之3GPP2 C.S0014－C版本1.0中所述之EVRC的特定實例中，每一16位元之SID包括碼簿中用於頻譜包絡之低頻資訊的四位元之索引LSPIDX1，及碼簿中用於頻譜包絡之高頻資訊的四位元之索引LSPIDX2。在如文件ETSI TS 126 092 V6.0.0(歐洲電信標準協會(ETSI)，Sophia Antipolis Cedex,FR，2004年12月)中所述之適應性多速率(AMR)語音編解碼器之特定實例中，每一35位元之SID包括用於三個LSF子向量之每一者的8位元或9位元長的索引。在如文件ETSI TS 126 192 V6.0.0(ETSI，2004年12月)中所述之AMR寬頻語音編解碼器之特定實例中，每一35位元之SID包括用於五個ISF子向量之每一者的5位元或6位元長的索引。The typical length of the spectral envelope description within the SID is currently in the range of eight to 28 bits. In a specific example of EVRC as described in 3GPP2 C.S0014-C version 1.0, cited above, each 16-bit SID includes a four-bit index LSPIDX1 for low frequency information of the spectral envelope in the codebook, and The four-bit index LSPIDX2 of the high frequency information for the spectral envelope in the codebook. In a specific example of an adaptive multi-rate (AMR) speech codec as described in document ETSI TS 126 092 V6.0.0 (European Telecommunications Standards Institute (ETSI), Sophia Antipolis Cedex, FR, December 2004), Each 35-bit SID includes an 8-bit or 9-bit long index for each of the three LSF sub-vectors. In a specific example of the AMR wideband speech codec as described in the document ETSI TS 126 192 V6.0.0 (ETSI, December 2004), each 35-bit SID includes for each of the five ISF sub-vectors. One of the 5-bit or 6-bit long index.

能量包絡描述可包括一待應用於訊框之增益值(亦稱為"增益訊框")。其他或另外，能量包絡描述可包括待應用於訊框之若干子訊框之每一者的若干增益值(統稱為"增益分布(gain profile)")。通常，可將增益訊框及/或增益分布作為一或多個索引量化至對應碼簿中，儘管在某些情況下可使用一演算法以在不使用碼簿的情況下量化及/或反量化增益訊框及/或增益分布。SID內之能量包絡描述之典型長度目前處於5至8個位元之範圍中。在如上文引用之3GPP2 C.S0014－C v.1.0(版本1.0)中所述之EVRC之特定實例中，每一16位元之SID包括8位元之能量索引FGIDX。在如上文引用之ETSI TS 126 092 V6.0.0中所述之AMR語音編解碼器及上文引用之ETSI TS 126 192 V6.0.0中所述之AMR寬頻語音編解碼器的特定實例中，每一35位元之SID包括6位元之能量索引。The energy envelope description may include a gain value to be applied to the frame (also referred to as a "gain frame"). Alternatively or additionally, the energy envelope description may include a number of gain values (collectively referred to as "gain profiles") to be applied to each of a number of sub-frames of the frame. In general, the gain frame and/or gain profile may be quantized into the corresponding codebook as one or more indices, although in some cases an algorithm may be used to quantize and/or reverse without using the codebook. Quantize the gain frame and/or gain profile. The typical length of the energy envelope description within the SID is currently in the range of 5 to 8 bits. In a particular example of an EVRC as described above in 3GPP2 C.S0014-C v.1.0 (version 1.0), each 16-bit SID includes an 8-bit energy index FGIDX. In a specific example of the AMR speech codec as described in ETSI TS 126 092 V6.0.0, cited above, and the AMR wideband speech codec described in ETSI TS 126 192 V6.0.0 referenced above, each The 35-bit SID includes a 6-bit energy index.

方法M100或裝置A100可用作遮沒機制以支援DTX。舉例而言，包括方法M100之程序或包括裝置A100之設備可經組態以僅在任務T500所產生之傳輸指示之狀態為正時執行SID之傳輸。其他遮沒機制亦可用於支援DTX。一個此實例為每當最近SID傳輸之後所出現的連貫不活動訊框之數目達到(或者，超過)臨限值DTX_MAX時便發布正SID傳輸指示的方法或裝置。DTX_MAX之典型值包括16及32。遮沒機制之另一實例在每當最近活動訊框之後所出現的連貫不活動訊框之數目達到(或者，超過)一臨限值時便發布正SID傳輸指示。Method M100 or device A100 can be used as an obstruction mechanism to support DTX. For example, a program including method M100 or a device including apparatus A100 can be configured to perform transmission of the SID only when the status of the transmission indication generated by task T500 is positive. Other obscuration mechanisms can also be used to support DTX. One such example is a method or apparatus that issues a positive SID transmission indication each time the number of consecutive inactive frames that occur after the most recent SID transmission reaches (or exceeds) the threshold DTX_MAX. Typical values for DTX_MAX include 16 and 32. Another example of an occlusion mechanism issues a positive SID transmission indication when the number of consecutive inactive frames that occur after the most recent active frame reaches (or exceeds) a threshold.

可用於支援DTX之其他遮沒機制包括經組態以在偵測到語音信號之能量及/或頻譜包絡描述之改變時發布正SID傳輸指示之若干機制。舉例而言，此機制可經組態以在偵測到訊框的頻譜包絡描述(例如，LSF、LSP、ISF或ISP向量)與最後傳輸之SID的頻譜包絡描述之間的距離超過臨限值(或者，不小於臨限值)時發布正SID傳輸指示，其指示傳輸當前不活動訊框之描述的決定。可能需要在計算距離之前對頻譜包絡描述進行濾波(例如，平滑)。此機制之一變型經組態以在其亦偵測到當前不活動訊框的能量包絡描述與最後傳輸之SID的能量包絡描述之間的距離超過臨限值(或者，不小於臨限值)時發布正SID傳輸指示。另一變型經組態以在其偵測到滿足此等條件之任一者時發布正SID傳輸指示。可使用的其他遮沒機制包括經組態以根據臨限值與一諸如訊框之平均絕對值或訊框之能量值(例如，樣本平方和)之值(可對該值進行濾波及/或加權)之間的比較而發布正SID傳輸指示之若干機制。Other occlusion mechanisms that may be used to support DTX include mechanisms that are configured to issue a positive SID transmission indication upon detection of a change in the energy of the speech signal and/or a change in the spectral envelope description. For example, the mechanism can be configured to detect a distance between a spectral envelope description (eg, LSF, LSP, ISF, or ISP vector) of the frame and a spectral envelope description of the last transmitted SID that exceeds a threshold. (or, not less than the threshold) a positive SID transmission indication is issued indicating the decision to transmit the description of the currently inactive frame. It may be necessary to filter the spectral envelope description (eg, smoothing) before calculating the distance. A variant of this mechanism is configured to detect that the distance between the energy envelope description of the current inactive frame and the energy envelope description of the last transmitted SID exceeds a threshold (or, not less than a threshold) The positive SID transmission indication is issued. Another variation is configured to issue a positive SID transmission indication when it detects that any of these conditions are met. Other occlusion mechanisms that may be used include values configured to filter the value according to a threshold value and an average absolute value of the frame or the energy value of the frame (eg, the sum of the squares of the samples). Several mechanisms for issuing positive SID transmission indications are compared between comparisons.

可用於支援DTX之遮沒機制之另一實例經組態以在偵測到最後傳輸之SID與當前不活動訊框之間的Itakura距離超過臨限值(或者，不小於臨限值)時發布正SID傳輸指示。此機制之一變型經組態以在偵測到(A)最後傳輸之SID與(B)當前不活動訊框與先前不活動訊框之平均值之間的Itakura距離超過臨限值(或者，不小於臨限值)時發布正SID傳輸指示。Itakura距離為基於自相關及殘餘能量值之頻譜改變量度，且此機制之描述可在ITU-T Recommendation G.729附錄B(國際電信聯盟，Geneva,CH，1996年10月)中查知。Another example of an obscuration mechanism that can be used to support DTX is configured to be issued when it is detected that the Itakura distance between the last transmitted SID and the currently inactive frame exceeds a threshold (or, no less than a threshold) Positive SID transmission indication. A variant of this mechanism is configured to detect the Itakura distance between (A) the last transmitted SID and (B) the current inactive frame and the previous inactive frame exceeds the threshold (or, A positive SID transmission indication is issued when not less than the threshold. The Itakura distance is a measure of spectral change based on autocorrelation and residual energy values, and a description of this mechanism can be found in Annex B of the ITU-T Recommendation G.729 (International Telecommunication Union, Geneva, CH, October 1996).

方法M100或裝置A100之實施例可與一或多個其他遮沒機制(諸如，上述之彼等遮沒機制之一或多者)相組合。舉例而言，包括或執行此實施例之裝置可經組態以在其遮沒機制之任一者針對彼訊框發布正SID傳輸指示時傳輸SID。圖7B展示此實例之一實施例，其中使用邏輯"或"運算將若干不同的傳輸指示組合成複合傳輸指示。The method of method M100 or apparatus A100 can be combined with one or more other occlusion mechanisms, such as one or more of the above-described occlusion mechanisms. For example, a device that includes or executes this embodiment can be configured to transmit a SID when any of its occlusion mechanisms issue a positive SID transmission indication for a frame. Figure 7B shows an embodiment of this example in which a number of different transmission indications are combined into a composite transmission indication using a logical OR operation.

如上所述，SID可得自一或多個不活動訊框。舉例而言，可能需要包括裝置A100之設備或包括方法M100之程序計算並傳輸表示若干經編碼不活動訊框之平均值之SID，而非按照單個經編碼不活動訊框來傳輸SID。此平均值可使用FIR或IIR濾波操作及/或藉由使用諸如中值濾波之統計方法進行計算，其中該中值濾波可包括廢棄離群值或用中值取代離群值。舉例而言，該設備或程序可經組態以藉由用一或多個先前不活動訊框之彼等能量及頻譜包絡描述以統計方式平滑當前訊框之能量及頻譜包絡描述而計算SID，從而使得所得之SID含有近期最常出現的增益及頻率值。As mentioned above, the SID can be derived from one or more inactive frames. For example, a device including device A100 or a program including method M100 may be required to calculate and transmit a SID representing an average of a number of encoded inactive frames instead of transmitting a SID in accordance with a single encoded inactive frame. This average may be calculated using FIR or IIR filtering operations and/or by using statistical methods such as median filtering, which may include discarding outliers or replacing outliers with median values. For example, the device or program can be configured to calculate the SID by statistically smoothing the energy and spectral envelope description of the current frame with the energy and spectral envelope descriptions of one or more previously inactive frames. The resulting SID thus contains the most frequently occurring gain and frequency values in the near future.

對其計算平均值之訊框的數目可為固定的，或可根據(例如)平穩性量度而改變。此量度之一實例為在不同之兩組訊框上所獲得之頻譜平均值之間的距離(例如，Itakura距離)。在如上文引用之G.729附錄B中所述之一個此實例中，對六個過去訊框(包括當前訊框)及對兩個過去訊框計算平均值。若此等兩個平均值之間的距離超過臨限值(或者，不小於臨限值)，則SID包括對兩個訊框求平均值的頻譜描述(例如，假設信號係局部不平穩的)。其他方面，SID包括對六個訊框求平均值的頻譜描述(例如，假設信號係局部平穩的)。在如上文引用之ETSI TS 126 192 V6.0.0中所述之AMR寬頻編解碼器之特定實例中，SID包括一抖動指示，該抖動指示之狀態係根據當前訊框與七個先前訊框之間的頻譜距離之和或根據當前訊框之能量與過去訊框之平均能量值之間的距離而設定。The number of frames for which the average value is calculated may be fixed or may vary depending on, for example, the measure of stationarity. An example of this measure is the distance between the spectral averages obtained on the different sets of frames (eg, Itakura distance). In one such example as described in Appendix B of G.729, cited above, an average of six past frames (including the current frame) and two past frames are calculated. If the distance between these two averages exceeds the threshold (or, not less than the threshold), the SID includes a spectral description that averages the two frames (eg, assuming the signal is partially unstable) . In other respects, the SID includes a spectral description that averages the six frames (eg, assuming the signal is locally stationary). In a specific example of the AMR wideband codec as described in ETSI TS 126 192 V6.0.0, referenced above, the SID includes a jitter indication, the state of the jitter indication being between the current frame and the seven previous frames. The sum of the spectral distances is set according to the distance between the energy of the current frame and the average energy value of the past frame.

方法M100可經實施以使得任務T200自另一過程(諸如，語音編碼過程)接收頻譜傾斜值序列。舉例而言，經組態以執行方法M100之實施例之設備或系統通常亦經組態以對語音信號執行語音編碼方法。語音編碼方法可包括線性預測編碼(LPC)分析，該分析計算一組係數，該組係數將語音信號在時刻t之樣本模型化為語音信號在t之前的時刻之樣本之線性組合。由通信設備(例如，蜂巢式電話)之語音編碼器所執行之LPC分析通常具有級數四、六、八、十、12、16、20、24、28或32。就對語音信號之不同頻帶執行獨立LPC分析而言，任務T200可經配置以接收基於對低頻帶(例如，包括1 kHz以下之頻率)或中頻帶(例如，包括至少處於1 kHz與2 kHz之間的頻率)之分析的頻譜傾斜值序列。Method MlOO can be implemented to cause task T200 to receive a sequence of spectral tilt values from another process, such as a speech encoding process. For example, a device or system configured to perform an embodiment of method M100 is also typically configured to perform a speech encoding method on a speech signal. The speech encoding method may include a linear predictive coding (LPC) analysis that computes a set of coefficients that model the sample of the speech signal at time t as a linear combination of samples of the speech signal at a time prior to t. The LPC analysis performed by a speech coder of a communication device (e.g., a cellular telephone) typically has a number of levels of four, six, eight, ten, twelve, 16, twenty, twenty, twenty or eight. For performing independent LPC analysis on different frequency bands of the speech signal, task T200 can be configured to receive based on a pair of low frequency bands (eg, including frequencies below 1 kHz) or a medium frequency band (eg, including at least 1 kHz and 2 kHz) A sequence of spectral tilt values for the analysis of the frequency).

任務T200可經配置以接收頻譜傾斜值序列作為反射係數序列，諸如，第一或第二反射係數序列。本文所揭示之組態之範圍包括包含方法M100與語音編碼方法之組合(例如，如圖9所述)之若干方法，以及包括方法M100之若干語音編碼方法。Task T200 can be configured to receive a sequence of spectral tilt values as a sequence of reflection coefficients, such as a sequence of first or second reflection coefficients. The scope of the configurations disclosed herein includes several methods including a combination of method M100 and a speech encoding method (e.g., as described in FIG. 9), and a number of speech encoding methods including method M100.

裝置A100可經實施以使得序列產生器120自諸如語音編碼器之另一裝置接收頻譜傾斜值序列。舉例而言，包括裝置A100之實施例之設備或系統通常將亦包括一語音編碼器，該語音編碼器可經組態以對語音信號執行LPC分析。在此情況下，序列產生器120可經配置以接收頻譜傾斜值序列作為反射係數序列。本文所揭示之組態之範圍包括包含裝置A100與語音編碼器之組合(例如，如圖10所述)之若干裝置，以及包括裝置A100之若干語音編碼器。Apparatus A100 can be implemented to cause sequence generator 120 to receive a sequence of spectral tilt values from another apparatus, such as a speech coder. For example, a device or system including an embodiment of apparatus A100 will typically also include a speech encoder that can be configured to perform LPC analysis on the speech signal. In this case, sequence generator 120 can be configured to receive a sequence of spectral tilt values as a sequence of reflection coefficients. The scope of the configurations disclosed herein includes several devices including a combination of device A100 and a speech encoder (e.g., as described in FIG. 10), and a number of speech encoders including device A100.

或者，任務T200可經實施以包括任務T210，任務T210基於語音信號之複數個不活動訊框而計算頻譜傾斜值序列。任務T210可經組態以(例如)根據如下文所述之若干不同技術之一或多者而對一系列訊框之每一者評估信號之頻譜傾斜。圖11A展示方法M100之實施例M200之流程圖，其中實施例M200包括任務T200之此實施例T202。任務T210亦可經配置以將計算出的頻譜傾斜值序列提供至較大過程(諸如，語音編碼方法)之其他任務。方法M100亦可經實施以使得將任務T200實施為任務T210。Alternatively, task T200 can be implemented to include task T210, which calculates a sequence of spectral tilt values based on a plurality of inactive frames of the speech signal. Task T210 can be configured to evaluate the spectral tilt of the signal for each of a series of frames, for example, according to one or more of a number of different techniques as described below. 11A shows a flowchart of an embodiment M200 of method M100, wherein embodiment M200 includes this embodiment T202 of task T200. Task T210 can also be configured to provide a sequence of calculated spectral tilt values to other tasks of a larger process, such as a speech encoding method. Method M100 can also be implemented such that task T200 is implemented as task T210.

圖11B展示裝置A100之實施例A200之方塊圖，其中實施例A200包括序列產生器120之實施例122。序列產生器122包括計算器128，計算器128經組態以基於語音信號之複數個不活動訊框而計算頻譜傾斜值序列。舉例而言，計算器128可經組態以執行如本文所揭示之任務T210之實施例。如同裝置A200之其他元件，計算器128亦可以視為適合於所欲應用之硬體、軟體及/或韌體之任一組合而實施。計算器128亦可經配置以將計算出的頻譜傾斜值序列提供至諸如語音編碼器之較大裝置之其他任務。裝置A100亦可經實施以使得將序列產生器120實施為計算器128。11B shows a block diagram of an embodiment A200 of apparatus A100, wherein embodiment A200 includes an embodiment 122 of sequence generator 120. Sequence generator 122 includes a calculator 128 that is configured to calculate a sequence of spectral tilt values based on a plurality of inactive frames of the speech signal. For example, the calculator 128 can be configured to perform an embodiment of task T210 as disclosed herein. As with other components of device A200, calculator 128 can also be implemented as any combination of hardware, software, and/or firmware suitable for the application. The calculator 128 can also be configured to provide the calculated sequence of spectral tilt values to other tasks such as a larger device of the speech coder. Apparatus A100 can also be implemented such that sequence generator 120 is implemented as calculator 128.

任務T210之典型實施例經組態以將頻譜傾斜計算為語音信號之對應訊框之第一反射係數。可將訊框之第一反射係數(通常表示為k ₀ )計算為比R (1)/R (0)(亦即，訊框之正規化第一自相關值)，對於處於－1至＋1之範圍中之樣本值而言，比R (1)/R (0)具有處於－1與＋1之間的純量值。在此表達式中，R (1)表示訊框之第一自相關係數(亦即，在滯後一個樣本時訊框之自相關函數之值)，且R (0)表示訊框之第零個自相關係數(亦即，在零滯後時訊框之自相關函數之值)。The exemplary embodiment of task T210 is configured to calculate the spectral tilt as the first reflection coefficient of the corresponding frame of the speech signal. The first reflection coefficient of the frame (generally denoted as k ₀ ) can be calculated as the ratio R (1) / R (0) (ie, the normalized first autocorrelation value of the frame), for -1 to +1 The sample value in the range has a scalar value between -1 and +1 than R (1) / R (0). In this expression, R (1) represents the first autocorrelation coefficient of the frame (ie, the value of the autocorrelation function when the frame is delayed by one sample), and R (0) represents the zeroth of the frame. Autocorrelation coefficient (ie, the value of the autocorrelation function at the zero lag time frame).

在其他實施例中，任務T210經組態以將頻譜傾斜計算為語音信號之對應訊框之第二反射係數。訊框之第二反射係數(通常表示為k ₁ )可計算為：其中R (2)表示訊框之第二自相關係數(亦即，在滯後兩個樣本時訊框之自相關函數之值)。任務T210亦可經實施以基於一或多個其他參數(諸如，一或多個LPC濾波係數)而計算對應訊框之一或多個反射係數(例如，第一及/或第二反射係數)。In other embodiments, task T210 is configured to calculate the spectral tilt as the second reflection coefficient of the corresponding frame of the speech signal. The second reflection coefficient of the frame (usually expressed as k ₁ ) can be calculated as: Where R (2) represents the second autocorrelation coefficient of the frame (ie, the value of the autocorrelation function of the frame when the two samples are delayed). Task T210 can also be implemented to calculate one or more reflection coefficients (eg, first and/or second reflection coefficients) of a corresponding frame based on one or more other parameters, such as one or more LPC filter coefficients. .

任務T210之實施例之範圍並不限於將頻譜傾斜計算為反射係數之彼等實施例。其他或另外，任務T210可經組態以執行一或多個其他頻譜評估技術，從而計算一或多個訊框之頻譜傾斜。此等頻譜評估技術可包括將每一訊框之頻譜傾斜計算為高頻帶之能量與低頻帶之能量之間的比。此計算可包括對片段執行頻率變換，諸如離散傅立葉變換(DFT)。此等頻譜評估技術可包括將頻譜傾斜計算為每一片段內之零交叉之數目。在此情況下，較高數目的零交叉可用來指示較大量的高頻能量。The scope of the embodiment of task T210 is not limited to the embodiment in which the spectral tilt is calculated as the reflection coefficient. Alternatively or additionally, task T210 can be configured to perform one or more other spectrum estimation techniques to calculate the spectral tilt of one or more frames. Such spectrum estimation techniques may include calculating the spectral tilt of each frame as the ratio between the energy of the high band and the energy of the low band. This calculation may include performing a frequency transform on the segment, such as a discrete Fourier transform (DFT). Such spectral evaluation techniques may include calculating the spectral tilt as the number of zero crossings within each segment. In this case, a higher number of zero crossings can be used to indicate a larger amount of high frequency energy.

在計算頻譜傾斜值序列時，任務T210可經組態以基於自相關函數之值而執行計算，諸如，如上所述計算一或多個反射係數。計算LPC模型參數(諸如濾波或反射係數)之自相關方法涉及執行一系列迭代以求解包括特普立茲(Toeplitz)矩陣之方程式。在某些實施例中，任務T210經組態以根據用於求解此方程式之熟知的李文森(Levinson)及/或杜賓(Durbin)遞歸演算法之任一者而執行自相關方法。此演算法通常將反射係數(亦稱為偏相關(PARCOR)係數、負PARCOR係數或Schur－Szego參數)計算為產生一組LPC濾波係數之過程中的中間值。In calculating the sequence of spectral tilt values, task T210 can be configured to perform calculations based on the value of the autocorrelation function, such as calculating one or more reflection coefficients as described above. An autocorrelation method of calculating LPC model parameters, such as filtering or reflection coefficients, involves performing a series of iterations to solve an equation including a Toeplitz matrix. In some embodiments, task T210 is configured to perform an autocorrelation method according to any of the well-known Levinson and/or Durbin recursive algorithms for solving this equation. This algorithm typically calculates the reflection coefficient (also known as the partial correlation (PARCOR) coefficient, the negative PARCOR coefficient, or the Schur-Szego parameter) as an intermediate value in the process of generating a set of LPC filter coefficients.

在其他實施例中，任務T210經組態以執行一系列迭代，從而計算一或多個反射係數而非一組濾波係數。舉例而言，任務T210可經組態以使用Leroux－Gueguen演算法之實施例來獲取一或多個反射係數。或者，任務T210可經組態以使用另一熟知迭代方法之實施例進而從自相關值獲取一或多個反射係數，諸如Schur遞歸演算法(其可經組態而用於有效的平行計算)或Burg遞歸演算法。In other embodiments, task T210 is configured to perform a series of iterations to calculate one or more reflection coefficients rather than a set of filter coefficients. For example, task T210 can be configured to acquire one or more reflection coefficients using an embodiment of the Leroux-Gueguen algorithm. Alternatively, task T210 can be configured to obtain one or more reflection coefficients from an autocorrelation value using an embodiment of another well-known iterative method, such as a Schur recursive algorithm (which can be configured for efficient parallel computation) Or Burg recursive algorithm.

任務T210可經組態以計算語音信號之對應訊框的自相關函數之一或多個值。舉例而言，任務T210可經組態以根據諸如下述表達式之表達式而針對特定滯後值m(其中m為不小於零之整數)來評估訊框之自相關函數：其中N表示訊框中之樣本之數目。或者，任務T210可經組態以(例如，自語音編碼器，或語音編碼方法或其他過程)接收自相關函數之值。Task T210 can be configured to calculate one or more values of the autocorrelation function of the corresponding frame of the speech signal. For example, task T210 can be configured to evaluate the autocorrelation function of the frame for a particular hysteresis value m (where m is an integer not less than zero) according to an expression such as the following expression: Where N is the number of samples in the frame. Alternatively, task T210 can be configured to receive values of the autocorrelation function (eg, from a speech coder, or a speech encoding method or other process).

語音編碼器或語音編碼方法可經組態以將自相關函數之值用於編碼操作中，諸如計算LPC模型之參數(例如，濾波及/或反射係數)。可能需要此語音編碼器或語音編碼方法對自相關值執行一或多個預處理操作。舉例而言，可藉由執行諸如下述操作之操作而對自相關值R (m )進行頻譜平滑： The speech encoder or speech encoding method can be configured to use the value of the autocorrelation function in the encoding operation, such as calculating parameters (eg, filtering and/or reflection coefficients) of the LPC model. This speech coder or speech encoding method may be required to perform one or more pre-processing operations on the autocorrelation values. For example, the autocorrelation value R ( m ) can be spectrally smoothed by performing an operation such as the following:

在此情境中，任務T210可經組態以對自相關值執行頻譜平滑或另一預處理操操作，及/或使用經過頻譜平滑或以其他方式進行預處理的自相關值來計算頻譜傾斜參數之值。In this scenario, task T210 can be configured to perform spectral smoothing or another pre-processing operation on autocorrelation values, and/or to calculate spectral tilt parameters using spectrally smoothed or otherwise pre-processed autocorrelation values. The value.

在將自相關函數應用於語音信號(例如，藉由任務T210，或語音編碼器或語音編碼方法)之前，可能需要將視窗函數w [n ]應用於該信號。舉例而言，可能需要使當前正被應用自相關函數之訊框外面的語音信號歸零。在某些情況下，視窗函數w [n ]為矩形或三角形的。可能需要使用在視窗之每一端具有低樣本權值之楔形視窗函數，此可幫助減少視窗外之分量的影響。舉例而言，可能需要升餘弦視窗，諸如，下述之漢明(Hamming)視窗函數：其中N為訊框中之樣本之數目。Before applying the autocorrelation function to the speech signal (eg, by task T210, or a speech coder or speech coding method), it may be desirable to apply the window function w [ n ] to the signal. For example, it may be desirable to zero the speech signal that is currently being applied to the frame of the autocorrelation function. In some cases, the window function w [ n ] is rectangular or triangular. It may be necessary to use a wedge window function with low sample weights at each end of the window, which can help reduce the effects of components outside the window. For example, a raised cosine window may be required, such as the Hamming window function described below: Where N is the number of samples in the frame.

可使用的其他楔形視窗包括漢寧(Hanning)、布雷克曼(Blackman)、凱斯(Kaiser)及巴列特(Bartlett)視窗。有窗訊框s _w [n ]可根據諸如下述表達式之表達式而計算： Other wedge windows that may be used include Hanning, Blackman, Kaiser, and Bartlett windows. The window frame s _w [ n ] can be calculated from an expression such as the following expression:

視窗函數無需對稱，以使得視窗之一半可以與另一半不同之方式進行加權。亦可使用混合視窗，諸如漢明餘弦視窗，或具有兩半不同視窗(例如，大小不同的兩個漢明視窗)之視窗。可在樣本值及/或有窗值用於評估自相關函數之前對其執行諸如感知加權之一或多個其他預處理操作(例如，藉由任務T210或語音編碼器或語音編碼方法)。The window function does not need to be symmetrical so that one half of the window can be weighted differently from the other half. You can also use a hybrid window, such as a Hamming cosine window, or a window with two halves of different windows (for example, two Hamming windows of different sizes). The sample value and/or the windowed value may be subjected to one of a perceptual weighting or a plurality of other pre-processing operations (e.g., by task T210 or a speech coder or speech encoding method) before the autocorrelation function is evaluated.

視窗函數w [n ]可經組態以包括當前訊框之樣本以及來自一或多個鄰近訊框之樣本。在某些情況下，視窗包括來自當前訊框以及鄰近的先前及後來訊框之樣本(例如，包括緊接在20毫秒訊框之前及之後的5毫秒之5－20－5視窗)、在其他情況下，視窗包括僅來自當前訊框及鄰近的先前訊框之樣本(例如，包括當前20毫秒訊框及先前訊框之最後10毫秒的10－20視窗)。The window function w [ n ] can be configured to include samples of the current frame and samples from one or more adjacent frames. In some cases, the window includes samples from the current frame and the adjacent previous and subsequent frames (for example, 5-20-5 windows including 5 milliseconds immediately before and after the 20 millisecond frame), in other In this case, the window includes samples from only the current frame and adjacent previous frames (eg, 10-20 windows including the current 20 millisecond frame and the last 10 milliseconds of the previous frame).

對將視窗函數應用於語音信號(例如，藉由任務T210或語音編碼器或語音編碼方法)的情況而言，訊框之自相關函數可根據諸如下述表達式之表達式而計算： For the case where a window function is applied to a speech signal (for example, by task T210 or a speech coder or speech coding method), the autocorrelation function of the frame can be calculated according to an expression such as the following expression:

如上所述，可能需要任務T300或平滑器130平滑僅包括對應於不活動訊框之值的序列。在此情況下，方法M100或裝置A100可經配置以(例如，自語音編碼器或語音編碼方法)接收訊框中之語音活動之位準的指示。舉例而言，此指示(亦稱為"語音活動指示")可具有二進位變數或旗標之形式，該二進位變數或旗標之狀態指示對應訊框是活動的還是不活動的。As described above, it may be desirable for task T300 or smoother 130 to smooth a sequence that only includes values corresponding to inactive frames. In this case, method M100 or apparatus A100 can be configured to receive an indication of the level of voice activity in the frame (eg, from a speech coder or a speech encoding method). For example, this indication (also referred to as "voice activity indication") may have the form of a binary variable or flag indicating whether the corresponding frame is active or inactive.

語音活動指示可用於控制平滑任務T300之操作。舉例而言，語音活動指示可用於允許自對應不活動訊框產生經平滑之頻譜傾斜值，及/或防止自對應活動訊框產生經平滑之頻譜傾斜值。在一個此實例中，電腦或處理器經組態以控制任務T300僅在語音活動指示指示對應訊框為不活動訊框時平滑頻譜傾斜值。或者，任務T300可包括根據對應語音活動偵測之值而決定是否產生經平滑之頻譜傾斜值或決定接受還是去除頻譜傾斜值。圖12A展示方法M101之實施例M110之流程圖，實施例M110包括任務T300之此實施例T320。The voice activity indication can be used to control the operation of the smoothing task T300. For example, the voice activity indication can be used to allow smoothed spectral tilt values to be generated from the corresponding inactive frame and/or to prevent smoothed spectral tilt values from being generated from the corresponding active frame. In one such example, the computer or processor is configured to control task T300 to smooth the spectral tilt value only when the voice activity indication indicates that the corresponding frame is an inactive frame. Alternatively, task T300 can include determining whether to generate a smoothed spectral tilt value or to determine whether to accept or remove the spectral tilt value based on the value of the corresponding voice activity detection. 12A shows a flowchart of an embodiment M110 of method M101, which includes an embodiment T320 of task T300.

語音活動指示可用於控制計算任務T210之操作。舉例而言，語音活動指示可用於允許產生對應不活動訊框之頻譜傾斜，及/或防止產生對應活動訊框之頻譜傾斜。在一個此實例中，處理器經組態以控制任務T210僅在語音活動指示指示當前訊框為不活動訊框時計算頻譜傾斜。或者，根據對應語音活動指示之值，任務T210可經組態以包括決定是否產生給定訊框之頻譜傾斜，或可經組態以控制其輸入(例如，接受還是去除訊框)及/或其輸出(例如，是否發布頻譜傾斜值)。圖12B展示方法M200之實施例M210之流程圖，實施例M210包括任務T202之實施例T204，其中任務T204包括任務T210之此實施例T220。The voice activity indication can be used to control the operation of the computing task T210. For example, the voice activity indication can be used to allow for the generation of a spectral tilt of the corresponding inactive frame and/or to prevent the spectral tilt of the corresponding active frame from being generated. In one such example, the processor is configured to control task T210 to calculate the spectral tilt only when the voice activity indication indicates that the current frame is an inactive frame. Alternatively, task T210 can be configured to include determining whether to generate a spectral tilt for a given frame, or can be configured to control its input (eg, accept or remove frames) and/or based on the value of the corresponding voice activity indication. Its output (for example, whether to release the spectral tilt value). 12B shows a flowchart of an embodiment M210 of method M200, which includes an embodiment T204 of task T202, wherein task T204 includes this embodiment T220 of task T210.

作為接收語音活動指示之替代方式，方法M100可經實施以包括任務T100，任務T100經組態以指示訊框是活動的還是不活動的。舉例而言，任務T100可經組態以計算如上所述之語音活動指示(VAI)。圖12C展示方法M101之包括任務T100之實施例M120的流程圖，且圖12D展示方法M200之包括任務T100之實施例M220的流程圖。任務T100可經組態以基於一或多個因素而將訊框分為活動或不活動的，該或該等因素諸如全頻帶能量、低頻帶能量、高頻帶能量、頻譜參數(例如，一或多個LSF及/或反射係數)、週期性及零交叉率。舉例而言，此分類可包括將此類特性之值與固定或適應性臨限值相比較，及/或計算此類特性之值的改變量值(例如，兩個值之間的差之量值，或一值與一移動平均值之間的差之量值)並將該量值與固定或適應性臨限值相比較。As an alternative to receiving a voice activity indication, method M100 can be implemented to include task T100 configured to indicate whether the frame is active or inactive. For example, task T100 can be configured to calculate a voice activity indication (VAI) as described above. 12C shows a flowchart of an embodiment M120 of method M101 that includes task T100, and FIG. 12D shows a flowchart of an embodiment M220 of method M200 that includes task T100. Task T100 can be configured to divide the frame into active or inactive based on one or more factors, such as full band energy, low band energy, high band energy, spectral parameters (eg, one or Multiple LSF and/or reflection coefficients), periodicity and zero crossing rate. For example, this classification may include comparing the value of such a characteristic to a fixed or adaptive threshold, and/or calculating a magnitude of the change in the value of such characteristic (eg, the amount of difference between two values) The value, or the magnitude of the difference between a value and a moving average, and compares the magnitude to a fixed or adaptive threshold.

任務T100可經組態以評估當前訊框在低頻帶及高頻帶之每一者中之能量，並在每一頻帶中之能量小於(或者，不大於)各別臨限值時指示訊框為不活動的。此等臨限值可為固定或適應性的。舉例而言，每一臨限值可基於所要之編碼速率。在上文所引用之C.S0014-C v.1.0之章節4.7中描述了一對適應性臨限值之一實例。在此實例中，每一頻帶之臨限值基於錨定操作點(如得自所要之平均資料速率)、先前訊框之在彼頻帶中之背景雜訊位準之估計及先前訊框之在彼頻帶中之信雜比。Task T100 can be configured to evaluate the energy of the current frame in each of the low and high frequency bands, and the indication frame is when the energy in each frequency band is less than (or, not greater than) the respective threshold. Inactive. These thresholds can be fixed or adaptive. For example, each threshold can be based on the desired coding rate. An example of a pair of adaptive thresholds is described in Section 4.7 of C.S0014-C v.1.0, cited above. In this example, the threshold for each band is based on the anchor operating point (if obtained from the desired average data rate), the estimate of the background noise level in the previous band of the previous frame, and the previous frame. The signal-to-noise ratio in the band.

自活動語音至不活動語音之過渡通常發生在一段具有若干訊框之時期上，且除背景雜訊之外，在自活動語音過渡之後的最先幾個不活動訊框亦可包括發音殘餘。發音殘餘(voicing remnant)可使得此等後過渡不活動訊框具有與背景雜訊之彼等頻譜傾斜不同的頻譜傾斜，且此等差別可破壞任務T200所產生之頻譜傾斜值序列，並導致不必要的SID傳輸。The transition from active speech to inactive speech usually occurs over a period of time with a number of frames, and in addition to background noise, the first few inactive frames after the transition from the active speech may also include residuals of pronunciation. The voicing remnant may cause the post-transition inactive frames to have a spectral tilt that is different from the spectral tilt of the background noise, and such differences may corrupt the sequence of spectral tilt values generated by task T200 and result in no The necessary SID transmission.

如上所述，可能需要任務T200產生僅基於不活動訊框之序列x 之值。同樣地，可能需要任務T300產生僅基於來自不活動訊框之一或多個頻譜傾斜值的經平滑序列y之值。亦可能需要方法M100之實施例避免使用來自一或多個後過渡訊框之頻譜傾斜值更新頻譜傾斜輪廓。此限制可幫助減小決定任務T500作出錯誤正傳輸指示之可能性。As noted above, task T200 may be required to generate a value based only on the sequence x of inactive frames. As such, task T300 may be required to generate a value based on the smoothed sequence y from one or more spectral tilt values of the inactive frame. It may also be desirable for embodiments of method M100 to avoid updating the spectral tilt profile using spectral tilt values from one or more post transition frames. This limitation can help reduce the likelihood that the decision task T500 will make an incorrect positive transmission indication.

任務T200可經組態以根據對應不活動訊框與先前活動訊框之間的時間距離而產生所產生之頻譜傾斜值序列之一或多個值。舉例而言，任務T200或任務T300之此實施例可經組態以在自活動語音過渡之後針對一或多個不活動訊框而延緩或暫時中止頻譜傾斜輪廓更新之開始。圖13A及圖13B分別說明此過渡及此延緩或暫時中止之影響的實例。圖13A展示後過渡訊框中之發音殘餘所引起的經平滑頻譜傾斜輪廓振幅之急劇改變。此改變可導致不當的正SID傳輸決定。在此特定實例中，頻譜傾斜參數為第一反射係數k ₀ ，以使得發音殘餘引起經平滑頻譜傾斜輪廓之振幅之急劇上升，儘管發音殘餘可在使用另一頻譜傾斜參數的情況下改為引起振幅之急劇降低。藉由比較，圖13B展示一實例，其中應用延緩(亦稱為"延遲")以在後過渡訊框期間去能經平滑輪廓之更新。在此情況下，並不發生圖13A中所看到的急劇上升。在一特定實例中，在自活動語音過渡至不活動語音之後使用五個訊框之延遲。Task T200 can be configured to generate one or more values of the generated sequence of spectral tilt values based on the temporal distance between the corresponding inactive frame and the previous active frame. For example, this embodiment of task T200 or task T300 can be configured to delay or temporarily suspend the beginning of a spectrally skewed profile update for one or more inactive frames after a transition from active speech. Figures 13A and 13B illustrate examples of this transition and the effects of this delay or temporary suspension, respectively. Figure 13A shows a sharp change in the amplitude of the smoothed spectral tilt profile caused by the residual of the sound in the post transition frame. This change can result in improper positive SID transmission decisions. In this particular example, the spectral tilt parameter is the first reflection coefficient k ₀ such that the pronunciation residual causes a sharp rise in the amplitude of the smoothed spectrally sloped profile, although the pronunciation residual can instead be caused by the use of another spectral tilt parameter. The amplitude is drastically reduced. By way of comparison, Figure 13B shows an example in which the application delay (also known as "delay") is used to update the smoothed contour during the post transition frame. In this case, the sharp rise seen in Fig. 13A does not occur. In a particular example, the delay of five frames is used after transitioning from active speech to inactive speech.

圖14展示一指令集之原始碼列表之一實例，該指令集可由可程式化邏輯元件陣列或其他狀態機(例如，處理器)執行以執行方法M100之一實施例，該實施例包括任務T310之實施例T312以及任務T400及T500之實施例。在此實例中，任務T312讀取儲存語音活動指示之當前狀態的變數FRAME_ACTIVE。若FRAME_ACTIVE之值為TRUE(此指示當前訊框係活動的)，則將延遲計數儲存至變數hangover_1，且該指令集終止。在此特定實例中，延遲計數為5，儘管可使用任何其他正整數值。當FRAME_ACTIVE之值變為FALSE時(此指示當前訊框係不活動的)，該指令集之每一後續迭代使變數hangover_1之值遞減，且至變數hangover_1之值達到零時便早早終止。在此實例中，任務T400及T500使用如上文參考圖8B而描述之指令加以實施。14 shows an example of a source code list of an instruction set executable by an array of programmable logic elements or other state machine (eg, a processor) to perform an embodiment of method M100, the embodiment including task T310 Embodiment T312 and embodiments of tasks T400 and T500. In this example, task T312 reads the variable FRAME_ACTIVE that stores the current state of the voice activity indication. If the value of FRAME_ACTIVE is TRUE (this indicates that the current frame is active), the delay count is stored to the variable hangover_1 and the instruction set is terminated. In this particular example, the delay count is 5, although any other positive integer value can be used. When the value of FRAME_ACTIVE changes to FALSE (this indicates that the current frame is inactive), each subsequent iteration of the instruction set decrements the value of the variable hangover_1 and terminates early when the value of the variable hangover_1 reaches zero. In this example, tasks T400 and T500 are implemented using instructions as described above with reference to Figure 8B.

方法M100及裝置A100之實例包括經組態以根據一更新控制信號之狀態而控制頻譜傾斜輪廓之更新的若干實施例。此信號可基於如上所述之語音活動指示。圖14所示之變數FRAME_ACTIVE為更新控制信號(具體言之，更新去能信號)之一實例。延遲邏輯電路50可用於藉由在語音行為指示中延緩活動至不活動之過渡而計算更新控制信號。圖15展示延遲邏輯電路50之實施例52，實施例52經組態以產生更新控制信號(具體言之，更新賦能信號)。在此圖中，語音活動指示之狀態對於不活動訊框而言為低而對於活動訊框而言為高，具有三個延緩元件之子取樣延緩線用於實施三個訊框之延遲，且邏輯"或非"運算用於組合當前與延緩的語音活動指示。在其他實例中，語音活動指示之狀態對於不活動訊框而言可為高而對於活動訊框而言可為低，且在此情況下，可使用邏輯"及"運算組合當前與延緩的語音活動指示。就子取樣延緩線而言，此電路之其他實例可根據所要之延遲持續時間而使用任一數目的延緩元件。或者，延遲邏輯電路50可經實施以使用延緩計數器自活動至不活動之過渡進行遞減計數(或遞增計數)，及/或計算更新去能信號而非更新賦能信號。Examples of method M100 and apparatus A100 include several embodiments configured to control the update of the spectral tilt profile in accordance with the state of an update control signal. This signal can be based on a voice activity indication as described above. The variable FRAME_ACTIVE shown in Fig. 14 is an example of an update control signal (specifically, an update disable signal). Delay logic circuit 50 can be used to calculate an update control signal by delaying the transition from active to inactive in the voice behavior indication. 15 shows an embodiment 52 of delay logic circuit 50, which is configured to generate an update control signal (specifically, an update enable signal). In this figure, the state of the voice activity indication is low for the inactive frame and high for the active frame, and the sub-sampling delay line with three delay elements is used to implement the delay of the three frames, and the logic The "or" operation is used to combine the current and deferred voice activity indications. In other examples, the status of the voice activity indication may be high for the inactive frame and low for the active frame, and in this case, the current and deferred speech may be combined using a logical "and" operation. Activity instructions. In the case of a subsampling delay line, other examples of this circuit may use any number of delay elements depending on the desired delay duration. Alternatively, delay logic circuit 50 may be implemented to count down (or up count) using a delay counter from active to inactive transition, and/or to calculate an update disable signal instead of an update enable signal.

序列產生器120可經組態以根據對應不活動訊框與先前活動訊框之間的時間距離而產生所產生之頻譜傾斜值序列之一或多個值。舉例而言，序列產生器120或平滑器130可經組態以根據所要之延遲而在活動至不活動之過渡之後暫時中止頻譜傾斜輪廓更新之開始。序列產生器120或平滑器130之此實施例可經組態以包括如上所述之延遲邏輯電路50之實施例。圖16A展示平滑器132之一個此實施例134。在此實例中，選擇器(例如，多工器)根據更新控制信號之狀態而在序列之當前值(亦即，x [n ])與經平滑頻譜傾斜輪廓之先前值(亦即，y [n －1])之間切換平滑器之輸入。或者，平滑器110之實施例可經組態以在更新控制信號為高時儲存當前值x [n ]，且在更新控制信號為低時將此儲存值用於輸入。The sequence generator 120 can be configured to generate one or more values of the generated sequence of spectral tilt values based on the temporal distance between the corresponding inactive frame and the previous active frame. For example, sequence generator 120 or smoother 130 can be configured to temporarily suspend the beginning of a spectrally skewed profile update after a transition to active to inactive based on the desired delay. This embodiment of sequence generator 120 or smoother 130 can be configured to include embodiments of delay logic circuit 50 as described above. FIG. 16A shows one such embodiment 134 of smoother 132. In this example, the selector (e.g., multiplexer) is at the current value of the sequence (i.e., x [ n ]) and the previous value of the smoothed spectral slope profile based on the state of the update control signal (i.e., y [ Switch the input of the smoother between n -1]). Alternatively, an embodiment of smoother 110 can be configured to store the current value x [ n ] when the update control signal is high and to use this stored value for input when the update control signal is low.

圖16B展示平滑器132之另一實施例136，實施例136包括如上所述之延遲邏輯電路50之實施例。此實例包括兩個選擇器(例如，多工器)，該兩個選擇器經組態以根據更新控制信號之狀態而輸出不同的增益因數。第一選擇器輸出待應用於x [n ]之增益因數。當更新控制信號之狀態為高時，此選擇器便輸出增益因數F10，且當更新控制信號之狀態為低時，此選擇器輸出增益因數F12。第二選擇器輸出待應用於y [n －1]之增益因數。當更新控制信號之狀態為高時，此選擇器輸出增益因數F20，且當更新控制信號之狀態為低時，此選擇器輸出增益因數F22。在一實例中，增益因數F10及F12分別具有值0.2及0，且增益因數F20及F22分別具有值0.8及1.0。FIG. 16B shows another embodiment 136 of smoother 132, which includes an embodiment of delay logic circuit 50 as described above. This example includes two selectors (eg, multiplexers) that are configured to output different gain factors depending on the state of the update control signal. The first selector outputs a gain factor to be applied to x [ n ]. When the state of the update control signal is high, the selector outputs a gain factor F10, and when the state of the update control signal is low, the selector outputs a gain factor F12. The second selector outputs a gain factor to be applied to y [ n -1]. When the state of the update control signal is high, the selector outputs a gain factor F20, and when the state of the update control signal is low, the selector outputs a gain factor F22. In one example, gain factors F10 and F12 have values of 0.2 and 0, respectively, and gain factors F20 and F22 have values of 0.8 and 1.0, respectively.

平滑器136之另一實施例可經組態以在每一增益因數之兩個以上的值之間進行選擇，從而使得平滑器自暫時中止至正常操作之過渡更為漸進。舉例而言，替代產生雙態控制信號之延遲邏輯電路，此平滑器可包括延遲邏輯電路50之經組態以產生具有兩個以上狀態之控制信號的實施例。延遲邏輯電路50之此實施例可經組態以產生回應於活動至不活動之過渡而經歷c個狀態之更新控制信號，其中c為大於二之整數。在此情況下，平滑器136之該兩個選擇器可經組態以使得，回應於過渡且在一系列c個訊框上，應用於x [n ]之增益因數經歷自最小值至最大值(例如，自0.0至0.2)之c個值，而應用於y [n －1]之增益因數經歷自最大值至最小值(例如，自1.0至0.8)之c個值。Another embodiment of smoother 136 can be configured to select between more than two values of each gain factor, thereby making the smoother transition from temporary suspension to normal operation more gradual. For example, instead of a delay logic circuit that produces a two-state control signal, the smoother can include an embodiment of the delay logic circuit 50 configured to generate a control signal having more than two states. This embodiment of delay logic circuit 50 can be configured to generate an update control signal that experiences c states in response to an active to inactive transition, where c is an integer greater than two. In this case, the two selectors of smoother 136 can be configured such that, in response to the transition and over a series of c frames, the gain factor applied to x [ n ] experiences from a minimum to a maximum The c values are (for example, from 0.0 to 0.2), and the gain factors applied to y [ n -1] are subjected to c values from the maximum value to the minimum value (for example, from 1.0 to 0.8).

編碼增益量度描述如語音編碼器(或語音編碼方法)所接收之信號之能量與對應編碼誤差之能量之間的關係。通常，語音編碼器或語音編碼方法比起不活動訊框而言更為有效地編碼活動訊框，以使得編碼增益量度對於活動訊框而言高於不活動訊框。訊框之編碼增益量度之一實例為初始信號能量E_in (例如，有窗訊框之能量)與編碼殘餘能量E_err 之比。在此等情況下，通常將每一信號之能量計算為樣本之量值之和。LPC分析之另一普通編碼增益量度為預測增益，可將其計算為(1-)之乘積之倒數，對於所有i j (或者，對於所有i，1<i j ))，其中j為LPC分析之級數，而k _i 指示第i個反射係數。The coding gain metric describes the relationship between the energy of the signal received by the speech coder (or speech coding method) and the energy of the corresponding coding error. In general, a speech coder or speech coding method encodes an active frame more efficiently than an inactive frame such that the coding gain metric is higher for the active frame than the inactive frame. An example of a coding gain metric of a frame is the ratio of the initial signal energy E _in (eg, the energy of the window frame) to the coded residual energy E _err . In such cases, the energy of each signal is typically calculated as the sum of the magnitudes of the samples. Another common coding gain metric for LPC analysis is the prediction gain, which can be calculated as (1- The inverse of the product of the product, for all i j (or, for all i, 1< i j )), where j is the number of stages of the LPC analysis and k _i indicates the ith reflection coefficient.

語音編碼器或語音編碼方法所達成之編碼增益度往往隨著信號改變之統計而逐訊框地發生變化。然而，在一系列不活動訊框期間，可能預期信號相對平穩以使得其統計不會發生顯著的變化。因此，可預期編碼增益量度之值G_c 甚至在背景雜訊於感知上發生顯著改變期間亦保持相對恆定。The degree of coding gain achieved by a speech coder or speech coding method tends to vary frame by frame as the signal changes. However, during a series of inactive frames, the signal may be expected to be relatively smooth so that its statistics do not change significantly. Accordingly, coding gain can be expected metric value of G _c also remains relatively constant even during significant changes in the perceived background noise.

編碼增益量度之值G_c 之較大改變可指示語音信號由於背景雜訊改變之外的因素而發生改變。可引起值G_c 之此改變之一個因素為處於編碼器之語音活動偵測器之偵測臨限值以下的語音活動。在此情況下，較大改變亦可發生在頻譜傾斜值中，從而導致即使背景雜訊尚未顯著改變，任務T500仍作出正SID傳輸決定。Changing the value of the larger coding gain G _c of the metric may be indicative of a speech signal due to the factors other than the background noise changes changed. This may be a factor contributing to change the value of G _c is a threshold value in the detection of the voice activity detector of the encoder of the speech activity. In this case, a larger change can also occur in the spectral tilt value, causing the task T500 to make a positive SID transmission decision even if the background noise has not changed significantly.

可能需要實施方法M100以慮及與編碼增益量度之值G_c 之改變相關聯的頻譜傾斜改變。舉例而言，任務T200之實施例T230或任務T300之實施例T330可經組態以基於編碼增益量度之值G_c 之變化的量值而賦能或去能輪廓更新。Method M100 may be required to implement the coding gain taking into account the value of the metric spectral tilt change associated with a change of G _c. For example, embodiments of task T200 or T230 of task T300 T330 embodiments may be configured based on a measure of coding gain change value G _c of the magnitude of the energized or de-energized profile updating.

在某些情況下，編碼增益量度可依據編碼誤差進行計算，正如在如下之表達式中： In some cases, the coding gain metric can be calculated based on the coding error, as in the following expression:

同樣地，預測增益亦可計算為預測誤差，正如在如下之表達式中：，對於所有，i j (或者，對於所有，1<i j ))。Similarly, the prediction gain can also be calculated as the prediction error, as in the following expression: For all, i j (or, for all, 1< i j )).

編碼增益量度亦可根據其他表達式進行計算，該等其他表達式(例如)亦包括下述乘積：，其中對於所有，i j (或者，對於所有，1<i j ))，或包括E_in 與E_err 之間的比作為因數或項。The coding gain metric can also be calculated from other expressions, such as the following products, for example: , for all, i j (or, for all, 1< i j )), or include the ratio between E _in and E _err as a factor or term.

編碼增益量度可在線性標度上或另一域中(諸如，在對數標度上)進行表示。此等表達式之實例包括下述表達式： The coding gain metric can be represented on a linear scale or in another domain, such as on a logarithmic scale. Examples of such expressions include the following expressions:

編碼增益量度通常係針對每一訊框而評估，但亦可較不頻繁地(例如，針對每兩個或每三個訊框)及/或在較長間隔上(例如，在一對或三個訊框上)進行評估。Coding gain metrics are typically evaluated for each frame, but may be less frequent (eg, for every two or every three frames) and/or at longer intervals (eg, in one or three Evaluation on the frame.

在典型配置中，任務T230或任務T330經組態以在值G_c 自一個不活動訊框至下一不活動訊框改變超過臨限量(或者，不小於臨限量)時去能所產生之頻譜傾斜輪廓之更新。在一特定實例中，任務T330經組態以在預測增益之值自先前不活動訊框至當前不活動訊框改變超過0.72 dB時去能經平滑輪廓之更新。任務T230或任務T330之實施例可經組態以應用延遲，從而將此去能擴展至一或多個後續訊框。任務T230或任務T330之另一實施例亦可經組態以如上文所述(例如，參考圖13A至圖16B)在自活動語音過渡之後應用延遲。In a typical configuration, task T230 or task T330 is configured to dissipate the generated spectrum when the value G _{c changes} from an inactive frame to a next inactive frame by more than a threshold amount (or no less than a threshold amount). The update of the skewed outline. In a particular example, task T330 is configured to update the smoothed contour when the value of the predicted gain changes from the previous inactive frame to the current inactive frame by more than 0.72 dB. Embodiments of task T230 or task T330 can be configured to apply a delay to extend this de-enforcement to one or more subsequent frames. Another embodiment of task T230 or task T330 can also be configured to apply a delay after the transition from a live voice as described above (eg, with reference to Figures 13A-16B).

可能需要實施裝置A100以慮及與編碼增益量度(諸如，上述實例之一者)之值G_c 之改變相關聯的頻譜傾斜輪廓改變。舉例而言，裝置A100可經實施以包括經組態以產生一更新控制信號之控制信號產生器60，該更新控制信號之狀態基於預測增益之變化之量值。圖17A展示控制信號產生器60之一實例62之方塊圖。控制信號產生器60亦可經實施以應用延遲，如同在圖17B所示之控制信號產生器64之實例中一樣。在一特定實例中，臨限值T30之值為0.72 dB。替代經組態以在語音活動指示中延緩活動至不活動之過渡的電路或除了該電路之外，平滑器134或136之實施例可包括控制信號產生器60之實施例。舉例而言，此實施例可包括如圖18所示之控制信號產生器66，控制信號產生器66組合延遲邏輯電路62與控制信號產生器64之操作。Embodiment of apparatus A100 may be necessary to account for the measure of coding gain (such as one of those above example) the spectrum change of the value of G _c of the associated inclined profile changes. For example, apparatus A100 can be implemented to include a control signal generator 60 configured to generate an update control signal whose state is based on a magnitude of a change in predicted gain. FIG. 17A shows a block diagram of an example 62 of control signal generator 60. Control signal generator 60 may also be implemented to apply delays as in the example of control signal generator 64 shown in Figure 17B. In a particular example, the value of threshold T30 is 0.72 dB. Embodiments of smoother 134 or 136 may include embodiments of control signal generator 60 instead of or in addition to the circuitry configured to delay the transition from active to inactive in the voice activity indication. For example, this embodiment can include a control signal generator 66 as shown in FIG. 18 that combines the operation of delay logic circuit 62 with control signal generator 64.

方法M100之實施例可經組態以根據編碼增益量度之值之改變而控制SID傳輸指示之產生。舉例而言，方法M100之實施例可包括任務T400之一實施例，任務T400之該實施例經組態以在編碼增益量度(例如，預測增益)之值自一個不活動訊框至下一不活動訊框改變超過臨限量(或者，不小於臨限量)時輸出距離零。另外或在替代例中，方法M100之實施例可包括任務T500之一實施例，任務T500之該實施例經組態以根據預測增益之變化之量值而賦能或去能正SID傳輸指示之產生。任務T500之一個此實施例T510經組態以去能正SID傳輸指示之產生，除非預測增益自先前不活動訊框至當前不活動訊框改變小於(或者，不超過)臨限值。在一個此特定實例中，該臨限值為0.65 dB。除了控制頻譜傾斜輪廓之更新之外或作為控制頻譜傾斜輪廓之更新的替代方式，可執行對傳輸指示之產生之控制。Embodiments of method M100 can be configured to control the generation of SID transmission indications based on changes in the value of the coding gain metric. For example, an embodiment of method M100 can include an embodiment of task T400, the embodiment of task T400 being configured to encode a gain metric (eg, predictive gain) from one inactive frame to the next. The output distance is zero when the active frame changes beyond the threshold amount (or, not less than the threshold amount). Additionally or in the alternative, an embodiment of method M100 can include an embodiment of task T500, the embodiment of task T500 being configured to energize or de-energize a positive SID transmission indication based on a magnitude of a change in predicted gain. produce. One such embodiment T510 of task T500 is configured to disable the generation of a positive SID transmission indication unless the predicted gain changes from a previous inactive frame to a current inactive frame less than (or does not exceed) a threshold. In one particular example, the threshold is 0.65 dB. In addition to controlling the update of the spectral tilt profile or as an alternative to controlling the update of the spectral tilt profile, control of the generation of the transmission indication can be performed.

裝置A100之實施例可經組態以根據編碼增益量度之值G_c 之改變而控制SID傳輸指示之產生。圖19A展示傳輸指示控制電路70之一實例72之方塊圖，實例72經組態以根據臨限值T40與預測增益改變之量值之間的關係而閘控正SID傳輸指示。在一特定實例中，臨限值T40之值為0.65 dB。圖19B展示比較器152之實施例156之方塊圖，實施例156包括傳輸指示控制電路72。Example of apparatus A100 may be configured to change the value of G _c according to the measure of coding gain is controlled to produce an indication of the SID transmission. 19A shows a block diagram of an example 72 of a transmission indication control circuit 70 that is configured to gate a positive SID transmission indication based on a relationship between a threshold value T40 and a magnitude of a predicted gain change. In a particular example, the value of threshold T40 is 0.65 dB. 19B shows a block diagram of an embodiment 156 of comparator 152, which includes transmission indication control circuitry 72.

裝置A100之實施例可經組態以基於編碼增益量度之值G_c 之改變而控制更新控制信號與SID傳輸指示之產生。圖20展示控制電路80之經組態以執行此等操作之一實例82的方塊圖。此電路可經配置以自比較器150接收SID傳輸指示，並將更新控制信號提供至平滑器130。此電路亦可實施於平滑器130或比較器150中。舉例而言，在平滑器134或136中，控制電路82可經配置以取代延遲邏輯電路52，並根據預測增益而閘控來自比較器150之SID傳輸指示。在另一實例中，控制電路82可配置於比較器152中，以根據預測增益而閘控SID傳輸指示，且亦將更新控制信號提供至平滑器130。Example of apparatus A100 may be configured to change the value of a coding gain based on a measure of G _c and the control of the update control signal is generated indicative of the SID transmission. 20 shows a block diagram of an example 82 of control circuit 80 configured to perform such operations. This circuit can be configured to receive a SID transmission indication from comparator 150 and provide an update control signal to smoother 130. This circuit can also be implemented in smoother 130 or comparator 150. For example, in smoother 134 or 136, control circuit 82 can be configured to replace delay logic circuit 52 and gate the SID transmission indication from comparator 150 based on the predicted gain. In another example, control circuit 82 can be configured in comparator 152 to gate the SID transmission indication based on the predicted gain and also provide an update control signal to smoother 130.

圖21展示一指令集之原始碼列表之一實例，該指令集可由可程式化邏輯元件陣列或其他狀態機(例如，處理器)執行以執行方法M100之一實施例，該實施例包括任務T312及T330之實施例T332、任務T500之實施例T510及任務T400之實施例。在此實例中，變數FRAME_ACTIVE之狀態指示當前訊框是活動的還是不活動的，變數Y_VALID之狀態指示是否之前已調用該指令集(且因此而指示儲存於變數y_current中之值是否有效)，且變數Gc之值指示當前訊框之預測增益。21 shows an example of a source code list of an instruction set that may be executed by an array of programmable logic elements or other state machine (eg, a processor) to perform an embodiment of method M100, which includes task T312 Embodiments of Embodiment T332 of T330, Embodiment T510 of Task T500, and Task T400. In this example, the state of the variable FRAME_ACTIVE indicates whether the current frame is active or inactive, and the state of the variable Y_VALID indicates whether the instruction set has been previously called (and thus indicates whether the value stored in the variable y_current is valid), and The value of the variable Gc indicates the predicted gain of the current frame.

若該指令集判定Y_VALID之值為FALSE(亦即，若該指令集係首次執行)，則將變數Gc_current初始化為變數Gc之當前值。將Gc之當前值與過去值之間的絕對差儲存至變數Gc_diff，且若此差大於臨限值，則應用兩個訊框之延遲。在第3部分中，僅在Gc_diff之值小於臨限值時才設定旗標p。If the instruction set determines that the value of Y_VALID is FALSE (ie, if the instruction set is first executed), the variable Gc_current is initialized to the current value of the variable Gc. The absolute difference between the current value of Gc and the past value is stored to the variable Gc_diff, and if the difference is greater than the threshold, the delay of the two frames is applied. In Part 3, the flag p is set only when the value of Gc_diff is less than the threshold.

陳述本文所述之邏輯實施例之特定實例以解釋本揭示案而非對其進行限制，且熟習此項技術者將易瞭解，替代性邏輯實施例包括在本揭示案之範疇中。舉例而言，在一情境中實施為經配置以僅在所有其輸入均為高時才產生活動高信號之"及"閘的選擇邏輯可在另一情境中實施為經配置以僅在所有其輸入均為低時才產生活動低信號之"或"閘。自第一值至第二值之遞減計數亦可實施為自第二值至第一值之遞增計數，且反之亦然。正或TRUE指示在一情境中可用二進位高值表示，而在另一情境中可用二進位低值表示。預期且由此揭示此等及其他實施性均等方式亦包括在本揭示案之範疇內。The specific examples of the logical embodiments described herein are set forth to explain the present disclosure and are not to be construed as limiting the scope of the invention. For example, the selection logic implemented in one context as a "and" gate configured to generate an active high signal only when all of its inputs are high may be implemented in another context to be configured only at all The OR gate of the active low signal is generated when the input is low. The countdown from the first value to the second value can also be implemented as an incremental count from the second value to the first value, and vice versa. A positive or TRUE indication can be represented by a binary high value in one context and a binary low value in another context. It is contemplated and thus disclosed that such and other embodiments are also within the scope of the present disclosure.

在上述實例中，假設頻譜傾斜值序列包括一系列連貫不活動訊框中之每一者之值。然而，亦預期方法M100及裝置A100可經實施以使得頻譜傾斜值序列包括少於一系列連貫不活動訊框中之每一者之一個值。舉例而言，該序列可包括該系列中之每隔一個訊框(或每隔兩個訊框等)之值。此序列可藉由忽略中間訊框或廢棄來自此等訊框之值而獲取，或藉由求每一對(三個等等)訊框之值的平均值而獲取。其他或另外，此等原理可應用於其他序列，諸如編碼增益量度值序列。In the above example, it is assumed that the sequence of spectral tilt values includes the value of each of a series of consecutive inactive frames. However, it is also contemplated that method M100 and apparatus A100 can be implemented such that the sequence of spectral tilt values includes less than one value for each of a series of consecutive inactive frames. For example, the sequence can include values for every other frame (or every two frames, etc.) in the series. This sequence can be obtained by ignoring the intermediate frame or discarding the values from such frames, or by averaging the values of each pair (three, etc.) frames. Other or additional, these principles can be applied to other sequences, such as a sequence of coding gain metric values.

熟習此項技術者將瞭解，資訊及信號可用多種不同技藝及技術之任一者來表示。舉例而言，可在整個上述描述中提及的資料、指令、命令、資訊、信號、位元及符號可由電壓、電流、電磁波、磁場或磁性粒子、光場或光學粒子或其任一組合表示。儘管自其獲得所產生之頻譜傾斜值序列的信號稱為"語音信號"，但是亦預期且由此揭示此信號亦可在活動訊框中載運音樂或其他非語音資訊內容。Those skilled in the art will appreciate that information and signals can be represented by any of a variety of different techniques and techniques. For example, the materials, instructions, commands, information, signals, bits, and symbols that may be referred to throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields, or magnetic particles, light fields, or optical particles, or any combination thereof. . Although the signal from which the resulting sequence of spectral tilt values is derived is referred to as a "speech signal," it is also contemplated and thus disclosed that the signal can also carry music or other non-speech information content in the active frame.

如本文所述之裝置A100之各種實施例的元件可製造為駐留於(例如)同一晶片上或晶片組之兩個或兩個以上晶片之間的電子及/或光學設備。此設備之一實例為固定或可程式化邏輯元件陣列，諸如電晶體或閘。如本文所述之裝置A100之各種實施例的一或多個元件亦可整個或部分地實施為一或多個指令集，該或該等指令集經配置以執行於一或多個固定或可程式化邏輯元件陣列上，諸如微處理器、嵌式處理器、IP核心、數位信號處理器、場可程式化閘陣列(FPGA)、特殊應用標準產品(ASSP)及特殊應用積體電路(ASIC)。Elements of various embodiments of apparatus A100 as described herein can be fabricated as electronic and/or optical devices residing, for example, on the same wafer or between two or more wafers of a wafer set. An example of such a device is an array of fixed or programmable logic elements, such as a transistor or gate. One or more elements of various embodiments of apparatus A100 as described herein may also be implemented in whole or in part as one or more sets of instructions that are configured to be executed in one or more fixed or Array of stylized logic elements such as microprocessors, embedded processors, IP cores, digital signal processors, field programmable gate arrays (FPGAs), application specific standard products (ASSPs), and special application integrated circuits (ASICs) ).

裝置A100之實施例之一或多個元件可能用於執行並非與該裝置之操作直接相關的任務或執行並非與該裝置之操作直接相關的其他指令集，諸如與嵌入有該裝置之設備或系統之另一操作相關的任務。裝置A100之實施例之一或多個元件亦可能具有共同結構(例如，用於在不同時間上執行程式碼之對應於不同元件之部分的處理器、經執行以在不同時間上執行對應於不同元件之任務的指令集，或在不同時間上對不同元件執行操作的電子及/或光學設備的配置)。在一個此實例中，平滑器130、計算器140及比較器150經實施為經配置以執行於同一處理器上之指令集。在另一此實例中，序列產生器120乃至語音編碼器(其可包括裝置A100)經實施為經配置以執行於彼處理器上之一或多個指令集。One or more elements of an embodiment of apparatus A100 may be used to perform tasks that are not directly related to the operation of the apparatus or to perform other sets of instructions that are not directly related to the operation of the apparatus, such as with a device or system in which the apparatus is embedded. Another operation related task. One or more of the elements of the apparatus A100 may also have a common structure (eg, a processor for executing portions of the code corresponding to different elements at different times, executed to perform at different times corresponding to different The set of instructions for the task of the component, or the configuration of electronic and/or optical devices that perform operations on different components at different times). In one such example, smoother 130, calculator 140, and comparator 150 are implemented as a set of instructions configured to execute on the same processor. In another such example, sequence generator 120 and even a speech encoder (which may include device A100) are implemented to be configured to execute on one or more sets of instructions on the processor.

提供所述組態之以上陳述以使熟習此項技術者能夠製造或使用本文所揭示之方法及其他結構。本文所展示並描述之流程圖及其他結構僅為實例，且此等結構之其他變型亦處於本揭示案之範疇內。對此等組態之多種修改係可能的，且本文所述之一般原理亦可應用於其他組態。The above statements of the configurations are provided to enable those skilled in the art to make or use the methods and other structures disclosed herein. The flowcharts and other structures shown and described herein are merely examples, and other variations of such structures are also within the scope of the present disclosure. Many modifications to these configurations are possible, and the general principles described herein can also be applied to other configurations.

本文所述之組態可部分或整個地實施為硬連線電路、實施為製造成特殊應用積體電路之電路組態，或實施為載入至非揮發性儲存器中之韌體程式或作為機器可讀碼而自資料儲存媒體載入或載入至資料儲存媒體之軟體程式，此機器可讀碼為可由邏輯元件陣列(諸如，微處理器或其他數位信號處理單元)執行的指令。資料儲存媒體可為儲存元件之陣列，該等儲存元件諸如，半導體記憶體(其可包括(但不限於)動態或靜態RAM(隨機存取記憶體)、ROM(唯讀記憶體)及/或快閃RAM)；或鐵電、磁電阻、雙向、聚合或相變記憶體；或諸如磁碟或光碟之碟片媒體。術語"軟體"應理解為包括原始碼、組合語言碼、機器碼、二進位碼、韌體、宏碼、微碼、可由邏輯元件陣列執行之任一個或多個指令集或指令序列及此等實例之任一組合。The configuration described herein may be implemented partially or entirely as a hardwired circuit, as a circuit configuration fabricated into a special application integrated circuit, or as a firmware loaded into a non-volatile memory or as The machine readable code is a software program loaded from a data storage medium or loaded into a data storage medium, the machine readable code being an instruction executable by an array of logic elements, such as a microprocessor or other digital signal processing unit. The data storage medium can be an array of storage elements, such as semiconductor memory (which can include, but is not limited to, dynamic or static RAM (random access memory), ROM (read only memory), and/or Flash RAM); or ferroelectric, magnetoresistive, bidirectional, polymeric or phase change memory; or disc media such as disk or optical disc. The term "software" shall be taken to include a source code, a combined language code, a machine code, a binary code, a firmware, a macro code, a microcode, any one or more instruction sets or sequences of instructions executable by an array of logic elements, and the like. Any combination of examples.

本文所揭示之方法亦可確實地(例如，在上文所列舉的一或多個資料儲存媒體中)體現為一或多個指令集，該或該等指令集可由包括邏輯元件陣列(例如，處理器、微處理器、微控制器或其他有限狀態機)之機器讀取及/或執行。因此，本揭示案並非意在限於上文所示之組態，而是意在符合與本文中以任一方式所揭示之原理及新穎特徵一致的最廣泛範疇，包括所申請之附加申請專利範圍，其中附加申請專利範圍形成原始揭示案之一部分。The methods disclosed herein may also be embodied (eg, in one or more of the data storage media listed above) as one or more sets of instructions, which may include an array of logic elements (eg, Machine reading and/or execution by a processor, microprocessor, microcontroller or other finite state machine. Therefore, the present disclosure is not intended to be limited to the configuration shown above, but is intended to be in accord with the broadest scope of the principles and novel features disclosed herein. The scope of the additional patent application forms part of the original disclosure.

熟習此項技術者將進一步瞭解到，結合本文所揭示之組態而描述之各種說明性邏輯區塊、模組、電路及操作可實施為電子硬體、電腦軟體或兩者之組合。此等邏輯區塊、模組、電路及操作可使用通用處理器、數位信號處理器(DSP)、ASIC、FPGA或其他可程式化邏輯設備、離散閘或電晶體邏輯、離散硬體組件或經設計以執行本文所述功能之其任一組合實施或執行。通用處理器可為微處理器，但在替代例中，處理器可為任何習知處理器、控制器、微控制器或狀態機。處理器亦可實施為計算設備之組合，例如，DSP與微處理器之組合、複數個微處理器之組合、一或多個微處理器與DSP核相結合之組合或任何其他此類組態。Those skilled in the art will further appreciate that the various illustrative logical blocks, modules, circuits, and operations described in connection with the configurations disclosed herein can be implemented as an electronic hardware, a computer software, or a combination of both. Such logic blocks, modules, circuits, and operations may use general purpose processors, digital signal processors (DSPs), ASICs, FPGAs, or other programmable logic devices, discrete gate or transistor logic, discrete hardware components, or The design is implemented or performed in any combination of the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. The processor can also be implemented as a combination of computing devices, such as a combination of a DSP and a microprocessor, a combination of a plurality of microprocessors, a combination of one or more microprocessors and a DSP core, or any other such configuration. .

本文所述之方法及演算法之任務可直接以軟體、以可由處理器執行之軟體模組或以該兩者之組合而實施。軟體模組可駐留於RAM記憶體、快閃記憶體、ROM記憶體、EPROM記憶體、EEPROM記憶體、暫存器、硬碟、抽取式磁碟、CD－ROM或此項技術中已知的任何其他形式之儲存媒體中。說明性儲存媒體耦接至處理器，以使得處理器可自該儲存媒體讀取資訊及將資訊寫入至該儲存媒體。在替代例中，儲存媒體可整合至處理器。處理器及儲存媒體可駐留於ASIC中。ASIC可駐留於使用者終端機中。在替代例中，處理器及儲存媒體可作為離散組件而駐留於使用者終端機中。The methods and algorithms described herein can be implemented directly in software, in a software module executable by a processor, or in a combination of the two. The software module can reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, scratchpad, hard disk, removable disk, CD-ROM, or known in the art. Any other form of storage media. The illustrative storage medium is coupled to the processor such that the processor can read information from the storage medium and write information to the storage medium. In the alternative, the storage medium can be integrated into the processor. The processor and the storage medium can reside in an ASIC. The ASIC can reside in the user terminal. In the alternative, the processor and the storage medium may reside as discrete components in the user terminal.

50,52．．．延遲邏輯電路50,52. . . Delay logic

62,64,66．．．控制信號產生器62,64,66. . . Control signal generator

72．．．傳輸指示控制電路72. . . Transmission indication control circuit

82．．．控制電路82. . . Control circuit

120,122．．．序列產生器120,122. . . Sequence generator

128,140,142．．．計算器128,140,142. . . Calculator

130,132,134,136．．．平滑器130,132,134,136. . . Smoother

150,152,154,156．．．比較器150,152,154,156. . . Comparators

A100,A101,A102,A200．．．裝置A100, A101, A102, A200. . . Device

G10,G20,F10,F12,F20,F22．．．增益因數G10, G20, F10, F12, F20, F22. . . Gain factor

T10,T20,T30,T40．．．臨限值T10, T20, T30, T40. . . Threshold

圖1A展示根據一組態之方法M100之流程圖。Figure 1A shows a flow chart of a method M100 in accordance with a configuration.

圖1B展示根據一組態之裝置A100之方塊圖。Figure 1B shows a block diagram of an apparatus A100 in accordance with a configuration.

圖1C展示方法M100之實施例M101之流程圖。1C shows a flow chart of an embodiment M101 of method M100.

圖1D展示裝置A100之實施例A101之方塊圖。1D shows a block diagram of an embodiment A101 of apparatus A100.

圖2展示平滑器130之實施例132之方塊圖。2 shows a block diagram of an embodiment 132 of smoother 130.

圖3展示一說明性實例，其中每一圓圈表示語音信號中隨著時間的一系列連續訊框中之一者。3 shows an illustrative example in which each circle represents one of a series of consecutive frames in a speech signal over time.

圖4展示計算器140之實施例142之方塊圖。4 shows a block diagram of an embodiment 142 of the calculator 140.

圖5展示比較器150之實施例152之方塊圖。FIG. 5 shows a block diagram of an embodiment 152 of comparator 150.

圖6展示比較器150之實施例154之方塊圖。FIG. 6 shows a block diagram of an embodiment 154 of comparator 150.

圖7A展示裝置A100之實施例A102之方塊圖。FIG. 7A shows a block diagram of an embodiment A102 of apparatus A100.

圖7B展示將若干不同的傳輸指示組合成複合傳輸指示的一實例。Figure 7B shows an example of combining several different transmission indications into a composite transmission indication.

圖8A展示可經執行以執行方法M100之一實施例之指令集的原始碼列表。FIG. 8A shows a list of source code that can be executed to execute an instruction set of an embodiment of method M100.

圖8B展示可經執行以執行方法M100之另一實施例之指令集的原始碼列表。FIG. 8B shows a list of source code that can be executed to execute an instruction set of another embodiment of method M100.

圖9展示包含方法M101與語音編碼方法之組合之方法的流程圖。9 shows a flow chart of a method including a combination of method M101 and a speech encoding method.

圖10展示包含裝置A101與語音編碼器之組合之裝置的方塊圖。Figure 10 shows a block diagram of an apparatus comprising a combination of apparatus A 101 and a speech coder.

圖11A展示方法M100之實施例M200之流程圖。11A shows a flowchart of an embodiment M200 of method M100.

圖11B展示裝置A100之實施例A200之流程圖。11B shows a flowchart of an embodiment A200 of apparatus A100.

圖12A展示方法M101之實施例M110之流程圖。Figure 12A shows a flow diagram of an embodiment M110 of method M101.

圖12B展示方法M200之實施例M210之流程圖。FIG. 12B shows a flowchart of an embodiment M210 of method M200.

圖12C展示方法M101之實施例M120之流程圖。Figure 12C shows a flow chart of an embodiment M120 of method M101.

圖12D展示方法M200之實施例M220之流程圖。12D shows a flowchart of an embodiment M220 of method M200.

圖13A及圖13B分別展示在應用及不應用延遲之情況下的經平滑頻譜傾斜輪廓之實例。Figures 13A and 13B show examples of smoothed spectral tilt profiles, respectively, with and without application delay.

圖14展示可經執行以執行方法M100之另一實施例之指令集的原始碼列表。14 shows a source code listing of an instruction set that can be executed to perform another embodiment of method MlOO.

圖15展示延遲邏輯電路之實例之方塊圖。Figure 15 shows a block diagram of an example of a delay logic circuit.

圖16A展示平滑器132之實施例134之方塊圖。FIG. 16A shows a block diagram of an embodiment 134 of smoother 132.

圖16B展示平滑器132之實施例136之方塊圖。16B shows a block diagram of an embodiment 136 of smoother 132.

圖17A展示控制信號產生器60之一實例62之方塊圖，其中實例62經組態以基於預測增益而產生更新控制信號。17A shows a block diagram of an example 62 of control signal generator 60, where instance 62 is configured to generate an update control signal based on the predicted gain.

圖17B展示控制信號產生器62之一實例64之方塊圖，其中實例64經組態以應用延遲。17B shows a block diagram of an example 64 of control signal generator 62, where instance 64 is configured to apply a delay.

圖18展示控制信號產生器64之實施例66之方塊圖，其中實施例66亦包括延遲邏輯電路52。18 shows a block diagram of an embodiment 66 of control signal generator 64, wherein embodiment 66 also includes delay logic circuit 52.

圖19A展示傳輸指示控制電路70之一實例72之方塊圖。19A shows a block diagram of an example 72 of a transmission indication control circuit 70.

圖19B展示比較器152之實施例156之方塊圖。19B shows a block diagram of an embodiment 156 of comparator 152.

圖20展示控制電路80之一實例82之方塊圖，其中實例82經組態以產生更新控制信號並閘控SID傳輸指示。20 shows a block diagram of an example 82 of control circuit 80, where instance 82 is configured to generate an update control signal and gate the SID transmission indication.

圖21展示可經執行以執行方法M100之另一實施例之指令集的原始碼列表。21 shows a source code listing of an instruction set that can be executed to perform another embodiment of method MlOO.

(無元件符號說明)(no component symbol description)

Claims

A method for processing a speech signal, the method comprising: generating a sequence of spectral tilt values based on a plurality of inactive frames of the speech signal, wherein the sequence of spectral tilt values comprises a sequence of reflection coefficients, wherein the spectral tilt values are Each of the at least one reflection coefficient of the inactive frame corresponding to one of the voice signals, the at least one reflection coefficient comprising a first reflection coefficient of the corresponding inactive frame or the corresponding inactive frame At least one of a second reflection coefficient; calculating a change between at least two values of the spectral tilt values based on the reflection coefficient; and determining whether one of the plurality of inactive frames is an inactive frame A description of one of the frames is transmitted, wherein the description of whether to transmit the frame is based on the calculated change.

A method of processing a speech signal according to claim 1, wherein the generating the sequence of spectral tilt values comprises smoothing another sequence of spectral tilt values to generate the sequence of spectral tilt values, wherein each of the spectral tilt values of the another sequence The indicator indicates that one of the plurality of inactive frames is spectrally tilted.

A method of processing a speech signal according to claim 1, wherein each of the plurality of spectral tilt values is based on another spectral tilt value in the sequence of spectral tilt values.

A method for processing a speech signal according to claim 1, wherein each of the plurality of spectral tilt values is based on (A) one of the plurality of inactive frames One of the corresponding ones is spectrally tilted and (B) is another spectral tilt value in the sequence of spectral tilt values.

A method of processing a speech signal according to claim 1, wherein the calculated change is based on a difference between consecutive values in the sequence of spectral tilt values.

A method of processing a speech signal according to claim 1, wherein the calculating the change comprises calculating a distance between adjacent values in the sequence of spectral tilt values.

A method of processing a speech signal of claim 1, wherein the determining whether to transmit the frame comprises comparing the calculated change to a threshold.

A method of processing a speech signal according to claim 1, wherein the result of the description of whether to transmit the frame is based on (A) a value of the calculated change and (B) a threshold value a relationship.

A method for processing a voice signal according to claim 1, wherein the method includes transmitting a silence description if a result of the determining whether to transmit the frame is a decision to transmit the description of the frame, The silence description includes at least one of a spectral envelope description and an energy envelope description.

A method of processing a speech signal according to claim 9, wherein the method comprises: (A) a spectral envelope description of each of the plurality of inactive frames and (B) an energy of each of the plurality of inactive frames The silence description is calculated by at least one of the envelope descriptions.

A method of processing a speech signal according to claim 1, wherein the description of whether to transmit the frame is based on at least one of the following: (A) a vector describing the spectral envelope of one of the frames, (B) a residual energy of the frame, (C) a description of one of the inactive frames, a time distance of the most recent transmission, (D) The time distance to one of the most recent active frames, (E) one of the energy envelopes of the frame, (F) the average absolute value of one of the frames, and (G) the energy value of one of the frames.

A method for processing a voice signal according to claim 11, wherein the method includes transmitting a silence description if a result of the determining whether to transmit the frame is a decision to transmit the description of the frame, The silence description includes at least one of a spectral envelope description and an energy envelope description.

The method of claim 1, wherein the determining whether to transmit the frame comprises determining not to transmit the frame in response to detecting that one of the coding gain metrics changes by more than a threshold. description.

A method of processing a speech signal according to claim 13, wherein each value of the coding gain metric is based on a value of a plurality of reflection coefficients of one of the speech signals corresponding to the inactive frame.

A method of processing a speech signal according to claim 1, wherein the method comprises calculating at least one of the spectral tilt value and the spectral tilt value sequence for each of the plurality of the spectral tilt values in the sequence of spectral tilt values a change between one of the other spectral tilt values, and wherein the method includes determining, for each of the plurality of inactive frames of the voice signal, whether to transmit the frame, and wherein, for the other For each of a plurality of inactive frames, a result of the determination to determine whether to transmit the frame is based on at least one of the calculated changes.

The method of claim 15 for processing a voice signal, wherein, for at least some of the inactive frames in the other plurality of inactive frames, a result of the determining whether to transmit the frame is not transmitted. The description of the description of the frame.

The method of claim 15 for processing a voice signal, wherein, for each of the other plurality of inactive frames, the determining whether to transmit the frame comprises responding to detecting a coding gain metric. The description of the frame is not transmitted as soon as the change exceeds a threshold.

A method of processing a speech signal according to claim 17, wherein for each of the other plurality of inactive frames, the change in a coding gain metric is based on (A) the speech signal is before the frame a value of the coded gain metric of a first inactive frame and (B) the second inactive frame of the voice signal that precedes the frame and is different from one of the first inactive frames A value that encodes the gain metric.

The method of claim 1 for processing a speech signal, wherein the generating the sequence of spectral tilt values comprises at least some inactive frames for the plurality of inactive frames, according to the inactive frame and one of the voice signals A time interval between the active frames produces a spectral tilt value corresponding to one of the sequence of spectral tilt values.

The method of claim 19, wherein the generating a spectral tilt value corresponding to one of the sequence of spectral tilt values comprises the time between the inactive frame and a previous active frame of the voice signal. When the distance is less than a threshold, the spectral tilt value is set to one of the previous spectral tilt values of the sequence of spectral tilt values.

The method of claim 1 for processing a speech signal, wherein the generating the sequence of spectral tilt values comprises at least some inactive frames for the plurality of inactive frames, and calculating according to a coding gain metric of one of the inactive frames One of the spectral tilt value sequences corresponds to a spectral tilt value.

A method for processing a speech signal according to claim 1, wherein the generating the sequence of spectral tilt values comprises, for at least one of the sequence of spectral tilt values, in response to detecting that one of the coding gain metrics changes by more than a threshold value. The spectral tilt value is set to one of the previous spectral tilt values of the sequence of spectral tilt values.

The method of claim 1, further comprising: combining the plurality of transmission indications into a composite transmission indication, wherein each transmission indication is generated from a different blanking algorithm; and determining whether based on the composite transmission indication Transmit a description of one of the inactive frames.

A non-transitory computer readable medium, the medium comprising a plurality of instructions for causing at least one computer to: generate a sequence of spectral tilt values of a plurality of inactive frames based on a speech signal, wherein the spectral tilt value The sequence includes a sequence of reflection coefficients, wherein each of the spectral tilt values is based on at least one reflection coefficient of the inactive frame corresponding to one of the voice signals, the at least one reflection coefficient including the corresponding inactive frame At least one of a first reflection coefficient or a second reflection coefficient of the corresponding inactive frame; calculating at least two values of the spectral tilt values based on the reflection coefficient a change between the two; and an inactive frame for the one of the plurality of inactive frames and determining whether to transmit a description of the frame based on the calculated change.

The computer readable medium of claim 24, wherein the instructions for causing the at least one computer to generate the sequence of spectral tilt values are configured such that the at least one computer is based on another spectral tilt value in the sequence of spectral tilt values Each of the plurality of spectral slope values is generated.

The computer readable medium of claim 24, wherein the instructions for causing the at least one computer to calculate the change are configured to cause the at least one computer to base a difference between consecutive values in the sequence of spectral tilt values Calculate the change.

The computer readable medium of claim 24, wherein the instructions for causing the at least one computer to determine whether to transmit the description of the frame are configured to cause the at least one computer to be based on (A) one of the calculated changes The relationship between the magnitude and (B) a threshold determines whether or not to transmit the description of the frame.

The computer readable medium of claim 24, wherein the instructions for causing the at least one computer to determine whether to transmit the description of the frame comprise causing the at least one computer to respond to a threshold of a coding gain exceeding a threshold One of the changes determines that several instructions of the description of the frame are not transmitted.

The computer readable medium of claim 24, wherein the instructions for causing at least one computer to calculate a change are configured to cause the at least one computer to target a plurality of the spectral tilt values in the sequence of spectral tilt values Calculating the change between the spectral tilt value and at least one other spectral tilt value in the sequence of spectral tilt values, and The instructions for causing at least one computer to determine whether to transmit the description of the frame are configured to cause the at least one computer to determine whether to transmit for each of the other plurality of inactive frames of the voice signal The description of the frame, and wherein the instructions for causing the at least one computer to determine whether to transmit the description of the frame are configured to cause transmission for each of the other plurality of inactive frames The determination of the description of the frame is based on at least one of the calculated changes.

The computer readable medium of claim 24, wherein the instructions for causing the at least one computer to generate the sequence of spectral tilt values include at least some inactive frames for the at least one computer to target the plurality of inactive frames And generating, according to a time distance between the inactive frame and the previous active frame of the one of the voice signals, a plurality of instructions for generating a spectral tilt value corresponding to one of the sequence of spectral tilt values.

The computer readable medium of claim 24, wherein the instructions for causing the at least one computer to generate the sequence of spectral tilt values are configured to cause the at least one computer to respond to the at least one of the sequence of spectral tilt values A spectral gain metric is detected to change beyond a threshold to set the spectral tilt value to a previous spectral tilt value in the sequence of spectral tilt values.

The computer readable medium of claim 24, wherein the instructions for causing the at least one computer to generate the sequence of spectral tilt values are configured to cause the at least one computer to smooth another sequence of spectral tilt values to generate the sequence of spectral tilt values , Each of the spectral tilt values of the another sequence indicates that one of the plurality of inactive frames is spectrally tilted.

An apparatus for processing a speech signal, the apparatus comprising: a sequence generator configured to generate a sequence of spectral tilt values of a plurality of inactive frames based on the speech signal, wherein the sequence of spectral tilt values comprises a sequence of reflection coefficients, wherein each of the spectral tilt values is based on at least one reflection coefficient of the inactive frame corresponding to one of the voice signals, the at least one reflection coefficient including one of the corresponding inactive frames a reflection coefficient or at least one of a second reflection coefficient of the corresponding inactive frame; a calculator configured to calculate a value between the at least two values of the spectral tilt values based on the reflection coefficient And a comparator configured to determine whether to transmit a description of the frame based on the one of the plurality of inactive frames and based on the calculated change.

The apparatus of claim 33 for processing a voice signal, wherein the comparator is configured to determine based on a relationship between (A) the calculated magnitude of the change and (B) a threshold value. Whether to transmit the description of the frame.

A device for processing a voice signal according to claim 33, wherein the device comprises a wireless communication device, the device comprising the sequence generator, the calculator and the comparator, and wherein the device is configured to respond to the A silence description is transmitted by the comparator to transmit the description of the frame, the silence description including at least one of a spectral envelope description and an energy envelope description.

The apparatus of claim 33 for processing a voice signal, wherein the comparator is configured to determine not to transmit the description of the frame in response to a change in a coding gain metric exceeding a threshold.

The apparatus of claim 33 for processing a speech signal, wherein the calculator is configured to calculate the spectral tilt value and the spectrum for each of the plurality of spectral tilt values in the sequence of spectral tilt values a change between at least one other spectral tilt value in the sequence of skew values, and wherein the comparator is configured to decide whether to transmit the frame for each of the other plurality of inactive frames of the voice signal The description, and wherein the comparator is configured such that for each of the other plurality of inactive frames, the determination of whether to transmit the description of the frame is based on the calculated changes At least one.

The apparatus of claim 33 for processing a voice signal, wherein the sequence generator is configured to act on the inactive frame and the voice signal for at least some of the inactive frames of the plurality of inactive frames A time interval between the previous active frames produces a spectral tilt value corresponding to one of the sequence of spectral tilt values.

An apparatus for processing a speech signal according to claim 33, wherein the sequence generator is configured to respond to detecting that one of the coding gain metrics changes by more than one threshold for at least one of the sequence of spectral tilt values The value is used to set the spectral tilt value to the previous spectral tilt value of the sequence of spectral tilt values.

An apparatus for processing a speech signal as claimed in claim 33, wherein the sequence generator is configured to generate the sequence by smoothing another sequence of spectral tilt values A sequence of spectral tilt values, wherein each of the spectral tilt values of the another sequence indicates a spectral tilt of one of the plurality of inactive frames.

An apparatus for processing a speech signal, the apparatus comprising: means for generating a sequence of spectral tilt values of a plurality of inactive frames based on the speech signal, wherein the sequence of spectral tilt values comprises a sequence of reflection coefficients, wherein Each of the spectral tilt values is based on at least one reflection coefficient of the inactive frame corresponding to one of the voice signals, the at least one reflection coefficient including a first reflection coefficient of the corresponding inactive frame or the corresponding At least one of a second reflection coefficient of one of the inactive frames; a means for calculating a change between the at least two values of the spectral tilt values based on the reflection coefficient; and for inactive for the plurality of One of the frames is inactive and determines whether to transmit the component described by one of the frames based on the calculated change.

An apparatus for processing a voice signal according to claim 41, wherein the apparatus includes means for transmitting a silence description in response to a decision made by the means for determining the component to transmit the description of the frame, The silence description includes at least one of a spectral envelope description and an energy envelope description.

An apparatus for processing a speech signal according to claim 41, wherein the means for generating the sequence of spectral tilt values is configured to act on the at least some inactive frames of the plurality of inactive frames according to the inactivity Frame and A temporal distance between one of the previously active frames of the speech signal produces a spectral tilt value corresponding to one of the sequence of spectral tilt values.

An apparatus for processing a speech signal of claim 41, wherein the means for generating the sequence of spectral tilt values is configured to respond to detecting a coding gain metric for at least one of the sequence of spectral tilt values One of the changes exceeds a threshold to set the spectral tilt value to one of the previous spectral tilt values of the sequence of spectral tilt values.

An apparatus for processing a speech signal according to claim 41, wherein the means for generating the sequence of spectral tilt values is configured to generate the sequence of spectral tilt values by smoothing another sequence of spectral tilt values, wherein the other Each of the sequence of the spectral tilt values indicates that one of the plurality of inactive frames is spectrally tilted.

A method for processing a speech signal, the method comprising: generating, by a sequence generator of a computer, a sequence of spectral tilt values of a plurality of inactive frames based on the speech signal, wherein the sequence of spectral tilt values comprises a reflection coefficient a sequence, wherein each of the spectral tilt values is based on at least one reflection coefficient of an inactive frame corresponding to one of the voice signals, the at least one reflection coefficient including a first reflection coefficient of the corresponding inactive frame Or at least one of the second reflection coefficients of the corresponding inactive frame; calculating, by the calculator of the computer, a change between the at least two values of the spectral tilt values based on the reflection coefficient; A comparator of the computer determines whether to transmit a description of the frame for one of the plurality of inactive frames. The determining whether to transmit the frame is based on the calculated change, and wherein the generating the sequence of spectral tilt values comprises at least some inactive frames for the plurality of inactive frames, according to the inactive message A time interval between the frame and one of the previously active frames of the speech signal produces a spectral tilt value corresponding to one of the sequence of spectral tilt values.