TW454166B

TW454166B - Sub-band plus mute speech coding system

Info

Publication number: TW454166B
Application number: TW84111336A
Authority: TW
Inventors: Tz-Jing Chiue
Original assignee: Utron Technology Inc
Priority date: 1995-10-24
Filing date: 1995-10-24
Publication date: 2001-09-11

Abstract

A sub-band coding system for storing and reproducing a speech subdivides a speech into sub-bands. In each sub-band, signals below a threshold level is considered as mute, and that above the threshold level is considered as non-mute. Upon modulation (e.g. by adaptive differential pulse-code modulation), the mute signal is converted into mute data to represent the starting time and duration of the mute period, while the non-mute signal is converted into non-mute data to represent signal levels. During replay, the mute data and the non-mute data are combined before conversion into different sub-band analog signals. The different sub-band analog signals are then summed to reproduce the original voice.

Description

經濟部中央標準局員工消費合作社印製 454166 B；五、發明説明（/) 本發明係關於以次頻帶編碼與靜音之技術，達成聲音信號的儲存和再生的技術。一般語音的即時信號中’通常都包含有一些靜默的時段，這些_默的時段一般約佔全部時間的1 2〇%到4〇%。這樣的信號可以轉系成^ 數声信號，以利辦位」傳輸與再生。習知的信號處理的方法有碼調變(PCM)、·異變(delta modulation)、與調適差異脈衝:^_變 ((ADPCM)等。使用這些調變的方法時，必需以不同的^間間隔，對語音信號取樣，並量化爲許多的電位。 . 這些習知技術的好處是它們的控制電路相當簡單，但缺點是需要很大的記億體。因此’如何降低所需的記憶體，又不致犧牲聲音的品質，一直到現在仍是人們所關切的硏究主題。Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 454166 B; 5. Description of the invention (/) The invention relates to the technology of sub-band coding and mute to achieve the storage and reproduction of sound signals. In general, the instant signal of speech generally includes some silent periods, and these periods of silence generally occupy about 120% to 40% of the total time. Such a signal can be converted into a digital audio signal to facilitate transmission and reproduction. Conventional signal processing methods include code modulation (PCM), delta modulation, and differential pulses: ^ _change ((ADPCM), etc.) When using these modulation methods, different ^ must be used. Interval, sample the speech signal, and quantify it into many potentials. The advantage of these conventional techniques is that their control circuit is quite simple, but the disadvantage is that it requires a lot of memory, so 'how to reduce the required memory , Without sacrificing the quality of the sound, is still the research theme of concern until now.

Wan等人在4,7〇_1，937號美國專利中提出一種固定圖樣(pattern)信號儲存與再生的系統。其中信號的靜默時段並不加以儲存，而是重新產生。靜默時段的信號是以脈衝碼調變技術加以調變，而在靜默時段中’儲存記憶體的解碼器並未啓動，但是在其他時段，儲存記憶體則被啓動，因此需要很大的儲存容量。兩個時段所需的儲存容量都需要加以降低。發明的簡要說明本發明的一個目的’是要降低聲音儲存與再生系統所需的儲存記憶總容量。 ..在一個次頻帶編碼系統中的各個次頻帶中，加入靜音時段，即可達到上述的目的。先設訂一個臨界電位，然後將低於該臨界電位的信 5，轉變座爲靜音資料，並且以靜音時段的起始時間與持續期間表示之，而高於該臨界電位的信號，則轉變成爲以信號電位表示的非靜音資料。不論靜音資料或非靜音資料，都儲存在記憶體中。既然靜音時段的信號電位是零，就不需要儲存這樣的電位，因此以次頻帶編碼技術加上不儲存靜音時段的信號電位，比起單利用高低次頻帶編碼的技術而言，可以省下更多的記憶容量。 (請先閲讀背面之注意事項再填寫本頁) 裝· 訂 1 2 本紙張尺度適用中國國家橾隼（CMS > A4規格（210X297公釐） -----————-— 歴議觸要說明 ‘1(a)說明一個聲音信號典型的時域波形。 _l(b)說明對應於圖1(a)之信號的續譜。圖2說明本發明的次頻帶與靜音系統。圖3說明以調適差異脈衝碼調變技術實施本發明時，所需的編碼器》圖4說明以調適差異脈衝碼調變技術實施本發明時’，所需的解碼器。 gg奮施例的詳細說明次頻帶編碼技術的原則是：在一個聲音的頻率響應中，/頻譜並不是均勻的分佈。一般來說，中頻範圍的_會大於高頻範圍，因此當語音信號轉變^爲數位信號時，高__圍可以用比中頻範圍更少的位元數來表示。這個原則可以甩數學來解釋。假設沖頻信號頻寬爲BW1，位元數爲 Bitl，而高頻信號頻寬爲BW2，位元數爲Bit2。那麼，所需的記憶量Ms若以資料速率(datarate)表示，就等於：Wan et al., U.S. Patent No. 4,70-1,937, propose a system for storing and reproducing fixed pattern signals. The silent periods of the signals are not stored but regenerated. The signal of the silent period is modulated by pulse code modulation technology. During the silent period, the decoder of the storage memory is not activated, but during other periods, the storage memory is activated, so a large storage capacity is required. . The storage capacity required for both periods needs to be reduced. BRIEF DESCRIPTION OF THE INVENTION An object of the present invention is to reduce the total storage memory capacity required for a sound storage and reproduction system. .. In each sub-band of a sub-band coding system, adding the mute period can achieve the above purpose. First set a critical potential, and then convert the letter 5 below the critical potential into mute data, which is expressed by the start time and duration of the mute period, and the signal above the critical potential is transformed into Non-silent data in terms of signal potential. Both silent and non-silent data are stored in memory. Since the signal potential of the mute period is zero, there is no need to store such a potential. Therefore, the sub-band coding technology plus the signal potential of the mute period is not stored. More memory capacity. (Please read the precautions on the reverse side before filling out this page) Binding · Order 1 2 This paper size is applicable to the Chinese country (CMS > A4 size (210X297 mm) -----————-— Note 1 (a) illustrates a typical time-domain waveform of a sound signal. _L (b) illustrates the continuum of the signal corresponding to Figure 1 (a). Figure 2 illustrates the sub-band and mute system of the present invention. Figure 3 Describe the encoder required to implement the present invention using the adaptive differential pulse code modulation technique "Fig. 4 illustrates the decoder required when implementing the present invention using the adaptive differential pulse code modulation technique. Gg The principle of sub-band coding technology is: In the frequency response of a sound, the / spectrum is not evenly distributed. Generally speaking, the _ in the intermediate frequency range will be greater than the high frequency range, so when the voice signal is transformed into a digital signal, The high __ range can be expressed in fewer bits than the IF range. This principle can be explained mathematically. Assume that the frequency of the impulse signal is BW1, the number of bits is Bitl, and the frequency of the high-frequency signal is BW2. , The number of bits is Bit2. Then, the required memory amount Ms is Data Rate (datarate) said that it is equal to:

Ms=BWl XBitl +BW2xBit2 (1) 其中Bit2<Bitl ; 如果不使用次頻帶編碼技術，系統所需的記憶量Μ就等於： M=BW1 XBitl+BW2 XBitl (2) 這是因爲不論中頻範圍或高頻範圍，都需使用最差狀況下的位元數 Bitl。當使用次頻帶編碼技術時，所需的記憶量Ms會小於Μ，因爲 Bit2<Bitl ° 經濟部中央標隼局員工福利委員會印製在雙頻帶次頻帶編碼系統中，靜默的靜音時段會出現在中頻頻帶與高頻頻帶中，這些靜音時段佔據了相當可觀的記憶空間。 J _ … 4,701，937_美國專利曾提到：因爲靜音時段的出現，並不需要儲存此靜音時段的零信號資料，因此可以節省記憶容量。在本發明中，將一個語音分爲二個或二個以上的頻帶，比方：一個高頻頻帶與一個低頻頻帶。在各頻帶中都有一個臨界電位，低於該値的信號就稱爲「次頻帶編碼靜^1」；在次頻帶編碼靜音時段，零信號並不儲存在記憶體中。本紙張尺度適用中國國家標隼（CNS )八4規格U10X297公釐） 4 經濟部中央標準局員工福利委員會印製 5 4 _2說明了基本的實施例。首先利用「低通濾波器」（l〇W-pass filter)LPl與「高通濾波器」（high-pass filter)HPl，將一個聲音分爲低頻範圍與高頻範圍。LP1傳送出的低頻信號與HP1傳送出的高頻信號，再進一步經由臨界靜音偵測器MDO、MD1分別加以偵測。偵測後的信號，就通往一個記憶、管理單元MMU，以決定高於臨界電位的信號，應儲存在記憶體RAM中。 · 在本發明中，至少包含二個次頻帶，也就是高頻頻帶與低頻頻帶，各自再分爲靜音資料與非靜音資料。圖說明了一個聲音信號的時域波形，而圖1(b)說明了相對應的頻譜。由圖1(a)可以看出，有一些沒有信號的靜音時段，標爲「A」，還有信號很小由時段，標爲「Β」。在本發明中，如果在「Β」時段中的信號，低於某一臨界電位，就視爲靜音。相對的聲譜(spectral)響應，會比圖1(b)中呈現空白間隙的靜音響應更窄。在這些靜音時段，信號電位爲零，不需要將這些零信號轉變成爲數位資料，而儲存在記憶體中。這些靜音時段的數位資料，稱爲「靜音資料」，只以「靜音時段的起始時間與持續時間」代表之' 所需的記憶容量M3，可以由以下的式子算出： M3 = BW0XBit0 + BWl'XBitl+BW2XBit2 (3) 其中，BWO與BitO是靜音時段的頻帶寬與位元數，BW11與Bitl是中頻頻寬的頻帶寬與位元數。既然靜音時段的位元數是零，則 M3 = BWl'XBitl+BW2XBit2 (4) 式子(4)中的M3，會小於式子(1)中的Ms，因爲頻寬BW1·小於BW1。因此本發明的次頻帶頻帶與靜音系統所需的總記憶空間，會小於傳統的高低次頻帶編碼系統，也小於Wan等人所提出不儲存靜默時段的系統。圖3和圖4分別說明利用調適差異脈衝編碼調變法，進行編碼與解碼的架構(scheme)。這個架構與圖1不同，因爲高頻次頻帶與低頻次頻帶都包括靜音資料與非靜音資料。在圖3說明的編碼過程中，輸入信號由「帶通」（band-pass)濾波器BP0、BP1、...BPN，將它分爲N個頻帶。濾過的信號由調變器MO、Μ卜…MN加以調變，使轉變成爲數位信號。數位信號再以靜音偵測器MDO、MD1、…MDN偵測，將低本紙張尺度適用中國國家標準（CNS〉Α4規格（2丨0父29今公釐） 4 5 4 1 6 6 h3 於臨界電位的信號予以消除；將高於臨界電位的信號，以調適差異脈衝編碼調變器ADPCM()、ADPCM1、...ADPC1^分別加以編碼’ 使轉變成爲非靜音數位資料。經調適差異脈衝編碼過的資料’就儲存在記憶元件中，但不儲存靜音時段的零電位資料；低於臨界電位的信號，只儲存起始時間與持續時間等資訊’作爲靜音資料。在圖4的解碼器中，記憶體RAM所儲存不同次頻帶的資料，分別送入不同的ADPCM解碼器DECODEO、DECODE1、…DECPDEN。解碼器傳出的信號，再送入分別的靜音產生器MG0、MG1、...MGN，以插入靜音時段的零電位資料。所插入的靜音資料包括靜音時段的起始時間與持續時間。然後再送入解調裝置DMO、DM1、…DMN，將包含靜音時段資料在內的資料，轉變成爲數位信號，然後再送入到帶通濾波器BPO、BP1、...BPN。最後，數位信號在一個合成單元 S中混合，而產生輸出的聲音信號。以上的分析，是以雙次頻帶之架構作爲說明範例。但是同一行業之人士應瞭解，這個架構也適用於包含二個次頻帶以上的架構。雖然圖3所用的編碼器，與圖4所用的解碼器，都是ADPCM系統，但是苴他調變系統也一樣適用。前述範例之說明只是用以作爲舉例說明^ 已，此並非用以限制本案申請人之權利範圍。對於同一行業人士而言，凡是應用本發明技藝於其他型態的均等技藝之應用或是轉用，無論其所完成之產品之功能更佳或是稍差，均爲本案發明人所主張之權利範圍。經濟部中央標準局員工福利委負會印製纽轉公 7 9 2Ms = BWl XBitl + BW2xBit2 (1) where Bit2 <Bitl; If the sub-band coding technology is not used, the amount of memory M required by the system is equal to: M = BW1 XBitl + BW2 XBitl (2) This is because regardless of the IF range or In the high-frequency range, the worst-case number of bits, Bitl, is required. When using sub-band coding technology, the required memory amount Ms will be less than M, because the Bit2 < Bitl ° Employee Welfare Committee of the Central Bureau of Standards of the Ministry of Economic Affairs prints it in a dual-band sub-band coding system. The silent mute period will appear in In the IF and HF bands, these silent periods occupy considerable memory space. J_… 4,701,937_ US patent mentioned that because of the emergence of the silent period, there is no need to store the zero signal data of this silent period, so the memory capacity can be saved. In the present invention, a voice is divided into two or more frequency bands, for example: a high frequency band and a low frequency band. There is a critical potential in each frequency band. Signals below this threshold are called “subband coding quiet ^ 1”; during the subband coding silence period, the zero signal is not stored in the memory. This paper size applies to China National Standards (CNS) 8-4 U10X297 mm) 4 Printed by the Staff Welfare Committee of the Central Standards Bureau of the Ministry of Economic Affairs 5 4 _2 illustrates the basic embodiment. First, a "low-pass filter" (LPW) and a "high-pass filter" (HPl) are used to divide a sound into a low-frequency range and a high-frequency range. The low-frequency signals transmitted by LP1 and the high-frequency signals transmitted by HP1 are detected by the critical silence detectors MDO and MD1, respectively. After detection, the signal leads to a memory and management unit MMU to determine the signal above the critical potential, which should be stored in the memory RAM. In the present invention, at least two sub-bands, that is, a high-frequency band and a low-frequency band, are respectively divided into mute data and non-mute data. The figure illustrates the time-domain waveform of a sound signal, and Figure 1 (b) illustrates the corresponding frequency spectrum. It can be seen from Fig. 1 (a) that there are some silent periods with no signal, marked with "A", and there are periods with small signal, marked with "B". In the present invention, if the signal in the "B" period is lower than a certain critical potential, it is regarded as mute. The relative spectral response is narrower than the silent response shown in Figure 1 (b) with a blank gap. During these silent periods, the signal potential is zero. These zero signals do not need to be converted into digital data and stored in memory. The digital data of these silent periods is called "silent data", which is only represented by "the start time and duration of the silent period". The required memory capacity M3 can be calculated by the following formula: M3 = BW0XBit0 + BWl 'XBitl + BW2XBit2 (3) Among them, BWO and BitO are the frequency bandwidth and number of bits during the mute period, and BW11 and Bitl are the frequency bandwidth and number of bits for the IF bandwidth. Since the number of bits in the silent period is zero, M3 = BWl'XBitl + BW2XBit2 (4) M3 in equation (4) will be smaller than Ms in equation (1) because the bandwidth BW1 · is less than BW1. Therefore, the total memory space required for the sub-band frequency band and the mute system of the present invention will be smaller than the conventional high- and low-frequency-band coding system, and smaller than the system proposed by Wan et al. That does not store the silent period. Figures 3 and 4 illustrate the schemes for encoding and decoding using the adaptive differential pulse coding modulation method, respectively. This architecture differs from Figure 1 because both the high and low frequency subbands include mute and non-mute data. In the encoding process illustrated in Figure 3, the input signal is divided into N frequency bands by "band-pass" filters BP0, BP1, ..., BPN. The filtered signals are modulated by the modulators MO, Mb, ... MN, so that they are converted into digital signals. The digital signal is detected by the mute detectors MDO, MD1, ..., MDN, and the low paper size is applied to the Chinese national standard (CNS> Α4 specification (2 丨 0 father 29 today mm) 4 5 4 1 6 6 h3 is critical The potential signals are eliminated; the signals above the critical potential are adapted to the differential pulse code modulators ADPCM (), ADPCM1, ... ADPC1 ^, respectively, to be converted into non-silent digital data. The adaptive differential pulse code is adapted. Passed data is stored in the memory element, but does not store the zero-potential data of the mute period; signals below the critical potential, only the information such as the start time and duration are stored as mute data. In the decoder of Figure 4 The data of different sub-bands stored in the memory RAM are sent to different ADPCM decoders DECODEO, DECODE1, ... DECPDEN. The signals from the decoder are then sent to the respective mute generators MG0, MG1, ... MGN, The zero-potential data of the mute period is inserted. The inserted mute data includes the start time and duration of the mute period. Then it is sent to the demodulation devices DMO, DM1, ... DMN, which will include the mute period information. The included data is converted into digital signals, and then sent to the band-pass filters BPO, BP1, ... BPN. Finally, the digital signals are mixed in a synthesis unit S to produce an output sound signal. From the above analysis, The dual sub-band architecture is used as an example for illustration. However, people in the same industry should understand that this architecture is also applicable to architectures with more than two sub-bands. Although the encoder used in Figure 3 and the decoder used in Figure 4 are both It is an ADPCM system, but other modulation systems are also applicable. The description of the foregoing example is only used as an example ^ This is not intended to limit the scope of the applicant's rights. For those in the same industry, anyone who applies the present invention The application or transfer of skills to other types of equal skills, whether the products they perform are better or worse, are the scope of rights claimed by the inventor of this case. Staff Welfare Committee, Central Standards Bureau, Ministry of Economic Affairs Will print New Zealand 7 9 2

Claims

Printed by the Consumer Standards Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs. Patent application scope 1. A voice encoding and decoding system using both sub-band and mute at the same time, comprising: more than one frequency filtering device for filtering the aforementioned sound signals into different frequencies A modulation device that modulates the aforementioned audio signal into digital data in the aforementioned sub-band; a separating device that separates the aforementioned digital data into the following two kinds of data in at least one of the aforementioned sub-bands: (a) Mute data: set the aforementioned data below a critical potential as mute data 'and represent it by the start time and duration of the mute period; and 0 >) non-mute data: set the aforementioned signal high The data at the aforementioned critical potential is set as non-silent data; a storage device is used to store the aforementioned silent data and non-silent data in a memory element. 2. A speech encoding 'decoding system using both sub-band and mute simultaneously as described in item 1 of the scope of the patent application, wherein the aforementioned separation device uses a mute detector. 3. As described in item 1 of the scope of the patent application, a speech encoding and decoding system using both sub-band and mute simultaneously, further including an encoding device, so as to store the aforementioned mute data and non-mute data in the aforementioned memory element, Encode it. 4. A voice coding and decoding system using sub-band and mute at the same time as described in item 3 of the scope of the patent application, wherein the aforementioned coding device uses an "adapted differential pulse code modulator" (ADPCM). 5. A speech encoding and decoding system using subband and mute simultaneously as described in item 1 of the scope of patent application, wherein the aforementioned memory device uses random access memory. 6 'A voice encoding and decoding system using both sub-band and mute, which can reproduce the sound signal with digital data. The sound signal of the digital data is represented by more than one sub-band, and has (a) the aforementioned The mute data whose signal is lower than a critical potential, and expressed by the start time and duration of the mute period, and (b) the non-mute data where the aforementioned signal is higher than the aforementioned critical potential; includes: a regenerative device, In at least one sub-band, the aforementioned mute data and non-mute data are reproduced into the aforementioned digital data; ^ a demodulation device that demodulates the aforementioned digital data into a sub-band f tone signal; a filtering device in the aforementioned In the sub-band, the aforementioned sub-band sound signals are filtered; a synthesizing device mixes all the aforementioned sub-band sound signals to provide sound output. 7. A voice coding and solution system using both sub-band and mute at the same time, which can reproduce the sound signal of digital data with "Adjustable Differential Pulse Code Modulation" (ADPCM) ^ The sound signal of the aforementioned digital data consists of more than one Those indicated by the sub-frequency band and having (a) the aforementioned paper whose signal is lower than a critical potential are applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) (please read the precautions on the back before filling this page) ),? τ 4 54 1 6 6 Α8 Jiwubu ... D8, the scope of application for patents, printed data of employees' cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs, and expressed by the start time and duration of the silent period And (b) the non-mute data whose aforementioned signal is higher than the aforementioned critical potential; including: an ADPCM decoder, which reproduces the aforementioned non-muted data for the aforementioned sub-band; a muting generator, which reproduces the aforementioned in the aforementioned sub-band Mute data; demodulation split, in the aforementioned sub-band, demodulate the digital data from the aforementioned ADPCM decoder and the aforementioned muting generator to make it into Speech signal: Band-pass filter to filter the aforementioned sub-band voice signals: and a synthesis amplifier to mix the aforementioned sub-band voice signals. 8. ^ An encoding method using both a sub-band and a mute speech encoding and decoding system, the steps of which include: filtering-filtering the aforementioned sound signal into at least two sub-bands; transformation-converting sound in at least one sub-band Signal into a digital signal containing the following two items of data: & (a) mute data below a critical potential and expressed as the start time and duration of the mute period; and Φ) the aforementioned sound is higher than the aforementioned Critical potential non-silent data; and storage-storing the aforementioned silent data and non-silent data in a memory. 9. An encoding method for a speech encoding and decoding system using both sub-band and mute, as described in item 8 of the scope of the patent application, wherein the aforementioned memory is added in the form of "adapted differential pulse code modulation" (ADPCM) Coded. 10. —A decoding method using a sub-band and mute speech encoding and decoding system, which can decode digital data of a voice signal encoded by the sub-band and mute speech. The aforementioned digital data of a sound signal has (a) The aforementioned mute data whose sound signal is lower than a critical potential, and represented by the start time and duration of the mute period, and (b) the 'non-mute data' where the aforementioned sound signal is higher than the aforementioned critical potential, including: regeneration -In at least one of the aforementioned sub-bands, regenerating the aforementioned digital data from the aforementioned muted data and non-muted data; demodulation-demodulating the aforementioned digital data into a sub-band voice signal; filtering-for the aforementioned Filter the sub-band sound signals; and mix-mix all the aforementioned sub-band sound signals into the aforementioned sound signals (please read the precautions on the back before filling out this page) This paper standard applies to the Chinese National Standard (CNS) Μ Specifications (210X297 mm)