TW201816774A

TW201816774A - Audio processing method and non-transitory computer readable medium

Info

Publication number: TW201816774A
Application number: TW105134132A
Authority: TW
Inventors: 李敬祥
Original assignee: 盛微先進科技股份有限公司
Priority date: 2016-10-21
Filing date: 2016-10-21
Publication date: 2018-05-01
Also published as: TWI602173B

Abstract

An audio processing method is disclosed. The audio processing method includes: splitting an audio file into a plurality of audio sections; analyzing a first lowest energy value within a frequency spectrum waveform of a first audio section of the audio sections; comparing the first lowest energy value with a predetermined energy value and assigning the higher one as a first noise floor; generating a first processed audio section according to the first noise floor and the first audio section; compressing the first processed audio section to generate a compressed audio section; and sending the compressed audio section to an audio playback apparatus.

Description

Audio processing method and non-transitory computer readable medium

本揭露文件係關於一種音訊處理方法，特別係關於一種用以壓縮音訊檔案的音訊處理方法與非暫時性電腦可讀媒體。 The present disclosure relates to an audio processing method, and more particularly to an audio processing method and a non-transitory computer readable medium for compressing an audio file.

傳統上，若欲將音訊檔案透過例如藍牙等僅支援低頻寬的無線傳輸協定發送至音訊播放裝置，則需使用例如MP3格式等失真/有損的壓縮方式來大幅降低資料量，然而失真的壓縮方式可能嚴重造成音訊檔案中低頻與高頻聲音流失、或減少原本豐富的頻率或音量變化，大大降低了音訊品質。 Traditionally, if an audio file is to be transmitted to an audio playback device via a wireless transmission protocol such as Bluetooth that supports only a low-bandwidth, it is necessary to use a distortion/lossy compression method such as an MP3 format to greatly reduce the amount of data, but the distortion is compressed. The method may seriously cause the loss of low frequency and high frequency sound in the audio file, or reduce the original rich frequency or volume change, which greatly reduces the audio quality.

此外，一般壓縮技術通常牽涉將音訊檔於時域及頻域間進行轉換等大量運算，然小型播放裝置例如藍牙耳機、藍牙喇叭等，通常僅具有低處理能力的微處理器，因此在執行解壓縮音訊檔案時，此等小型播方裝置將耗費較長的處理時間，而無法即時播放。 In addition, general compression techniques usually involve a large number of operations such as converting audio files between the time domain and the frequency domain. However, small-sized playback devices such as Bluetooth headsets, Bluetooth speakers, etc. usually only have low-processing microprocessors, so the solution is executed. When compressing audio files, these small broadcasters will take a long time to process and cannot be played instantly.

在本揭露文件之一技術態樣中提出一種音訊處理方法。音訊處理方法首先將音訊檔案分割為複數個音訊區段。接著分析此些音訊區段當中第一音訊區段之頻譜波形中的第一最低能量值。將第一最低能量值與一預設能量值比較，並以較高者作為第一雜訊底。根據第一雜訊底與第一音訊區段產生第一經處理音訊區段。將第一經處理音訊區段進行壓縮以產生壓縮音訊區段。以及將該壓縮音訊區段發送至音訊播放裝置。 An audio processing method is proposed in one of the technical aspects of the present disclosure. The audio processing method first divides the audio file into a plurality of audio segments. The first lowest energy value in the spectral waveform of the first audio segment of the audio segments is then analyzed. The first lowest energy value is compared with a predetermined energy value, and the higher one is used as the first noise floor. Generating a first processed audio segment based on the first noise floor and the first audio segment. The first processed audio segment is compressed to produce a compressed audio segment. And transmitting the compressed audio segment to the audio playback device.

在本揭露文件之另一技術態樣中提出一種非暫時性電腦可讀媒體。非暫時性電腦可讀媒體儲存有複數指令，當複數指令被處理單元執行時，執行：將音訊檔案分割為複數個音訊區段，並依下列步驟處理音訊區段其中一音訊區段：分析此其中一音訊區段之頻譜波形中的最低能量值；將最低能量值與一預設能量值比較，並以較高者作為雜訊底；根據雜訊底及該其中一音訊區段來產生經處理音訊區段；將經處理音訊區段進行壓縮以產生壓縮音訊區段；以及將壓縮音訊區段發送至音訊播放裝置。 A non-transitory computer readable medium is presented in another aspect of the disclosure. The non-transitory computer readable medium stores a plurality of instructions. When the plurality of instructions are executed by the processing unit, the method performs: dividing the audio file into a plurality of audio segments, and processing one of the audio segments according to the following steps: analyzing the The lowest energy value of the spectral waveform of one of the audio segments; comparing the lowest energy value with a predetermined energy value, and using the higher one as the noise floor; generating the classic according to the noise floor and the one audio segment Processing the audio segment; compressing the processed audio segment to generate a compressed audio segment; and transmitting the compressed audio segment to the audio playback device.

在本揭露文件之又一技術態樣中提出一種非暫時性電腦可讀媒體。非暫時性電腦可讀媒體儲存有複數指令，用以將壓縮音訊檔案中的壓縮音訊區段還原，當複數指令被處理單元執行時，執行：將壓縮音訊區段解壓縮，以得到解壓音訊區段；以及將解壓音訊區段中各取樣值乘以一捨棄值。其中捨棄值與壓縮音訊區段對應的原始音訊區段的原始雜訊底相關。 A non-transitory computer readable medium is presented in yet another aspect of the disclosure. The non-transitory computer readable medium stores a plurality of instructions for restoring the compressed audio section in the compressed audio file. When the plurality of instructions are executed by the processing unit, performing: decompressing the compressed audio section to obtain a decompressed audio zone Segment; and multiplying each sample value in the decompressed audio segment by a discard value. The discard value is related to the original noise floor of the original audio segment corresponding to the compressed audio segment.

透過本揭示文件的教示，可使音訊檔案能透過低頻寬傳輸協定發送。而因為音訊檔案係以無失真的壓縮格式處理，其中並不牽涉例如時域與頻域間的轉換，因此即使音訊播放裝置僅具備低運算能力之處理器，仍可以快速地解壓縮音訊檔案，以供即時播放。 Through the teachings of this disclosure, audio files can be transmitted via a low frequency wide transmission protocol. Since the audio file is processed in a distortion-free compression format, which does not involve, for example, conversion between the time domain and the frequency domain, even if the audio playback device has only a low computing power processor, the audio file can be quickly decompressed. For instant playback.

100、400、500‧‧‧音訊處理方法 100, 400, 500‧‧‧ audio processing methods

S102、S104、S106、S108、S109‧‧‧步驟 S102, S104, S106, S108, S109‧‧‧ steps

S110、S111、S112、S114、S115‧‧‧步驟 S110, S111, S112, S114, S115‧‧‧ steps

S116、S118、S119、S120‧‧‧步驟 S116, S118, S119, S120‧‧ steps

L11、L12、L13、L14‧‧‧能量值 L11, L12, L13, L14‧‧‧ energy values

第1圖為本揭露文件之一實施例之音訊處理方法流程圖。 FIG. 1 is a flow chart of an audio processing method according to an embodiment of the disclosure.

第2A~2C圖為本揭露文件之一實施例之頻譜波形圖。 2A-2C are spectrum waveform diagrams of an embodiment of the present disclosure.

第3A~3C圖為本揭露文件之一實施例之時域波形圖。 3A-3C are time-domain waveform diagrams of an embodiment of the present disclosure.

第4圖為本揭露文件之一實施例之音訊處理方法流程圖。 FIG. 4 is a flow chart of an audio processing method according to an embodiment of the disclosure.

第5圖為本揭露文件之一實施例之音訊處理方法流程圖。 FIG. 5 is a flow chart of an audio processing method according to an embodiment of the disclosure.

第6圖為本揭露文件之一實施例之函數曲線圖。 Figure 6 is a graph showing the function of one embodiment of the present disclosure.

下文係舉實施例配合所附圖式作詳細說明，但所描述的具體實施例僅僅用以解釋本發明，並不用來限定本發明，而結構操作之描述非用以限制其執行之順序，任何由元件重新組合之結構，所產生具有均等功效的裝置，皆為本發明揭示內容所涵蓋的範圍。此外，附圖僅僅用以示意性地加以說明，并未依照其真實尺寸進行繪製。 The following detailed description of the embodiments of the present invention is intended to be illustrative of the invention, and is not intended to limit the invention, and the description of structural operation is not intended to limit the order of execution, any The means for re-combining the components, resulting in equal functionality, are within the scope of the present disclosure. Moreover, the drawings are only for illustrative purposes and are not drawn in their true dimensions.

在全篇說明書與申請專利範圍所使用之用詞(terms)，除有特別註明外，通常具有每個用詞使用在此領域中、在此揭露之內容中與特殊內容中的平常意義。某些用以描述本揭露之用詞將於下或在此說明書的別處討論，以提供本領域技術人員在有關本揭露之描述上額外的引導。 The terms used in the entire specification and the scope of the patent application, unless otherwise specified, generally have the ordinary meaning of each term used in the field, the content disclosed herein, and the particular content. Certain terms used to describe the disclosure are discussed below or elsewhere in this specification to provide additional guidance to those skilled in the art in the description of the disclosure.

第1圖繪示本揭露文件之一實施例之音訊處理方法2流程圖。音訊處理方法100用以壓縮音訊檔案，並發送經壓縮的音訊檔案至播放裝置進行播放，較佳地，當音訊檔案較大時，音訊處理方法100可將音訊檔案分割為多個音訊區段，並針對每個音訊區段作個別處理。音訊檔案可根據任何規則作分割，例如時間長度、取樣點數量及/或檔案大小等。其中，音訊處理方法100是依據音訊內容的時間先後順序來處理每一音訊區段，而每一音訊區段的內容具有相同或不相同的時間長度、取樣點數量及/或檔案大小，本揭示文件並不加以限制。 FIG. 1 is a flow chart showing an audio processing method 2 according to an embodiment of the disclosure. The audio processing method 100 is configured to compress the audio file and send the compressed audio file to the playback device for playback. Preferably, when the audio file is large, the audio processing method 100 can divide the audio file into multiple audio segments. Individual processing is performed for each audio segment. Audio files can be segmented according to any rules, such as length of time, number of samples, and/or file size. The audio processing method 100 processes each audio segment according to the chronological order of the audio content, and the content of each audio segment has the same or different length of time, the number of sampling points, and/or the file size. The file is not restricted.

音訊處理方法100包含步驟S102~S120。其中，步驟S102~S114例如由電腦等較高運算處理能力的裝置執行，而步驟S116~S120例如由藍牙裝置等較低運算處理能力的裝置執行。舉例來說，上述運算處理能力是指處理器之時脈速率、處理器之效能、浮點計算能力、位元頻寬、記憶體之容量等運算參數，例如較高運算處理能力的裝置可以包含音響系統、智慧型手機、平板電腦、隨身音樂撥放器等，較低運算處理能力的裝置可以包含藍牙耳機、藍牙喇叭等。 The audio processing method 100 includes steps S102 to S120. Here, steps S102 to S114 are executed by, for example, a device having a higher arithmetic processing capability such as a computer, and steps S116 to S120 are executed by, for example, a device having a lower arithmetic processing capability such as a Bluetooth device. For example, the above-mentioned arithmetic processing capability refers to operational parameters such as the clock rate of the processor, the performance of the processor, the floating point computing capability, the bit width, and the capacity of the memory. For example, a device with higher arithmetic processing capability may include Audio systems, smart phones, tablets, portable music players, etc., lower computing power can include Bluetooth headsets, Bluetooth speakers and so on.

音訊檔案中的多個音訊區段的第一音訊區段將首先透過步驟S102~S120進行處理。待第一音訊區段經音訊處理方法100處理完畢後，第二音訊區段緊接著透過步驟S102~S120處理，而待第二音訊區段處理完畢後，接續執行下一音訊區段。換句話說，依序透過步驟S102~S120處理每一音訊區段，直至整個音訊檔案被處理完成。步驟S102~S110皆為壓縮音訊區段前的預處理步驟。在下文中，僅以第一音訊區段及第二音訊區段為例以簡化說明。 The first audio segment of the plurality of audio segments in the audio file will first be processed through steps S102-S120. After the first audio segment is processed by the audio processing method 100, the second audio segment is processed through steps S102 to S120, and after the second audio segment is processed, the next audio segment is continuously executed. In other words, each audio segment is processed through steps S102-S120 in sequence until the entire audio file is processed. Steps S102~S110 are all pre-processing steps before compressing the audio segment. In the following, only the first audio segment and the second audio segment are taken as an example to simplify the description.

在步驟S102中，將第一音訊區段從時域資料轉換為以頻域表示的資料(頻譜)，上述轉換可以透過例如快速傅立葉轉換(Fast Fourier Transform,FFT)或其他相似的演算法加以完成。其中，所述資料為時域或頻域中取樣點及對應之取樣值數據。經轉換後之結果可參照第2A圖所繪示本揭露文件之一實施例之第一音訊區段的頻譜波形。在第2A圖中，橫軸座標單元為頻率(Hz)，縱軸為音量/能量(dB)。 In step S102, the first audio segment is converted from the time domain data to the data (spectrum) expressed in the frequency domain, and the conversion may be performed by, for example, Fast Fourier Transform (FFT) or other similar algorithms. . The data is a sampling point in the time domain or the frequency domain and corresponding sampling value data. For the result of the conversion, the spectrum waveform of the first audio segment of one embodiment of the disclosure may be referred to in FIG. 2A. In Fig. 2A, the horizontal axis coordinate unit is frequency (Hz), and the vertical axis is volume/energy (dB).

接著，於步驟S104，分析第一音訊區段之頻譜波形中的最低能量值，此步驟目的在於計算出不必要的系統雜訊所佔有的資料量。舉例來說，音訊輸出在每一時間點上通常包含有系統固有的雜訊，此系統雜訊一般稱為雜訊基準或雜訊底(Noise Floor)。雜訊底為不期望的噪聲，影響著訊號雜訊比(Signal-to-noise ratio,SNR)，而雜訊比則與音訊的品質相關聯。雜訊底在音訊的靜默階段影響尤為明顯，其也限制了音訊的動態範圍(最強音量與最弱音量的比率)。因此，去除系統雜訊所佔有的資料量，除了可降低檔案大小、提高後續壓縮處理的壓縮能力，亦提高了音訊的品質(提高SNR)。 Next, in step S104, the lowest energy value in the spectral waveform of the first audio segment is analyzed. The purpose of this step is to calculate the amount of data occupied by unnecessary system noise. For example, the audio output usually contains system-specific noise at each point in time. This system noise is generally called a noise reference or a noise floor. The noise floor is undesired noise, affecting the signal-to-noise ratio (SNR), and the noise ratio is related to the quality of the audio. The noise floor is particularly noticeable in the silent phase of the audio, which also limits the dynamic range of the audio (the ratio of the strongest volume to the weakest volume). Therefore, the removal of the amount of data occupied by the system noise, in addition to reducing the file size, improving the compression capacity of subsequent compression processing, also improves the quality of the audio (increased SNR).

而在步驟S104中，將分析得之最低能量值作為第一最低能量值。在第2A圖實施例的頻譜中，第一最低能量值為能量值L11處，約為-130dB。一般而言，在一段音訊內容中，高頻的資料通常具有較低的能量。應注意的是，人耳平均所能感知的聲音最大頻率範圍約為20Hz至20KHz，而其實在15KHz以上的聲音的感知已很薄弱。因此在例如流行音樂唱片或某些音訊檔案中，唱片製作公司會先行去除音訊檔案中較高頻率(例如15KHz以上)的音訊內容以降低檔案大小，如第2B圖所示。第2B圖繪示本揭露文件之一實施例之高頻15KHz以上音訊內容被去除的頻譜波形。亦即是說，音訊在高頻15KHz以上部分已不存在有用資訊，僅剩無用資訊(噪聲)。在第2B圖中，橫軸座標單元為頻率(Hz)，縱軸為音量/能量(dB)。 In step S104, the analyzed lowest energy value is taken as the first lowest energy value. In the spectrum of the embodiment of Fig. 2A, the first lowest energy value is at the energy value L11, which is about -130 dB. In general, high-frequency data usually has a lower energy in a piece of audio content. It should be noted that the maximum frequency range of the sound that can be perceived by the human ear is about 20 Hz to 20 kHz, but the perception of sound above 15 kHz is already weak. Therefore, in popular music records or certain audio files, for example, the record production company will first remove the higher frequency (for example, 15KHz or higher) audio content in the audio file to reduce the file size, as shown in FIG. 2B. FIG. 2B is a diagram showing a spectrum waveform in which audio content of a frequency higher than 15 kHz is removed in an embodiment of the disclosed document. That is to say, there is no useful information in the audio part above the high frequency of 15KHz, and only the useless information (noise) remains. In Fig. 2B, the horizontal axis coordinate unit is frequency (Hz), and the vertical axis is volume/energy (dB).

在第2B圖的實施例中，透過步驟S104分析的第一最低能量值約位在45KHz處，其對應圖中所標示之能量值L12(-120dB)。然而，事實上在第2B圖的實施例中，此音訊區段在15KHz以上部分已不存在有效的音訊檔案內容(在出廠時已被唱片製作公司先行去除)，即15KHz至45KHz範圍的部分也是不必要的系統雜訊所佔有的資料量。因此，在音訊處理方法100的步驟S106中，將步驟S104 所分析出的第一最低能量值與一預設能量值作比較，並以較高者作為第一雜訊底。其中，在本揭露文件中，第一雜訊底對應的能量值以下的資料皆視為所謂的雜訊。舉例來說，若步驟S104分析得之最低能量值低於預設能量值，則以預設能量值作為雜訊底，而當分析得之最低能量值高於預設能量值，則以最低能量值作為雜訊底。 In the embodiment of FIG. 2B, the first lowest energy value analyzed through step S104 is approximately at 45 KHz, which corresponds to the energy value L12 (-120 dB) indicated in the figure. However, in fact, in the embodiment of FIG. 2B, there is no valid audio file content in the audio segment above 15 kHz (which has been removed by the record production company at the factory), that is, the portion ranging from 15 kHz to 45 kHz is also The amount of data possessed by unnecessary system noise. Therefore, in step S106 of the audio processing method 100, the first lowest energy value analyzed in step S104 is compared with a preset energy value, and the higher one is used as the first noise floor. Among them, in the disclosure document, the data below the energy value corresponding to the first noise floor is regarded as so-called noise. For example, if the lowest energy value analyzed in step S104 is lower than the preset energy value, the preset energy value is used as the noise floor, and when the lowest energy value analyzed is higher than the preset energy value, the lowest energy is used. The value is used as the noise floor.

在第2B圖的實施例中，預設能量值對應能量值L13(例如為-85dB)。其中，預設能量值亦可由使用者自行設定，本揭露文件並不加以限制。此例中，預設能量值(-85dB)高於最低能量值(-120dB)，因此以預設能量值-85dB作為第一雜訊底，低於-85dB第一雜訊底之能量值的資料皆視為雜訊。 In the embodiment of FIG. 2B, the preset energy value corresponds to the energy value L13 (eg, -85 dB). The preset energy value can also be set by the user, and the disclosure document is not limited. In this example, the preset energy value (-85dB) is higher than the lowest energy value (-120dB), so the preset energy value is -85dB as the first noise floor, and below the -85dB first noise floor energy value. The information is considered as noise.

而預設能量值-85dB對應到第2B圖中的15KHz頻率處，故透過預設能量值的設定，15KHz至45KHz(最低能量值對應頻率)範圍的部分也可被歸類為雜訊資料，而不會被誤留而限制了後續檔案壓縮能力。簡單來說，藉由步驟S106，可計算出更接近實際音訊的雜訊底/不必要的資料。 The preset energy value -85dB corresponds to the 15KHz frequency in Figure 2B. Therefore, the portion of the range of 15KHz to 45KHz (the lowest energy value corresponding to the frequency) can also be classified as noise data by setting the preset energy value. It will not be misplaced and limit the ability of subsequent file compression. Briefly, by step S106, the noise/necessary data closer to the actual audio can be calculated.

另一情況下，若量測到的最低能量值高於預設能量值，則以量測到的最低能量值作為第一雜訊底。請參閱第2C圖，第2C圖繪示本揭露文件之一實施例之頻譜波形圖。在第2C圖的音訊區塊的頻譜中，最低能量值L14約為-78dB，高於預設能量值(-85dB)，因此以最低能量值L14作為第一雜訊底。透過量測到的最低能量值作為雜訊底，低於雜訊底的部分將被歸類為雜訊資料，如此一來，雜訊底將隨著音訊內容的最低能量值而浮動設定，而不會使雜訊底固定於預設能量值。 In another case, if the measured minimum energy value is higher than the preset energy value, the measured lowest energy value is used as the first noise floor. Please refer to FIG. 2C. FIG. 2C is a diagram showing a waveform waveform of an embodiment of the disclosed document. In the spectrum of the audio block of FIG. 2C, the lowest energy value L14 is about -78 dB, which is higher than the preset energy value (-85 dB), so the lowest energy value L14 is used as the first noise floor. The measured minimum energy value is used as the noise floor, and the portion below the noise floor is classified as noise data, so that the noise floor will be set with the lowest energy value of the audio content. Does not fix the noise floor to the preset energy value.

接著，於步驟S108中，根據第一音訊區段之時域波形中低於第一雜訊底能量值的資料來產生第一捨棄值。第一捨棄值用以與第一音訊區段進行進一步處理，以產生第一經處理音訊區段。詳細而言，步驟S108將第一音訊區段之時域波形中能量值低於第一雜訊底的取樣點的取樣值進行方均根(Root Mean Square，RMS)運算來計算出時域的振幅(Amplitude)，並以此振幅大小作為第一捨棄值。並接著步驟S110將第一音訊區段中各初始取樣值除以第一捨棄值，並經小數點無條件捨去取至整數位後，產生第一經處理音訊區段。舉例來說，上述提到的小數點無條件捨去可以透過下取整函數(floor function)加以完成。 Next, in step S108, the first discarded value is generated according to the data in the time domain waveform of the first audio segment that is lower than the first noise bottom energy value. The first discard value is used for further processing with the first audio segment to generate a first processed audio segment. In detail, step S108 calculates a time domain amplitude by performing a Root Mean Square (RMS) operation on a sample value of a time domain waveform of the first audio segment that is lower than a sampling point of the first noise floor. Amplitude), and use this amplitude as the first discard value. Then, in step S110, each initial sample value in the first audio segment is divided by the first discarded value, and after the decimal point is unconditionally rounded to the integer bit, the first processed audio segment is generated. For example, the unconditional rounding off of the decimal point mentioned above can be done by the floor function.

假設第一音訊區段為24bit/96KHz格式的音訊，其中24bit所能呈現的資料範圍具有8388608個不同的強度等級，例如可以用來表示數值區間-8388608~-1，或是可以用來表示數值區間0~8388607，或是其他設定的數值區間。以下舉例，是採用數值區間0~8388607進行說明。 Assume that the first audio segment is in 24-bit/96KHz format, and the data range that can be presented by 24bit has 8388608 different intensity levels, for example, it can be used to represent the value interval -3888818~-1, or can be used to represent the value. Interval 0~8388607, or other set value range. The following example is described using the numerical interval 0~8388607.

當第一音訊區段在時域上其中一取樣點的初始取樣值為24bit格式所能呈現的最大值8388607，以及假設第一捨棄值為1000。在步驟S110中，將此取樣點之值8388607除以1000，得到8388.607，並經下取整函數取至整數位，得到之新的取樣值為8388。亦即，原第一音訊區段中初始取樣值為8388607的取樣點經步驟S110處理後，對應之第一經處理音訊區段中同一取樣點的取樣值為8388。 When the first audio segment is in the time domain, the initial sampling value of one of the sampling points is the maximum value of 8388607 which can be presented in the 24-bit format, and the first rejection value is assumed to be 1000. In step S110, the value of the sample point 8388607 is divided by 1000 to obtain 8386.607, and is taken to the whole number by the rounding function to obtain a new sample value of 8388. That is, after the sampling point whose initial sampling value is 8388607 in the original first audio section is processed in step S110, the sampling value of the same sampling point in the corresponding first processed audio section is 8388.

因此，24bit/96KHz格式的音訊原先使用24bit的資料量來儲存每一取樣點的資料，在透過步驟S102~S110的壓縮前預處理步驟後，因最大的初始取樣值對應到新的最大取樣值為8388(介於2¹³和2¹⁴之間)，可僅使用15bit的資料量進行每一取樣點的儲存。藉此，可使後續壓縮音訊的能力大幅提高。應注意的是，傳統上針對雜訊底之作法是依據位元數來處理，例如當第一捨棄值為1000時，因1000介於2⁹和2¹⁰之間，因此最多僅能捨去2⁹(=512)的資料量，則浪費了1000-512=488的捨棄值資料量。亦即是說，傳統作法仍可能保留了不必要的部分雜訊，此導致了後續壓縮能力的下降。 Therefore, the audio of the 24-bit/96KHz format originally uses the data amount of 24 bits to store the data of each sampling point, and after the pre-compression pre-processing step through steps S102-S110, the maximum initial sampling value corresponds to the new maximum sampling value. For 8388 (between 2 ¹³ and 2 ¹⁴ ), each sample point can be stored using only 15 bits of data. Thereby, the ability to subsequently compress audio can be greatly improved. It should be noted that the traditional method for the noise floor is processed according to the number of bits. For example, when the first discard value is 1000, since 1000 is between 2 ⁹ and 2 ¹⁰ , only 2 can be discarded. ⁹ (= 512) of the amount of data, wasted 1000-512 = 488 discarded value data. That is to say, the traditional practice may still retain unnecessary part of the noise, which leads to the decline of subsequent compression capabilities.

承上實施例，當一取樣點的取樣值低於第一捨棄值時，則新的取樣值將為0。舉例來說，假設第一音訊區段在時域上其中一取樣點的取樣值為900(低於假設的第一捨棄值1000)。經步驟S110處理，將此取樣點之值900除以1000，得到0.9，並經下取整函數取至整數位，得到之新的取樣值為0。亦即，原第一音訊區段中初始取樣值低於第一捨棄值時，經步驟S110處理後，對應之第一經處理音訊區段中的新取樣值皆為0。 In the embodiment, when the sampled value of a sampling point is lower than the first discarded value, the new sampled value will be zero. For example, assume that the sampling value of one of the sampling points of the first audio segment in the time domain is 900 (below the assumed first discarding value of 1000). After the processing in step S110, the value 900 of the sampling point is divided by 1000 to obtain 0.9, and the integer value is obtained by taking the rounding function to obtain an integer value of 0. That is, when the initial sampling value in the original first audio segment is lower than the first discarded value, the new sampling value in the corresponding first processed audio segment is 0 after the processing in step S110.

接著，步驟S112將第一經處理音訊區段進行壓縮以產生壓縮音訊區段。詳細來說，因經過步驟S102~S110 的預處理步驟，第一音訊區段的檔案大小已大幅縮小，因此步驟S112可使用無失真的壓縮格式來將第一經處理音訊區段進行壓縮，而不需要透過失真的壓縮格式來提高壓縮能力。在此實施例中，無失真的壓縮格式例如為自由無損音頻壓縮編碼(Free Lossless Audio Codec,FLAC)。藉由FLAC壓縮技術，第一經處理音訊區段中最低取樣值(例如0)之取樣點會被先行捨棄以提高壓縮能力，待解壓縮後才將最低取樣值之取樣點復原以回復原取樣率。其中，若第一音訊區段未經過步驟S102~S110的預處理而直接進行壓縮，則FLAC壓縮所能提供的壓縮率(壓縮後的大小與壓縮前的大小之比)約為70%~80%，而經過步驟S102~S110的預處理後再進行壓縮，壓縮率可達到20%~15%。 Next, step S112 compresses the first processed audio segment to generate a compressed audio segment. In detail, since the file size of the first audio segment has been greatly reduced by the pre-processing steps of steps S102 to S110, step S112 can compress the first processed audio segment using the distortion-free compression format. There is no need to pass the distortion compression format to improve compression. In this embodiment, the distortion-free compression format is, for example, Free Lossless Audio Codec (FLAC). With FLAC compression technology, the sampling point of the lowest sampled value (for example, 0) in the first processed audio segment is discarded first to improve the compression capability, and the sampling point of the lowest sampling value is restored to restore the original sampling after being decompressed. rate. Wherein, if the first audio segment is directly compressed without being preprocessed in steps S102 to S110, the compression ratio (the ratio of the compressed size to the size before compression) that FLAC compression can provide is about 70%-80. %, and after the pre-processing of steps S102-S110, compression is performed, and the compression ratio can reach 20% to 15%.

待第一經處理音訊區段被壓縮以產生壓縮音訊區段後，步驟S114將此壓縮音訊區段透過例如藍牙傳輸發送至音訊播放裝置，例如藍牙耳機或藍牙喇叭等低功耗且低運算處理能力之裝置。步驟S116中，音訊播放裝置可將接受到的壓縮音訊區段進行解壓還原。因為壓縮音訊區段是透過無失真壓縮(例如FLAC)進行處理所產生，故解壓縮過程中，僅需回復壓縮時被去除之最低取樣值之取樣點即可(即回復為第一經處理音訊區段)，而不需要再經過反向快速傅立葉轉換等額外複雜且大量的運算。 After the first processed audio segment is compressed to generate a compressed audio segment, step S114 sends the compressed audio segment to an audio playback device, such as a Bluetooth headset or a Bluetooth speaker, for example, through a Bluetooth transmission, such as a Bluetooth headset or a Bluetooth speaker. The device of ability. In step S116, the audio playback device can decompress and restore the received compressed audio segment. Since the compressed audio segment is generated by processing without distortion compression (for example, FLAC), in the decompression process, it is only necessary to restore the sampling point of the lowest sample value that is removed during compression (ie, returning to the first processed audio) Section), without the need for additional complex and large numbers of operations such as inverse fast Fourier transforms.

而於解壓還原後，步驟S118將還原的第一經處理音訊區段各取樣點的取樣值乘上第一捨棄值，以回復原音訊格式(例如24bit)。接著，步驟S120即時將還原後的音訊播放。因此，透過音訊處理方法100處理之音訊可於音訊播放裝置進行快速解壓及還原以供即時播放。 After the decompression is restored, step S118 multiplies the sampled value of each sample point of the restored first processed audio segment by the first discarded value to restore the original audio format (for example, 24 bits). Then, in step S120, the restored audio is played in real time. Therefore, the audio processed by the audio processing method 100 can be quickly decompressed and restored by the audio playback device for instant playback.

承上實施例，當第一音訊區段透過音訊處理方法100處理之後，接著第二音訊區段亦透過音訊處理方法100開始進行處理。其中，步驟S102首先將第二音訊區段的時域資料轉為頻譜。步驟S104分析第二音訊區段之頻譜波形中的第二最低能量值。步驟S106將第二最低能量值與預設能量值比較，並以較高者作為第二雜訊底。步驟S108中，將第二音訊區段之時域波形中能量值低於第二雜訊底的取樣點的取樣值進行方均根(RMS)運算來計算出時域的振幅(Amplitude)，並以此振幅大小作為第二捨棄值，並於步驟S110與第二音訊區段進行處理，以產生第二經處理音訊區段。 In the embodiment, after the first audio segment is processed by the audio processing method 100, the second audio segment is then processed by the audio processing method 100. Step S102 first converts the time domain data of the second audio segment into a frequency spectrum. Step S104 analyzes a second lowest energy value in the spectral waveform of the second audio segment. Step S106 compares the second lowest energy value with the preset energy value, and uses the higher one as the second noise floor. In step S108, the amplitude value (Amplitude) of the time domain is calculated by performing a root mean square (RMS) operation on the sampling value of the sampling point of the second audio segment in which the energy value is lower than the sampling point of the second noise floor. The amplitude magnitude is used as the second discard value and is processed with the second audio segment in step S110 to generate a second processed audio segment.

接著，執行步驟S112將第二經處理音訊區段壓縮及步驟S114將壓縮後的音訊發送至播放裝置，並執行步驟S116、S118的解壓還原過程，最後以步驟S120即時播放音訊。 Next, step S112 is performed to compress the second processed audio segment and step S114 to send the compressed audio to the playback device, and the decompression and reduction process of steps S116 and S118 is performed, and finally the audio is played in step S120.

於一實施例中，經音訊處理方法100處理的音訊區段的時域波形圖例如第3A~3C圖所示。其中，第3A~3C圖中，橫座標軸單位為時間(t)，而縱座標軸單位為強度等級，即取樣值。第3A圖繪示本揭露文件之一實施例之音訊區段的原始時域波形圖。第3B圖繪示第3A圖之實施例的音訊區段經步驟S102~S110的預處理過程而產生之經處理音訊區段的時域波形圖。其中，於此例中，是假設步驟S108 計算得之捨棄值為448來處理音訊區段。而第3C圖繪示第3B圖之經處理音訊區段再經過步驟S112壓縮、S114發送及S115~118解壓還原過程後的時域波形圖。由第3A圖及第3C圖可看出，經音訊處理方法100處理的音訊區段並未有明顯的失真現象產生。 In one embodiment, the time domain waveform diagram of the audio segment processed by the audio processing method 100 is shown, for example, in FIGS. 3A-3C. In the 3A~3C diagram, the unit of the abscissa axis is time (t), and the unit of the ordinate axis is the intensity level, that is, the sample value. FIG. 3A is a diagram showing an original time domain waveform of an audio segment of an embodiment of the disclosed document. FIG. 3B is a time-domain waveform diagram of the processed audio segment generated by the pre-processing of steps S102-S110 in the audio segment of the embodiment of FIG. 3A. In this example, it is assumed that the discard value of 448 is calculated in step S108 to process the audio segment. The 3C is a time domain waveform diagram of the processed audio segment of FIG. 3B after the step S112 compression, the S114 transmission, and the S115~118 decompression and reduction process. As can be seen from Figures 3A and 3C, the audio segments processed by the audio processing method 100 are not significantly distorted.

於本揭露文件的一實施例中，音訊處理方法更可包含步驟S109及步驟S115，如第4圖所示。第4圖繪示本揭露文件之一實施例之音訊處理方法400流程圖。音訊處理方法400包含步驟S102、S104、S106、S108、S109、S110、S112、S114、S115、S116、S118、S120，其中步驟S102~S108、S110~S114、S116~S120同於音訊處理方法100，請見前述相關段落說明，於此不再重複贅述。而於步驟S108產生第一捨棄值後，步驟S109中，將第一捨棄值乘上一調整係數。其中，調整係數可由使用者自訂，用以控制及調整後續處理步驟產生的音訊檔案品質。 In an embodiment of the disclosure, the audio processing method may further include step S109 and step S115, as shown in FIG. FIG. 4 is a flow chart of an audio processing method 400 according to an embodiment of the disclosure. The audio processing method 400 includes steps S102, S104, S106, S108, S109, S110, S112, S114, S115, S116, S118, S120, wherein steps S102~S108, S110~S114, S116~S120 are the same as the audio processing method 100, Please refer to the description of the relevant paragraphs above, and the details are not repeated here. After the first discard value is generated in step S108, the first discard value is multiplied by an adjustment coefficient in step S109. The adjustment coefficient can be customized by the user to control and adjust the quality of the audio file generated by the subsequent processing steps.

詳細來說，使用者可判斷音訊檔案並不需要太高的品質，則可選擇提高第一捨棄值，使欲捨棄的資料量提高，進而降低音訊檔案大小，後續壓縮能力也可進一步提升。舉例而言，假設第一捨棄值為1000，而調整係數為16，則步驟S109中，將第一捨棄值1000乘以調整係數為16，乘積為新捨棄值16000，亦即提高捨棄值。接著在進入步驟S110，將第一音訊區段中各初始取樣值除以新捨棄值，並經下取整函數處理後，產生第一經處理音訊區段。接著，第一經處理音訊區段經步驟S112壓縮產生壓縮音訊區段後，由步驟S114發送至音訊播放裝置。 In detail, if the user can judge that the audio file does not need too high quality, the user can choose to increase the first discard value, thereby increasing the amount of data to be discarded, thereby reducing the size of the audio file, and the subsequent compression capability can be further improved. For example, assuming that the first discard value is 1000 and the adjustment coefficient is 16, in step S109, the first discard value 1000 is multiplied by an adjustment factor of 16, and the product is a new discard value of 16000, that is, the discard value is increased. Then, in step S110, each initial sample value in the first audio segment is divided by the new discard value, and processed by the lower rounding function to generate a first processed audio segment. Then, the first processed audio segment is compressed to generate a compressed audio segment via step S112, and then sent to the audio playback device in step S114.

於步驟S115中，計算發送壓縮音訊區段的傳送頻寬，若傳送頻寬大於一預設值時，則提高下一音訊區段(第二音訊區段)的調整係數。一般來說，欲使藍牙可以穩定地傳輸資料，頻寬通常需介於1~1.5Mbps或以下，於此實施例中，預設值設為660Kbps。當發送壓縮音訊區段的頻寬大於預設值時，則自動提高第二音訊區段的調整係數，藉此增加捨棄值來提高壓縮能力。因調整係數的提高，後續壓縮後的音訊區段的發送頻寬將可符合穩定傳輸的條件(低於660Kbps)。 In step S115, the transmission bandwidth of the transmitted compressed audio segment is calculated, and if the transmission bandwidth is greater than a preset value, the adjustment coefficient of the next audio segment (second audio segment) is increased. In general, in order for Bluetooth to transmit data stably, the bandwidth usually needs to be between 1 and 1.5 Mbps or less. In this embodiment, the preset value is set to 660 Kbps. When the bandwidth of the transmitted compressed audio segment is greater than a preset value, the adjustment coefficient of the second audio segment is automatically increased, thereby increasing the discarding value to improve the compression capability. Due to the increase of the adjustment factor, the transmission bandwidth of the subsequently compressed audio segment will conform to the condition of stable transmission (less than 660 Kbps).

應理解的是，當傳送頻寬遠小於預設值時，亦可降低第二音訊區段的調整係數，以提高頻寬。其中調整係數之數值可為整數/非整數或甚至為函數式，本揭露文件並不加以限制。而於一實施例中，系統或使用者亦可預先建立調整係數表。調整係數表中包含多個不同的調整係數，因此於步驟S115中，當傳送頻寬大於或遠小於預設值時，音訊處理方法400可自動選擇調整係數表中較大或較小的調整係數來處理下一音訊區段。 It should be understood that when the transmission bandwidth is much smaller than the preset value, the adjustment coefficient of the second audio segment may also be lowered to increase the bandwidth. The value of the adjustment coefficient may be an integer/non-integer or even a functional formula, and the disclosure is not limited. In an embodiment, the system or the user may also establish an adjustment coefficient table in advance. The adjustment coefficient table includes a plurality of different adjustment coefficients, so in step S115, when the transmission bandwidth is greater than or far less than the preset value, the audio processing method 400 can automatically select a larger or smaller adjustment coefficient in the adjustment coefficient table. To process the next audio segment.

於本揭露文件的另一實施例中，音訊處理方法亦可包含步驟S111、S119，如第5圖所繪示本揭露文件之一實施例之音訊處理方法500流程圖。音訊處理方法500包含步驟S102、S104、S106、S108、S111、S112、S114、S116、S119、S120，其中步驟S102~S108、S112~S116、S120同於音訊處理方法100，請見前述相關段落說明，於此不再重複贅述。在步驟S111中，根據第一音訊區段中的每一初始取樣值的大小來動態地調整步驟S108產生之第一捨棄值，以進一步產生經處理音訊區段。亦即，每一取樣點的取樣值根據各自所對應之第一捨棄值作調整。其中，將第一捨棄值與第一音訊區段各個初始取樣值透過非線性壓擴(companding)方法進行轉換，以對應地調整每一初始取樣值並產生新的取樣值。 In another embodiment of the disclosure, the audio processing method may further include steps S111 and S119. The flowchart of the audio processing method 500 according to an embodiment of the disclosure is shown in FIG. The audio processing method 500 includes steps S102, S104, S106, S108, S111, S112, S114, S116, S119, S120, wherein steps S102~S108, S112~S116, S120 are the same as the audio processing method 100, please refer to the foregoing related paragraphs. This will not be repeated here. In step S111, the first discard value generated in step S108 is dynamically adjusted according to the size of each initial sample value in the first audio segment to further generate a processed audio segment. That is, the sample values of each sampling point are adjusted according to the respective first discarded values. The first discard value and each initial sample value of the first audio segment are converted by a nonlinear companding method to correspondingly adjust each initial sample value and generate a new sample value.

於一實施例中，非線性壓擴方法可例如為Mu-law編碼(μ-law encoding)。在Mu-law編碼中，將初始取樣值區間對應到最大值為1、最小值為-1的區間，亦即將取樣值除以最大值。Mu-law函式(μ-law function)如下公式一： In an embodiment, the nonlinear companding method may be, for example, a mu-law encoding. In Mu-law coding, the initial sample value interval is corresponding to the interval where the maximum value is 1 and the minimum value is -1, that is, the sampled value is divided by the maximum value. The Mu-law function is as follows:

其中，x為取樣值，μ為捨棄值，sign(x)為符號函數，當x大於0時，則sign(x)=1；當x為0時，則sign(x)=0；而當x小於0時，sign(x)=-1。其中mu(x)之值定於1~-1之間，因此須將計算得之mu(x)之值乘上轉換後的音訊格式的bit數量，以得到實際對應的取樣值。關於Mu-law編碼函數mu(x)與取樣值x的關係請見第6圖所繪示之本揭露文件之一實施例之Mu-law編碼函數曲線圖。第5圖中，橫軸為x值，縱軸為mu(x)。 Where x is the sampled value, μ is the discarded value, and sign(x) is the sign function. When x is greater than 0, then sign(x)=1; when x is 0, then sign(x)=0; When x is less than 0, sign(x) = -1. The value of mu(x) is set between 1 and -1. Therefore, the calculated value of mu(x) must be multiplied by the number of bits of the converted audio format to obtain the actual corresponding sample value. For the relationship between the Mu-law encoding function mu(x) and the sampled value x, see the Mu-law encoding function graph of one embodiment of the present disclosure shown in FIG. In Fig. 5, the horizontal axis represents the value of x and the vertical axis represents mu (x).

舉例來說，假設第一音訊區段為16bit/44.1KHz的格式，當捨棄值μ為255時，經步驟S111處理後，第一音訊區段的資料量將轉換為8bit。若有一取樣值為33的取樣點，在經過Mu-law編碼轉換後，可得mu(33/32768)=0.0412，而第一音訊區段的資料量於處理後轉換為8bit，故將0.0412乘上2⁷(=128)，並經過下取整函數，得到5。亦即是說，取樣值為33的取樣點經過Mu-law編碼後，對應到8bit格式中的取樣值5。又或者，有另一取樣值為32178的取樣點，在經過Mu-law編碼轉換後，可得mu(32178/32768)=0.9967，接著將0.9967乘上128，並經過下取整函數，得到127。亦即取樣值為32178的取樣點經過Mu-law編碼後，對應到8bit格式中的取樣值127。 For example, if the first audio segment is in the 16bit/44.1KHz format, when the discard value μ is 255, the data amount of the first audio segment will be converted to 8 bits after the processing in step S111. If there is a sampling point with a sampling value of 33, after Mu-law encoding conversion, mu (33/32768)=0.0412 can be obtained, and the data amount of the first audio segment is converted into 8 bits after processing, so 0.0412 times is multiplied. On top of 2 ⁷ (=128), and after the rounding of the whole function, get 5. That is to say, the sampling point with a sampling value of 33 corresponds to the sampling value 5 in the 8-bit format after being encoded by Mu-law. Or, there is another sampling point with a sampling value of 32178. After Mu-law encoding conversion, mu (32178/32768)=0.9967, then 0.9967 multiplied by 128, and the following rounding function is obtained to obtain 127. . That is, the sampling point with the sampling value of 32178 is encoded by Mu-law, and corresponds to the sampling value 127 in the 8-bit format.

透過Mu-law編碼處理捨棄值，即使振幅較小的取樣點也可以保留，使音訊區段的動態範圍得以保存，因此音訊品質也不會因為雜訊的處理而損失太多。應理解的是，音訊處理方法500可依實際應用使用不同的非線性壓擴技術，本文僅以Mu-law編碼作為較佳實施例說明，但並非用以限制本揭露文件。 By discarding the value by Mu-law encoding, even the sample points with smaller amplitudes can be retained, so that the dynamic range of the audio segment can be saved, so the audio quality is not lost too much due to the processing of the noise. It should be understood that the audio processing method 500 may use different nonlinear companding techniques according to actual applications. The Mu-law encoding is only described as a preferred embodiment, but is not intended to limit the disclosure.

於步驟S111完成後，執行步驟S112壓縮檔案及步驟S114將壓縮音訊區段發送至音訊播放裝置。音訊播放裝置透過步驟S116將壓縮音訊區段解壓縮以還原為原經處理音訊區段。接著步驟S119中，執行反向Mu-law處理以還原為原音訊格式的音訊區段。其中，反向Mu-law函式(inverse μ-law function)如下公式二： After the step S111 is completed, the step S112 is performed to compress the file and the step S114 is to send the compressed audio segment to the audio playback device. The audio playback device decompresses the compressed audio segment to restore the original processed audio segment through step S116. Next, in step S119, reverse Mu-law processing is performed to restore to the audio section of the original audio format. Among them, the inverse μ-law function is as follows:

以上述取樣值為33的取樣點經過Mu-law編碼後對應到8bit格式中的取樣值5為例，將取樣值5待入反向 Mu-law函式，可得mu_inverse(5/128)=0.00094846，因為原第一音訊區段的資料量為16bit，因此將0.00094846乘上2¹⁵(=32768)，並經小數點無條件進位，得到32，與原取樣值33僅有3%左右的誤差。舉例來說，上述提到的小數點無條件進位可以透過上取整函數(ceiling function)加以完成。而以上述取樣值為32178的取樣點經過Mu-law編碼後對應到8bit格式中的取樣值127為例，將取樣值127待入反向Mu-law函式，可得mu_inverse(127/128)=0.9574，將0.9574乘上2¹⁵，並經過上取整函數，得到31373，與原取樣值32178僅有2.5%左右的誤差。 Taking the sampling point with the above sample value of 33 as the example, the sample value 5 corresponding to the 8-bit format is obtained by Mu-law coding, and the sample value 5 is placed in the inverse Mu-law function to obtain mu_inverse(5/128)= 0.00094846, because the data volume of the original first audio segment is 16 bits, multiply 0.00094486 by 2 ¹⁵ (=32768), and the decimal point is unconditionally carried to obtain 32, which is only about 3% error with the original sample value 33. For example, the unconditional carry of the decimal point mentioned above can be done by a ceiling function. Taking the sample point with the sample value of 32178 as described above and the sample value 127 corresponding to the 8-bit format after the Mu-law coding, the sample value 127 is added to the inverse Mu-law function, and mu_inverse (127/128) is obtained. = 0.9574, multiplying 0.9574 by 2 ¹⁵ and passing the upper rounding function to obtain 31373, which is only about 2.5% error from the original sample value 32178.

於本揭露文件的一實施例中，音訊處理方法100、400、500中各步驟亦可整合實施或改變執行先後順序，例如，一音訊處理方法亦可同時整合包含音訊處理方法400之步驟S109、S115、以及包含音訊處理方法500之步驟S111、S119。具體而言，第一捨棄值可透過音訊處理方法400之步驟S109乘上調整係數以產生新捨棄值，接著將第一音訊區段與新捨棄值代入音訊處理方法500之步驟S111，以透過非線性壓擴技術產生第一經處理音訊區段。接著，經過壓縮及發送後，於步驟S115計算發送壓縮音訊區段的傳送頻寬，以判定是否有需要提高下一音訊區段的調整係數。 In an embodiment of the present disclosure, the steps in the audio processing methods 100, 400, and 500 may be integrated or changed. For example, an audio processing method may also integrate step S109 including the audio processing method 400. S115, and steps S111 and S119 including the audio processing method 500. Specifically, the first discard value may be multiplied by the adjustment coefficient by step S109 of the audio processing method 400 to generate a new discard value, and then the first audio segment and the new discard value are substituted into step S111 of the audio processing method 500 to transmit the non-transparent value. The linear companding technique produces a first processed audio segment. Then, after compression and transmission, the transmission bandwidth of the transmitted compressed audio segment is calculated in step S115 to determine whether there is a need to increase the adjustment coefficient of the next audio segment.

在本揭露文件的一實施態樣中，上述音訊處理方法可透過非暫時性電腦可讀媒體實現。其中，非暫時性電腦可讀媒體儲存有複數程式碼指令，當複數程式碼指令被處理單元執行時，可執行音訊處理方法100、400、500中步驟S102、S104、S106、S108、S109、S110、S111、S112、S114、S115或此等步驟的整合方法。非暫時性電腦可讀媒體可為電腦、手機或獨立之音訊編碼器，而處理單元可為處理器或系統晶片等。 In an embodiment of the disclosure, the audio processing method can be implemented by a non-transitory computer readable medium. The non-transitory computer readable medium stores a plurality of code instructions. When the plurality of code instructions are executed by the processing unit, the steps S102, S104, S106, S108, S109, and S110 of the audio processing methods 100, 400, and 500 can be performed. , S111, S112, S114, S115 or an integrated method of these steps. The non-transitory computer readable medium can be a computer, a mobile phone, or a standalone audio encoder, and the processing unit can be a processor or a system chip.

而在本揭露文件的另一實施態樣中，另一非暫時性電腦可讀媒體亦儲存有複數程式碼指令，當複數程式碼指令被處理單元執行時，可執行音訊處理方法100、400、500中步驟S116、S118、S119、S120。此另一非暫時性電腦可讀媒體可為藍牙/無線耳機、喇叭、音響等音訊播放裝置或獨立之音訊解碼器，而處理單元可為微處理器或系統晶片等。 In another embodiment of the disclosure, another non-transitory computer readable medium stores a plurality of code instructions. When the plurality of code instructions are executed by the processing unit, the audio processing methods 100, 400, Steps S116, S118, S119, and S120 in 500. The other non-transitory computer readable medium can be an audio/video device such as a Bluetooth/wireless earphone, a speaker, a stereo, or a separate audio decoder, and the processing unit can be a microprocessor or a system chip.

透過本揭露文件的教示，即使音訊檔案使用24bit/96KHz的高解析格式，亦可於使用壓縮後，透過藍牙等低傳輸頻寬的傳輸規範進行發送，並可於音訊播放裝置被快速且即時的播放。 Through the teachings of the disclosed documents, even if the audio file uses a high-resolution format of 24bit/96KHz, it can be transmitted after being compressed, and transmitted through a transmission standard of low transmission bandwidth such as Bluetooth, and can be quickly and instantaneously played on the audio playback device. Play.

雖然本發明之實施例已揭露如上，然其並非用以限定本發明，任何熟習此技藝者，在不脫離本發明之精神和範圍內，當可做些許之更動與潤飾，因此本發明之保護範圍當以後附之申請專利範圍所界定為準。 Although the embodiments of the present invention have been disclosed as above, it is not intended to limit the present invention, and any person skilled in the art can make some modifications and retouchings without departing from the spirit and scope of the present invention. The scope is defined as defined in the scope of the patent application.

Claims

An audio processing method includes: dividing an audio file into a plurality of audio segments, and processing the first audio segment of the audio segments includes the following steps: analyzing one of the spectral waveforms of the first audio segment a first lowest energy value; comparing the first lowest energy value with a predetermined energy value, and using the higher one as a first noise floor; generating a first according to the first noise floor and the first audio segment a first processed audio segment; compressing the first processed audio segment to generate a compressed audio segment; and transmitting the compressed audio segment to an audio playback device.

The audio processing method of claim 1, wherein the step of generating the first processed audio segment further comprises: at least one of a time domain waveform of the first audio segment having an energy value lower than a first noise floor The sampled values of the sampling points are subjected to a root mean square operation to generate a first discarded value; and the initial sampled values in the first audio segment are divided by the first discarded value to generate the first processed audio segment.

The audio processing method of claim 1, wherein the step of generating the first processed audio segment further comprises: at least one of a time domain waveform of the first audio segment having an energy value lower than a first noise floor The sampled value of the sampling point is subjected to a root mean square operation to generate a first discarded value; and each initial sampled value is correspondingly adjusted according to the first discarded value and each initial sampled value in the first audio segment.

The audio processing method of claim 1, further comprising: analyzing a second lowest energy value of the spectrum waveform of the second audio segment, wherein the second audio segment is sent after the first audio segment; Comparing the second lowest energy value with the preset energy value, and using the higher one as a second noise floor; the energy value in the time domain waveform of the second audio segment is lower than the second noise floor The sampled value of the at least one sampling point is subjected to a root mean square operation to generate a second discarded value; and the second audio zone is adjusted when the bit rate of the compressed audio segment sent to the audio playback device is greater than a preset value The second discard value of the segment.

The audio processing method of claim 4, further comprising: when the bit rate of the compressed audio segment sent to the audio playback device is greater than the preset value, multiplying the second discarded value by an adjustment coefficient; The product of the second discard value and the adjustment factor adjusts a plurality of initial sample values of the second audio segment to generate a second processed audio segment.

The audio processing method of claim 1, wherein the audio playback device is a Bluetooth device, and transmitting the compressed audio segment to the audio playback device is transmitted via Bluetooth.

The audio processing method of claim 1, wherein the step of compressing the processed audio segment is distortionless compression.

A non-transitory computer readable medium storing a plurality of instructions, when the plurality of instructions are executed by a processing unit, performing: dividing an audio file into a plurality of audio segments, and processing the audio segments according to the following steps: An audio segment: analyzing a lowest energy value of the spectral waveform of the one audio segment; comparing the lowest energy value with a predetermined energy value, and using the higher one as a noise floor; according to the noise And the one of the audio segments to generate a processed audio segment; compressing the processed audio segment to generate a compressed audio segment; and transmitting the compressed audio segment to an audio playback device.

The non-transitory computer readable medium of claim 8, wherein the step of generating the processed audio segment further comprises: performing an energy value in a time domain waveform of one of the audio segments below the noise floor A sample value of at least one sample point is subjected to a root mean square operation to generate a discard value; and each initial sample value in the audio segment is divided by the discard value to generate a processed audio segment.

A non-transitory computer readable medium storing a plurality of instructions for restoring a compressed audio segment in a compressed audio file, and when the plurality of instructions are executed by a processing unit, performing: decoding the compressed audio segment Compressing to obtain a decompressed audio segment; and multiplying each sample value in the decompressed audio segment by a discard value; wherein the discarding value is an original noise floor of an original audio segment corresponding to the compressed audio segment Related.