TWI789577B

TWI789577B - Method and system for recovering audio information

Info

Publication number: TWI789577B
Application number: TW109111346A
Authority: TW
Inventors: 李敬祥
Original assignee: 同響科技股份有限公司
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2023-01-11
Also published as: TW202139032A

Abstract

A method and a system for recovering audio information are provided. The method includes steps of: transmitting audio packages sequentially to a receiver from a transmitter; performing a Fourier transform operation on the audio package of previous frames of a frame in which the audio package delays or misses by the receiver; calculating a threshold according to the audio package of the previous frame by the receiver; performing a linear prediction on the frequency segments each having an amplitude that is higher than the threshold; and performing a Fourier transform inverse operation on the extrapolated frequency segments to extrapolate the audio package that delays or misses.

Description

Audio data reconstruction method and system

本發明涉及音訊，特別是涉及一種音訊資料重建方法及系統。The present invention relates to audio, in particular to an audio data reconstruction method and system.

數位音訊資料常以一幀幀(frame)方式編碼後經由有線或無線網路傳送到接收端解碼後播放。而傳送過程中因為干擾或是網路壅塞造成音訊封包丟失或遲到，使得接收端因緩衝區耗盡而中斷播放。一個簡單的方法是加大緩衝區，並在緩衝區耗盡前要求重送丟失的封包，但如此將增加播放延遲，不適用於一些需要低延遲的應用。Digital audio data is usually encoded in a frame and then transmitted to the receiving end through a wired or wireless network for decoding and playback. During the transmission process, due to interference or network congestion, audio packets are lost or delayed, causing the receiver to interrupt playback due to buffer exhaustion. A simple method is to increase the buffer size and request to resend lost packets before the buffer is exhausted, but this will increase the playback delay, which is not suitable for some applications that require low latency.

另一種方法是利用丟失封包前後的完好封包內的音訊資料做內插(interpolation)運算，或是利用丟失封包前的完好封包內的音訊資料做外推(extrapolation)運算，以運算出的音訊資料取代丟失封包的音訊資料，如此可保持播放不中斷，也無需加大緩衝區。Another method is to use the audio data in the intact packets before and after the lost packet to do interpolation (interpolation), or use the audio data in the intact packet before the lost packet to do extrapolation (extrapolation) operation, and use the calculated audio data Replace the audio data with lost packets, so that the playback can be kept without interruption, and there is no need to increase the buffer.

目前已有多種重建丟失音訊資料的方法被提出，先將音訊資料由時域(time domain)轉換到頻域(frequency domain)，並以正弦波模型(sinusoidal model)來表示聲音，利用丟失封包前後的完好封包正弦波來內插運算出丟失封包的正弦波頻率(frequency)，振幅(amplitude)與相位(phase)，最後再做頻域到時域轉換得到重建的音訊資料。At present, a variety of methods for reconstructing lost audio data have been proposed. First, the audio data is converted from the time domain to the frequency domain, and the sound is represented by a sinusoidal model. The sine wave of the intact packet is interpolated to calculate the sine wave frequency (frequency), amplitude (amplitude) and phase (phase) of the lost packet, and finally the frequency domain to time domain conversion is performed to obtain the reconstructed audio data.

本發明所要解決的技術問題在於，針對現有技術的不足提供一種音訊資料重建方法，包含以下步驟：由發送端將音訊資料分成多個音訊封包依序發送至接收端；利用接收端，對丟失或延遲送達音訊封包的那一幀的前幾幀的音訊封包進行快速傅立葉轉換運算，以將音訊封包從時域轉換到頻域；利用接收端，依據丟失或延遲送達音訊封包的那一幀的前一幀的音訊封包計算門檻值；利用接收端，比對丟失或延遲送達音訊封包的那一幀之前的每一幀的每一頻率片段的振幅與門檻值，以進行篩選；利用接收端，對篩選出的振幅大於門檻值的每一頻率片段的振幅和相位進行線性預測，以外推丟失或延遲送達的音訊封包；以及利用接收端，以外推出的頻率片段進行快速傅立葉反轉換運算，以將經由外推所取得的音訊封包，從頻域轉換回時域。The technical problem to be solved by the present invention is to provide a method for reconstructing audio data in view of the deficiencies in the prior art, which includes the following steps: the audio data is divided into multiple audio packets by the sending end and sent to the receiving end sequentially; Fast Fourier transform operation is performed on the audio packets of the first few frames of the delayed delivery audio packet to convert the audio packet from the time domain to the frequency domain; Calculate the threshold value of a frame of audio packets; use the receiving end to compare the amplitude and threshold value of each frequency segment of each frame before the frame of the lost or delayed delivery of the audio packet to filter; use the receiving end to Perform linear prediction on the amplitude and phase of each frequency segment whose amplitude is greater than the threshold value, and extrapolate the lost or delayed audio packets; The obtained audio packets are extrapolated and converted from the frequency domain back to the time domain.

在一實施態樣中，所述音訊資料重建方法更包含以下步驟：利用接收端，計算外推的音訊封包的振幅和相位；利用接收端，在外推的音訊封包中，振幅不大於門檻值的頻率片段的實部和虛部，填入雜訊；以及利用接收端，進行快速傅立葉反轉換運算，以將經由外推和填入雜訊後所取得的音訊封包，從頻域轉換回時域。In an implementation aspect, the audio data reconstruction method further includes the following steps: using the receiving end to calculate the amplitude and phase of the extrapolated audio packets; using the receiving end, in the extrapolated audio packets, the The real and imaginary parts of the frequency segment are filled with noise; and the receiving end is used to perform an inverse fast Fourier transform operation to convert the audio packet obtained after extrapolation and noise filling from the frequency domain back to the time domain .

在一實施態樣中，所述音訊資料重建方法更包含以下步驟：利用接收端，計算篩選出進行快速傅立葉轉換運算後的前一幀的每一頻率片段的振幅；利用接收端加總前一幀的所有頻率片段的振幅，以取得總振幅；以及利用接收端，基於總振幅計算出門檻值。In an implementation aspect, the audio data reconstruction method further includes the following steps: using the receiving end to calculate and filter out the amplitude of each frequency segment of the previous frame after the fast Fourier transform operation; using the receiving end to sum the previous Amplitudes of all frequency segments of the frame to obtain a total amplitude; and using the receiving end to calculate a threshold based on the total amplitude.

在一實施態樣中，所述音訊資料重建方法更包含以下步驟：利用接收端將總振幅除以訊雜比，以取得門檻值。In an implementation aspect, the audio data reconstruction method further includes the following step: using the receiving end to divide the total amplitude by the signal-to-noise ratio to obtain the threshold value.

另外，本發明提供一種音訊重建系統，包含發送端以及接收端。發送端配置以將一音訊資料分成多個音訊封包，並依序發送多個音訊封包。接收端包含音訊接收模組、音訊轉換模組、音頻篩選模組以及外推運算模組。音訊接收模組連接發送端，配置以依序接收發送端發送的多個音訊封包。音訊轉換模組連接音訊接收模組，配置以對丟失或延遲送達音訊封包的那一幀的前幾幀的音訊封包進行快速傅立葉轉換運算，以將音訊封包從時域轉換到頻域。音頻篩選模組連接音訊轉換模組。音頻篩選模組配置以在經由快速傅立葉轉換運算後，依據丟失或延遲送達音訊封包的那一幀的前一幀的音訊封包計算一門檻值。音頻篩選模組配置以比對丟失或延遲送達音訊封包的那一幀之前的每一幀的每一頻率片段的振幅與門檻值，以進行篩選。外推運算模組連接音頻篩選模組以及音訊轉換模組。外推運算模組對篩選出進行快速傅立葉轉換運算的所有振幅進行線性預測，以外推丟失或延遲送達的音訊封包。音訊轉換模組以外推出的頻率片段進行快速傅立葉反轉換運算，以將經由外推所取得的音訊封包，從頻域轉換回時域。In addition, the present invention provides an audio reconstruction system, including a sending end and a receiving end. The sending end is configured to divide an audio data into multiple audio packets, and send the multiple audio packets sequentially. The receiving end includes an audio receiving module, an audio conversion module, an audio filtering module and an extrapolation calculation module. The audio receiving module is connected to the sending end and is configured to sequentially receive multiple audio packets sent by the sending end. The audio conversion module is connected to the audio receiving module and is configured to perform fast Fourier transform operation on the audio packets of the first few frames of the frame where the audio packet is lost or delayed, so as to convert the audio packet from the time domain to the frequency domain. The audio filtering module is connected to the audio converting module. The audio screening module is configured to calculate a threshold value according to the audio packet of the frame preceding the frame of the lost or delayed audio packet after the fast Fourier transform operation. The audio filtering module is configured to compare the amplitude of each frequency segment of each frame preceding the frame in which the audio packet is lost or delayed to a threshold for filtering. The extrapolation calculation module is connected with the audio filtering module and the audio conversion module. The extrapolation operation module performs linear prediction on all the amplitudes filtered out for fast Fourier transform operation to extrapolate lost or delayed audio packets. The frequency segment introduced outside the audio conversion module is subjected to inverse fast Fourier transform operation, so as to convert the audio packet obtained through extrapolation from the frequency domain back to the time domain.

在一實施態樣中，接收端更包含相位計算模組，連接音頻篩選模組。相位計算模組配置以計算外推的音訊封包的振幅和相位。In an implementation aspect, the receiving end further includes a phase calculation module connected to the audio filtering module. The phase calculation module is configured to calculate the amplitude and phase of the extrapolated audio packets.

在一實施態樣中，接收端更包含雜訊填充模組，連接相位計算模組、外推運算模組以及音訊轉換模組。雜訊填充模組配置以在外推的音訊封包中，振幅不大於門檻值的頻率片段的實部和虛部填入雜訊。音訊轉換模組進行快速傅立葉反轉換運算，以將經由外推和填入雜訊後取得的音訊封包，從頻域轉換回時域。In one embodiment, the receiving end further includes a noise filling module connected to the phase calculation module, the extrapolation operation module and the audio conversion module. The noise filling module is configured to fill the real part and the imaginary part of the frequency segment whose amplitude is not greater than the threshold value into the noise in the extrapolated audio packet. The audio conversion module performs an inverse fast Fourier transform operation to convert the audio packets obtained after extrapolation and noise filling from the frequency domain back to the time domain.

在一實施態樣中，音頻篩選模組配置以計算篩選出進行快速傅立葉轉換運算後的前一幀的每一頻率片段的振幅，加總前一幀的所有頻率片段的振幅，以取得總振幅，並基於總振幅，計算出門檻值。In one embodiment, the audio screening module is configured to calculate and filter out the amplitude of each frequency segment of the previous frame after performing the fast Fourier transform operation, and add up the amplitudes of all frequency segments of the previous frame to obtain the total amplitude , and based on the total amplitude, the threshold value is calculated.

在一實施態樣中，音頻篩選模組配置以將總振幅除以訊雜比，以取得門檻值。In one embodiment, the audio screening module is configured to divide the total amplitude by the signal-to-noise ratio to obtain the threshold value.

如上所述，本發明提供一種音訊資料重建方法及方法，其重建丟失或遲到的音波，主要優點如下： 1.只在接收端進行運算，發送端無需特別編碼或運算； 2.使用單純的PCM資料進行運算，跟音訊資料的壓縮方法無關； 3.運算簡單，運算量低，適合低功耗低運算能力裝置； 4.無需丟失或遲到的封包後的完好封包，只需要丟失或遲到的封包前的完好封包即可進行運算，適用於低延遲播放裝置。As mentioned above, the present invention provides a method and method for reconstructing audio data, which reconstructs lost or late sound waves. The main advantages are as follows: 1. Computation is only performed at the receiving end, and no special coding or calculation is required at the sending end; 2. Use pure PCM data for calculation, which has nothing to do with the compression method of audio data; 3. The operation is simple, the amount of calculation is low, and it is suitable for devices with low power consumption and low computing power; 4. There is no need for the intact packet after the lost or late packet, but only the intact packet before the lost or late packet for calculation, which is suitable for low-latency playback devices.

為使能更進一步瞭解本發明的特徵及技術內容，請參閱以下有關本發明的詳細說明與圖式，然而所提供的圖式僅用於提供參考與說明，並非用來對本發明加以限制。In order to further understand the features and technical content of the present invention, please refer to the following detailed description and drawings related to the present invention. However, the provided drawings are only for reference and description, and are not intended to limit the present invention.

以下是通過特定的具體實施例來說明本發明的實施方式，本領域技術人員可由本說明書所公開的內容瞭解本發明的優點與效果。本發明可通過其他不同的具體實施例加以施行或應用，本說明書中的各項細節也可基於不同觀點與應用，在不背離本發明的構思下進行各種修改與變更。另外，本發明的附圖僅為簡單示意說明，並非依實際尺寸的描繪，事先聲明。以下的實施方式將進一步詳細說明本發明的相關技術內容，但所公開的內容並非用以限制本發明的保護範圍。另外，本文中所使用的術語“或”，應視實際情況可能包含相關聯的列出項目中的任一個或者多個的組合。The implementation of the present invention is described below through specific specific examples, and those skilled in the art can understand the advantages and effects of the present invention from the content disclosed in this specification. The present invention can be implemented or applied through other different specific embodiments, and various modifications and changes can be made to the details in this specification based on different viewpoints and applications without departing from the concept of the present invention. In addition, the drawings of the present invention are only for simple illustration, and are not drawn according to the actual size, which is stated in advance. The following embodiments will further describe the relevant technical content of the present invention in detail, but the disclosed content is not intended to limit the protection scope of the present invention. In addition, the term "or" used herein may include any one or a combination of more of the associated listed items depending on the actual situation.

[第一實施例][first embodiment]

請參閱圖1，其為本發明第一實施例的音訊資料重建方法的步驟流程圖。本實施例的音訊資料重建方法可包含如圖1所示的步驟S101~S107，具體說明如下。Please refer to FIG. 1 , which is a flow chart of the steps of the audio data reconstruction method according to the first embodiment of the present invention. The audio data reconstruction method of this embodiment may include steps S101 - S107 as shown in FIG. 1 , which are described in detail as follows.

在步驟S101，由發送端將音訊資料分成多個音訊封包依序發送至接收端。在發送端依序發送多個音訊封包至接收端的傳輸過程中，可能會丟失或延遲送達音訊封包。在發生此情況時，利用接收端對丟失或延遲送達音訊封包的那一幀或稱為訊框(frame)的前幾幀的音訊封包進行快速傅立葉轉換(Fast Fourier Transform, FFT)運算，以將音訊封包從時域轉換到頻域。In step S101, the audio data is divided into multiple audio packets by the sending end and sent to the receiving end sequentially. During transmission of multiple audio packets from the sender to the receiver in sequence, audio packets may be lost or delayed. When this happens, use the receiving end to perform Fast Fourier Transform (FFT) operations on the frame of the lost or delayed delivery of the audio packet or the audio packets of the first few frames called frames (frame) to convert Audio packets are converted from the time domain to the frequency domain.

在步驟S103，利用接收端依據丟失或延遲送達音訊封包的那一幀的前一幀的音訊封包計算門檻值，比對丟失或延遲送達音訊封包的那一幀之前的(多個幀)的每一幀的多個頻率片段中的每一頻率片段(FFT bin)的振幅與門檻值，以篩選出高於門檻值的振幅，即篩選出高能量的頻片。In step S103, use the receiving end to calculate the threshold value according to the audio packet of the previous frame of the frame of the lost or delayed delivery of the audio packet, and compare each of the (multiple frames) before the frame of the lost or delayed delivery of the audio packet The amplitude and the threshold value of each frequency bin (FFT bin) among the multiple frequency bins in one frame are used to screen out amplitudes higher than the threshold value, that is, to screen out high-energy frequency bins.

在步驟S105，利用接收端對篩選出的所有頻率片段的振幅進行線性預測，以外推丟失或延遲送達的音訊封包。In step S105, the receiving end performs linear prediction on the amplitudes of all frequency segments screened out to extrapolate lost or delayed audio packets.

在步驟S107，利用接收端進行快速傅立葉反轉換運算，以將經由外推所取得的音訊封包，從頻域轉換回時域，以取得脈衝編碼調變(pulse code modulation, PCM)音訊封包。In step S107, the receiving end performs an inverse fast Fourier transform operation to convert the audio packet obtained through extrapolation from the frequency domain back to the time domain to obtain a pulse code modulation (PCM) audio packet.

[第二實施例][Second embodiment]

請參閱圖2~圖5，其中圖2為本發明第二實施例的音訊資料重建方法的步驟流程圖；圖3為本發明第二實施例的依據丟失或延遲送達音訊封包的那一幀的前一幀計算門檻值的示意圖；圖4為本發明第二實施例的丟失或延遲送達音訊封包的那一幀的前一幀執行完快速傅立葉轉換運算後的頻譜圖；圖5為本發明第二實施例的重建的幀的頻譜圖。Please refer to FIG. 2 to FIG. 5, wherein FIG. 2 is a flow chart of the steps of the audio data reconstruction method according to the second embodiment of the present invention; FIG. A schematic diagram of calculating the threshold value in the previous frame; FIG. 4 is a spectrogram after the fast Fourier transform operation is performed in the previous frame of the frame of the lost or delayed delivery audio packet according to the second embodiment of the present invention; FIG. 5 is the spectrum diagram of the first frame of the present invention Spectrogram of the reconstructed frame of the second embodiment.

在發送端依序發送多個幀分別的多個音訊封包至接收端的過程中，可能受到環境因素或網路壅塞等干擾，造成部分音訊封包例如最後一幀的音訊封包丟失或延遲送達至接收端，導致接收端未接收到最後一幀的音訊封包(即丟失音訊封包)，或未在一預定時間內接收到最後一幀的音訊封包(即延遲送達音訊封包)。其結果為，接收端在依序接收並播放多個幀後，發生斷音，例如圖3所示，接收端在依序接收並播放多個幀FR1~FR7後，出現斷音區Gap。為了解決斷音的問題，本實施例的音訊資料重建方法可包含如圖2所示的步驟S201~S211，具體說明如下。During the process of sequentially sending multiple audio packets with multiple frames to the receiving end, environmental factors or network congestion may interfere, causing some audio packets such as the last frame of the audio packet to be lost or delayed in delivery to the receiving end. , resulting in the receiving end not receiving the last frame of audio packets (that is, missing audio packets), or not receiving the last frame of audio packets within a predetermined time (that is, delaying delivery of audio packets). As a result, after the receiver receives and plays multiple frames in sequence, a staccato occurs. For example, as shown in Figure 3, after the receiver receives and plays multiple frames FR1~FR7 in sequence, a staccato region Gap occurs. In order to solve the problem of staccato, the method for reconstructing audio data in this embodiment may include steps S201-S211 as shown in FIG. 2 , which are described in detail as follows.

在步驟S201，在發送端依序發送多個音訊封包至接收端的傳輸過程中，利用接收端對丟失或延遲送達音訊封包的那一幀的前幾幀(例如圖3所示的幀FR7)的音訊封包進行快速傅立葉轉換(Fast Fourier Transform, FFT)運算，以將音訊封包從時域轉換到頻域，如圖4所示為丟失或延遲送達音訊封包的那一幀的前一幀執行完快速傅立葉轉換運算後的頻譜波形WAVE1。In step S201, during the transmission process in which the sending end sequentially sends a plurality of audio packets to the receiving end, the receiving end uses the information of the first few frames (for example, the frame FR7 shown in FIG. 3 ) of the frame in which the audio packet is lost or delayed The audio packet undergoes a Fast Fourier Transform (FFT) operation to convert the audio packet from the time domain to the frequency domain. As shown in Figure 4, the frame before the frame of the lost or delayed delivery of the audio packet is executed fast Spectrum waveform WAVE1 after Fourier transform operation.

在步驟S203，利用接收端依據丟失或延遲送達音訊封包的那一幀(例如圖3所示的斷音區Gap)的前一幀(例如圖3所示的幀FR7)的音訊封包計算門檻值。接著，利用接收端比對丟失或延遲送達音訊封包的那一幀之前的多個幀的每一幀(例如圖3所示的每個幀FR1~FR7)的每一頻率片段的振幅與門檻值(例如圖4所示的門檻值TH例如雜訊比)，以篩選出高於門檻值的頻率振幅，例如篩選出圖4所示的所有高於門檻值TH的頻率振幅，超過門檻值TH的頻率片段才需要做振幅線性預測和相位計算。In step S203, the receiver calculates the threshold value based on the audio packet of the previous frame (such as the frame FR7 shown in FIG. 3 ) of the frame (such as the Gap shown in FIG. 3 ) in which the audio packet is lost or delayed. . Then, use the receiving end to compare the amplitude and threshold value of each frequency segment in each of the multiple frames (for example, each frame FR1~FR7 shown in Figure 3) before the frame of the lost or delayed delivery of the audio packet (such as the threshold value TH shown in Figure 4 such as the noise-to-noise ratio) to filter out frequency amplitudes higher than the threshold value, for example, filter out all frequency amplitudes higher than the threshold value TH shown in Figure 4, and those exceeding the threshold value TH Only the frequency segment needs to do amplitude linear prediction and phase calculation.

在步驟S205，利用接收端對篩選出的所有的振幅進行線性預測，以外推丟失或延遲送達的音訊封包，如圖5所示為重建的幀的頻譜波形WAVE2。In step S205, the receiver performs linear prediction on all the filtered amplitudes to extrapolate lost or delayed audio packets, as shown in FIG. 5 is the reconstructed frame spectrum waveform WAVE2.

詳言之，利用接收端計算篩選出進行快速傅立葉轉換運算的前一幀(例如圖3所示在斷音區Gap前的幀FR7)的每一頻率片段的振幅，以下列方程式表示：

；其中magnitude代表頻率的振幅，real代表頻率的實部，image代表頻率的虛部。Specifically, the amplitude of each frequency segment of the previous frame (for example, the frame FR7 before the staccato Gap as shown in Figure 3 ) is calculated and screened out by the receiving end, expressed by the following equation:

; Among them, magnitude represents the amplitude of the frequency, real represents the real part of the frequency, and image represents the imaginary part of the frequency.

接著，利用接收端加總前一幀的所有頻率片段的振幅，以取得總振幅。利用接收端基於總振幅，計算出門檻值，以下列方程式表示：TM=M1+M2+M3+…+Mn，其中TM代表總振幅，M1~Mn代表多個頻率片段的振幅，n代表用於計算門檻值的頻率振幅的數量，n=FFT size/2，其中FFT size代表可進行快速傅立葉轉換運算的頻率振幅的數量。Next, the receiving end sums up the amplitudes of all the frequency segments of the previous frame to obtain the total amplitude. Use the receiving end to calculate the threshold value based on the total amplitude, which is expressed by the following equation: TM=M1+M2+M3+...+Mn, where TM represents the total amplitude, M1~Mn represents the amplitude of multiple frequency segments, and n represents the amplitude used for calculation The number of frequency amplitudes of the threshold value, n=FFT size/2, where the FFT size represents the number of frequency amplitudes that can perform fast Fourier transform operations.

舉例而言，利用接收端將總振幅除以訊雜比，以取得門檻值，以下列方程式表示：

，其中S代表門檻值，TM代表總振幅，L代表訊雜比，此訊雜比可為任意適當值，例如1000。在利用接收端篩選出高能量的頻率片段後，剩餘篩掉的低能量雜訊視為雜訊，不作為後續步驟中外推丟失或延遲送達的音訊封包的依據。For example, using the receiving end to divide the total amplitude by the signal-to-noise ratio to obtain the threshold value, expressed by the following equation:

, where S represents the threshold value, TM represents the total amplitude, and L represents the signal-to-noise ratio, which can be any appropriate value, such as 1000. After the high-energy frequency segment is screened out by the receiving end, the remaining low-energy noise screened out is regarded as noise, and is not used as a basis for extrapolating lost or delayed audio packets in subsequent steps.

在步驟S207，利用接收端計算外推的音訊封包的相位，例如以下列方程式計算取得：

；其中Phase代表相位，image代表頻率片段的虛部，real代表頻率片段的實部。In step S207, the phase of the extrapolated audio packet is calculated by the receiving end, for example, obtained by the following equation:

; Among them, Phase represents the phase, image represents the imaginary part of the frequency segment, and real represents the real part of the frequency segment.

舉例而言，利用接收端計算外推的幀(即丟失音訊封包的那一幀，第N+1幀)的前一幀(第N幀)的相位，並計算前一幀(第N幀) 的相位與更前一幀(第N-1幀)的相位之間的相位差，最後加總計算出的相位差與前一幀(第N幀)的相位，以取得外推的幀的相位，以下列方程式表示： Phase[N+1] = Phase[N]+(Phase[N] – Phase[N-1])；其中Phase[N+1]代表外推的幀的相位，表示外推的幀的音訊封包在發送端發送的多個音訊封包中排列第N+1個傳輸，N可為大於1的任意適當整數值，而Phase[N]代表外推的幀的前一幀的相位，Phase[N-1]代表外推的幀的前一幀的更前一幀的相位。For example, use the receiving end to calculate the phase of the previous frame (frame N) of the extrapolated frame (that is, the frame where the audio packet is lost, frame N+1), and calculate the phase of the previous frame (frame N) The phase difference between the phase of the previous frame (frame N-1) and the phase difference between the calculated phase difference and the phase of the previous frame (frame N) to obtain the phase of the extrapolated frame, Expressed in the following equation: Phase[N+1] = Phase[N]+(Phase[N] – Phase[N-1]); Among them, Phase[N+1] represents the phase of the extrapolated frame, indicating that the audio packet of the extrapolated frame is arranged in the N+1th transmission among the multiple audio packets sent by the sender, and N can be any appropriate integer greater than 1 Value, and Phase[N] represents the phase of the previous frame of the extrapolated frame, and Phase[N-1] represents the phase of the previous frame of the extrapolated frame.

音訊封包的音波的頻率片段的實部與虛部可依下列方程式表示：

其中real代表頻率片段的實部，image代表頻率片段的虛部，magnitude代表頻率片段的振幅，Phase代表頻率片段的相位。The real part and the imaginary part of the frequency segment of the sound wave of the audio packet can be expressed according to the following equation:

Where real represents the real part of the frequency segment, image represents the imaginary part of the frequency segment, magnitude represents the amplitude of the frequency segment, and Phase represents the phase of the frequency segment.

在步驟S209，在外推的幀的音訊封包的音波中，振幅不大於門檻值的頻率片段的實部和虛部，填入雜訊。詳言之，在外推的幀的音波中，不大於門檻值的頻率片段的實部和虛部填入小於門檻值的雜訊值。In step S209 , in the sound wave of the audio packet of the extrapolated frame, the real part and the imaginary part of the frequency segment whose amplitude is not greater than the threshold value are filled with noise. In detail, in the sound wave of the extrapolated frame, the real part and the imaginary part of the frequency segment not greater than the threshold value are filled with the noise value less than the threshold value.

在步驟S211，利用接收端進行快速傅立葉反轉換運算，以將經由外推所取得的音訊封包，從頻域轉換回時域，以取得PCM音訊封包。In step S211 , the receiving end performs an inverse fast Fourier transform operation to convert the audio packet obtained through extrapolation from the frequency domain back to the time domain to obtain the PCM audio packet.

應理解，本發明不受限於本文實施例的舉例，可依據實際需求，適當調整本文實施例的方法的步驟執行順序和內容，或增減步驟的程序，若有需要，可適當重覆執行本文實施例舉例的一或多個步驟。It should be understood that the present invention is not limited to the examples of the embodiments herein, and the execution sequence and content of the steps in the methods of the embodiments herein can be appropriately adjusted according to actual needs, or the procedure of adding or subtracting steps can be appropriately repeated if necessary One or more steps exemplified in the embodiments herein.

[第三實施例][Third embodiment]

請參閱圖6和圖7，其中圖6為本發明第三實施例的音訊資料重建系統的方塊圖，而圖7為手機發送音源資料至採用第三實施例的音訊資料重建系統的耳機的使用示意圖。Please refer to FIG. 6 and FIG. 7, wherein FIG. 6 is a block diagram of the audio data reconstruction system of the third embodiment of the present invention, and FIG. 7 is the use of the mobile phone to send the audio source data to the earphone adopting the audio data reconstruction system of the third embodiment schematic diagram.

本實施例的音訊資料重建系統可包含如圖6所示的發送端TX以及接收端RX。其中，接收端RX可包含音訊接收模組10、音訊轉換模組20、音頻篩選模組30以及外推運算模組40，可用以執行上述步驟S101~S109、S201~S205、S211。舉例而言，如圖7所示，發送端TX可為手機，而接收端RX可為耳機，在此僅舉例說明，本發明不以此為限。The audio data reconstruction system of this embodiment may include a transmitting end TX and a receiving end RX as shown in FIG. 6 . Wherein, the receiving end RX may include an audio receiving module 10 , an audio conversion module 20 , an audio filtering module 30 and an extrapolation operation module 40 , which can be used to execute the above steps S101 ~ S109 , S201 ~ S205 , S211 . For example, as shown in FIG. 7 , the transmitting end TX can be a mobile phone, and the receiving end RX can be an earphone. This is only an example and the present invention is not limited thereto.

音訊接收模組10連接發送端TX以及音訊接收模組10。發送端TX將音訊資料AU切割成多個音訊封包後，向接收端RX發送多個音訊封包。接收端RX的音訊接收模組10經由有線或無線方式(例如但不限於採用藍芽無線傳輸技術)依序接收發送端TX發送的音訊封包。The audio receiving module 10 is connected to the sending end TX and the audio receiving module 10 . After the sending end TX cuts the audio data AU into multiple audio packets, it sends the multiple audio packets to the receiving end RX. The audio receiving module 10 of the receiving end RX sequentially receives the audio packets sent by the sending end TX via wired or wireless means (such as but not limited to Bluetooth wireless transmission technology).

在發送端TX持續向接收端RX的音訊接收模組10依序發送多個音訊封包的過程中，音訊轉換模組20判斷有音訊封包丟失或延遲送達時，對丟失或延遲送達音訊封包的那一幀的前幾幀的音訊封包進行快速傅立葉轉換運算，以將音訊封包從時域轉換到頻域。During the process that the transmitting end TX continues to send multiple audio packets to the audio receiving module 10 of the receiving end RX in sequence, the audio conversion module 20 judges that any audio packet is lost or delayed in delivery, and the audio packet that is lost or delayed in delivery The audio packets of the first few frames of a frame are subjected to fast Fourier transform operations to convert the audio packets from the time domain to the frequency domain.

音頻篩選模組30連接音訊轉換模組20。在進行快速傅立葉轉換運算後，音頻篩選模組30依據丟失或延遲送達音訊封包的那一幀的前一幀的音訊封包的係數或參數計算門檻值。音頻篩選模組30接著比對丟失或延遲送達音訊封包的那一幀之前的每一幀的每一頻率片段的振幅與門檻值，以進行篩選。The audio filtering module 30 is connected to the audio conversion module 20 . After performing the fast Fourier transform operation, the audio screening module 30 calculates the threshold value according to the coefficients or parameters of the audio packet of the frame preceding the frame where the audio packet is lost or delayed. The audio filtering module 30 then compares the amplitude of each frequency segment of each frame preceding the frame of the lost or delayed audio packet with the threshold value for screening.

外推運算模組40連接音頻篩選模組30以及音訊轉換模組20，配置以對篩選出的頻率片段的振幅進行線性預測，以外推丟失或延遲送達的音訊封包。最後，音訊轉換模組20進行快速傅立葉反轉換運算，以將經由外推所取得的音訊封包，從頻域轉換回時域。The extrapolation operation module 40 is connected to the audio screening module 30 and the audio conversion module 20, configured to linearly predict the amplitude of the filtered frequency segments, and extrapolate lost or delayed audio packets. Finally, the audio conversion module 20 performs an inverse fast Fourier transform operation to convert the audio packets obtained through extrapolation from the frequency domain back to the time domain.

若有需要，接收端RX可更包含相位計算模組50以及雜訊填充模組60，可分別用以執行上述步驟S207、S209。相位計算模組50連接音頻篩選模組30。雜訊填充模組60連接相位計算模組50、外推運算模組40以及音訊轉換模組20。If necessary, the receiving end RX may further include a phase calculation module 50 and a noise filling module 60, which can be used to perform the above steps S207 and S209 respectively. The phase calculation module 50 is connected to the audio screening module 30 . The noise filling module 60 is connected to the phase calculation module 50 , the extrapolation operation module 40 and the audio conversion module 20 .

在外推運算模組40外推丟失或延遲送達的音訊封包後，相位計算模組50計算外推丟失或延遲送達的音訊封包的頻率片段的相位。接著，雜訊填充模組60在外推出的音訊封包中，振幅不大於門檻值的頻率片段的實部和虛部填入雜訊。After the extrapolation operation module 40 extrapolates the lost or delayed audio packets, the phase calculation module 50 calculates the phase of the frequency segment of the extrapolated lost or delayed audio packets. Next, the noise filling module 60 fills the real part and the imaginary part of the frequency segment whose amplitude is not greater than the threshold value into the noise in the exported audio packet.

最後，音訊轉換模組20進行快速傅立葉反轉換運算，以將經由外推和填入雜訊後取得的音訊封包，從頻域轉換回時域。在接收端RX的音訊播放模組(未圖示)播放準時到達的音訊封包之後，可接著播放此外推並加入雜訊的音訊封包(取代丟失或延遲的音訊封包)。Finally, the audio conversion module 20 performs an inverse fast Fourier transform operation to convert the audio packet obtained after extrapolation and noise filling from the frequency domain back to the time domain. After the audio playback module (not shown) of the RX at the receiving end plays the punctually arrived audio packet, it can then play the extrapolated audio packet with added noise (replacing the lost or delayed audio packet).

請參閱圖8和圖9，其中圖8為出現斷音的波形圖；圖9為音訊資料透過本發明實施例的音訊資料重建系統及方法重建丟失或遲到的音波後的波形圖。Please refer to FIG. 8 and FIG. 9 , wherein FIG. 8 is a waveform diagram of staccato; FIG. 9 is a waveform diagram of audio data reconstructed by the audio data reconstruction system and method according to an embodiment of the present invention after missing or late sound waves are reconstructed.

在發送端向接收端依序發送多個音訊封包的過程中，接收端接收到如圖8所示的一音訊封包的音波W1後，受到環境干擾或其他因素導致接續的下一音訊封包丟失或遲到，並且緩衝區即將耗盡。在此情況下，必須啟動音訊資料重建運算，進行音訊補償，以避免接收端在播放到音波W1後出現斷音，如圖8所示的斷音區GDP。During the process of sequentially sending multiple audio packets from the sending end to the receiving end, after the receiving end receives the sound wave W1 of an audio packet as shown in Figure 8, the next audio packet is lost or lost due to environmental interference or other factors. Arriving late, and the buffer is about to be exhausted. In this case, the audio data reconstruction operation must be started to perform audio compensation, so as to avoid staccato sound after playing the sound wave W1 at the receiving end, as shown in the staccato region GDP in FIG. 8 .

因此，採用本發明上述實施例的音訊資料重建系統及方法。接收端RX在接收到音波W1後，未接著接收到下一音波時，可依序執行上述步驟S201~S213，以依據音波W1外推如圖9所示的音波W2，重建在音波W1後。如此，接收端RX在播放音波W1後，可接著播放重建的音波W2，接著播放接收到的音波W3，以避免在播放過程中出現斷音。Therefore, the audio data reconstruction system and method of the above-mentioned embodiments of the present invention are adopted. After receiving the sound wave W1, the receiving end RX can execute the above steps S201-S213 sequentially when it does not receive the next sound wave, so as to extrapolate the sound wave W2 shown in FIG. 9 according to the sound wave W1 and reconstruct it after the sound wave W1. In this way, after playing the sound wave W1, the receiving end RX can then play the reconstructed sound wave W2, and then play the received sound wave W3, so as to avoid sound interruption during the playing process.

舉例來說，接收端RX在預定時間內從發送端TX接收到的如圖8所示的完整音波W1可包含如圖3所示的幀FR1~FR7的音波，而如圖8所示的斷音區GDP可對應如圖3所示的斷音區Gap。For example, the complete sound wave W1 shown in FIG. 8 received by the receiving end RX from the sending end TX within a predetermined time may include sound waves of frames FR1 to FR7 as shown in FIG. 3 , while the broken sound waves shown in FIG. 8 The sound area GDP may correspond to the staccato area Gap as shown in FIG. 3 .

[實施例的有益效果][Advantageous Effects of Embodiment]

本發明的其中一有益效果在於，本發明所提供的音訊資料重建方法及其方法，其重建丟失或遲到的音波，主要優點如下： 1.只在接收端進行運算，發送端無需特別編碼或運算； 2.使用單純的PCM資料進行運算，跟音訊資料的壓縮方法無關； 3.運算簡單，運算量低，適合低功耗低運算能力裝置； 4.無需丟失或遲到的封包後的完好封包，只需要丟失或遲到的封包前的完好封包即可進行運算，適用於低延遲播放裝置。One of the beneficial effects of the present invention is that the audio data reconstruction method and method provided by the present invention can reconstruct lost or late sound waves, and the main advantages are as follows: 1. Computation is only performed at the receiving end, and no special coding or calculation is required at the sending end; 2. Use pure PCM data for calculation, which has nothing to do with the compression method of audio data; 3. The operation is simple, the amount of calculation is low, and it is suitable for devices with low power consumption and low computing power; 4. There is no need for the intact packet after the lost or late packet, but only the intact packet before the lost or late packet for calculation, which is suitable for low-latency playback devices.

以上所公開的內容僅為本發明的優選可行實施例，並非因此侷限本發明的申請專利範圍，所以凡是運用本發明說明書及圖式內容所做的等效技術變化，均包含於本發明的申請專利範圍內。The content disclosed above is only a preferred feasible embodiment of the present invention, and does not therefore limit the scope of the patent application of the present invention. Therefore, all equivalent technical changes made by using the description and drawings of the present invention are included in the application of the present invention. within the scope of the patent.

S101~S107、S201~S211:步驟 FR1~FR7:幀 Gap、GDP:斷音區 TH:門檻值 TX:發送端 AU:音訊資料 RX:接收端 10:音訊接收模組 20:音訊轉換模組 30:音頻篩選模組 40:外推運算模組 50:相位計算模組 60:雜訊填充模組 W1、W2、W3:音波 WAVE1、WAVE2:頻譜波形S101~S107, S201~S211: steps FR1~FR7: frame Gap, GDP: staccato area TH: Threshold value TX: sender AU: audio data RX: Receiver 10: Audio receiving module 20:Audio conversion module 30: Audio Filter Module 40: Extrapolation operation module 50: Phase calculation module 60: Noise filling module W1, W2, W3: Sound waves WAVE1, WAVE2: spectrum waveform

圖1為本發明第一實施例的音訊資料重建方法的步驟流程圖。FIG. 1 is a flow chart of the steps of the audio data reconstruction method according to the first embodiment of the present invention.

圖2為本發明第二實施例的音訊資料重建方法的步驟流程圖。FIG. 2 is a flow chart of the steps of the audio data reconstruction method according to the second embodiment of the present invention.

圖3為本發明第二實施例的依據丟失或延遲送達音訊封包的那一幀的前一幀計算門檻值的示意圖。FIG. 3 is a schematic diagram of calculating the threshold value based on the previous frame of the lost or delayed audio packet according to the second embodiment of the present invention.

圖4為本發明第二實施例的丟失或延遲送達音訊封包的那一幀的前一幀執行完快速傅立葉轉換運算後的頻譜圖。FIG. 4 is a spectrum diagram of the frame before the frame of the lost or delayed audio packet after the fast Fourier transform operation is performed according to the second embodiment of the present invention.

圖5為本發明第二實施例的重建的幀的頻譜圖。FIG. 5 is a spectrum diagram of a reconstructed frame according to the second embodiment of the present invention.

圖6為本發明第三實施例的音訊資料重建系統的方塊圖。FIG. 6 is a block diagram of an audio data reconstruction system according to a third embodiment of the present invention.

圖7為手機發送音源資料至採用第三實施例的音訊資料重建系統的耳機的使用示意圖。FIG. 7 is a schematic diagram of the mobile phone sending audio data to the earphone using the audio data reconstruction system of the third embodiment.

圖8為出現斷音的波形圖。Figure 8 is a waveform diagram of a staccato.

圖9為音訊資料透過本發明實施例的音訊資料重建系統及方法重建丟失或遲到的音波後的波形圖。FIG. 9 is a waveform diagram of audio data reconstructing lost or late sound waves through the audio data reconstruction system and method according to an embodiment of the present invention.

S201~S211:步驟S201~S211: steps

Claims

A method for reconstructing audio data, comprising the following steps: An audio data is divided into multiple audio packets by a sending end and sent to a receiving end in sequence; Using the receiving end, performing a fast Fourier transform operation on the audio packets of the preceding frames of the frame in which the audio packet is lost or delayed, so as to convert the audio packet from the time domain to the frequency domain; Using the receiving end to calculate a threshold value according to the audio packet of the frame preceding the frame of the audio packet lost or delayed; Using the receiving end, comparing the amplitude of each frequency segment of each frame preceding the frame in which the audio packet is lost or delayed with the threshold value for screening; Using the receiving end, performing linear prediction on all the amplitudes screened out for fast Fourier transform operation, to extrapolate the lost or delayed delivery of the audio packets; and The receiving end is used to perform an inverse fast Fourier transform operation, and the extrapolated frequency segment is subjected to an inverse fast Fourier transform operation, so as to convert the audio packet obtained through the extrapolation from the frequency domain back to the time domain.

The audio data reconstruction method as described in claim 1 further includes the following steps: using the receiving end to calculate the extrapolated amplitude and phase of the audio packet; Using the receiving end, in the extrapolated audio packet, the real part and the imaginary part of the frequency segment whose amplitude is not greater than the threshold value are filled with noise; and Using the receiving end, an inverse fast Fourier transform operation is performed to transform the audio packet obtained after extrapolation and noise filling from the frequency domain back to the time domain.

The audio data reconstruction method as described in claim 1 further includes the following steps: Using the receiving end, calculate and filter out the amplitude of each frequency segment of the previous frame after performing the fast Fourier transform operation; using the receiving end, summing up the amplitudes of all the frequency segments of the previous frame to obtain a total amplitude; and Using the receiver, the threshold value is calculated based on the total amplitude.

The audio data reconstruction method as described in claim 3 further includes the following steps: Using the receiving end, dividing the total amplitude by a signal-to-noise ratio to obtain the threshold value.

An audio reconstruction system comprising: A sending end, configured to divide an audio data into a plurality of audio packets, and send the plurality of audio packets in sequence; and A receiver, including: An audio receiving module, connected to the sending end, configured to sequentially receive the plurality of audio packets sent by the sending end; An audio conversion module, connected to the audio receiving module, configured to perform fast Fourier transform operations on the audio packets of the first few frames of the frame that was lost or delayed in delivering the audio packet, so as to convert the audio packet from the time domain to the frequency domain; An audio screening module, connected to the audio conversion module, configured to calculate a threshold value based on the audio packet of the previous frame of the frame that was lost or delayed in delivery of the audio packet after the fast Fourier transform operation, and compare the amplitude and the threshold value of each frequency segment of each frame preceding the frame in which the audio packet was lost or delayed for screening; and an extrapolation operation module, connected to the audio screening module and the audio conversion module, configured to perform linear prediction on all the amplitudes screened out for fast Fourier transform operation, extrapolate the lost or delayed delivery of the audio packets, and then the Inverse Fast Fourier Transform operation is performed on the frequency segment derived from the audio conversion module, so as to convert the audio packet obtained through extrapolation from the frequency domain back to the time domain.

The audio reconstruction system as claimed in claim 5, wherein the receiving end further includes a phase calculation module connected to the audio screening module and configured to calculate the extrapolated amplitude and phase of the audio packet.

The audio reconstruction system as described in claim 6, wherein the receiving end further includes a noise filling module, connected to the phase calculation module, the extrapolation calculation module and the audio conversion module, configured to extrapolate the audio In the packet, the real and imaginary parts of the frequency segment whose amplitude is not greater than the threshold are filled with noise; The audio conversion module performs an inverse fast Fourier transform operation to convert the audio packet obtained after extrapolation and noise filling from the frequency domain back to the time domain.

The audio reconstruction system as described in claim 5, wherein the audio screening module is configured to calculate and filter out the amplitude of each frequency segment of the previous frame after performing the fast Fourier transform operation, and sum up the amplitude of all frequency segments of the previous frame amplitude to obtain a total amplitude, and based on the total amplitude, the threshold value is calculated.

The audio reconstruction system as claimed in claim 8, wherein the audio screening module is configured to divide the total amplitude by a signal-to-noise ratio to obtain the threshold value.