CN103258552B

CN103258552B - How to adjust playback speed

Info

Publication number: CN103258552B
Application number: CN201210038338.7A
Authority: CN
Inventors: 陈亘志; 陈昭宇
Original assignee: Ali Corp
Current assignee: Ali Corp
Priority date: 2012-02-20
Filing date: 2012-02-20
Publication date: 2015-12-16
Anticipated expiration: 2032-02-20
Also published as: CN103258552A

Abstract

The invention provides a method for adjusting playing speed, which utilizes related data of frequency of audio data analyzed in an auditory perception decoding process to judge whether to discard or copy partial audio data so as to change the playing speed in the decoding process. Therefore, the invention does not need a large number of registers to register the audio data.

Description

How to adjust playback speed

技术领域 technical field

本发明关于一种媒体处理方法及其装置，尤指一种调整媒体播放速度的方法及其装置。The present invention relates to a media processing method and its device, in particular to a method and its device for adjusting media playback speed.

背景技术 Background technique

使用者利用多媒体播放平台聆听如MP3/WMA/AAC(MPEG-1AudioLayer3/WindowsMediaAudio/AdvancedAudioCoding)等音频压缩档时，可能加速播放速度寻找所希望聆听的片段，或者放慢播放速度仔细聆听某片段的细节(expansion)。为了播放品质不因为播放速度改变而大幅失真，时长调整方法(TimeScaleModification，TSM)为业界广泛地采用。传统时域上时长调整方法，如重叠相加法(OverlapAdd，OLA)或同步重叠相加法(SynchronizedOLA)，主要是将输入音频信号分成许多片段信号，重叠在时间上邻近的两个片段信号，并将重叠的区域作淡出淡入的加权处理。然而，这样的时长调整方法需要大量的寄存器寄存片段信号。When users listen to audio compression files such as MP3/WMA/AAC (MPEG-1AudioLayer3/WindowsMediaAudio/AdvancedAudioCoding) on a multimedia playback platform, they may speed up the playback speed to find the segment they want to listen to, or slow down the playback speed to listen carefully to the details of a segment (expansion). In order not to greatly distort the playback quality due to the change of the playback speed, the TimeScale Modification (TSM) method is widely used in the industry. Traditional time-domain duration adjustment methods, such as Overlap Add (OLA) or Synchronized Overlap Add (SynchronizedOLA), mainly divide the input audio signal into many segment signals, and overlap two segment signals adjacent in time. And the overlapped area is weighted by fading out and fading in. However, such a duration adjustment method requires a large number of registers to store segment signals.

此外，现有时长调整方法也有利用短时傅立叶转换(Short-TimeDiscreteFourierTransform，ST-DFT)将输入音频信号从时域转到频域上作分析，但是在分析后再转回到时域时，会遇到相位失真的问题。In addition, the existing duration adjustment method also utilizes the Short-Time Discrete Fourier Transform (ST-DFT) to convert the input audio signal from the time domain to the frequency domain for analysis, but when it is converted back to the time domain after analysis, it will Having problems with phase distortion.

美国专利公开号20050010397揭示一利用短时傅立叶转换的时长调整方法，其主要根据人类听觉感知频率响应的变动，选择音频数据的特定频谱带(SpectralBand)，这些频谱带根据关于人类听觉感知模型的Bark量度，使用于相位锁定。每一个频谱带皆标示出一频谱波峰(SpectralPeak)。频谱波峰及靠近或远离频谱波峰的频谱线进行不同的相位处理，也因此于后续音频数据必须转回时域而进行信号窗重建(Reconstruction)时，容易造成相位失真，影响播放品质。U.S. Patent Publication No. 20050010397 discloses a time length adjustment method using short-time Fourier transform, which mainly selects specific spectral bands (SpectralBand) of audio data based on changes in the frequency response of human auditory perception. Metric, used for phase locking. Each spectrum band is marked with a spectrum peak (SpectralPeak). Spectral peaks and spectral lines close to or far from the spectral peaks undergo different phase processing. Therefore, when the subsequent audio data must be converted back to the time domain for signal window reconstruction (Reconstruction), it is easy to cause phase distortion and affect the playback quality.

发明内容 Contents of the invention

因此，本发明主要提供一种不需要大量的寄存器的调整播放速度的方法及其装置。Therefore, the present invention mainly provides a method and device for adjusting playback speed that do not require a large number of registers.

本发明揭露一种调整播放速度的方法，包括有：一听觉感知解码装置接收一音频数据；该听觉感知解码装置进行该音频数据的一第一音频框的频率分析；取得关于该频率分析的一第一频域分析数据；接收一速度调整信号；于该速度调整信号指示加快该音频数据的播放速度时，根据该第一频域分析数据，判断是否舍弃该第一音频框；于该速度调整信号指示放慢该音频数据的播放速度时，根据该第一频域分析数据，判断是否复制该第一音频框；于该第一音频框被判断为可以舍弃时，该听觉感知解码装置舍弃该第一音频框的至少一部分数据；以及于该第一音频框被判断为可以复制时，该听觉感知解码装置复制该第一音频框的至少一部分数据。The present invention discloses a method for adjusting playback speed, including: an auditory perception decoding device receives an audio data; the auditory perception decoding device performs frequency analysis of a first audio frame of the audio data; obtains a frequency analysis related information First frequency domain analysis data; receive a speed adjustment signal; when the speed adjustment signal indicates to speed up the playback speed of the audio data, according to the first frequency domain analysis data, determine whether to discard the first audio frame; in the speed adjustment When the signal indicates to slow down the playing speed of the audio data, according to the first frequency domain analysis data, it is judged whether to copy the first audio frame; when the first audio frame is judged to be discardable, the auditory perception decoding device discards the At least a part of the data of the first audio frame; and when the first audio frame is determined to be copyable, the auditory perception decoding device copies at least a part of the data of the first audio frame.

本发明另揭露一种调整播放速度的方法，包括有：一听觉感知解码装置接收一音频数据，该音频数据包括多个音频框；该听觉感知解码装置进行该多个音频框的频率分析；接收一速度调整信号；于该速度调整信号指示加快该音频数据的播放速度至(N/(N-M))倍时，对该多个音频框中的N个连续音频框的每一音频框执行用来判断所处理的音频框是否可以舍弃的一调整判断程序，其中N、M为正整数；于通过该调整判断程序，判断该N个连续音频框中有M个音频框可以舍弃时，该听觉感知解码装置舍弃该M个音频框的至少一部分数据；于该速度调整信号指示放慢该音频数据的播放速度至(N/(N+M))倍时，对该多个音频框中的N个连续音频框的每一音频框执行用来判断所处理的音频框是否可以复制的一调整判断程序；以及于通过该调整判断程序，判断该N个连续音频框中有M个音频框可以复制时，该听觉感知解码装置复制该M个音频框的至少一部分数据。其中，该调整判断程序包括：取得对应于所处理的一第一音频框、关于该频率分析的一第一频域分析数据；于该速度调整信号指示加快该音频数据的播放速度时，根据该第一频域分析数据，判断是否舍弃该第一音频框的至少一部分数据；以及于该速度调整信号指示放慢该音频数据的播放速度时，根据该第一频域分析数据，判断是否复制该第一音频框的至少一部分数据。The present invention also discloses a method for adjusting playback speed, which includes: an auditory perception decoding device receives audio data, the audio data includes a plurality of audio frames; the auditory perception decoding device performs frequency analysis on the plurality of audio frames; receiving A speed adjustment signal; when the speed adjustment signal indicates that the playback speed of the audio data is accelerated to (N/(N-M)) times, each audio frame of the N consecutive audio frames in the plurality of audio frames is used to perform An adjustment judgment program for judging whether the processed audio frame can be discarded, wherein N and M are positive integers; when it is judged through the adjustment judgment program that there are M audio frames among the N consecutive audio frames that can be discarded, the auditory perception The decoding device discards at least a part of the data of the M audio frames; when the speed adjustment signal indicates to slow down the playback speed of the audio data to (N/(N+M)) times, the N audio frames in the plurality of audio frames Each audio frame of the continuous audio frame executes an adjustment judging program for judging whether the processed audio frame can be copied; and through the adjustment judging program, it is judged that there are M audio frames in the N consecutive audio frames that can be copied , the auditory perception decoding device copies at least a part of data of the M audio frames. Wherein, the adjustment judging procedure includes: obtaining a first frequency domain analysis data corresponding to a processed first audio frame and related to the frequency analysis; when the speed adjustment signal indicates to increase the playback speed of the audio data, according to the According to the first frequency domain analysis data, it is judged whether to discard at least a part of the data of the first audio frame; and when the speed adjustment signal indicates to slow down the playback speed of the audio data, according to the first frequency domain analysis data, it is judged whether to copy the audio data At least a portion of data for the first audio frame.

本发明另揭露一种加速播放速度的方法，包括有一听觉感知解码装置接收一音频数据；该听觉感知解码装置进行该音频数据的一第一音频框的频率分析；取得关于该频率分析的一第一频域分析数据；接收一加速调整信号；根据该第一频域分析数据，判断是否舍弃该第一音频框；以及于该第一音频框被判断为可以舍弃时，该听觉感知解码装置根据该加速调整信号所指示的一播放速度，舍弃该第一音频框的至少一部分数据。The present invention also discloses a method for accelerating playback speed, which includes an auditory perception decoding device receiving an audio data; the auditory perception decoding device performs a frequency analysis of a first audio frame of the audio data; and obtains a first frequency analysis related to the frequency analysis. A frequency domain analysis data; receiving an acceleration adjustment signal; judging whether to discard the first audio frame according to the first frequency domain analysis data; and when the first audio frame is judged to be discardable, the auditory perception decoding device according to The acceleration adjusts a playback speed indicated by the signal, discarding at least a part of data of the first audio frame.

本发明另揭露一种放慢播放速度的方法，包括有一听觉感知解码装置接收一音频数据；该听觉感知解码装置进行该音频数据的一第一音频框的频率分析；取得关于该频率分析的一第一频域分析数据；接收一放慢速度调整信号；根据该第一频域分析数据，判断是否复制该第一音频框；以及于该第一音频框被判断为可以复制时，该听觉感知解码装置根据该放慢速度调整信号所使指示的一播放速度，复制该第一音频框的至少一部分数据。The present invention also discloses a method for slowing down the playback speed, which includes an auditory perception decoding device receiving an audio data; the auditory perception decoding device performs frequency analysis of a first audio frame of the audio data; obtains a frequency analysis related information First frequency domain analysis data; receiving a slowing speed adjustment signal; judging whether to copy the first audio frame according to the first frequency domain analysis data; and when the first audio frame is judged to be copyable, the auditory perception The decoding device copies at least a part of data of the first audio frame according to a playback speed indicated by the slowing down adjustment signal.

本发明所的提供调整播放速度的方法及其装置，不需要大量的寄存器。The method and device for adjusting the playback speed provided by the present invention do not require a large number of registers.

附图说明 Description of drawings

图1为本发明实施例一流程的流程图。FIG. 1 is a flow chart of a process in Embodiment 1 of the present invention.

图2为本发明实施例一流程的流程图。FIG. 2 is a flow chart of a process in Embodiment 1 of the present invention.

图3为本发明实施例一流程的流程图。FIG. 3 is a flow chart of a process in Embodiment 1 of the present invention.

图4A及图4B为本发明实施例一流程的流程图。FIG. 4A and FIG. 4B are flowcharts of a process according to Embodiment 1 of the present invention.

图5为本发明实施例一流程的流程图。FIG. 5 is a flow chart of a process in Embodiment 1 of the present invention.

图6为本发明实施例一强制复制/舍弃的流程的流程图。FIG. 6 is a flow chart of a forced copy/discard process according to Embodiment 1 of the present invention.

图7为本发明实施例一速度调整装置的方块示意图。FIG. 7 is a schematic block diagram of a speed adjusting device according to an embodiment of the present invention.

附图标号：Figure number:

10、20、30、40、60流程10, 20, 30, 40, 60 processes

50速度调整装置50 speed adjustment device

500音频读取装置500 audio reading device

510处理器单元510 processor unit

520储存单元520 storage units

530输入单元530 input unit

540输出单元540 output unit

522程序代码522 program code

100、102、104、106、108、110、112、114、116、118、200、202、204、206、208、210、212、300、302、304、306、308、310、312、314、316、318、320、322、324、400、402、404、406、408、410、412、414、416、418、S910、S920、S930、S940、S950、602、604、606、608、610、612、614、616、618、620、622、624、626步骤100, 102, 104, 106, 108, 110, 112, 114, 116, 118, 200, 202, 204, 206, 208, 210, 212, 300, 302, 304, 306, 308, 310, 312, 314, 316, 318, 320, 322, 324, 400, 402, 404, 406, 408, 410, 412, 414, 416, 418, S910, S920, S930, S940, S950, 602, 604, 606, 608, 610, 612, 614, 616, 618, 620, 622, 624, 626 steps

具体实施方式 Detailed ways

图1是本发明一实施例用于使声音播放变速不变调的速度调整流程图。请参照图1，本实施例适用于电视机、机顶盒、数字视频光盘播放器、MP3播放器等多媒体播放装置，用以根据播放装置可用的缓冲容量，决定播放时所读取的音效数据的音频框数目，并根据这些音频框的差值总和的分布，决定所播放的音频数据的内容，而提供较好的播放效果。Fig. 1 is a flow chart of speed adjustment for changing the speed of sound playback in an embodiment of the present invention. Please refer to Fig. 1, this embodiment is applicable to the multimedia player such as television set, set-top box, digital video disc player, MP3 player, in order to determine the audio frequency of the sound effect data read when playing according to the available buffer capacity of the player. The number of frames, and according to the distribution of the sum of the differences of these audio frames, determine the content of the audio data to be played, and provide a better playback effect.

首先，取得切分为多个音频框的音频数据(步骤S910)。其中，所述的音频数据包括电视节目或多媒体档的音频数据，而此音频数据中的每一个音频框均包括多个频率分量。Firstly, audio data segmented into multiple audio frames is acquired (step S910). Wherein, the audio data includes audio data of TV programs or multimedia files, and each audio frame in the audio data includes multiple frequency components.

在取得音频框的音频数据之后，接着即可进行音频框的频域分析处理(步骤S920)。其中，在对音频框做频域分析时会进行一频率分量的计算，频率分量的计算方式可以是利用快速傅利叶转换(FastFourierTransform，FFT)，而获得各个频点的频域复数值，藉以将一个音频框区分为多个FFT频点，然后再分别计算这些FFT频点的能量值以作为其各个频率分量的能量。另一种方式则是利用滤波器组(FilterBank)将一个音频框区分为多个子带(Sub-band)，并计算各个子带的能量值以作为其各个频率分量的能量。After the audio data of the audio frame is obtained, frequency domain analysis of the audio frame can be performed (step S920 ). Wherein, a frequency component is calculated when performing frequency domain analysis on the audio frame, and the calculation method of the frequency component may be to use Fast Fourier Transform (FastFourierTransform, FFT) to obtain frequency domain complex values of each frequency point, so as to convert a The audio frame is divided into multiple FFT frequency points, and then the energy values of these FFT frequency points are respectively calculated as the energy of each frequency component. Another way is to use a filter bank (FilterBank) to divide an audio frame into multiple sub-bands (Sub-band), and calculate the energy value of each sub-band as the energy of each frequency component.

当播放装置在接收到使用者输入的变更播放速度指令之后，即会判断使用者是要快速播放或是慢速播放(步骤S930)，根据播放速度的倍数A以及上述多个音频框以动态调整所播放音频数据的音频框的比例，播放出音频数据，其中A为正数。After the playback device receives the instruction to change the playback speed input by the user, it will judge whether the user wants to play fast or slow (step S930), and dynamically adjust the playback speed according to the multiple A of the playback speed and the above-mentioned multiple audio frames. The ratio of the audio frame of the audio data to be played, and the audio data is played, where A is a positive number.

其中，当使用者欲执行快速播放，此时播放装置会依据一调整判断流程(将于下文详述其原理)将符合舍弃条件的音频框删除，以达成快速播放(步骤S950)(例如音频框1、2、3，其中2被删除，则播放1、3)；反之，当使用者欲执行慢速播放，此时播放装置即会依据调整判断流程将符合复制条件的音频框复制，以达成慢速播放(步骤S940)(例如音频框1、2、3，其中2被重复，则播放1、2、2、3)。在实际应用中，播放速度的倍数A可以是个小数，例如1.75倍或者0.75倍。Among them, when the user wants to perform fast playback, the playback device will delete the audio frames that meet the discarding conditions according to an adjustment judgment process (the principle will be described in detail below) to achieve fast playback (step S950) (for example, the audio frame 1, 2, 3, wherein 2 is deleted, then play 1, 3); On the contrary, when the user wants to perform slow playback, the playback device will copy the audio frames that meet the copying conditions according to the adjustment and judgment process to achieve Play slowly (step S940) (for example, audio frames 1, 2, 3, where 2 is repeated, then play 1, 2, 2, 3). In practical applications, the multiple A of the playback speed may be a decimal number, such as 1.75 times or 0.75 times.

举例来说，当播放装置执行2倍速的快速播放时，即可将B个音频框中，依据判断流程将符合舍弃条件的B/2音频框扔掉，藉以播放出音频数据中变化较大的音频框内容，而能够让使用者在快速播放的过程中，仍可听到音频数据中的重要讯息。另一方面，当播放装置执行0.66倍速的慢速播放时，则可将所述B个音频框中，依据判断流程将符合复制条件的B/2音频框各重复一次，藉以重复播放出音频数据中变化较小的音频框内容，而能够让使用者在慢速播放的过程中，听到延长而不变调的音频内容。通过上述方法，播放装置即可利用原有的可用的缓冲容量来进行音频框数据复制与舍弃，而不会影响到音频数据的正常播放，换句话说，播放装置可节省缓冲寄存器的使用且可维持声音绝大部分的细节特性，提供使用者快速浏览以及重点播放的聆听效果。For example, when the playback device executes fast playback at 2x speed, it can discard the B/2 audio frames that meet the discarding conditions in the B audio frames according to the judgment process, so as to play out the audio data that changes greatly The content of the audio frame, so that the user can still hear important information in the audio data during fast playback. On the other hand, when the playback device executes slow playback at a speed of 0.66 times, the B/2 audio frames that meet the copying conditions can be repeated once among the B audio frames according to the judgment process, so as to repeatedly play out the audio data The content of the audio frame changes less in the middle, so that the user can hear the extended audio content without changing the pitch during the slow playback process. Through the above method, the playback device can use the original available buffer capacity to copy and discard the audio frame data without affecting the normal playback of the audio data. In other words, the playback device can save the use of buffer registers and can Most of the detailed characteristics of the sound are maintained, providing users with a listening effect of quick browsing and focused playback.

值得一提的是，上述区分频率分量及计算其能量的方式仅为本发明的一实施例，熟知本领域技术人员当可视实际需要，改变FFT长度或者滤波器组子带个数，或者使用小波变换、离散余弦变换(DiscreteCosineTransform，DCT)或其他技术来区分频率分量与计算其能量，本实施例不限制其范围。It is worth mentioning that the above method of distinguishing frequency components and calculating their energy is only an embodiment of the present invention. Those skilled in the art can change the FFT length or the number of filter bank subbands according to actual needs, or use Wavelet transform, discrete cosine transform (DiscreteCosine Transform, DCT) or other technologies are used to distinguish frequency components and calculate their energy, and this embodiment does not limit the scope thereof.

对于已压缩过的音频数据来说，如：MPEG、AC3、DTS、WMA、AAC等，其在压缩时就已经先切分成一个个音频框，并且是在其每一个音频框中各个频率分量均计算出来之后才进行压缩。因此，在播放以上规格的压缩音频数据时，播放装置只需要将所接收的压缩音频数据解压缩，即可获得已切分成多个音频框的音频数据以及各个音频框中所有频率分量，可以直接计算这些频率分量的能量。For compressed audio data, such as: MPEG, AC3, DTS, WMA, AAC, etc., it has been divided into audio frames during compression, and each frequency component in each audio frame is Compression is performed after calculation. Therefore, when playing the compressed audio data of the above specifications, the playback device only needs to decompress the received compressed audio data to obtain the audio data that has been divided into multiple audio frames and all the frequency components in each audio frame, which can be directly Calculate the energy of these frequency components.

请参考图2、图3、图4A及图4B，图2为本发明一实施例的速度调整流程图，图3、图4A及图4B为本发明另一实施例的调整判断流程图。速度调整流程10可实现于一听觉感知解码(PerceptualAudiodecoding)装置上，以配合调整判断流程20，在听觉感知解码程序下调整音频播放速度。速度调整流程10包括以下步骤：Please refer to FIG. 2 , FIG. 3 , FIG. 4A and FIG. 4B . FIG. 2 is a flow chart of speed adjustment in one embodiment of the present invention, and FIG. 3 , FIG. 4A and FIG. 4B are flow charts of adjustment judgment in another embodiment of the present invention. The speed adjustment process 10 can be implemented on a perceptual audio decoding device, so as to cooperate with the adjustment judgment process 20 to adjust the audio playback speed under the perceptual audio decoding program. The speed adjustment process 10 includes the following steps:

步骤100：开始。Step 100: start.

步骤102：接收一音频数据的一音频框。Step 102: Receive an audio frame of an audio data.

步骤104：进行该音频框的熵解码(EntropyDecoding)。Step 104: Perform entropy decoding (Entropy Decoding) of the audio frame.

步骤106：进行该音频框的反量化(InverseQuantization)。Step 106: Perform inverse quantization (InverseQuantization) of the audio frame.

步骤108：根据一听觉感知模块，进行该音频框的频率分析，且执行调整判断流程20。Step 108 : Perform frequency analysis of the audio frame according to an auditory perception module, and execute the adjustment judgment process 20 .

步骤110：根据调整判断流程20所输出的一判断结果，判断是否舍弃该音频框？若是，则执行步骤118；若否，则执行步骤112。Step 110: According to a judgment result output by the adjustment judgment process 20, judge whether to discard the audio frame? If yes, go to step 118; if not, go to step 112.

步骤112：根据该音频框的窗型，进行该音频框的逆改良型离散余弦转换(InverseModifiedDiscreteCosineTransform，IMDCT)。Step 112: Perform an inverse Modified Discrete Cosine Transform (IMDCT) of the audio frame according to the window type of the audio frame.

步骤114：根据调整判断流程20的该判断结果，判断是否复制该音频框？若是，则执行步骤116；若否，则执行步骤118。Step 114: According to the determination result of the adjustment determination process 20, determine whether to copy the audio frame? If yes, go to step 116; if not, go to step 118.

步骤116：复制该音频框且预设下回的判断结果为“不复制”，并执行步骤112。Step 116: copy the audio frame and preset the judgment result of the next time as "no copy", and execute step 112.

步骤118：于有下一个音频框存在时，接收此音频框的下一个音频框，并进行步骤104。Step 118: When there is a next audio frame, receive the next audio frame of the audio frame, and proceed to step 104.

由上可知，速度调整流程10逐一对音频数据的每一音频框进行听觉感知解码，音频数据可为MP3/WMA/AAC等压缩格式的音频数据。首先，每一音频框进行熵解码，例如霍夫曼(Huffman)解码。接着，音频框进行反量化，其可包括解码当初编码端用来量化时所用的比例因子(ScaleFactor)。反量化完成之后，调整判断流程20根据频域分析的数据(以下简称频域分析数据)及一速度调整信号指示，判断音频框是否需要被复制、舍弃或无须复制及舍弃的处理，并产生相关判断结果。于速度调整信号指示加快音频数据的播放速度时，调整判断流程20根据频域分析数据，判断是否舍弃该音频框；于速度调整信号指示放慢音频数据的播放速度时，调整判断流程20则根据频域分析数据，判断是否复制该音频框。速度调整信号指示可根据使用者利用音频播放系统改变播放速度而产生。调整判断流程20的详细操作原理将于后文说明。As can be seen from the above, the speed adjustment process 10 performs auditory perception decoding on each audio frame of the audio data one by one, and the audio data may be audio data in compressed formats such as MP3/WMA/AAC. First, each audio frame is subjected to entropy decoding, such as Huffman decoding. Next, the audio frame is dequantized, which may include decoding the scale factor (ScaleFactor) used for quantization at the encoder. After the dequantization is completed, the adjustment judgment process 20 judges whether the audio frame needs to be copied, discarded or not, according to the frequency domain analysis data (hereinafter referred to as the frequency domain analysis data) and a speed adjustment signal indication, and generates related critical result. When the speed adjustment signal indicates to speed up the playback speed of the audio data, the adjustment judgment process 20 judges whether to discard the audio frame according to the frequency domain analysis data; when the speed adjustment signal indicates to slow down the playback speed of the audio data, the adjustment judgment process 20 is based on Analyze the data in the frequency domain to determine whether to copy the audio frame. The speed adjustment signal indication can be generated according to the user changing the playback speed by using the audio playback system. The detailed operation principle of the adjustment judgment process 20 will be described later.

听觉感知解码装置根据判断结果，先判断音频框是否需要舍弃，若需舍弃，则如步骤118所述，转而解码下一个音频框，如此一来，在音频数据的播放过程中，此音频框数据不会被播放，以达到播放速度加快的目的。相反地，若音频框不需舍弃，则听觉感知解码装置根据音频框的窗型，进行该音频框的逆改良型离散余弦转换及合成，其为一种反向时频转换，可以长窗或短窗为单位，将音频框的频域数据(可包括于频域分析数据)转成时域数据。于一次逆改良型离散余弦转换完成之后，听觉感知解码装置会判断该音频框是否需要复制，若需复制，则预设下回判断结果为“否”，即遇下回判断时不需要复制，此外复制的音频框进行逆改良型离散余弦转换，如此一来，在音频数据的播放过程中，此音频框数据会被播放两次，以达到播放速度减缓的目的。由于判断结果被设定成此音频框遇下回判断时不需要复制，速度调整流程10转至解码下一个音频框。因此，根据调整判断流程20的判断结果，速度调整流程10可对音频数据的每一音频框进行舍弃/复制动作，以加快/放慢播放速度。The auditory perception decoding device first judges whether the audio frame needs to be discarded according to the judgment result, and if it needs to be discarded, then as described in step 118, it decodes the next audio frame instead, so that during the playback of the audio data, the audio frame The data will not be played, in order to achieve the purpose of speeding up the playback. On the contrary, if the audio frame does not need to be discarded, the auditory perception decoding device performs the inverse improved discrete cosine transform and synthesis of the audio frame according to the window type of the audio frame, which is a kind of inverse time-frequency conversion, which can be long window or The unit of the short window is to convert the frequency domain data of the audio frame (which can be included in the frequency domain analysis data) into time domain data. After an inverse improved discrete cosine transform is completed, the auditory perception decoding device will judge whether the audio frame needs to be copied. If it needs to be copied, the default judgment result of the next round is "No", that is, no copying is required in the next judgment. In addition, the copied audio frame is subjected to an inverse improved discrete cosine transform, so that during the playback of the audio data, the audio frame data will be played twice to achieve the purpose of slowing down the playback speed. Since the judgment result is set such that this audio frame does not need to be copied when it is judged next time, the speed adjustment process 10 proceeds to decode the next audio frame. Therefore, according to the determination result of the adjustment determination process 20, the speed adjustment process 10 can discard/copy each audio frame of the audio data to speed up/slow down the playback speed.

未被舍弃的短窗数据仍然会经由步骤112的逆改良型离散余弦转换及分窗，并于听觉感知解码完成后播放。The undiscarded short-window data will still go through the inverse improved DCT and windowing in step 112, and will be played after the auditory perception decoding is completed.

请注意，在本发明中的音频框为数据舍弃与复制的最小单元，依据每一种音频格式会含有不同的长短窗相对比例；例如：在A格式中，一个长窗长度视为一音频框，而一个长窗的长度可能为4个短窗或数个短窗的长度组合，即4个短窗或数个短窗将视为一音频框；另一例，在B格式中，一音频框需视其长短窗的匹配性而定。在听觉感知编码中，由长窗所组成的音频框数据表示一段较平稳的信号范围，而由短窗组成的音频框数据表示一段变化较剧烈的信号范围。因此，在调整播放速度上，仅复制或舍弃属于长窗的数据较不会影响拨放品质。Please note that the audio frame in the present invention is the smallest unit of data discarding and duplication, and each audio format will contain different relative ratios of long and short windows; for example: in A format, the length of a long window is regarded as an audio frame , and the length of a long window may be a combination of four short windows or several short windows, that is, four short windows or several short windows will be regarded as an audio frame; another example, in the B format, an audio frame It depends on the matching of its long and short windows. In auditory perceptual coding, the audio frame data composed of long windows represents a relatively stable signal range, while the audio frame data composed of short windows represents a relatively sharply changing signal range. Therefore, in adjusting the playback speed, only copying or discarding the data belonging to the long window will not affect the playback quality.

因此，前述的频域分析数据可包括一窗型指标，其用来指示音频框用于逆改良型离散余弦转换的窗型为长窗或短窗。在此情况下，图3调整判断流程20包括以下步骤：Therefore, the aforementioned frequency-domain analysis data may include a window type indicator, which is used to indicate that the window type used for the inverse modified DCT of the audio frame is a long window or a short window. In this case, the adjustment judgment process 20 in FIG. 3 includes the following steps:

步骤200：接收一速度调整信号指示。Step 200: Receive a speed adjustment signal indication.

步骤202：取得包括该音频框的一窗型指标的频域分析数据。Step 202: Obtain frequency-domain analysis data of a window-type indicator including the audio frame.

步骤204：判断该窗型指标是否指示该音频框属于长窗型？若是，则进行步骤208；若否，则进行步骤206。Step 204: Determine whether the window type indicator indicates that the audio box belongs to the long window type? If yes, go to step 208 ; if not, go to step 206 .

步骤206：产生一“不舍弃/不复制”的判断结果。Step 206: Generate a judgment result of "do not discard/do not copy".

步骤208：判断该速度调整信号是否指示加快该音频数据的播放速度？若是，则进行步骤210；若否，则进行步骤212。Step 208: Determine whether the speed adjustment signal indicates to speed up the playback speed of the audio data? If yes, go to step 210 ; if not, go to step 212 .

步骤210：产生一“舍弃”的判断结果。Step 210: Generate a judgment result of "discard".

步骤212：产生一“复制”的判断结果。Step 212: Generate a judgment result of "copy".

图3调整判断流程20主要利用音频框的窗型作为音频框是否需要舍弃/复制的准则。由上可知，于速度调整信号指示加快音频数据的播放速度，且窗型指标指示音频框属于长窗时，调整判断流程20判断可以舍弃。于速度调整信号指示放慢音频数据的播放速度，且窗型指标指示长窗时，调整判断流程20判断可以复制。换句话说，于窗型指标指示音频框属于其他窗型(如短窗、长转短窗等等)时，则调整判断流程20指示速度调整流程10此音频框不需要舍弃也不需要复制。The adjustment judgment process 20 in FIG. 3 mainly uses the window type of the audio frame as a criterion for whether the audio frame needs to be discarded/duplicated. As can be seen from the above, when the speed adjustment signal indicates to increase the playback speed of the audio data, and the window type indicator indicates that the audio frame belongs to a long window, the determination of the adjustment and determination process 20 can be discarded. When the speed adjustment signal indicates to slow down the playback speed of the audio data, and the window type indicator indicates a long window, the adjustment determination process 20 determines that copying is possible. In other words, when the window type indicator indicates that the audio frame belongs to other window types (such as short window, long-to-short window, etc.), the adjustment judgment process 20 indicates the speed adjustment process 10 and the audio frame does not need to be discarded or copied.

除了窗型指标以外，前述的频域分析数据另可包括音频框的一频谱线(SpectralLine)数据。图4A及图4B调整判断流程20同时利用音频框的窗型及频谱线数据作为音频框是否需要舍弃/复制的准则，其包括以下步骤：In addition to the window index, the aforementioned frequency domain analysis data may further include a spectral line (SpectralLine) data of the audio frame. Figure 4A and Figure 4B adjust the judgment process 20 while using the window type and spectral line data of the audio frame as the criteria for whether the audio frame needs to be discarded/duplicated, which includes the following steps:

步骤300：接收一速度调整信号指示。Step 300: Receive a speed adjustment signal indication.

步骤302：取得该音频框的频域分析数据，其包括一窗型指标与一频谱线数据。Step 302: Obtain frequency domain analysis data of the audio frame, which includes a window index and a spectral line data.

步骤304：判断该窗型指标是否指示该音频框属于长窗型？若是，则进行步骤308；若否，则进行步骤306。Step 304: Determine whether the window type indicator indicates that the audio box belongs to the long window type? If yes, go to step 308 ; if not, go to step 306 .

步骤306：产生一“不舍弃/不复制”的判断结果。Step 306: Generate a judgment result of "do not discard/do not copy".

步骤308：将该频谱线数据划分出多个频带单位，且计算该多个频带单位的一能量总合Pcurr。Step 308: Divide the spectral line data into multiple frequency band units, and calculate an energy sum Pcurr of the multiple frequency band units.

步骤310：取得该音频框的前一音频框对应于该多个频带单位的一能量总合Pprev。Step 310: Obtain an energy sum Pprev corresponding to the plurality of frequency band units of the previous audio frame of the audio frame.

步骤312：计算一能量总合差Pdiff＝Pprev-Pcurr。Step 312: Calculate a total energy difference Pdiff=Pprev-Pcurr.

步骤314：判断|Pdiff|＜THa？若是，则进行步骤316；若否，则进行步骤306。Step 314: Determine |Pdiff|<THa? If yes, go to step 316 ; if not, go to step 306 .

步骤316：判断Pdiff＞THb？若是，则进行步骤318；若否，则进行步骤306。Step 316: Determine if Pdiff>THb? If yes, go to step 318 ; if not, go to step 306 .

步骤318：判断Pprev＜THc且Pcurr＜THc？若是，则进行步骤320；若否，则进行步骤306。Step 318: Determine if Pprev<THc and Pcurr<THc? If yes, go to step 320 ; if not, go to step 306 .

步骤320：判断该速度调整信号是否指示加快该音频数据的播放速度？若是，则进行步骤322；若否，则进行步骤324。Step 320: Determine whether the speed adjustment signal indicates to speed up the playback speed of the audio data? If yes, go to step 322 ; if not, go to step 324 .

步骤322：产生一“舍弃”的判断结果。Step 322: Generate a judgment result of "discard".

步骤324：产生一“复制”的判断结果。Step 324: Generate a judgment result of "copy".

由上可知，对长窗之外的其他音频框窗型，图4A及图4B调整判断流程20亦指示速度调整流程10此音频框不需要舍弃也不需要复制。在图4A及图4B调整判断流程20中，频谱线数据的频带单位划分可以根据系统需求而有所不同，例如频谱线数据可直接划分出连续且占满所有音频框的频率范围的频带单位，如此一来，能量总合Pcurr及Pprev计算出来分别为音频框及前一音频框的总能量。或是，频谱线数据可根据信号平坦性，划分出归类为类单频信号(Tone-like)或类噪声(noise-like)的频带单位。频带单位的划分及其能量运算可参考频率分量的方式，详细操作于此不赘述。另外，图4A及图4B调整判断流程20定义了门槛值THa、THb及THc，其分别是系统根据类单频信号、听觉感知中的后遮蔽(Post-masking)效应及静音(Silence)信号的特性所给予的门槛值，其特性应为本领域的技术人员所熟知，于此不赘述。因此，图4A及图4B调整判断流程20在以下条件皆符合时才会指示音频框需要舍弃或复制，其条件为：(i)能量总合差Pdiff的绝对值小于相关于类单频信号能量总合差的门槛值THa；(ii)能量总合差Pdiff大于相关于后遮蔽效应的门槛值THb；(iii)能量总合Pprev小于相关于静音频号的门槛值THc且能量总合Pcurr也小于门槛值THc。不符合以上任一条件的情况下，图4A及图4B调整判断流程20指示速度调整流程10此音频框不需要舍弃也不需要复制。在符合以上(i)、(ii)及(iii)所有条件的情况下，调整判断流程20根据速度调整信号的加快/放慢指示，指示速度调整流程10此音频框需要舍弃/复制。It can be seen from the above that for other audio frame window types other than the long window, the adjustment and judgment process 20 in FIG. 4A and FIG. 4B also indicates the speed adjustment process 10 that the audio frame does not need to be discarded or copied. In FIG. 4A and FIG. 4B adjustment judgment process 20, the frequency band unit division of spectral line data can be different according to system requirements, for example, spectral line data can be directly divided into frequency band units that are continuous and occupy the frequency range of all audio frames, In this way, the energy sums Pcurr and Pprev are calculated to be the total energy of the audio frame and the previous audio frame respectively. Alternatively, the spectral line data can be divided into frequency band units classified as tone-like or noise-like according to signal flatness. The division of the frequency band unit and its energy calculation can refer to the method of the frequency component, and the detailed operation will not be repeated here. In addition, the adjustment judgment process 20 in Fig. 4A and Fig. 4B defines the threshold values THa, THb and THc, which are respectively the values of the system based on the single-frequency signal, the post-masking effect in the auditory perception and the silence (Silence) signal. The characteristics of the threshold given by the characteristics should be well known to those skilled in the art, so details are not described here. Therefore, the adjustment judgment process 20 in FIG. 4A and FIG. 4B will only indicate that the audio frame needs to be discarded or copied when the following conditions are met. The threshold THa of the total difference; (ii) The total energy difference Pdiff is greater than the threshold THb related to the back shadowing effect; (iii) The total energy Pprev is less than the threshold THc related to the silence signal and the total energy Pcurr is also less than Threshold THc. If any of the above conditions is not met, the adjustment judgment process 20 in FIG. 4A and FIG. 4B indicates the speed adjustment process 10 that the audio frame does not need to be discarded or copied. When all the conditions (i), (ii) and (iii) above are met, the adjustment judgment process 20 instructs the speed adjustment process 10 that the audio frame needs to be discarded/duplicated according to the speed up/slow down indication of the speed adjustment signal.

在图4A及图4B调整判断流程20中，本领域的技术人员可将长短窗及条件(i)～(iii)中四者任一或四者的组合作为判断是否需要舍弃/复制的准则，并不限于需完全符合此四个条件。举例来说，音频框可在被判断长窗时，即被舍弃或复制。前述听觉感知解码流程，如熵解码、反量化、音频框的频率分析及逆改良型离散余弦转换，应为本领域的技术人员所熟知，本发明主要是利用听觉感知解码中既有的频率分析资讯，作为复制或舍弃音频框的基准，因此不会遭遇听觉感知解码后续信号重建(Reconstruction)时会出现相位失真的问题。此外，本发明可以即时判断须复制或舍弃的音频框，因此在调整播放速度过程，不需要大量的寄存器储存前后的音频框数据，进而节省生产成本。In the adjustment and judgment process 20 shown in FIG. 4A and FIG. 4B , those skilled in the art can use any combination of the length window and the conditions (i) to (iii) or any combination of the four as a criterion for judging whether discarding/copying is required, It is not limited to fully meeting these four conditions. For example, an audio frame may be discarded or duplicated when it is determined that the window is long. The aforementioned auditory perception decoding process, such as entropy decoding, inverse quantization, frequency analysis of audio frames and inverse improved discrete cosine transform, should be well known to those skilled in the art. The present invention mainly utilizes the existing frequency analysis in auditory perception decoding Information, as a reference for copying or discarding audio frames, so it will not encounter the problem of phase distortion in the subsequent signal reconstruction (Reconstruction) of auditory perception decoding. In addition, the present invention can judge the audio frame to be copied or discarded in real time. Therefore, in the process of adjusting the playback speed, there is no need for a large number of registers to store the data of the audio frame before and after, thereby saving the production cost.

请参考图5，图5为本发明实施例一速度调整流程40的流程图。速度调整流程40可实现于听觉感知解码装置上，利用速度调整流程10舍弃、复制或正常处理每一音频框，进而调整音频数据的播放速度到使用者预期的播放速度，其包括以下步骤：Please refer to FIG. 5 , which is a flow chart of a speed adjustment process 40 according to an embodiment of the present invention. The speed adjustment process 40 can be implemented on the auditory perception decoding device, using the speed adjustment process 10 to discard, copy or process each audio frame normally, and then adjust the playback speed of the audio data to the playback speed expected by the user, which includes the following steps:

步骤400：接收一音频数据，其音频数据为连续的音频框输入。Step 400: Receive an audio data, and the audio data is input in continuous audio frames.

步骤402：接收并判断一速度调整信号。于该速度调整信号指示加快该音频数据的播放速度至(N/(N-M))倍时，执行步骤404；于该速度调整信号指示放慢该音频数据的播放速度至(N/(N+M))倍时，执行步骤410。Step 402: Receive and determine a speed adjustment signal. When the speed adjustment signal indicates that the playback speed of the audio data is accelerated to (N/(N-M)) times, step 404 is executed; when the speed adjustment signal indicates that the playback speed of the audio data is slowed down to (N/(N+M) )) times, execute step 410.

步骤404：通过速度调整流程10，判断该多个音频框中的N个连续音频框的每一音频框是否可以舍弃。Step 404: Through the speed adjustment process 10, determine whether each audio frame of the N consecutive audio frames in the plurality of audio frames can be discarded.

步骤406：判断该N个连续音频框中是否有M个音频框可以舍弃？若有，则进行步骤408；若无，则进行步骤420。Step 406: Determine whether there are M audio frames among the N consecutive audio frames that can be discarded? If yes, go to step 408 ; if not, go to step 420 .

步骤408：该听觉感知解码装置舍弃该M个音频框的至少一部分数据。Step 408: The auditory perception decoding device discards at least a part of data of the M audio frames.

步骤410：取得该音频数据的下一组N个连续音频框，并进行步骤404。Step 410: Obtain the next group of N consecutive audio frames of the audio data, and proceed to step 404.

步骤412：通过速度调整流程10，判断该多个音频框中的N个连续音频框的每一音频框是否可以复制。Step 412: Through the speed adjustment process 10, determine whether each audio frame of the N consecutive audio frames in the plurality of audio frames can be copied.

步骤414：判断该N个连续音频框中是否有M个音频框可以复制？若有，则进行步骤416；若无，则进行步骤424。Step 414: Determine whether there are M audio frames among the N consecutive audio frames that can be copied? If yes, go to step 416 ; if not, go to step 424 .

步骤416：该听觉感知解码装置复制该M个音频框的至少一部分数据。Step 416: The auditory perception decoding apparatus copies at least a part of data of the M audio frames.

步骤418：取得该音频数据的下一组N个连续音频框，并进行步骤412。Step 418: Obtain the next group of N consecutive audio frames of the audio data, and proceed to step 412.

步骤420：判断是否已处理K组的N个连续音频框，且此K组的N个连续音频框中是否总共不到K×M个音频框可以舍弃？若有，则进行步骤422；若无，则进行步骤410。Step 420: Determine whether the K groups of N consecutive audio frames have been processed, and whether there are less than K×M audio frames in the K group of N consecutive audio frames that can be discarded? If yes, go to step 422 ; if not, go to step 410 .

步骤422：舍弃之后的音频框的全部或一部分数据。Step 422: Discard all or part of the data of the subsequent audio frames.

步骤424：判断是否已处理K组的N个连续音频框，且此K组的N个连续音频框中是否总共不到K×M个音频框可以复制？若有，则进行步骤426；若无，则进行步骤418。Step 424: Determine whether the K groups of N consecutive audio frames have been processed, and whether there are less than K×M audio frames in the K group of N consecutive audio frames that can be copied? If yes, go to step 426 ; if not, go to step 418 .

步骤426：复制之后的音频框的全部或一部分数据。Step 426: All or part of the data of the audio frame after copying.

根据速度调整流程40，于速度调整信号指示加快播放速度至(N/(N-M))倍时，每一组N个连续音频框的每一音频框会由速度调整流程10判断是否可以舍弃。当N个连续音频框中有M个音频框被判断可以舍弃时，听觉感知解码装置可舍弃M个音频框的至少一部分数据，例如舍弃M个音频框全部或其长窗类型数据。同样地，于速度调整信号指示放慢播放速度至(N/(N+M))倍时，每一音频框执行会由速度调整流程10判断是否可以复制。当N个连续音频框中有M个音频框被判断可以复制时，听觉感知解码装置可复制M个音频框的至少一部分数据。速度调整流程40在N个音频框下，复制或者舍弃M个音频框数据(或长窗数据)，以得到使用者所期望的播放速度为N/(N±M)倍。According to the speed adjustment process 40, when the speed adjustment signal indicates that the playback speed is increased to (N/(N-M)) times, each audio frame of each group of N consecutive audio frames will be judged by the speed adjustment process 10 whether it can be discarded. When M audio frames among the N consecutive audio frames are determined to be discarded, the auditory perception decoding device may discard at least a part of the data of the M audio frames, for example discard all of the M audio frames or their long window type data. Similarly, when the speed adjustment signal indicates that the playback speed is slowed down to (N/(N+M)) times, the execution of each audio frame will be judged by the speed adjustment process 10 whether it can be copied. When it is determined that M audio frames among the N consecutive audio frames can be copied, the auditory perception decoding device may copy at least a part of data of the M audio frames. The speed adjustment process 40 copies or discards M audio frame data (or long window data) under N audio frames, so as to obtain N/(N±M) times the playback speed desired by the user.

此外，速度调整流程40中可能有连续多组N个音频框中都没有M个音频框可舍弃或复制。在此情况下，如步骤420～422及424～426所示，本实施例可设定一最大限组值K，当连续K组N个音频框中没有K×M个音频框可舍弃或复制时，该听觉感知解码装置开始强制舍弃或复制之后的音频框的全部或一部分数据，以使播放速度能达到使用者预期的速度。举例来说，当连续K组N个音频框中仅有(K×M-L)个音频框符合可舍弃或复制的条件时，该听觉感知解码装置舍弃或复制之后所接收的L个音频框的全部或一部分数据，以维持播放速度于(N/(N-M))倍或(N/(N+M))倍。In addition, in the speed adjustment process 40 , there may be no M audio frames in consecutive groups of N audio frames that can be discarded or copied. In this case, as shown in steps 420-422 and 424-426, this embodiment can set a maximum limit value K, when there are no K×M audio frames in the continuous K groups of N audio frames, it can be discarded or copied , the auditory perception decoding device starts to forcibly discard or copy all or part of the data of the subsequent audio frame, so that the playback speed can reach the speed expected by the user. For example, when only (K×M-L) audio frames in K consecutive groups of N audio frames meet the conditions for discarding or copying, the auditory perception decoding device discards or copies all received L audio frames Or a part of the data to maintain the playback speed at (N/(N-M)) times or (N/(N+M)) times.

在本发明实施例中，判断连续K组N个音频框中有没有K×M个音频框可舍弃或复制的过程不一定只限定每一组N个音频框要有M个音频框，只要总共有K×M个音频框即可。举例来说，K设定为2的情况下，若第一组N个音频框有(M-1)个音频框可舍弃或复制，则第二组N个音频框需要有(M+1)个音频框，以使播放速度能达到使用者预期的速度。In the embodiment of the present invention, the process of judging whether there are K×M audio frames in the continuous K groups of N audio frames that can be discarded or copied does not necessarily only require M audio frames for each group of N audio frames, as long as the total It only needs to have K×M audio frames. For example, when K is set to 2, if the first group of N audio frames has (M-1) audio frames that can be discarded or copied, then the second group of N audio frames needs to have (M+1) audio frame, so that the playback speed can reach the speed expected by the user.

请参考图6，图6为本发明实施例一流程60的方块示意图。流程60用于实现上述强制舍弃或复制音频框的概念，其包括以下步骤：Please refer to FIG. 6 , which is a schematic block diagram of a process 60 according to an embodiment of the present invention. The process 60 is used to realize the above-mentioned concept of forcibly discarding or duplicating audio frames, which includes the following steps:

步骤602：接收并判断一速度调整信号。于该速度调整信号指示加快该音频数据的播放速度至(N/(N-M))倍时，执行步骤604；于该速度调整信号指示放慢该音频数据的播放速度至(N/(N+M))倍时，执行步骤616。Step 602: Receive and determine a speed adjustment signal. When the speed adjustment signal indicates that the playback speed of the audio data is accelerated to (N/(N-M)) times, step 604 is executed; when the speed adjustment signal indicates that the playback speed of the audio data is slowed down to (N/(N+M) )) times, execute step 616.

步骤604：设定一参数i＝1。Step 604: Set a parameter i=1.

步骤606：判断新接收的N个音频框中可以舍弃的音频框数目。Step 606: Determine the number of discardable audio frames among the newly received N audio frames.

步骤608：判断一总计可以舍弃的音频框数目Ndiscard是否有i×M个？若无，则进行步骤610；若有，则进行步骤614。Step 608: Determine whether there are i×M audio frames Ndiscard that can be discarded in total? If not, go to step 610; if yes, go to step 614.

步骤610：i＝i+1。Step 610: i=i+1.

步骤612：判断i是否大于一门槛值K？若无，则进行步骤606；若有，则进行步骤614。Step 612: Determine whether i is greater than a threshold K? If not, go to step 606; if yes, go to step 614.

步骤614：执行一重设流程，其包括设定i＝1、舍弃Ndiscard个音频框且于后续新接收的音频框中强制舍弃(K×M-Ndiscard)个音频框，或仅舍弃Ndiscard个音频框，但不于后续新接收的音频框中强制舍弃(K×M-Ndiscard)个音频框。Step 614: Execute a reset process, which includes setting i=1, discarding Ndiscard audio frames and forcibly discarding (K×M-Ndiscard) audio frames in subsequent newly received audio frames, or only discarding Ndiscard audio frames , but do not forcibly discard (K×M-Ndiscard) audio frames in subsequent newly received audio frames.

步骤616：设定i＝1。Step 616: Set i=1.

步骤618：判断新接收的N个音频框中可以复制的音频框数目。Step 618: Determine the number of audio frames that can be copied among the newly received N audio frames.

步骤620：判断一总计可以复制的音频框数目Ncopy是否有i×M个？若无，则进行步骤622；若有，则进行步骤626。Step 620: Determine whether the total number Ncopy of audio frames that can be copied is i×M? If not, go to step 622; if yes, go to step 626.

步骤622：i＝i+1。Step 622: i=i+1.

步骤624：判断i是否大于门槛值K？若无，则进行步骤618；若有，则进行步骤626。Step 624: Determine whether i is greater than the threshold value K? If not, go to step 618; if yes, go to step 626.

步骤626：执行一重设流程，其包括设定i＝1、复制Ncopy个音频框且于后续新接收的音频框中强制复制(K×M-Ncopy)个音频框，或仅复制Ncopy个音频框，但不于后续新接收的音频框中强制复制(K×M-Ncopy)个音频框。Step 626: Execute a reset process, which includes setting i=1, copying Ncopy audio frames and forcing (K×M-Ncopy) audio frames to be copied in subsequent newly received audio frames, or only copying Ncopy audio frames , but not forcibly copying (K×M-Ncopy) audio frames in subsequent newly received audio frames.

根据流程60，本发明实施例可以为了不过度降低音频播放的品质，而不于后续新接收的音频框中强制复制或舍弃不足K×M个的音频框，即(K×M-Ndiscard)或(K×M-Ncopy)个音频框。其他流程60的详细操作原理已于前文揭露，故不再赘述。According to the process 60, in order not to excessively reduce the quality of audio playback, the embodiment of the present invention does not forcefully copy or discard less than K×M audio frames in subsequent newly received audio frames, that is, (K×M-Ndiscard) or (K×M-Ncopy) audio frames. The detailed operating principles of the other process 60 have been disclosed above, so they will not be repeated here.

请参考图7，图7为本发明实施例一速度调整装置50的方块示意图。速度调整装置50包括一音频读取装置500、一处理器单元510、一储存单元520、一输入单元530及一输出单元540。音频读取装置500可为一CD/DVD播放器或网路卡装置，用来取得一音频数据AU_DATA并通过储存单元520传送给处理器单元510作处理。输入单元530可为键盘、滑鼠、声音输入或其他使用者得以与速度调整装置50达成互动的装置，用以根据使用者输入信号，产生一速度调整信号PLR_ADJ给处理器单元510。储存单元520可为非挥发性记忆体，用来储存程序代码522，其通过处理器单元510处理，可实现前述任一流程(如调整判断流程20、速度调整流程40等)或其流程组合。输出单元540可为一喇叭，会播放经由处理器单元510处理的音频数据。举例来说，当使用者通过输入单元530加快播放速度时，处理器单元510可根据对应的速度调整信号PLR_ADJ，利用速度调整流程40舍弃所播放的音频数据的音频框，并将有音频框被舍弃的音频数据传送至播放单元540播放，让使用者感知到音频加速。由于速度调整装置50主要用来实现前述任一流程(如调整判断流程20、速度调整流程40等)或其流程组合，因此主要操作原理请参考前述。Please refer to FIG. 7 , which is a schematic block diagram of a speed adjusting device 50 according to an embodiment of the present invention. The speed adjusting device 50 includes an audio reading device 500 , a processor unit 510 , a storage unit 520 , an input unit 530 and an output unit 540 . The audio reading device 500 can be a CD/DVD player or a network card device, used to obtain an audio data AU_DATA and send it to the processor unit 510 for processing through the storage unit 520 . The input unit 530 can be a keyboard, a mouse, a voice input or other devices through which the user can interact with the speed adjustment device 50 , and is used to generate a speed adjustment signal PLR_ADJ to the processor unit 510 according to the user input signal. The storage unit 520 can be a non-volatile memory for storing the program code 522, which can be processed by the processor unit 510 to implement any one of the aforementioned processes (such as the adjustment judgment process 20, the speed adjustment process 40, etc.) or a combination thereof. The output unit 540 can be a speaker for playing the audio data processed by the processor unit 510 . For example, when the user speeds up the playback speed through the input unit 530, the processor unit 510 can use the speed adjustment process 40 to discard the audio frames of the played audio data according to the corresponding speed adjustment signal PLR_ADJ, and the audio frames will be discarded. The discarded audio data is sent to the playback unit 540 for playback, so that the user can perceive audio acceleration. Since the speed adjusting device 50 is mainly used to realize any one of the aforementioned processes (such as the adjustment judgment process 20 , the speed adjustment process 40 , etc.) or a combination thereof, please refer to the foregoing for the main operating principles.

综上所述，本发明实施例利用一听觉感知解码过程中分析音频数据的频率的相关数据(如窗型/频谱线数据)，判断是否舍弃或复制部分的音频数据，以于解码过程同时也达成播放速度的改变。如此一来，本发明不需要大量的寄存器寄存音频数据。In summary, the embodiment of the present invention utilizes the frequency-related data (such as window type/spectrum line data) of the audio data analyzed in the auditory perception decoding process to determine whether to discard or copy part of the audio data, so that the decoding process can also Achieving a change in playback speed. In this way, the present invention does not require a large number of registers to store audio data.

以上所述仅为本发明的较佳实施例，凡依本发明权利要求所做的均等变化与修饰，皆应属本发明的涵盖范围。The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the claims of the present invention shall fall within the scope of the present invention.

Claims

1. adjust a method for broadcasting speed, it is characterized in that, the method for described adjustment broadcasting speed includes:

One Auditory Perception decoding device receives a voice data;

Described Auditory Perception decoding device carries out the frequency analysis of one first audio frequency frame of described voice data;

Obtain the one first frequency-domain analysis data about described frequency analysis;

Receive a speed adjustment signal, judge to accelerate broadcasting speed or to slow down broadcasting speed to carry out adjustment broadcasting speed;

It is characterized in that comprising:

When described speed adjustment signal designation accelerates the broadcasting speed of described voice data, according to described first frequency-domain analysis data, judge whether to give up described first audio frequency frame;

In described first audio frequency frame be judged as can give up time, described Auditory Perception decoding device gives up described first audio frequency frame; And

When described speed adjustment signal designation slows down the broadcasting speed of described voice data, according to described first frequency-domain analysis data, judge whether to copy described first audio frequency frame;

In described first audio frequency frame be judged as can copy time, described Auditory Perception decoding device copies described first audio frequency frame;

Described first frequency-domain analysis data comprise the window type index being used to refer to the window type that described first audio frequency frame is changed for a frequency domain to the time domain used in described Auditory Perception decoding device;

When described speed adjustment signal designation accelerates the broadcasting speed of described voice data, judge whether that giving up described first audio frequency frame comprises according to described first frequency-domain analysis data: the broadcasting speed accelerating described voice data in described speed adjustment signal designation, and described window type index instruction described first audio frequency frame is when belonging to long window type, judgement can give up described first audio frequency frame; And judge whether that copying described first audio frequency frame comprises according to described first frequency-domain analysis data when described speed adjustment signal designation slows down the broadcasting speed of described voice data: the broadcasting speed slowing down described voice data in described speed adjustment signal designation, and described window type index instruction described first audio frequency frame is when belonging to long window type, judgement can copy described first audio frequency frame.

2. the method for claim 1, is characterized in that, described first frequency-domain analysis data separately comprise the spectrum line data of described first audio frequency frame.

3. method as claimed in claim 2, is characterized in that, when described speed adjustment signal designation accelerates the broadcasting speed of described voice data, judge whether that giving up described first audio frequency frame comprises according to described first frequency-domain analysis data:

When the described first audio frequency frame of described window type index instruction belongs to long window type, described spectrum line Data Placement is gone out multiple band unit;

Calculate one first energy sum total of described multiple band unit;

The one second audio frequency frame obtaining described voice data corresponds to one second energy sum total of described multiple band unit, and described second audio frequency frame is the previous audio frequency frame by the process of described Auditory Perception decoding device of described first audio frequency frame;

Calculate an energy sum total poor, poor=described second energy sum total of described energy sum total-described first energy sum total;

Absolute value in described energy sum total difference is less than one first threshold value being relevant to class simple signal energy sum total difference, described energy sum total difference be greater than be relevant to Auditory Perception after one second threshold value, the described second energy sum total of covering be less than one the 3rd threshold value that is relevant to quiet audio number and at least one condition that described first energy sum total is less than above three conditions of described 3rd threshold value meets time, judge to give up the data belonging to long window type in described first audio frequency frame or described first audio frequency frame; And

When described speed adjustment signal designation slows down the broadcasting speed of described voice data, judge whether that copying described first audio frequency frame comprises according to described first frequency-domain analysis data:

When the described first audio frequency frame of described window type index instruction belongs to long window type, described spectrum line Data Placement is gone out described multiple band unit;

Calculate described first energy sum total;

Obtain described second energy sum total;

Calculate described energy sum total poor;

Absolute value in described energy sum total difference is less than one first threshold value being relevant to class simple signal energy sum total difference, described energy sum total difference be greater than be relevant to Auditory Perception after one second threshold value, the described second energy sum total of covering be less than one the 3rd threshold value that is relevant to quiet audio number and at least one condition that described first energy sum total is less than above three conditions of described 3rd threshold value meets time, judge to copy the data belonging to long window type in described first audio frequency frame or described first audio frequency frame.

4. method as claimed in claim 3, it is characterized in that, when the described first audio frequency frame of described window type index instruction belongs to long window type, described spectrum line Data Placement is gone out described multiple band unit to comprise, when the described first audio frequency frame of described window type index instruction belongs to long window type, according to the flatness of spectrum line data, described spectrum line Data Placement is gone out to classify as class simple signal or the described multiple band unit for noise like.

5. adjust a method for broadcasting speed, it is characterized in that, the method for described adjustment broadcasting speed includes:

One Auditory Perception decoding device receives a voice data, and described voice data comprises multiple audio frequency frame;

Described Auditory Perception decoding device carries out the frequency analysis of described multiple audio frequency frame;

Receive a speed adjustment signal;

The feature of described method is to comprise:

During in the broadcasting speed that described speed adjustment signal designation accelerates described voice data to (N/ (N-M)) times, to the adjustment determining program whether each audio frequency frame execution audio frequency frame be used for handled by judgement of the N number of continuous audio frequency frame in described multiple audio frequency frame can be given up, wherein N, M are positive integer;

In by described adjustment determining program, when judging have M audio frequency frame to give up in described N number of continuous audio frequency frame, described Auditory Perception decoding device gives up the data at least partially of described M audio frequency frame;

During in the broadcasting speed that described speed adjustment signal designation slows down described voice data to (N/ (N+M)) times, the adjustment determining program whether the audio frequency frame handled by being used for judging can copy is performed to each audio frequency frame of the N number of continuous audio frequency frame in described multiple audio frequency frame; And

In by described adjustment determining program, when judging have M audio frequency frame to copy in described N number of continuous audio frequency frame, described Auditory Perception decoding device copies the data at least partially of described M audio frequency frame;

Described adjustment determining program comprises:

Obtain corresponding to one first handled audio frequency frame, one first frequency-domain analysis data about described frequency analysis;

When described speed adjustment signal designation accelerates the broadcasting speed of described voice data, according to described first frequency-domain analysis data, judge whether the data at least partially giving up described first audio frequency frame; And

When described speed adjustment signal designation slows down the broadcasting speed of described voice data, according to described first frequency-domain analysis data, judge whether the data at least partially copying described first audio frequency frame;

Described first frequency-domain analysis data comprise the window type index being used to refer to the window type that described first audio frequency frame is changed for the frequency domain in described Auditory Perception decoding device to time domain;

When described speed adjustment signal designation accelerates the broadcasting speed of described voice data, judge whether that giving up described first audio frequency frame comprises according to described first frequency-domain analysis data: the broadcasting speed accelerating described voice data in described speed adjustment signal designation, and described window type index instruction described first audio frequency frame is when belonging to long window type, judge to give up the data belonging to long window type in described first audio frequency frame or described first audio frequency frame; And judge whether that copying described first audio frequency frame comprises according to described first frequency-domain analysis data when described speed adjustment signal designation slows down the broadcasting speed of described voice data: the broadcasting speed slowing down described voice data in described speed adjustment signal designation, and described window type index instruction described first audio frequency frame is when belonging to long window type, judge to copy the data belonging to long window type in described first audio frequency frame or described first audio frequency frame.

6. method as claimed in claim 5, it is characterized in that, described first frequency-domain analysis data separately comprise the spectrum line data of described first audio frequency frame.

7. method as claimed in claim 6, is characterized in that, when described speed adjustment signal designation accelerates the broadcasting speed of described voice data, judge whether that giving up described first audio frequency frame comprises according to described first frequency-domain analysis data:

Calculate one first energy sum total of described multiple band unit;

Calculate an energy sum total poor, poor=described first energy sum total of described energy sum total-described second energy sum total;

Absolute value in described energy sum total difference is less than one first threshold value being relevant to class simple signal energy sum total difference, described energy sum total difference be greater than be relevant to Auditory Perception after one second threshold value, the described second energy sum total of covering be less than one the 3rd threshold value that is relevant to quiet audio number and at least one condition that described first energy sum total is less than three conditions of described 3rd threshold value meets time, judge to give up the data belonging to long window type in described first audio frequency frame or described first audio frequency frame; And

Calculate described first energy sum total;

Obtain described second energy sum total;

Calculate described energy sum total poor;

8. method as claimed in claim 7, it is characterized in that, when the described first audio frequency frame of described window type index instruction belongs to long window type, described spectrum line Data Placement is gone out described multiple band unit to comprise, when the described first audio frequency frame of described window type index instruction belongs to long window type, according to the flatness of spectrum line data, described spectrum line Data Placement is gone out described multiple band unit, and each band unit classifies as a class simple signal classification or a noise like classification.

9. method as claimed in claim 5, it is characterized in that, the method for described adjustment broadcasting speed separately comprises:

In by described adjustment determining program, when judging do not have K × M audio frequency frame to give up in the N number of continuous audio frequency frame of K group, described Auditory Perception decoding device gives up the data at least partially of at least one audio frequency frame after the N number of continuous audio frequency frame of described K group, and wherein K is positive integer; And

In by described adjustment determining program, when judging do not have K × M audio frequency frame to copy in the N number of continuous audio frequency frame of K group, described Auditory Perception decoding device is replicated in the data at least partially of at least one audio frequency frame after the N number of continuous audio frequency frame of described K group.

10. accelerate a method for broadcasting speed, it is characterized in that, the method for described acceleration broadcasting speed includes:

One Auditory Perception decoding device receives a voice data;

Receive one and accelerate adjustment signal;

The feature of described method is to comprise:

According to described first frequency-domain analysis data, judge whether to give up described first audio frequency frame; And

In described first audio frequency frame be judged as can give up time, described Auditory Perception decoding device according to described accelerate adjustment signal indicated by a broadcasting speed, give up the data at least partially of described first audio frequency frame;

Described first frequency-domain analysis data comprise the window type index being used to refer to the window type that described first audio frequency frame is changed to time domain for the frequent territory used in described Auditory Perception decoding device;

Judge whether that giving up described first audio frequency frame comprises according to described first frequency-domain analysis data: when the described first audio frequency frame of described window type index instruction belongs to long window type, judgement can give up described first audio frequency frame.

11. methods as claimed in claim 10, is characterized in that, described first frequency-domain analysis data separately comprise the spectrum line data of described first audio frequency frame.

12. methods as claimed in claim 11, is characterized in that, judge whether that giving up described first audio frequency frame comprises according to described first frequency-domain analysis data:

Calculate one first energy sum total of described multiple band unit;

Calculate an energy sum total poor, poor=described second energy sum total of described energy sum total-described first energy sum total; And

Absolute value in described energy sum total difference is less than one first threshold value being relevant to class simple signal energy sum total difference, described energy sum total difference be greater than be relevant to Auditory Perception after one second threshold value that covers and described second energy sum total be less than one the 3rd threshold value that is relevant to quiet audio number and at least one condition that described first energy sum total is less than above three conditions of described 3rd threshold value meets time, judge to give up the data belonging to long window type in described first audio frequency frame or described first audio frequency frame.

13. methods as claimed in claim 12, it is characterized in that, when the described first audio frequency frame of described window type index instruction belongs to long window type, described spectrum line Data Placement is gone out described multiple band unit to comprise, when the described first audio frequency frame of described window type index instruction belongs to long window type, according to the flatness of spectrum line data, described spectrum line Data Placement is gone out to classify as class simple signal or the described multiple band unit for noise like.

14. 1 kinds of methods slowing down broadcasting speed, is characterized in that, described in slow down broadcasting speed method include:

One Auditory Perception decoding device receives a voice data;

Receive one and slowly adjust signal;

The feature of described method is to comprise:

According to described first frequency-domain analysis data, judge whether to copy described first audio frequency frame; And

In described first audio frequency frame be judged as can copy time, described Auditory Perception decoding device according to described slowly adjustment signal make a broadcasting speed of instruction, copy the data at least partially of described first audio frequency frame;

Judge whether that copying described first audio frequency frame comprises according to described first frequency-domain analysis data: when the described first audio frequency frame of described window type index instruction belongs to long window type, judgement can copy described first audio frequency frame.

15. methods as claimed in claim 14, is characterized in that, described first frequency-domain analysis data separately comprise the spectrum line data of described first audio frequency frame.

16. methods as claimed in claim 15, is characterized in that, judge whether that giving up described first audio frequency frame comprises according to described first frequency-domain analysis data:

Calculate one first energy sum total of described multiple band unit;

17. methods as claimed in claim 16, it is characterized in that, when the described first audio frequency frame of described window type index instruction belongs to long window type, described spectrum line Data Placement is gone out described multiple band unit to comprise, when the described first audio frequency frame of described window type index instruction belongs to long window type, according to the flatness of spectrum line data, described spectrum line Data Placement is gone out to classify as class simple signal or the described multiple band unit for noise like.