CN105245496A

CN105245496A - Audio data play method and device

Info

Publication number: CN105245496A
Application number: CN201510536538.9A
Authority: CN
Inventors: 林成保
Original assignee: All Kinds Of Fruits Garden Guangzhou Network Technology Co Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2015-08-26
Filing date: 2015-08-26
Publication date: 2016-01-13
Anticipated expiration: 2035-08-26
Also published as: CN105245496B

Abstract

The invention discloses an audio data play method and device, and belongs to the field of Internet technology. The method comprises that the amount of to-be-played audio data stored in a jitter buffer memory is determined in a voice conversation process; if the amount of the audio data is lower than a preset first threshold, the time length of audio frames in the to-be-played audio data is increased; if the amount of the audio data is higher than a preset second threshold which is greater than the first threshold, the time length of the audio frames in the to-be-played audio data is decreased; the processed audio data to be played is played according to a playing sequence. Thus, the method and device can prevent the phenomenon of play vacancy or character missing.

Description

A kind of method and apparatus of playing audio-fequency data

Technical field

The present invention relates to Internet technical field, particularly a kind of method and apparatus of playing audio-fequency data.

Background technology

Along with the development of Internet technology and mechanics of communication, the voice call of VOIP (VoiceoverInternetProtocol, the internet audio call) technology exchanged based on voice packet is more and more subject to the favor of user.

VOIP technology is utilized to carry out the method for voice call often: just in two terminals of voice call, the terminal of either end sends the voice packet (may comprise multiframe voice data) through overcompression, the terminal of opposite end receives voice packet, be stored in dithering cache after voice packet is decompressed, successively the every frame voice data in dithering cache play.

Realizing in process of the present invention, inventor finds that prior art at least exists following problem:

Based on the method for above-mentioned call, when unstable networks, after the terminal transmission voice packet of transmitting terminal, the terminal of receiving terminal may not receive the voice packet of transmitting terminal transmission for a long time, thus causes not having voice data in dithering cache, or receives a large amount of voice packet instantaneously, thus the voice data in jitter-buffer overflows, voice data will be caused to lose, thus, cause the phenomenon occurring broadcasting sky or scarce word.

Summary of the invention

In order to solve the problem of prior art, embodiments provide a kind of method and apparatus of playing audio-fequency data.Described technical scheme is as follows:

First aspect, provides a kind of method of playing audio-fequency data, and described method comprises:

In the process of voice call, detect the data volume of the voice data to be played stored in dithering cache;

If the data volume of described voice data is lower than the first threshold preset, then duration is carried out to the audio frame in described voice data to be played and extend process; If the data volume of described voice data is higher than the Second Threshold preset, then carry out duration to the audio frame in described voice data to be played and shorten process, wherein, described first threshold is less than described Second Threshold;

According to broadcasting sequential, treated voice data to be played is play.

Optionally, described method also comprises:

Obtain the pitch period of each audio frame in described voice data to be played;

If the data volume of described voice data is lower than the first threshold preset, then duration is carried out to the audio frame in described voice data to be played and extend process; If the data volume of described voice data is higher than the Second Threshold preset, then duration is carried out to the audio frame in described voice data to be played and shortens process, comprising:

If the data volume of described voice data is lower than the first threshold preset, then each audio frame in described voice data to be played is extended 1 corresponding pitch period; If the data volume of described voice data is higher than the Second Threshold preset, then each audio frame in described voice data to be played is shortened 1 corresponding pitch period.

Optionally, if the data volume of described voice data is lower than the first threshold preset, then each audio frame in described voice data to be played is extended 1 corresponding pitch period; If the data volume of described voice data is higher than the Second Threshold preset, then each audio frame in described voice data to be played is shortened 1 corresponding pitch period, comprising:

If the data volume of described voice data is lower than the first threshold preset, in each audio frame then in described voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, the data of merging are inserted between described first pitch period and described second pitch period;

If the data volume of described voice data is higher than the Second Threshold preset, in each audio frame then in described voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, replace the data of described first pitch period and second pitch period by the data merged.

Optionally, described method also comprises:

If the data volume of described voice data is lower than the first threshold preset, then according to the prolongation duration preset, determine the processed in units duration that each audio frame is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame; Each audio frame in described voice data to be played is extended corresponding processed in units duration;

If the data volume of described voice data is higher than the Second Threshold preset, then according to the shortening duration preset, determine the processed in units duration that each audio frame is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame; Each audio frame in described voice data to be played is shortened corresponding processed in units duration.

Optionally, the described processed in units duration each audio frame in described voice data to be played being extended correspondence, comprising:

In each audio frame in described voice data to be played, the data of first processed in units duration and second processed in units duration are merged into the data of a processed in units duration, the data of merging are inserted between described first processed in units duration and described second processed in units duration;

The described processed in units duration each audio frame in described voice data to be played being shortened correspondence, comprising:

In each audio frame in described voice data to be played, the data of first processed in units duration and second processed in units duration are merged into the data of a processed in units duration, replace the data of described first processed in units duration and second processed in units duration by the data merged.

Optionally, the pitch period of each audio frame in the described voice data to be played of described acquisition, comprising:

If the audio frame in described voice data to be played records pitch period, then from each audio frame described voice data to be played, obtain the pitch period of each audio frame; If the audio frame in described voice data to be played does not record pitch period, then based on pitch period searching algorithm, and each decoded audio frame, determine the pitch period of each audio frame.

Second aspect, provides a kind of device of playing audio-fequency data, and described device comprises:

Detection module, in the process of voice call, detects the data volume of the voice data to be played stored in dithering cache;

Processing module, if for described voice data data volume lower than preset first threshold, then to the audio frame in described voice data to be played carry out duration extend process; If the data volume of described voice data is higher than the Second Threshold preset, then carry out duration to the audio frame in described voice data to be played and shorten process, wherein, described first threshold is less than described Second Threshold;

Playing module, for according to broadcasting sequential, plays treated voice data to be played.

Optionally, described device also comprises acquisition module, for:

Described processing module, for:

Optionally, described processing module, comprising:

First process submodule, if for described voice data data volume lower than preset first threshold, in each audio frame then in described voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, the data of merging are inserted between described first pitch period and described second pitch period;

Second process submodule, if for described voice data data volume higher than preset Second Threshold, in each audio frame then in described voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, replace the data of described first pitch period and second pitch period by the data merged.

Optionally, described acquisition module, for:

Described first process submodule, for:

Described second process submodule, for:

Optionally, described first process submodule, for:

Described second process submodule, for:

Optionally, described acquisition module, for:

The beneficial effect that the technical scheme that the embodiment of the present invention provides is brought is:

In the embodiment of the present invention, in the process of voice call, detect the data volume of the voice data to be played stored in dithering cache, if the data volume of voice data is lower than the first threshold preset, then the audio frame treated in playing audio-fequency data carries out duration and extends process; If the data volume of voice data is higher than the Second Threshold preset, then the audio frame treated in playing audio-fequency data carries out duration and shortens process, and wherein, first threshold is less than Second Threshold, according to broadcasting sequential, plays treated voice data to be played.Like this, when in dithering cache, data volume is less, the voice data in dithering cache is play slack-off, when unstable networks, the longer time can be provided to make in buffer memory stored in new voice data, when in dithering cache, data volume is more, the voice data in dithering cache is play as early as possible, ensures there are as far as possible many spaces in dithering cache, the a large amount of voice datas received instantaneously can be preserved, prevent the voice data in jitter-buffer from overflowing, thus, the phenomenon broadcasting sky or scarce word can be prevented.

Accompanying drawing explanation

In order to be illustrated more clearly in the technical scheme in the embodiment of the present invention, below the accompanying drawing used required in describing embodiment is briefly described, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the flow chart of the method for a kind of playing audio-fequency data that the embodiment of the present invention provides;

Fig. 2 is a kind of schematic diagram carrying out processing according to the data volume in dithering cache that the embodiment of the present invention provides;

Fig. 3 is the schematic diagram of a kind of overtime long process that the embodiment of the present invention provides;

The schematic diagram of a kind of long process when shortening that Fig. 4 is that the embodiment of the present invention provides;

Fig. 5 is the structural representation of the device of a kind of playing audio-fequency data that the embodiment of the present invention provides;

Fig. 6 is the structural representation of the device of a kind of playing audio-fequency data that the embodiment of the present invention provides;

Fig. 7 is the structural representation of the device of a kind of playing audio-fequency data that the embodiment of the present invention provides;

Fig. 8 is the structural representation of a kind of terminal that the embodiment of the present invention provides.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly, below in conjunction with accompanying drawing, embodiment of the present invention is described further in detail.

Embodiment one

Embodiments provide a kind of method of playing audio-fequency data, as shown in Figure 1, the handling process of the method can comprise following step:

Step 101, in the process of voice call, detects the data volume of the voice data to be played stored in dithering cache.

Step 102, if the data volume of voice data is lower than the first threshold preset, then the audio frame treated in playing audio-fequency data carries out duration and extends process; If the data volume of voice data is higher than the Second Threshold preset, then the audio frame treated in playing audio-fequency data carries out duration and shortens process, and wherein, first threshold is less than Second Threshold.

Step 103, according to broadcasting sequential, plays treated voice data to be played.

Embodiment two

Embodiments provide a kind of method of playing audio-fequency data, the executive agent of the method is terminal.Wherein, this terminal can be terminal console, can be the mobile terminal such as mobile phone, panel computer.Processor, memory, transceiver and loud speaker can be provided with in this terminal, processor may be used for detecting the data volume of the voice data to be played in dithering cache and treating playing audio-fequency data according to testing result processing accordingly, memory may be used for the data to storing data and the generation needed in following processing procedure, transceiver may be used for receiving and sending data, and loud speaker may be used for the broadcasting to treated voice data to be played.Can also be provided with decoder, the audio frame through coding that decoder may be used for receiving is decoded.This terminal can also be provided with microphone, encoder, and microphone may be used for obtaining the voice signal of user in voice call, and the voice signal that encoder may be used for terminal obtains is encoded.

Below in conjunction with embodiment, be described in detail the handling process shown in Fig. 1, content can be as follows:

Wherein, dithering cache may be used for the voice data to be played that storage terminal receives.

In force, in the voice call process exchanged based on voice packet, after the terminal (can be called transmitting terminal) of voice call one end sends voice packet, the terminal of voice call opposite end can receive this voice packet, wherein, voice packet can include multiframe voice data, carries out de-packaging operation to the voice packet received, and the voice data comprised by voice packet is stored in dithering cache.

The voice packet that transmitting terminal sends can carry the sequence number of transmitting time and this voice packet, after terminal receives the voice packet of transmitting terminal transmission at every turn, de-packaging operation is carried out to it, multiframe voice data wherein and sequence number corresponding to this voice packet can be obtained, can judge whether the sequence number of the voice packet be currently received abutted to the sequence number of the voice packet that the last time receives, if the sequence number of the voice packet be currently received abutted to the sequence number of the voice packet that the last time receives, then the multiframe voice data that this voice packet can be comprised is stored in dithering cache.If be separated with other sequence number between going back in the middle of the sequence number of the voice packet that the sequence number of the voice packet be currently received and last time receive, then after can waiting for certain hour, then the multiframe voice data comprised by this voice packet is stored in dithering cache.If in the time waited for, terminal receives the voice packet that sequence number abutted to the voice packet sequence number that the last time receives, the multiframe voice data that then this voice packet can be comprised is stored in dithering cache, is then stored in dithering cache by the multiframe voice data that the voice packet be currently received comprises.If terminal receives the voice packet that sequence number abutted to the voice packet sequence number once received after above-mentioned certain hour, the multiframe voice data that then this voice packet can be comprised is stored in dithering cache, and the position stored abutted to the voice data that voice packet that the last time receives comprises.Namely the voice data to be played be stored in dithering cache according to the sequential storage play, also can generate the time storage of above-mentioned voice data according to the terminal of transmitting terminal.

Can pre-set the sense cycle for detecting the data volume in dithering cache, in the process of voice call, terminal periodically can detect the data volume of the voice data to be played stored in dithering cache according to the sense cycle preset.

Step 102, if the data volume of voice data is lower than the first threshold preset, then the audio frame treated in playing audio-fequency data carries out duration and extends process; If the data volume of voice data is higher than the Second Threshold preset, then the audio frame treated in playing audio-fequency data carries out duration and shortens process.

Wherein, first threshold is less than Second Threshold.

In force, in voice call, voice data to be played is stored in after in dithering cache, decode operation can be carried out successively according to each audio frame that treat in playing audio-fequency data of order stored, decoded audio frame corresponding to each audio frame and coder parameters corresponding to each audio frame will be obtained.

Two threshold values of the data volume characterized in dithering cache can be pre-set, first threshold and Second Threshold can be called, wherein, first threshold can be less than Second Threshold, as shown in Figure 2, if terminal detects that the data volume of the voice data be stored in dithering cache is lower than the first threshold preset, then can extend the duration of the audio frame (can be decoded audio frame) to be played be stored in dithering cache, even if the broadcasting speed of audio frame also to be played is slack-off, like this, for unstable networks, terminal likely can not receive the situation of voice packet in the long period, effectively can prevent and broadcast empty phenomenon.If terminal detects that the data volume of the voice data be stored in dithering cache is higher than the Second Threshold preset, then can shorten the duration of the audio frame (can be decoded audio frame) to be played be stored in dithering cache, even if the broadcasting speed of audio frame also to be played accelerates, like this, for unstable networks, terminal receives the situation of a large amount of voice packet instantaneously, can effectively prevent the voice data in dithering cache from overflowing and causing playing card.If terminal detects that the data volume of the voice data be stored in dithering cache is between the first threshold preset and default Second Threshold, can not carry out any process to the duration of the audio frame to be played be stored in dithering cache.

Optionally, the each audio frame treated in playing audio-fequency data carries out after decode operation obtains each decoded audio frame successively, can obtain the pitch period of each audio frame, accordingly, processing procedure can be as follows: the pitch period obtaining each audio frame in voice data to be played.

Wherein, pitch period is the inverse of the frequency (being called fundamental frequency) of vocal cords vibrations, and the correlation of the voice signal of the pitch period in interval is maximum, and pitch period is the intrinsic parameter of one of audio signal.

In force, after each audio frame treated in voice call in playing audio-fequency data is decoded successively, can obtain pitch period corresponding to each audio frame accordingly, the audio frame namely treated in playing audio-fequency data can obtain pitch period corresponding to this audio frame after decoding.

Optionally, the mode obtaining the pitch period of each audio frame in playing audio-fequency data can be varied, following present several feasible mode:

Mode one, if the audio frame in described voice data to be played records pitch period, then obtains the pitch period of each audio frame from each audio frame described voice data to be played.

In force, if characterize in the coder parameters that each audio frame obtained after treating each audio frame decoding in playing audio-fequency data is corresponding audio frame whether record the Status mark parameters of pitch period value be 1, namely represent that the audio frame in voice data to be played records pitch period, directly can obtain the pitch period by each audio frame obtained of decoding.

Mode two, if the audio frame in described voice data to be played does not record pitch period, then based on pitch period searching algorithm, and each decoded audio frame, determine the pitch period of each audio frame.

In force, if characterize in the coder parameters that each audio frame obtained after treating each audio frame decoding in playing audio-fequency data is corresponding audio frame whether record the Status mark parameters of pitch period value be 0, namely represent that the audio frame in voice data to be played does not record pitch period, then can by pitch period searching algorithms such as correlation method or average amplitude difference methods, to decoding, the decoded audio frame obtained calculates, and obtains the pitch period that each audio frame is corresponding.

Optionally, for the situation of the pitch period of each audio frame in above-mentioned acquisition voice data to be played, different based on needing the selection principle of the duration extending or shorten to audio frame, the processing mode of step 102 can be varied, following present several feasible processing mode:

Mode one, choose the duration that pitch period corresponding to each audio frame in voice data to be played extends as each audio frame or shorten, corresponding processing procedure can be as follows: if the data volume of voice data is lower than the first threshold preset, then each audio frame in voice data to be played is extended 1 corresponding pitch period; If the data volume of audio data is higher than the Second Threshold preset, then each audio frame in voice data to be played is shortened 1 corresponding pitch period.

In force, when the duration of the audio frame in the voice data to be played be stored in dithering cache is extended or shorten process, can extend or shorten 1 pitch period that each audio frame is corresponding.If when terminal detects the data volume of the voice data to be played be stored in dithering cache lower than first threshold, the duration can treating each audio frame in playing audio-fequency data carries out prolongation process, the pitch period duration that each audio frame 1 is corresponding can be extended, namely each audio frame extends different durations, is all to extend 1 corresponding pitch period.If when terminal detects the data volume of the voice data to be played be stored in dithering cache higher than Second Threshold, the duration can treating each audio frame in playing audio-fequency data carries out shortening process, the pitch period duration that each audio frame 1 is corresponding can be shortened, namely each audio frame shortens different durations, is all to shorten 1 corresponding pitch period.Like this, the process of each audio frame is just extended or shortens pitch period duration corresponding to this audio frame, namely the pitch period not changing each audio frame can not change the fundamental frequency of each audio frame, wherein, do not change fundamental frequency both can not modify tone, the effect of original each audio frame being carried out to speed-variation without tone can be reached.

Optionally, for the situation of the pitch period that each audio frame extends or shortening 1 is corresponding, the duration that each audio frame extends or shortens can be obtained by the data merging of the first two pitch period, accordingly, processing procedure can be as follows: if the data volume of voice data is lower than the first threshold preset, in each audio frame then in voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, the data of merging are inserted between first pitch period and second pitch period; If the data volume of voice data is higher than the Second Threshold preset, in each audio frame then in voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, replace the data of first pitch period and second pitch period by the data merged.

In force, if the data volume of voice data is lower than the first threshold preset, the data of the data of first pitch period in each audio frame in voice data to be played and second pitch period can be carried out correspondence superposition, wherein, the first weight and second weight of the data of first pitch period during superposition and the data difference correspondence of second pitch period can be pre-set, first weight and the second weight and be 1, can 0.5 be respectively.As shown in Figure 3, after corresponding superposition, can obtain by the data of a pitch period of the Data Synthesis of the data of first pitch period and second pitch period, can be inserted between first pitch period and second pitch period, using the audio frame adding a pitch period that obtains as the audio frame after process corresponding to this audio frame.If the data volume of voice data is higher than the Second Threshold preset, the data of the data of first pitch period in each audio frame in voice data to be played and second pitch period can be carried out correspondence superposition, wherein, the first weight and second weight of the data of first pitch period during superposition and the data difference correspondence of second pitch period can be pre-set, first weight and the second weight and be 1, can 0.5 be respectively.As shown in Figure 4, after corresponding superposition, can obtain by the data of a pitch period of the Data Synthesis of the data of first pitch period and second pitch period, the data of first pitch period and second pitch period can be replaced, using the audio frame shortening a pitch period that obtains as the audio frame after process corresponding to this audio frame.

Mode two, the preset duration extending or shorten is needed according to each audio frame, choose the actual duration that will extend or shorten of each audio frame, corresponding processing procedure can be as follows: if the data volume of voice data is lower than the first threshold preset, then according to the prolongation duration preset, determine the processed in units duration that each audio frame is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame; Each audio frame in voice data to be played is extended corresponding processed in units duration; If the data volume of voice data is higher than the Second Threshold preset, then according to the shortening duration preset, determine the processed in units duration that each audio frame is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame; Each audio frame in voice data to be played is shortened corresponding processed in units duration.

In force, prolongation duration corresponding to each audio frame can be pre-set and shorten duration.If the data volume of voice data is lower than the first threshold preset, can according to the prolongation duration preset divided by pitch period corresponding to each audio frame, a quotient can be obtained, if this quotient is integer, then the pitch period that this value is corresponding with each audio frame can be multiplied and obtain the actual processed in units duration that will extend corresponding to each audio frame, namely this quotient is the multiple that processed in units duration is corresponding.This quotient may not be an integer, for this kind of situation, the whole part of this quotient can be got, pitch period corresponding with each audio frame for value (rounding downwards by quotient) corresponding for integer part is multiplied and obtains the actual processed in units duration that will extend corresponding to each audio frame, also value corresponding for integer part can be added the pitch period corresponding with each audio frame of the value (rounding up by quotient) after 1 to be multiplied and to obtain the actual processed in units duration (namely processed in units duration is the integral multiple of the pitch period that each audio frame is corresponding) that will extend corresponding to each audio frame, such as, the prolongation duration preset is 7ms, the 3ms during pitch period of a certain audio frame, according under the method that rounds, processed in units duration can be pitch period and the 6ms of twice, according to the method rounded up, processed in units duration can be pitch period and the 9ms of 3 times.Then, each audio frame can treated in playing audio-fequency data extends processed in units duration corresponding to each audio frame of obtaining.If the data volume of voice data is higher than the Second Threshold preset, can according to the shortening duration preset divided by pitch period corresponding to each audio frame, a quotient can be obtained, this quotient may not be an integer, for this kind of situation, can carry out downwards this quotient or round up, the whole part of this quotient can be got, pitch period corresponding with each audio frame for value (rounding downwards by quotient) corresponding for integer part is multiplied and obtains the actual processed in units duration that will shorten corresponding to each audio frame, also value corresponding for integer part can be added the pitch period corresponding with each audio frame of the value (rounding up by quotient) after 1 to be multiplied and to obtain the actual processed in units duration that will shorten corresponding to each audio frame, then, the each audio frame can treated in playing audio-fequency data shortens processed in units duration corresponding to each audio frame of obtaining.

Optionally, in order to the duration making each audio frame extend levels off to identical duration, when determining processed in units duration corresponding to current audio frame, it is also conceivable to the difference of processed in units duration corresponding to previous audio frame and default prolongation duration, accordingly, processing procedure can be as follows: for the audio frame of first in voice data to be played, according to the prolongation duration preset, determines the processed in units duration that first audio frame is corresponding; For other audio frame each outside first audio frame in voice data to be played, according to the prolongation duration preset, and the difference of processed in units duration corresponding to the previous audio frame of other audio frame and the prolongation duration preset, determine the processed in units duration that other audio frame is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame.

In force, if the data volume of voice data is lower than the first threshold preset, when determining processed in units duration corresponding to each audio frame, for the audio frame of first in voice data to be played, can be according to the method described above, according to the prolongation duration preset, determine the processed in units duration that first audio frame is corresponding, for other audio frame each outside first audio frame in voice data to be played, for above-mentioned situation about rounding downwards, the difference of processed in units duration corresponding for the previous audio frame of other audio frame with the prolongation duration preset can be added with the overtime appearance preset, obtain the duration that other audio frame each after being added should extend, according to this duration, can in the manner described above two method determine the processed in units duration that other audio frame each is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame, such as, the prolongation duration preset is 7ms, the pitch period of a certain audio frame is 3ms, the processed in units duration determined is 6ms, then processed in units duration and the overtime appearance difference 1ms preset, the duration that next audio frame of this audio frame should extend can be that this difference adds default prolongation duration and this duration value (namely can be regarded as default prolongation duration by 8ms, pitch period determination processed in units length that can be corresponding with audio frame according to this duration value), if the pitch period of next audio frame is 2.5ms, adopt the method rounded downwards, processed in units duration can be pitch period and the 7.5ms of 3 times.For above-mentioned situation about rounding up, processed in units duration corresponding with the previous audio frame of other audio frame for default prolongation duration and the overtime appearance preset can be subtracted, obtain the duration that other audio frame each after subtracting each other should extend, according to this duration, can in the manner described above two method determine the processed in units duration that other audio frame each is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame, such as, the prolongation duration preset is 7ms, the pitch period of a certain audio frame is 3ms, the processed in units duration determined is 9ms, then processed in units duration and the overtime appearance difference 2ms preset, the duration that next audio frame of this audio frame should extend can be that default prolongation duration deducts this difference and this duration value (namely can be regarded as default prolongation duration by 5ms, pitch period determination processed in units length that can be corresponding with audio frame according to this duration value), if the pitch period of next audio frame is 3.5ms, adopt the method rounded up, processed in units duration can be pitch period and the 7ms of 2 times.

In order to the duration making each audio frame shorten levels off to identical duration, when determining processed in units duration corresponding to current audio frame, it is also conceivable to the difference of processed in units duration corresponding to previous audio frame and default shortening duration, accordingly, processing procedure can be as follows: for the audio frame of first in voice data to be played, according to the shortening duration preset, determine the processed in units duration that first audio frame is corresponding; For other audio frame each outside first audio frame in voice data to be played, according to the shortening duration preset, and the difference of processed in units duration corresponding to the previous audio frame of other audio frame and described prolongation duration, determine the processed in units duration that other audio frame is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame.

In force, if the data volume of voice data is higher than the Second Threshold preset, when determining processed in units duration corresponding to each audio frame, for the audio frame of first in voice data to be played, can be according to the method described above, according to the shortening duration preset, determine the processed in units duration that first audio frame is corresponding.For other audio frame each outside first audio frame in voice data to be played, for above-mentioned situation about rounding downwards, processed in units duration corresponding for the previous audio frame of other audio frame can be added with the shortening duration preset with the difference of the shortening duration preset, obtain the duration that other audio frame each after being added should shorten, according to this duration, can in the manner described above two method determine the processed in units duration that each audio frame is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame.For above-mentioned situation about rounding up, processed in units duration corresponding with the previous audio frame of other audio frame for default shortening duration and the shortening duration preset can be subtracted each other, obtain the duration that other audio frame each after subtracting each other should shorten, according to this duration, can in the manner described above two method determine the processed in units duration that other audio frame each is corresponding, wherein, each processed in units duration is the integral multiple of the pitch period of corresponding audio frame.

Optionally, the situation of processed in units duration is extended for above-mentioned each audio frame, the duration that each audio frame extends can be obtained by the data merging of the first two processed in units duration, accordingly, processing procedure can be as follows: in each audio frame in voice data to be played, the data of first processed in units duration and second processed in units duration are merged into the data of a processed in units duration, the data of merging are inserted between first processed in units duration and second processed in units duration.

In force, if the data volume of voice data is lower than the first threshold preset, the data of the data of first processed in units duration in each audio frame in voice data to be played and second processed in units duration can be carried out correspondence superposition, wherein, the first weight and second weight of the data of first processed in units duration during superposition and the data difference correspondence of second processed in units duration can be pre-set, first weight and the second weight and be 1, can 0.5 be respectively.After corresponding superposition, can obtain by the data of a pitch period of the Data Synthesis of the data of first processed in units duration and second processed in units duration, can be inserted between first processed in units duration and second processed in units duration, using the audio frame adding a processed in units duration that obtains as the audio frame after process corresponding to this audio frame.

The situation of processed in units duration is shortened for above-mentioned each audio frame, the duration that each audio frame shortens can be obtained by the data merging of the first two processed in units duration, accordingly, processing procedure can be as follows: in each audio frame in voice data to be played, the data of first processed in units duration and second processed in units duration are merged into the data of a processed in units duration, replace the data of first processed in units duration and second processed in units duration by the data merged.

In force, if the data volume of voice data is higher than the Second Threshold preset, the data of the data of first processed in units duration in each audio frame in voice data to be played and second processed in units duration can be carried out correspondence superposition, wherein, the first weight and second weight of the data of first processed in units duration during superposition and the data difference correspondence of second processed in units duration can be pre-set, first weight and the second weight and be 1, can 0.5 be respectively.After corresponding superposition, can obtain by the data of a processed in units duration of the Data Synthesis of the data of first processed in units duration and second processed in units duration, the data of first processed in units duration and second processed in units duration can be replaced, using the audio frame shortening a processed in units duration that obtains as the audio frame after process corresponding to this audio frame.

In force, after each audio frame treated in playing audio-fequency data is decoded successively, decoded each audio frame is stored in plays in buffer memory through extending or shorten each audio frame of process and the audio frame through extending or shorten process, by system according to playing sequence, voice data to be played is wherein play.

Embodiment three

Based on identical technical conceive, the embodiment of the present invention additionally provides a kind of device of playing audio-fequency data, and as shown in Figure 5, this device comprises:

Detection module 510, in the process of voice call, detects the data volume of the voice data to be played stored in dithering cache;

Processing module 520, if for described voice data data volume lower than preset first threshold, then to the audio frame in described voice data to be played carry out duration extend process; If the data volume of described voice data is higher than the Second Threshold preset, then carry out duration to the audio frame in described voice data to be played and shorten process, wherein, described first threshold is less than described Second Threshold;

Playing module 530, for according to broadcasting sequential, plays treated voice data to be played.

Optionally, as shown in Figure 6, described device also comprises acquisition module 540, for:

Described processing module 520, for:

Optionally, as shown in Figure 7, described processing module 520, comprising:

First process submodule 5201, if for described voice data data volume lower than preset first threshold, in each audio frame then in described voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, the data of merging are inserted between described first pitch period and described second pitch period;

Second process submodule 5202, if for described voice data data volume higher than preset Second Threshold, in each audio frame then in described voice data to be played, the data of first pitch period and second pitch period are merged into the data of a pitch period, replace the data of described first pitch period and second pitch period by the data merged.

Optionally, described acquisition module 540, for:

Described first process submodule 5201, for:

Described second process submodule 5202, for:

Optionally, described first process submodule 5201, for:

Described second process submodule 5202, for:

Optionally, described acquisition module 540, for:

It should be noted that: the device of the playing audio-fequency data that above-described embodiment provides is when playing audio-fequency data, only be illustrated with the division of above-mentioned each functional module, in practical application, can distribute as required and by above-mentioned functions and be completed by different functional modules, internal structure by equipment is divided into different functional modules, to complete all or part of function described above.In addition, the device of the playing audio-fequency data that above-described embodiment provides and the embodiment of the method for playing audio-fequency data belong to same design, and its specific implementation process refers to embodiment of the method, repeats no more here.

Embodiment four

Please refer to Fig. 8, it illustrates the structural representation of the terminal involved by the embodiment of the present invention, this terminal may be used for the method implementing the playing audio-fequency data provided in above-described embodiment.Specifically:

Terminal 800 can comprise RF (RadioFrequency, radio frequency) circuit 110, the memory 120 including one or more computer-readable recording mediums, input unit 130, display unit 140, transducer 150, voicefrequency circuit 160, WiFi (wirelessfidelity, Wireless Fidelity) module 170, include the parts such as processor 180 and power supply 190 that more than or processes core.It will be understood by those skilled in the art that the restriction of the not structure paired terminal of the terminal structure shown in Fig. 8, the parts more more or less than diagram can be comprised, or combine some parts, or different parts are arranged.Wherein:

RF circuit 110 can be used for receiving and sending messages or in communication process, the reception of signal and transmission, especially, after being received by the downlink information of base station, transfer to more than one or one processor 180 to process; In addition, base station is sent to by relating to up data.Usually, RF circuit 110 includes but not limited to antenna, at least one amplifier, tuner, one or more oscillator, subscriber identity module (SIM) card, transceiver, coupler, LNA (LowNoiseAmplifier, low noise amplifier), duplexer etc.In addition, RF circuit 110 can also by radio communication and network and other devices communicatings.Described radio communication can use arbitrary communication standard or agreement, include but not limited to GSM (GlobalSystemofMobilecommunication, global system for mobile communications), GPRS (GeneralPacketRadioService, general packet radio service), CDMA (CodeDivisionMultipleAccess, code division multiple access), WCDMA (WidebandCodeDivisionMultipleAccess, Wideband Code Division Multiple Access (WCDMA)), LTE (LongTermEvolution, Long Term Evolution), Email, SMS (ShortMessagingService, Short Message Service) etc.

Memory 120 can be used for storing software program and module, and processor 180 is stored in software program and the module of memory 120 by running, thus performs the application of various function and data processing.Memory 120 mainly can comprise storage program district and store data field, and wherein, storage program district can storage operation system, application program (such as sound-playing function, image player function etc.) etc. needed at least one function; Store data field and can store the data (such as voice data, phone directory etc.) etc. created according to the use of terminal 800.In addition, memory 120 can comprise high-speed random access memory, can also comprise nonvolatile memory, such as at least one disk memory, flush memory device or other volatile solid-state parts.Correspondingly, memory 120 can also comprise Memory Controller, to provide the access of processor 180 and input unit 130 pairs of memories 120.

Input unit 130 can be used for the numeral or the character information that receive input, and produces and to arrange with user and function controls relevant keyboard, mouse, action bars, optics or trace ball signal and inputs.Particularly, input unit 130 can comprise Touch sensitive surface 131 and other input equipments 132.Touch sensitive surface 131, also referred to as touch display screen or Trackpad, user can be collected or neighbouring touch operation (such as user uses any applicable object or the operations of annex on Touch sensitive surface 131 or near Touch sensitive surface 131 such as finger, stylus) thereon, and drive corresponding jockey according to the formula preset.Optionally, Touch sensitive surface 131 can comprise touch detecting apparatus and touch controller two parts.Wherein, touch detecting apparatus detects the touch orientation of user, and detects the signal that touch operation brings, and sends signal to touch controller; Touch controller receives touch information from touch detecting apparatus, and converts it to contact coordinate, then gives processor 180, and the order that energy receiving processor 180 is sent also is performed.In addition, the polytypes such as resistance-type, condenser type, infrared ray and surface acoustic wave can be adopted to realize Touch sensitive surface 131.Except Touch sensitive surface 131, input unit 130 can also comprise other input equipments 132.Particularly, other input equipments 132 can include but not limited to one or more in physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, action bars etc.

Display unit 140 can be used for the various graphical user interface showing information or the information being supplied to user and the terminal 800 inputted by user, and these graphical user interface can be made up of figure, text, icon, video and its combination in any.Display unit 140 can comprise display floater 141, optionally, the form such as LCD (LiquidCrystalDisplay, liquid crystal display), OLED (OrganicLight-EmittingDiode, Organic Light Emitting Diode) can be adopted to configure display floater 141.Further, Touch sensitive surface 131 can cover display floater 141, when Touch sensitive surface 131 detects thereon or after neighbouring touch operation, send processor 180 to determine the type of touch event, on display floater 141, provide corresponding vision to export with preprocessor 180 according to the type of touch event.Although in fig. 8, Touch sensitive surface 131 and display floater 141 be as two independently parts realize input and input function, in certain embodiments, can by Touch sensitive surface 131 and display floater 141 integrated and realize input and output function.

Terminal 800 also can comprise at least one transducer 150, such as optical sensor, motion sensor and other transducers.Particularly, optical sensor can comprise ambient light sensor and proximity transducer, and wherein, ambient light sensor the light and shade of environmentally light can regulate the brightness of display floater 141, proximity transducer when terminal 800 moves in one's ear, can cut out display floater 141 and/or backlight.As the one of motion sensor, Gravity accelerometer can detect the size of all directions (are generally three axles) acceleration, size and the direction of gravity can be detected time static, can be used for identifying the application (such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating) of mobile phone attitude, Vibration identification correlation function (such as pedometer, knock) etc.; As for terminal 800 also other transducers such as configurable gyroscope, barometer, hygrometer, thermometer, infrared ray sensor, do not repeat them here.

Voicefrequency circuit 160, loud speaker 161, microphone 162 can provide the audio interface between user and terminal 800.Voicefrequency circuit 160 can by receive voice data conversion after the signal of telecommunication, be transferred to loud speaker 161, by loud speaker 161 be converted to voice signal export; On the other hand, the voice signal of collection is converted to the signal of telecommunication by microphone 162, voice data is converted to after being received by voicefrequency circuit 160, after again voice data output processor 180 being processed, through RF circuit 110 to send to such as another terminal, or export voice data to memory 120 to process further.Voicefrequency circuit 160 also may comprise earphone jack, to provide the communication of peripheral hardware earphone and terminal 800.

WiFi belongs to short range wireless transmission technology, and by WiFi module 170, terminal 800 can help that user sends and receive e-mail, browsing page and access streaming video etc., and its broadband internet wireless for user provides is accessed.Although Fig. 8 shows WiFi module 170, be understandable that, it does not belong to must forming of terminal 800, can omit in the scope of essence not changing invention as required completely.

Processor 180 is control centres of terminal 800, utilize the various piece of various interface and the whole mobile phone of connection, software program in memory 120 and/or module is stored in by running or performing, and call the data be stored in memory 120, perform various function and the deal with data of terminal 800, thus integral monitoring is carried out to mobile phone.Optionally, processor 180 can comprise one or more process core; Preferably, processor 180 accessible site application processor and modem processor, wherein, application processor mainly processes operating system, user interface and application program etc., and modem processor mainly processes radio communication.Be understandable that, above-mentioned modem processor also can not be integrated in processor 180.

Terminal 800 also comprises the power supply 190 (such as battery) of powering to all parts, preferably, power supply can be connected with processor 180 logic by power-supply management system, thus realizes the functions such as management charging, electric discharge and power managed by power-supply management system.Power supply 190 can also comprise one or more direct current or AC power, recharging system, power failure detection circuit, power supply changeover device or the random component such as inverter, power supply status indicator.

Although not shown, terminal 800 can also comprise camera, bluetooth module etc., does not repeat them here.Specifically in the present embodiment, the display unit of terminal 800 is touch-screen displays, terminal 800 also includes memory, and one or more than one program, one of them or more than one program are stored in memory, and are configured to be performed by more than one or one processor state more than one or one program package containing the instruction for carrying out following operation:

According to broadcasting sequential, treated voice data to be played is play.

Optionally, described method also comprises:

One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can have been come by hardware, the hardware that also can carry out instruction relevant by program completes, described program can be stored in a kind of computer-readable recording medium, the above-mentioned storage medium mentioned can be read-only memory, disk or CD etc.

The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a method for playing audio-fequency data, is characterized in that, described method comprises:

According to broadcasting sequential, treated voice data to be played is play.

2. method according to claim 1, is characterized in that, described method also comprises:

3. method according to claim 2, is characterized in that, if the data volume of described voice data is lower than the first threshold preset, then each audio frame in described voice data to be played is extended 1 corresponding pitch period; If the data volume of described voice data is higher than the Second Threshold preset, then each audio frame in described voice data to be played is shortened 1 corresponding pitch period, comprising:

4. method according to claim 1, is characterized in that, described method also comprises:

5. method according to claim 4, is characterized in that, the described processed in units duration each audio frame in described voice data to be played being extended correspondence, comprising:

6. the method according to any one of claim 2,4, is characterized in that, the pitch period of each audio frame in the described voice data to be played of described acquisition, comprising:

7. a device for playing audio-fequency data, is characterized in that, described device comprises:

8. device according to claim 7, is characterized in that, described device also comprises acquisition module, for:

Described processing module, for:

9. device according to claim 8, is characterized in that, described processing module, comprising:

10. device according to claim 7, is characterized in that, described acquisition module, for:

Described first process submodule, for:

Described second process submodule, for:

11. devices according to claim 10, is characterized in that, described first process submodule, for:

Described second process submodule, for:

12. devices according to Claim 8, described in 10 any one, is characterized in that, described acquisition module, for: