CN105245496B

CN105245496B - A kind of method and apparatus of playing audio-fequency data

Info

Publication number: CN105245496B
Application number: CN201510536538.9A
Authority: CN
Inventors: 林成保
Original assignee: All Kinds Of Fruits Garden Guangzhou Network Technology Co Ltd
Current assignee: Bigo Technology Pte Ltd
Priority date: 2015-08-26
Filing date: 2015-08-26
Publication date: 2019-03-12
Anticipated expiration: 2035-08-26
Also published as: CN105245496A

Abstract

The invention discloses a kind of method and apparatus of playing audio-fequency data, belong to Internet technical field.The described method includes: detecting the data volume of the audio data to be played stored in jitter cache during voice communication；If the data volume of the audio data is lower than preset first threshold, duration extension processing is carried out to the audio frame in the audio data to be played；If the data volume of the audio data is higher than preset second threshold, duration shortening processing is carried out to the audio frame in the audio data to be played, wherein the first threshold is less than the second threshold；According to timing is played, treated audio data to be played is played out.Using the present invention, the phenomenon that broadcasting empty or scarce word can be prevented.

Description

A kind of method and apparatus of playing audio-fequency data

Technical field

The present invention relates to Internet technical field, in particular to a kind of method and apparatus of playing audio-fequency data.

Background technique

With the development of Internet technology and mechanics of communication, VOIP (the Voice over based on voice packet switch Internet Protocol, internet audio call) technology voice communication increasingly by the favor of user.

It is often using the method that VOIP technology carries out voice communication: just in two terminals of voice communication, either end Terminal sends the voice packet (may include multiframe audio data) through overcompression, and the terminal of opposite end receives voice packet, by voice packet It is stored in jitter cache after decompression, successively every frame audio data in jitter cache is played out.

In the implementation of the present invention, the inventor finds that the existing technology has at least the following problems:

Based on the method for above-mentioned call, when unstable networks, after the terminal of transmitting terminal sends voice packet, the end of receiving end End may be not received by the voice packet of transmitting terminal transmission for a long time, so as to cause there is no audio data in jitter cache, or Person's moment receives a large amount of voice packets, so that the audio data in jitter-buffer overflows, it will and cause audio data to lose, from And lead to the phenomenon that occurring broadcasting empty or scarce word.

Summary of the invention

In order to solve problems in the prior art, the embodiment of the invention provides a kind of method of playing audio-fequency data and dresses It sets.The technical solution is as follows:

In a first aspect, providing a kind of method of playing audio-fequency data, which comprises

During voice communication, the data volume of the audio data to be played stored in jitter cache is detected；

If the data volume of the audio data is lower than preset first threshold, in the audio data to be played Audio frame carries out duration extension processing；If the data volume of the audio data be higher than preset second threshold, to it is described to Audio frame in playing audio-fequency data carries out duration shortening processing, wherein the first threshold is less than the second threshold；

According to timing is played, treated audio data to be played is played out.

Optionally, the method also includes:

Obtain the pitch period of each audio frame in the audio data to be played；

If the data volume of the audio data is lower than preset first threshold, to the audio data to be played In audio frame carry out duration extension processing；If the data volume of the audio data is higher than preset second threshold, to institute The audio frame stated in audio data to be played carries out duration shortening processing, comprising:

If the data volume of the audio data is lower than preset first threshold, will be in the audio data to be played Each audio frame extends 1 corresponding pitch period；If the data volume of the audio data is higher than preset second threshold, Each audio frame in the audio data to be played is shortened into 1 corresponding pitch period.

It optionally, will be described to be played if the data volume of the audio data is lower than preset first threshold Each audio frame in audio data extends 1 corresponding pitch period；If the data volume of the audio data is higher than default Second threshold, then by the audio data to be played each audio frame shorten 1 corresponding pitch period, comprising:

If the data volume of the audio data is lower than preset first threshold, in the audio data to be played In each audio frame, the data of first pitch period and second pitch period are merged into the data of a pitch period, Combined data are inserted between first pitch period and second pitch period；

If the data volume of the audio data is higher than preset second threshold, in the audio data to be played In each audio frame, the data of first pitch period and second pitch period are merged into the data of a pitch period, The data of first pitch period and second pitch period are replaced with combined data.

Optionally, the method also includes:

Obtain the pitch period of each audio frame in the audio data to be played；

If the data volume of the audio data is lower than preset first threshold, according to preset extension duration, determine The corresponding processed in units duration of each audio frame, wherein each processed in units duration is the pitch period of corresponding audio frame Integral multiple；Each audio frame in the audio data to be played is extended into corresponding processed in units duration；

If the data volume of the audio data is higher than preset second threshold, according to preset shortening duration, determine The corresponding processed in units duration of each audio frame, wherein each processed in units duration is the pitch period of corresponding audio frame Integral multiple；Each audio frame in the audio data to be played is shortened into corresponding processed in units duration.

Optionally, each audio frame by the audio data to be played extends corresponding processed in units duration, Include:

In each audio frame in the audio data to be played, by first processed in units duration and second unit The data of handling duration merge into the data of a processed in units duration, and combined data are inserted at first unit It manages between duration and second processed in units duration；

Each audio frame by the audio data to be played shortens corresponding processed in units duration, comprising:

In each audio frame in the audio data to be played, by first processed in units duration and second unit The data of handling duration merge into the data of a processed in units duration, replace first processed in units with combined data The data of duration and second processed in units duration.

Optionally, the pitch period for obtaining each audio frame in the audio data to be played, comprising:

If the audio frame recording in the audio data to be played has pitch period, from the audio data to be played In each audio frame in obtain the pitch period of each audio frame；If the audio frame in the audio data to be played is not remembered Record has pitch period, then is based on pitch period searching algorithm and each decoded audio frame, determines the base of each audio frame The sound period.

Second aspect, provides a kind of device of playing audio-fequency data, and described device includes:

Detection module, for detecting the audio data to be played stored in jitter cache during voice communication Data volume；

Processing module, if the data volume for the audio data is lower than preset first threshold, to described wait broadcast The audio frame put in audio data carries out duration extension processing；If the data volume of the audio data is higher than preset second threshold Value then carries out duration shortening processing to the audio frame in the audio data to be played, wherein the first threshold is less than described Second threshold；

Playing module, for being played out to treated audio data to be played according to timing is played.

Optionally, described device further includes obtaining module, is used for:

Obtain the pitch period of each audio frame in the audio data to be played；

The processing module, is used for:

Optionally, the processing module, comprising:

First processing submodule, if the data volume for the audio data is lower than preset first threshold, in institute It states in each audio frame in audio data to be played, the data of first pitch period and second pitch period is merged into The data of one pitch period, by combined data be inserted into first pitch period and second pitch period it Between；

Second processing submodule, if the data volume for the audio data is higher than preset second threshold, in institute It states in each audio frame in audio data to be played, the data of first pitch period and second pitch period is merged into The data of one pitch period replace the data of first pitch period and second pitch period with combined data.

Optionally, the acquisition module, is used for:

Obtain the pitch period of each audio frame in the audio data to be played；

The first processing submodule, is used for:

The second processing submodule, is used for:

Optionally, the first processing submodule, is used for:

The second processing submodule, is used for:

Optionally, the acquisition module, is used for:

Technical solution provided in an embodiment of the present invention has the benefit that

In the embodiment of the present invention, during voice communication, the audio data to be played stored in jitter cache is detected Data volume, if the data volume of audio data be lower than preset first threshold, treat the audio frame in playing audio-fequency data Carry out duration extension processing；If the data volume of audio data is higher than preset second threshold, treat in playing audio-fequency data Audio frame carry out duration shortening processing, wherein first threshold be less than second threshold, according to play timing, to treated Audio data to be played plays out.In this way, the audio data in jitter cache plays when data volume is less in jitter cache It is slack-off, when unstable networks, longer time can be provided and to be stored in new audio data in caching, when in jitter cache When data volume is more, the audio data in jitter cache plays as early as possible, guarantees the space for having more as far as possible in jitter cache, Ke Yibao Moment received a large amount of audio datas are deposited, prevents the audio data in jitter-buffer from overflowing, broadcasts sky it is thus possible to prevent Or the phenomenon that scarce word.

Detailed description of the invention

To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.

Fig. 1 is a kind of flow chart of the method for playing audio-fequency data provided in an embodiment of the present invention；

Fig. 2 is the schematic diagram that a kind of data volume according in jitter cache provided in an embodiment of the present invention is handled；

Fig. 3 is a kind of schematic diagram for extending duration processing provided in an embodiment of the present invention；

Fig. 4 is a kind of schematic diagram for shortening duration processing provided in an embodiment of the present invention；

Fig. 5 is a kind of structural schematic diagram of the device of playing audio-fequency data provided in an embodiment of the present invention；

Fig. 6 is a kind of structural schematic diagram of the device of playing audio-fequency data provided in an embodiment of the present invention；

Fig. 7 is a kind of structural schematic diagram of the device of playing audio-fequency data provided in an embodiment of the present invention；

Fig. 8 is a kind of structural schematic diagram of terminal provided in an embodiment of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.

Embodiment one

The embodiment of the invention provides a kind of methods of playing audio-fequency data, as shown in Figure 1, the process flow of this method can To comprise the following steps that

Step 101, during voice communication, the data of the audio data to be played stored in jitter cache are detected Amount.

Step 102, it if the data volume of audio data is lower than preset first threshold, treats in playing audio-fequency data Audio frame carries out duration extension processing；If the data volume of audio data is higher than preset second threshold, to audio to be played Audio frame in data carries out duration shortening processing, wherein first threshold is less than second threshold.

Step 103, according to timing is played, treated audio data to be played is played out.

Embodiment two

The embodiment of the invention provides a kind of method of playing audio-fequency data, the executing subject of this method is terminal.Wherein, The terminal can be terminal console, can be the mobile terminals such as mobile phone, tablet computer.Can be set in the terminal processor, Memory, transceiver and loudspeaker, processor can be used for carrying out the data volume of the audio data to be played in jitter cache It detects and treats playing audio-fequency data according to testing result and perform corresponding processing, memory can be used for the following places of storage The data of the data and generation that need during reason, transceiver can be used for sending and receiving data, and loudspeaker can be used for Broadcasting to treated audio data to be played.It is also provided with decoder, decoder can be used for receiving It is decoded by the audio frame of coding.The terminal is also provided with microphone, encoder, and microphone can be used for obtaining use Voice signal of the family in voice communication, encoder can be used for encoding the voice signal that terminal obtains.

Below in conjunction with specific embodiment, process flow shown in FIG. 1 is described in detail, content can be as Under:

Wherein, jitter cache can be used for storing the audio data to be played that terminal receives.

In an implementation, in the voice call process based on voice packet switch, the terminal of voice communication one end (is properly termed as Transmitting terminal) send voice packet after, the terminal of voice communication opposite end can receive the voice packet, wherein voice packet can wrap containing Multiframe audio data carries out de-packaging operation to the voice packet received, and it is slow that the audio data for including by voice packet is stored in shake In depositing.

The voice packet that transmitting terminal is sent can carry the serial number of sending time He the voice packet, and terminal receives hair every time After the voice packet that sending end is sent, de-packaging operation, available multiframe audio data therein and the voice packet pair are carried out to it The serial number answered, it can be determined that the sequence of the voice packet whether serial number for the voice packet being currently received receives close to the last time Number, it, can be by the voice if the serial number of voice packet that the serial number for the voice packet being currently received is received close to the last time The multiframe audio data that packet includes is stored in jitter cache.If the serial number for the voice packet being currently received is received with the last time Voice packet serial number among also between be separated with other serial numbers, then can wait after a certain period of time, then include by the voice packet Multiframe audio data is stored in jitter cache.If terminal receives serial number and connects close to the last time in the time of waiting The multiframe audio data that the voice packet includes can be then stored in jitter cache by the voice packet of the voice packet serial number received, Then the multiframe audio data that the voice packet being currently received includes is stored in jitter cache.If terminal is above-mentioned certain Serial number is received after time close to the voice packet of the last voice packet serial number received, then can include by the voice packet Multiframe audio data is stored in jitter cache, and the sound for being positioned next to the last voice packet received and including stored Frequency evidence.Being stored in audio data to be played in jitter cache can be according to the sequential storage of broadcasting, namely according to transmission The terminal at end generates the time storage of above-mentioned audio data.

The detection cycle for detecting the data volume in jitter cache can be preset, during voice communication, Terminal can periodically detect the data volume of the audio data to be played stored in jitter cache according to preset detection cycle.

Step 102, it if the data volume of audio data is lower than preset first threshold, treats in playing audio-fequency data Audio frame carries out duration extension processing；If the data volume of audio data is higher than preset second threshold, to audio to be played Audio frame in data carries out duration shortening processing.

Wherein, first threshold is less than second threshold.

It in an implementation,, can be according to storage after audio data to be played is stored in jitter cache in voice communication Sequence treat each audio frame in playing audio-fequency data and be successively decoded operation, it will it is corresponding to obtain each audio frame Decoded audio frame and the corresponding coder parameters of each audio frame.

Two threshold values that the data volume in characterization jitter cache can be preset, are properly termed as first threshold and the second threshold Value, wherein first threshold can be less than second threshold, as shown in Fig. 2, if terminal detects the audio being stored in jitter cache The data volume of data is lower than preset first threshold, then can (can be with to the audio frame to be played being stored in jitter cache Decoded audio frame) duration extended, even if also the broadcasting speed of audio frame to be played is slack-off, in this way, being directed to Unstable networks, terminal are possible in the long period the case where will not receiving voice packet, can be effectively prevented and occur broadcasting sky Phenomenon.It, can be with if terminal detects that the data volume for the audio data being stored in jitter cache is higher than preset second threshold The duration for the audio frame (can be decoded audio frame) to be played being stored in jitter cache is shortened, even if also The broadcasting speed of audio frame to be played becomes faster, in this way, being directed to unstable networks, terminal moment receives the feelings of a large amount of voice packets Condition can be effectively prevented the audio data in jitter cache and overflow and cause to play Caton.If terminal, which detects, is stored in shake The data volume of audio data in caching can not tremble between preset first threshold and preset second threshold to being stored in The duration of audio frame to be played in dynamic caching carries out any processing.

Optionally, treat each audio frame in playing audio-fequency data be successively decoded operation obtain it is each decoded After audio frame, the pitch period of available each audio frame, correspondingly, treatment process, which can be such that, obtains audio to be played The pitch period of each audio frame in data.

Wherein, pitch period is the inverse of the frequency (referred to as fundamental frequency) of vocal cords vibration, is spaced the voice of a pitch period The correlation maximum of signal, pitch period are a kind of intrinsic parameters of audio signal.

In an implementation, it is treated in voice communication after each audio frame in playing audio-fequency data is successively decoded, phase The corresponding pitch period of available each audio frame answered, that is, treat in playing audio-fequency data audio frame and be decoded After can obtain the corresponding pitch period of the audio frame.

Optionally, the mode of the pitch period of each audio frame in acquisition playing audio-fequency data can be varied, Following present several feasible modes:

Mode one, if the audio frame recording in the audio data to be played has pitch period, from described to be played The pitch period of each audio frame is obtained in each audio frame in audio data.

In an implementation, it is corresponded to if treating each audio frame obtained after each audio frame decoding in playing audio-fequency data Coder parameters in characterization audio frame whether record pitch period Status mark parameters value be 1, that is, indicate to be played Audio frame recording in audio data has pitch period, can directly acquire the fundamental tone week by decoding obtained each audio frame Phase.

Mode two, if the audio frame in the audio data to be played has not recorded pitch period, based on fundamental tone week Phase searching algorithm and each decoded audio frame, determine the pitch period of each audio frame.

In an implementation, it is corresponded to if treating each audio frame obtained after each audio frame decoding in playing audio-fequency data Coder parameters in characterization audio frame whether record pitch period Status mark parameters value be 0, that is, indicate to be played Audio frame in audio data has not recorded pitch period, then can pass through the fundamental tones week such as correlation method or average amplitude difference method Phase searching algorithm, the decoded audio frame obtained to decoding calculate, and obtain the corresponding pitch period of each audio frame.

Optionally, for the pitch period of each audio frame in above-mentioned acquisition audio data to be played it the case where, is based on To the selection principle difference for the duration that audio frame needs to extend or shorten, the processing mode of step 102 can be varied, below Give several feasible processing modes:

Mode one, the corresponding pitch period of each audio frame chosen in audio data to be played prolong as each audio frame Duration that is long or shortening, corresponding treatment process can be such that if the data volume of audio data is lower than preset first threshold, Each audio frame in audio data to be played is then extended into 1 corresponding pitch period；If the data volume of frequency evidence is higher than Each audio frame in audio data to be played is then shortened 1 corresponding pitch period by preset second threshold.

In an implementation, to the duration for the audio frame being stored in the audio data to be played in jitter cache carry out extend or When person shortens processing, it can extend or shorten corresponding 1 pitch period of each audio frame.If a terminal detects that being stored in When the data volume of audio data to be played in jitter cache is lower than first threshold, each of playing audio-fequency data can be treated The duration of audio frame carries out extension processing, can extend the corresponding pitch period duration of each audio frame 1, i.e., each audio frame Extend different durations, is to extend 1 corresponding pitch period.If a terminal detects that be stored in jitter cache to When the data volume of playing audio-fequency data is higher than second threshold, the when progress of each audio frame in playing audio-fequency data can be treated Row shortening processing, can shorten the corresponding pitch period duration of each audio frame 1, i.e., when each audio frame shortens different It is long, it is to shorten 1 corresponding pitch period.In this way, the processing to each audio frame only extends or shortens the sound The corresponding pitch period duration of frequency frame, the pitch period for not changing each audio frame will not change the base of each audio frame Frequently, wherein not changing fundamental frequency will not both modify tone, and can achieve the effect that speed-variation without tone is carried out to original each audio frame.

Optionally, the case where extending for each audio frame or shorten 1 corresponding pitch period, each audio frame prolongs Duration that is long or shortening can merge to obtain by the data of the first two pitch period, correspondingly, treatment process can be such that It, will in each audio frame in audio data to be played if the data volume of audio data is lower than preset first threshold The data of first pitch period and second pitch period merge into the data of a pitch period, and combined data are inserted into To between first pitch period and second pitch period；If the data volume of audio data is higher than preset second threshold, Then in each audio frame in audio data to be played, the data of first pitch period and second pitch period are merged For the data of a pitch period, the data of first pitch period and second pitch period are replaced with combined data.

It in an implementation, can be by audio data to be played if the data volume of audio data is lower than preset first threshold In each audio frame in the data of first pitch period and the data of second pitch period carry out corresponding superposition, In, the data corresponding first of the data of first pitch period and second pitch period when can preset superposition Weight and the second weight, the first weight and the second weight and be 1, can be respectively 0.5.As shown in figure 3, after corresponding superposition, The data of one pitch period of the Data Synthesis of the available data and second pitch period by first pitch period, It can insert it between first pitch period and second pitch period, the pitch period of increasing that will be obtained Audio frame is as corresponding treated the audio frame of the audio frame.If the data volume of audio data is higher than preset second threshold Value, can be by the data and second pitch period of first pitch period in each audio frame in audio data to be played Data carry out corresponding superposition, wherein the data of first pitch period and second fundamental tone week when can preset superposition Corresponding first weight of the data of phase and the second weight, the first weight and the second weight and be 1, can be respectively 0.5. It is available by the data of first pitch period and the Data Synthesis of second pitch period as shown in figure 4, after corresponding superposition A pitch period data, the data of first pitch period and second pitch period can be replaced, will be obtained The audio frame for shortening a pitch period as corresponding treated the audio frame of the audio frame.

Mode two needs the preset duration for extending or shortening according to each audio frame, chooses each audio frame and be actually subjected to prolong Duration that is long or shortening, corresponding treatment process can be such that if the data volume of audio data is lower than preset first threshold, Then according to preset extension duration, the corresponding processed in units duration of each audio frame is determined, wherein each processed in units duration is The integral multiple of the pitch period of corresponding audio frame；Each audio frame in audio data to be played is extended at corresponding unit Manage duration；If the data volume of audio data, which is higher than preset second threshold, determines each sound according to preset shortening duration The corresponding processed in units duration of frequency frame, wherein each processed in units duration is the integral multiple of the pitch period of corresponding audio frame； Each audio frame in audio data to be played is shortened into corresponding processed in units duration.

In an implementation, the corresponding extension duration of each audio frame can be preset and shorten duration.If audio data Data volume be lower than preset first threshold, can be all divided by the corresponding fundamental tone of each audio frame according to preset extensions duration Value pitch period corresponding with each audio frame can be multiplied by phase, an available quotient if the quotient is integer Obtain that each audio frame is corresponding to be actually subjected to extended processed in units duration, which is corresponding times of processed in units duration Number.The quotient may not be an integer, for such situation, can take the whole part of the quotient, and integer part is corresponding Value (being rounded quotient downwards) pitch period corresponding with each audio frame be multiplied to obtain the corresponding reality of each audio frame Extended processed in units duration is wanted, it can also be by value (quotient rounds up) after the corresponding value of integer part plus 1 and every The corresponding pitch period of a audio frame is multiplied to obtain that each audio frame is corresponding to be actually subjected to extended processed in units duration (i.e. unit Handling duration is the integral multiple of the corresponding pitch period of each audio frame), for example, preset extension duration is 7ms, a certain audio The 3ms when pitch period of frame, according to the method for lower rounding, processed in units duration can be twice of pitch period i.e. 6ms, if Using the method to round up, processed in units duration can be 3 times of pitch period i.e. 9ms.It is then possible to audio to be played The corresponding processed in units duration of each audio frame that each audio frame in data extends.If the data volume of audio data Higher than preset second threshold, can be obtained according to preset shortening duration divided by the corresponding pitch period of each audio frame To a quotient, which may not be an integer, for such situation, can carry out taking downwards or upwards to the quotient It is whole, the whole part of the quotient can be taken, by the corresponding value of integer part (being rounded quotient downwards) and each audio frame pair The pitch period answered is multiplied to obtain the corresponding processed in units duration for being actually subjected to shorten of each audio frame, can also be by integer part Value (quotient rounds up) pitch period corresponding with each audio frame after corresponding value plus 1 is multiplied to obtain each audio The corresponding processed in units duration for being actually subjected to shorten of frame, it is then possible to which each audio frame treated in playing audio-fequency data shortens The corresponding processed in units duration of obtained each audio frame.

Optionally, in order to make the extended duration of each audio frame level off to identical duration, determine that current audio frame is corresponding Processed in units duration when, it is also contemplated that the corresponding processed in units duration of previous audio frame and the preset difference for extending duration Value, correspondingly, treatment process can be such that for first audio frame in audio data to be played, according to preset extension Duration determines the corresponding processed in units duration of first audio frame；Except first audio frame in audio data to be played Each of other audio frames, at preset extension duration and the corresponding unit of previous audio frame of other audio frames Duration and the preset difference for extending duration are managed, determines the corresponding processed in units duration of other audio frames, wherein at each unit Reason duration is the integral multiple of the pitch period of corresponding audio frame.

In an implementation, if the data volume of audio data is lower than preset first threshold, determining that each audio frame is corresponding Processed in units duration when, can according to the method described above, according to default for first audio frame in audio data to be played Extension duration, the corresponding processed in units duration of first audio frame is determined, for first audio in audio data to be played Other audio frames each of except frame can be by the previous audio frame of other audio frames the case where for above-mentioned downward rounding Corresponding processed in units duration is added with the preset difference for extending duration with preset extension duration, is obtained after being added each Other audio frames should want extended duration, and according to the duration, each other sounds can be determined in the manner described above two method The corresponding processed in units duration of frequency frame, wherein each processed in units duration is the integral multiple of the pitch period of corresponding audio frame, For example, preset extension duration is 7ms, the pitch period of a certain audio frame is 3ms, and determining processed in units duration is 6ms, then Processed in units duration differs 1ms with preset extension duration, next audio frame of the audio frame should extended duration can be The difference adds the i.e. 8ms of preset extension duration (duration value can be regarded to preset extension duration as, when can be according to this Long value pitch period corresponding with audio frame determines processed in units length), if the pitch period of next audio frame is 2.5ms, adopt With the method being rounded downwards, processed in units duration can be 3 times of pitch period i.e. 7.5ms.For the above-mentioned feelings to round up Condition can prolong preset extension duration processed in units duration corresponding with the previous audio frame of other audio frames with preset Long duration is subtracted each other, and other audio frames should want extended duration each of after being subtracted each other, can be according to above-mentioned according to the duration The method of mode two determines the corresponding processed in units duration of each other audio frames, wherein each processed in units duration is corresponding Audio frame pitch period integral multiple, for example, preset extension duration is 7ms, the pitch period of a certain audio frame is 3ms, determining processed in units duration is 9ms, then processed in units duration differs 2ms with preset extension duration, the audio frame Next audio frame should extended duration can be preset extension duration subtract the difference i.e. 5ms (can be by the duration value Regard preset extension duration as, processed in units length can be determined according to duration value pitch period corresponding with audio frame), If the pitch period of next audio frame is 3.5ms, using the method to round up, processed in units duration can be 2 times of fundamental tone Period, that is, 7ms.

In order to which the duration for shortening each audio frame levels off to identical duration, determine at the corresponding unit of current audio frame When managing duration, it is also contemplated that the previous corresponding processed in units duration of audio frame and the preset difference for shortening duration, accordingly , treatment process can be such that for first audio frame in audio data to be played, according to preset shortening duration, really The fixed corresponding processed in units duration of first audio frame；For each of except first audio frame in audio data to be played its Its audio frame, according to the corresponding processed in units duration of the previous audio frame of preset shortening duration and other audio frames with The difference for extending duration, determines the corresponding processed in units duration of other audio frames, wherein each processed in units duration is pair The integral multiple of the pitch period for the audio frame answered.

In an implementation, if the data volume of audio data is higher than preset second threshold, determining that each audio frame is corresponding Processed in units duration when, can according to the method described above, according to default for first audio frame in audio data to be played Shortening duration, determine the corresponding processed in units duration of first audio frame.For first audio in audio data to be played Other audio frames each of except frame can be by the previous audio frame of other audio frames the case where for above-mentioned downward rounding Corresponding processed in units duration is added with the preset difference for shortening duration with preset shortening duration, is obtained after being added each The duration that other audio frames should be shortened can determine each audio frame according to the duration in the manner described above two method Corresponding processed in units duration, wherein each processed in units duration is the integral multiple of the pitch period of corresponding audio frame.For It above-mentioned the case where rounding up, can will be at preset shortening duration unit corresponding with the previous audio frame of other audio frames Reason duration is subtracted each other with preset shortening duration, the duration that other audio frames should be shortened each of after being subtracted each other, according to this Duration can determine the corresponding processed in units duration of each other audio frames, wherein Mei Gedan in the manner described above two method Position handling duration is the integral multiple of the pitch period of corresponding audio frame.

Optionally, the case where extending processed in units duration for above-mentioned each audio frame, each extended duration of audio frame It can merge to obtain by the data of the first two processed in units duration, correspondingly, treatment process can be such that in audio to be played In each audio frame in data, the data of first processed in units duration and second processed in units duration are merged into one The data of processed in units duration, by combined data be inserted into first processed in units duration and second processed in units duration it Between.

It in an implementation, can be by audio data to be played if the data volume of audio data is lower than preset first threshold In each audio frame in the data of first processed in units duration and the data of second processed in units duration corresponded to Superposition, wherein the number of the data of first processed in units duration and second processed in units duration when can preset superposition According to corresponding first weight and the second weight, the first weight and the second weight and be 1, can be respectively 0.5.It is corresponding folded After adding, a base of the Data Synthesis of the available data and second processed in units duration by first processed in units duration The data in sound period can be inserted it between first processed in units duration and second processed in units duration, will be obtained The audio frame for increasing a processed in units duration as corresponding treated the audio frame of the audio frame.

The case where shortening processed in units duration for above-mentioned each audio frame, the duration that each audio frame shortens can pass through The data of the first two processed in units duration merge to obtain, correspondingly, treatment process can be such that in audio data to be played In each audio frame, the data of first processed in units duration and second processed in units duration are merged into a processed in units The data of duration replace the data of first processed in units duration and second processed in units duration with combined data.

It in an implementation, can be by audio data to be played if the data volume of audio data is higher than preset second threshold In each audio frame in the data of first processed in units duration and the data of second processed in units duration corresponded to Superposition, wherein the number of the data of first processed in units duration and second processed in units duration when can preset superposition According to corresponding first weight and the second weight, the first weight and the second weight and be 1, can be respectively 0.5.It is corresponding folded After adding, a list of the Data Synthesis of the available data and second processed in units duration by first processed in units duration The data of position handling duration, can be replaced the data of first processed in units duration and second processed in units duration, will The obtained audio frame for shortening a processed in units duration is as corresponding treated the audio frame of the audio frame.

In an implementation, it treats after each audio frame in playing audio-fequency data successively decodes, by decoded each audio Frame is not stored in by each audio frame and the audio frame by extending or shortening processing that extend or shorten processing and broadcasts Slow down in depositing, by system according to playing sequence, audio data to be played therein is played out.

Embodiment three

Based on the same technical idea, the embodiment of the invention also provides a kind of devices of playing audio-fequency data, such as Fig. 5 institute Show, which includes:

Detection module 510, for detecting the audio data to be played stored in jitter cache during voice communication Data volume；

Processing module 520, if for the audio data data volume be lower than preset first threshold, to it is described to Audio frame in playing audio-fequency data carries out duration extension processing；If the data volume of the audio data is higher than preset second Threshold value then carries out duration shortening processing to the audio frame in the audio data to be played, wherein the first threshold is less than institute State second threshold；

Playing module 530, for being played out to treated audio data to be played according to timing is played.

Optionally, it as shown in fig. 6, described device further includes obtaining module 540, is used for:

Obtain the pitch period of each audio frame in the audio data to be played；

The processing module 520, is used for:

Optionally, as shown in fig. 7, the processing module 520, comprising:

First processing submodule 5201, if the data volume for the audio data is lower than preset first threshold, In each audio frame in the audio data to be played, the data of first pitch period and second pitch period are closed And be the data of a pitch period, combined data are inserted into first pitch period and second fundamental tone week Between phase；

Second processing submodule 5202, if the data volume for the audio data is higher than preset second threshold, In each audio frame in the audio data to be played, the data of first pitch period and second pitch period are closed And be the data of a pitch period, the number of first pitch period and second pitch period is replaced with combined data According to.

Optionally, the acquisition module 540, is used for:

Obtain the pitch period of each audio frame in the audio data to be played；

The first processing submodule 5201, is used for:

The second processing submodule 5202, is used for:

Optionally, the first processing submodule 5201, is used for:

The second processing submodule 5202, is used for:

Optionally, the acquisition module 540, is used for:

It should be understood that the device of playing audio-fequency data provided by the above embodiment is in playing audio-fequency data, only with The division progress of above-mentioned each functional module can according to need and for example, in practical application by above-mentioned function distribution by not Same functional module is completed, i.e., the internal structure of equipment is divided into different functional modules, to complete whole described above Or partial function.In addition, the device of playing audio-fequency data provided by the above embodiment and the method for playing audio-fequency data are implemented Example belongs to same design, and specific implementation process is detailed in embodiment of the method, and which is not described herein again.

Example IV

Referring to FIG. 8, the terminal can be used for it illustrates the structural schematic diagram of terminal involved in the embodiment of the present invention The method of the playing audio-fequency data provided in above-described embodiment is provided.Specifically:

Terminal 800 may include RF (Radio Frequency, radio frequency) circuit 110, include one or more meter The memory 120 of calculation machine readable storage medium storing program for executing, input unit 130, display unit 140, sensor 150, voicefrequency circuit 160, WiFi (wireless fidelity, Wireless Fidelity) module 170, the processing for including one or more than one processing core The components such as device 180 and power supply 190.It will be understood by those skilled in the art that terminal structure shown in Fig. 8 is not constituted pair The restriction of terminal may include perhaps combining certain components or different component cloth than illustrating more or fewer components It sets.Wherein:

RF circuit 110 can be used for receiving and sending messages or communication process in, signal sends and receivees, particularly, by base station After downlink information receives, one or the processing of more than one processor 180 are transferred to；In addition, the data for being related to uplink are sent to Base station.In general, RF circuit 110 includes but is not limited to antenna, at least one amplifier, tuner, one or more oscillators, uses Family identity module (SIM) card, transceiver, coupler, LNA (Low Noise Amplifier, low-noise amplifier), duplex Device etc..In addition, RF circuit 110 can also be communicated with network and other equipment by wireless communication.The wireless communication can make With any communication standard or agreement, and including but not limited to GSM (Global System of Mobile communication, entirely Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, wideband code division multiple access), LTE (Long Term Evolution, long term evolution), Email, SMS (Short Messaging Service, short message service) etc..

Memory 120 can be used for storing software program and module, and processor 180 is stored in memory 120 by operation Software program and module, thereby executing various function application and data processing.Memory 120 can mainly include storage journey Sequence area and storage data area, wherein storing program area can the (ratio of application program needed for storage program area, at least one function Such as sound-playing function, image player function) etc.；Storage data area, which can be stored, uses created number according to terminal 800 According to (such as audio data, phone directory etc.) etc..In addition, memory 120 may include high-speed random access memory, can also wrap Include nonvolatile memory, a for example, at least disk memory, flush memory device or other volatile solid-state parts. Correspondingly, memory 120 can also include Memory Controller, to provide processor 180 and input unit 130 to memory 120 access.

Input unit 130 can be used for receiving the number or character information of input, and generate and user setting and function Control related keyboard, mouse, operating stick, optics or trackball signal input.Specifically, input unit 130 may include touching Sensitive surfaces 131 and other input equipments 132.Touch sensitive surface 131, also referred to as touch display screen or Trackpad are collected and are used Family on it or nearby touch operation (such as user using any suitable object or attachment such as finger, stylus in touch-sensitive table Operation on face 131 or near touch sensitive surface 131), and corresponding attachment device is driven according to preset formula.It is optional , touch sensitive surface 131 may include both touch detecting apparatus and touch controller.Wherein, touch detecting apparatus detection is used The touch orientation at family, and touch operation bring signal is detected, transmit a signal to touch controller；Touch controller is from touch Touch information is received in detection device, and is converted into contact coordinate, then gives processor 180, and can receive processor 180 The order sent simultaneously is executed.Furthermore, it is possible to using multiple types such as resistance-type, condenser type, infrared ray and surface acoustic waves Realize touch sensitive surface 131.In addition to touch sensitive surface 131, input unit 130 can also include other input equipments 132.Specifically, Other input equipments 132 can include but is not limited to physical keyboard, function key (such as volume control button, switch key etc.), One of trace ball, mouse, operating stick etc. are a variety of.

Display unit 140 can be used for showing information input by user or the information and terminal 800 that are supplied to user Various graphical user interface, these graphical user interface can be made of figure, text, icon, video and any combination thereof. Display unit 140 may include display panel 141, optionally, can use LCD (Liquid Crystal Display, liquid crystal Show device), the forms such as OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) configure display panel 141.Further, touch sensitive surface 131 can cover display panel 141, when touch sensitive surface 131 detects touching on it or nearby After touching operation, processor 180 is sent to determine the type of touch event, is followed by subsequent processing device 180 according to the type of touch event Corresponding visual output is provided on display panel 141.Although in fig. 8, touch sensitive surface 131 and display panel 141 are conducts Two independent components realize input and input function, but in some embodiments it is possible to by touch sensitive surface 131 and display Panel 141 is integrated and realizes and outputs and inputs function.

Terminal 800 may also include at least one sensor 150, such as optical sensor, motion sensor and other sensings Device.Specifically, optical sensor may include ambient light sensor and proximity sensor, wherein ambient light sensor can be according to environment The light and shade of light adjusts the brightness of display panel 141, and proximity sensor can close display when terminal 800 is moved in one's ear Panel 141 and/or backlight.As a kind of motion sensor, gravity accelerometer can detect in all directions (generally Three axis) acceleration size, can detect that size and the direction of gravity when static, can be used to identify mobile phone posture application (ratio Such as horizontal/vertical screen switching, dependent game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, tap)；Extremely In other sensors such as gyroscope, barometer, hygrometer, thermometer, the infrared sensors that terminal 800 can also configure, herein It repeats no more.

Voicefrequency circuit 160, loudspeaker 161, microphone 162 can provide the audio interface between user and terminal 800.Audio Electric signal after the audio data received conversion can be transferred to loudspeaker 161, be converted to sound by loudspeaker 161 by circuit 160 Sound signal output；On the other hand, the voice signal of collection is converted to electric signal by microphone 162, after being received by voicefrequency circuit 160 Audio data is converted to, then by after the processing of audio data output processor 180, such as another end is sent to through RF circuit 110 End, or audio data is exported to memory 120 to be further processed.Voicefrequency circuit 160 is also possible that earphone jack, To provide the communication of peripheral hardware earphone Yu terminal 800.

WiFi belongs to short range wireless transmission technology, and terminal 800 can help user's transceiver electronics by WiFi module 170 Mail, browsing webpage and access streaming video etc., it provides wireless broadband internet access for user.Although Fig. 8 is shown WiFi module 170, but it is understood that, and it is not belonging to must be configured into for terminal 800, it can according to need completely Do not change in the range of the essence of invention and omits.

Processor 180 is the control centre of terminal 800, utilizes each portion of various interfaces and connection whole mobile phone Point, by running or execute the software program and/or module that are stored in memory 120, and calls and be stored in memory 120 Interior data execute the various functions and processing data of terminal 800, to carry out integral monitoring to mobile phone.Optionally, processor 180 may include one or more processing cores；Preferably, processor 180 can integrate application processor and modem processor, Wherein, the main processing operation system of application processor, user interface and application program etc., modem processor mainly handles nothing Line communication.It is understood that above-mentioned modem processor can not also be integrated into processor 180.

Terminal 800 further includes the power supply 190 (such as battery) powered to all parts, it is preferred that power supply can pass through electricity Management system and processor 180 are logically contiguous, to realize management charging, electric discharge and power consumption by power-supply management system The functions such as management.Power supply 190 can also include one or more direct current or AC power source, recharging system, power supply event Hinder the random components such as detection circuit, power adapter or inverter, power supply status indicator.

Although being not shown, terminal 800 can also include camera, bluetooth module etc., and details are not described herein.Specifically in this reality It applies in example, the display unit of terminal 800 is touch-screen display, and terminal 800 further includes having memory and one or one Above program, one of them perhaps more than one program be stored in memory and be configured to by one or one with Upper processor execution states one or more than one program includes the instruction for performing the following operation:

According to timing is played, treated audio data to be played is played out.

Optionally, the method also includes:

Obtain the pitch period of each audio frame in the audio data to be played；

Optionally, the method also includes:

Obtain the pitch period of each audio frame in the audio data to be played；

Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..

The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims

1. a kind of method of playing audio-fequency data, which is characterized in that the described method includes:

If the data volume of the audio data is lower than preset first threshold, to the audio in the audio data to be played Frame carries out duration extension processing；If the data volume of the audio data is higher than preset second threshold, to described to be played Audio frame in audio data carries out duration shortening processing, wherein the first threshold is less than the second threshold；

According to timing is played, treated audio data to be played is played out；

The method also includes:

Obtain the pitch period of each audio frame in the audio data to be played；

If the data volume of the audio data is lower than preset first threshold, in the audio data to be played Audio frame carries out duration extension processing；If the data volume of the audio data be higher than preset second threshold, to it is described to Audio frame in playing audio-fequency data carries out duration shortening processing, comprising:

If the data volume of the audio data is lower than preset first threshold, according to preset extension duration, determine each The corresponding processed in units duration of audio frame, wherein each processed in units duration is the integer of the pitch period of corresponding audio frame Times；Each audio frame in the audio data to be played is extended into corresponding processed in units duration；

If the data volume of the audio data is higher than preset second threshold, according to preset shortening duration, determine each The corresponding processed in units duration of audio frame, wherein each processed in units duration is the integer of the pitch period of corresponding audio frame Times；Each audio frame in the audio data to be played is shortened into corresponding processed in units duration.

2. the method according to claim 1, wherein the method also includes:

Obtain the pitch period of each audio frame in the audio data to be played；

If the data volume of the audio data is lower than preset first threshold, by each of described audio data to be played Audio frame extends 1 corresponding pitch period；If the data volume of the audio data is higher than preset second threshold, by institute The each audio frame stated in audio data to be played shortens 1 corresponding pitch period.

3. if according to the method described in claim 2, it is characterized in that, the data volume of the audio data is lower than default First threshold, then by the audio data to be played each audio frame extend 1 corresponding pitch period；If described The data volume of audio data is higher than preset second threshold, then each audio frame in the audio data to be played is shortened 1 A corresponding pitch period, comprising:

If the data volume of the audio data is lower than preset first threshold, in each of described audio data to be played In audio frame, the data of first pitch period and second pitch period are merged into the data of a pitch period, will be closed And data be inserted between first pitch period and second pitch period；

If the data volume of the audio data is higher than preset second threshold, in each of described audio data to be played In audio frame, the data of first pitch period and second pitch period are merged into the data of a pitch period, with conjunction And data replace the data of first pitch period and second pitch period.

4. the method according to claim 1, wherein each audio by the audio data to be played Frame extends corresponding processed in units duration, comprising:

In each audio frame in the audio data to be played, by first processed in units duration and second processed in units The data of duration merge into the data of a processed in units duration, when combined data are inserted into first processed in units Between long and described second processed in units duration；

In each audio frame in the audio data to be played, by first processed in units duration and second processed in units The data of duration merge into the data of a processed in units duration, replace first processed in units duration with combined data With the data of second processed in units duration.

5. according to claim 1,2 described in any item methods, which is characterized in that described to obtain in the audio data to be played Each audio frame pitch period, comprising:

If the audio frame recording in the audio data to be played has pitch period, from the audio data to be played The pitch period of each audio frame is obtained in each audio frame；If the audio frame in the audio data to be played has not recorded Pitch period is then based on pitch period searching algorithm and each decoded audio frame, determines the fundamental tone week of each audio frame Phase.

6. a kind of device of playing audio-fequency data, which is characterized in that described device includes:

Detection module, for detecting the data of the audio data to be played stored in jitter cache during voice communication Amount；

Processing module, if the data volume for the audio data is lower than preset first threshold, to the sound to be played Audio frame of the frequency in carries out duration extension processing；If the data volume of the audio data is higher than preset second threshold, Duration shortening processing then is carried out to the audio frame in the audio data to be played, wherein the first threshold is less than described the Two threshold values；

Playing module, for being played out to treated audio data to be played according to timing is played；

Module is obtained, is used for: obtaining the pitch period of each audio frame in the audio data to be played；

The processing module includes the first processing submodule and second processing submodule；

Wherein, the first processing submodule, is used for: if the data volume of the audio data is lower than preset first threshold, Then according to preset extension duration, the corresponding processed in units duration of each audio frame is determined, wherein each processed in units duration is The integral multiple of the pitch period of corresponding audio frame；Each audio frame in the audio data to be played is extended into corresponding list Position handling duration；

The second processing submodule, is used for: if the data volume of the audio data is higher than preset second threshold, basis Preset shortening duration determines the corresponding processed in units duration of each audio frame, wherein each processed in units duration is corresponding The integral multiple of the pitch period of audio frame；Each audio frame in the audio data to be played is shortened into corresponding processed in units Duration.

7. device according to claim 6, which is characterized in that described device further includes obtaining module, is used for:

Obtain the pitch period of each audio frame in the audio data to be played；

The processing module, is used for:

8. device according to claim 7, which is characterized in that the first processing submodule, if being used for the audio The data volume of data is lower than preset first threshold, then in each audio frame in the audio data to be played, by first The data of a pitch period and second pitch period merge into the data of a pitch period, and combined data are inserted into institute It states between first pitch period and second pitch period；

The second processing submodule, if the data volume for the audio data is higher than preset second threshold, in institute It states in each audio frame in audio data to be played, the data of first pitch period and second pitch period is merged into The data of one pitch period replace the data of first pitch period and second pitch period with combined data.

9. device according to claim 6, which is characterized in that the first processing submodule is used for:

The second processing submodule, is used for:

10. according to claim 6,7 described in any item devices, which is characterized in that the acquisition module is used for: