Background technology
Along with the development of audio compression techniques, people are more and more wider for the application of audio compression techniques, require also more and more higher.For example, products such as existing a lot of electronic toys, e-book and electronic dictionary all have voice output, true man pronunciation, even function such as sound recording broadcast.In general, this class of electronic devices all needs the input voice are compressed storage, and then decompresses as required and play output.
Existing audio compression mode, with the compress speech is example, mainly take process as shown in Figure 1: at first, the raw tone of input is carried out LPC (Linear Predict Coding through after the pre-service, linear predictive coding) analyzes, LPC coefficient wherein carries out LSF (Line Spectrum Frequency, line spectral frequencies) conversion and obtains the LSF coefficient, and then the LSF coefficient is transferred to quantizer; Behind the lpc analysis, prediction residual is gone forward side by side, and lang gas (UV) is judged and tone (Pitch) search then, obtains tone and UV information, and it is flowed to quantizer, and quantizer is according to LSF coefficient and the synthetic output of tone coefficient UV information compressed bit stream.
Above-mentioned pre-service to voice signal mainly is that continuous speech is divided into a series of sound frame.When voice signal is handled, the feature of normal hypothesis voice is slowly to change, therefore, when handling the voice signal in this not fixing (nonstationary) cycle, often with its staging treating, as fixing (stationary) signal Processing, each segment is referred to as the sound frame each segment, or claim sound frame (Frame), or in short-term apart from (short time).The mode of following the example of of sound frame has stationary tone frame number to follow the example of, and stationary tone frame number is not followed the example of.
For the mode that decompresses, as shown in Figure 2, its process is opposite with Fig. 1: at first, compressed bit stream obtains the LSF coefficient through inverse DCT, tone coefficient and UV information, and the LSF coefficient changes into the LPC coefficient, tone coefficient and UV information are synthesized pumping signal, after LPC coefficient and pumping signal are synthesized through LPC, through aftertreatment, the output decoder voice.
The coded system that is applicable to compress speech has multiple, and for example above-mentioned linear predictive coding (LPC) belongs to parameter coding, and parameter coding is that source signal is extracted characteristic parameter in frequency field or other orthogonal transform domain, and it is transformed into digital code transmits.Also have waveform coding in addition, as pulse code modulation (PCM) (pulse code modulation, PCM), and hybrid coding, as multi-pulse excitation linear predictive coding (MPLPC).
Above-mentioned tone and tone information all belong to the characteristic parameter of importing voice, are the important rings that voice signal is effectively compressed to the processing of characteristic parameter.
Existing voice compression algorithm, particularly sectional type compression algorithm are utilized above-mentioned compress mode, the input voice document is carried out the sectional type compression, and with characteristic parameter, during for example tone data independently was kept at every section, speech data and tone data after will compressing then were saved in the memory module.
After utilizing above-mentioned existing compress mode that voice document is compressed processing, reach compression effects to a great extent, can satisfy general application demand.Yet, in today of more and more miniaturization of electronic equipment, portability, for a lot of digital products, for example palm electronic equipment, still occupied relatively large storage space through the voice after the above-mentioned compression, people still wish for further compressing through the voice after the above-mentioned compression.
Therefore,, can propose a solution, on the basis of existing compress mode, the voice data after the compression is further processed, become problem demanding prompt solution in the industry at current electronic equipment with audio compression demand.
Summary of the invention
Technical matters to be solved by this invention is to provide a kind of method and device that voice data is handled, on the basis of existing sectional type compress mode, voice data after the compression is further compressed processing, and sound quality can't be subjected to tangible influence.
The invention provides a kind of method that voice data is handled, voice data after sectional type compression, that independently preserve the characteristic parameter data in each frame is further compressed, at first, be each frame generating feature parameter enveloping curve data according to the characteristic parameter data of described each frame voice data; Use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data then.
In addition, the invention described above method also comprises following edit step to audio compressed data:
1) receives the operation information that the user imports;
2), generate corresponding therewith control command automatically according to aforementioned operation information;
3), edit the voice data after the compression of described storage according to described control command; And
4) waveform and the envelope of the voice data behind the demonstration editor.
The present invention and then a kind of audio-frequency data processing device also is provided comprises the sectional type compression module, is used for voice data is carried out the sectional type compression, and the characteristic parameter data independently are stored in every frame; Memory module is used for the store compressed voice data; And recompression module, be used for the voice data after the sectional type compression is further compressed, this recompression module comprises again: the enveloping curve generation unit, be used for characteristic parameter data according to described each frame voice data, and be each frame generating feature parameter enveloping curve data; And characteristic parameter replacement unit, be used to use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.
In addition, above-mentioned audio-frequency data processing device also comprises decompression module, be used for the voice data behind the compressed editor is decompressed, comprise: the calculation of characteristic parameters unit, be used for the characteristic parameter enveloping curve data according to each frame voice data, dynamic calculation goes out the characteristic parameter data of current each frame voice data; And the sectional type decompression unit, be used for characteristic parameter data according to described each frame voice data, current each frame voice data is carried out sectional type decompress.
In addition, above-mentioned audio-frequency data processing device also comprises output module, is used to export voice data and enveloping curve after decompressing.
In addition, above-mentioned audio-frequency data processing device also comprises user interactive module, is used to receive the order of user's input, to carry out the recompression editor and to show control.
The present invention is owing to adopt enveloping curve to represent the characteristic parameter of each sound frame approx, and for example variation of tone calculates the pitch value of current sound frame data dynamically, thereby plays the effect of compression when playing.When derived data, by the characteristic parameter data in the sound frame are taken away, and replace with the data of enveloping curve and to get final product, in general, because the sound frame of audio files is more, so the replacement of tone partial data has been played the function of compressing data.In addition, the present invention also provides output and the interactive editing to the voice after the compression, and the user can carry out flexible interactive mode to the compressed voice data by various user commands and revise, and the real-time amended audio files of output edit.
Embodiment
Analyze existing voice data compression, because after traditional sectional type compression processing, can be characteristic parameter of each sound frame separate records, the value of tone for example, but in fact, tone is as long as change within the specific limits, sound sounds still can be satisfactory, therefore, for the speech data after overcompression, still there is the further space of compression.
Specifically, existing compress mode by voice document being carried out sectional type compression, independently is kept at every section with tone data, and speech data and tone data after then will compressing are saved in storage unit.At this situation, the present invention at first proposes a kind of method that voice data is handled, voice data after sectional type compression, that independently preserve the characteristic parameter data in each frame is further compressed, as shown in Figure 3, at first, according to the characteristic parameter data of described each frame voice data, be each frame generating feature parameter enveloping curve data (step 301); Use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data (step 302) of described each frame voice data then.
Although should be noted in the discussion above that the present invention is is that example illustrates voice data with the tone, the characteristic parameter data of speech data particularly,, significantly, the characteristic parameter data can also comprise other characteristic parameter data, for example feature such as the tone.Therefore, no matter be which kind of voice data, which kind of characteristic parameter is all comprised by the present invention.
As shown in Figure 4, another embodiment schematic flow sheet for the inventive method, at first, to the original audio data of input, for example " .Wav " formatted file, carry out the sectional type compression, form for example " .Bin " formatted file, the characteristic parameter data independently are stored in every frame (step 401), should be noted that, if directly input is exactly " .Bin " formatted file, then this step can be omitted; According to the characteristic parameter data of described each frame audio compressed data, be each frame generating feature parameter enveloping curve data (step 402) then; Use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data (step 403) of described each frame voice data; According to the characteristic parameter enveloping curve data in each frame voice data, dynamic calculation goes out the characteristic parameter data of each frame voice data, with current each frame voice data is decompressed (step 404); Export described decompression result (step 405); Obtain the edit-modify order (step 406) that the user imports according to described output result; The modification of controlling described enveloping curve according to described edit-modify order generates and replaces, to compress adjustment (step 407); According to described compression adjustment, adjusted voice data is decompressed once more, and output result's (step 408); According to the user modification commands of obtaining, repeat described compression adjustment and decompression step then, finish (step 409) until editor.
Wherein, the described step (step 406) of obtaining the edit-modify order of user's input, if need changing the characteristic parameter enveloping curve, the user can also repeatedly revise the characteristic parameter enveloping curve data of described each frame, generate the characteristic parameter enveloping curve of further simplifying, and to the modification of enveloping curve, can realize by the node of revising described enveloping curve, use amended characteristic parameter enveloping curve data to replace the characteristic parameter data of each frame voice data then.
Wherein, the described step (step 406) of obtaining the edit-modify order of user's input, if desired the demonstration size of waveform is controlled, then can further include according to user command, control to the amplification (zoom out) of voice data oscillogram, dwindle (zoom in) and full figure (zoom all) shows output; Can also select the demonstration output of characteristic parameter enveloping curve according to user command, can select the tone enveloping curve, also can select the enveloping curve of other features such as the tone.
The step (step 406) of wherein said edit-modify order, input audio data if desired, can be directly to import audio frequency from the outside, also can be the user command according to the download audio files of obtaining, and selects to open the audio file that is stored in the memory module.
Corresponding to disposal route of the present invention, the present invention also provides a kind of device that voice data is handled, as shown in Figure 5, can comprise the sectional type compression module, be used for voice data is carried out the sectional type compression, the characteristic parameter data independently are stored in every frame, voice data after the sectional type compression is stored in the memory module 502, recompression device 501 of the present invention, comprise enveloping curve generation module 5011, be used for characteristic parameter data, be each frame generating feature parameter enveloping curve data according to described each frame voice data; And characteristic parameter replacement module 5012, be used to use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.
The present invention also provides a kind of embodiment of audio-frequency data processing device, as shown in Figure 6, for interactive compression editing device embodiment synoptic diagram of the present invention, comprise sectional type compression module 601, memory module 602, decompression module 603, output module 604, recompression module 605, user interactive module 606, if desired compressed file is downloaded in other hardware and play, then can also comprise a downloading and playing module 607.
Wherein, sectional type compression module 601 is used for the original audio data to input, the file of " .Wav " form for example, carry out the sectional type compression, form the audio compressed data of " .Bin " form, the characteristic parameter data independently are stored in every frame, it should be noted that, if the voice data of input directly is exactly the file of " .Bin form ", this sectional type compression module 601 can omit so, directly stores to get final product in memory module; Memory module 602 is used for the store compressed voice data, as " .Bin " formatted file; Recompression module 605 is used for the voice data after the sectional type compression is further compressed; Decompression module 603 is used for the voice data behind the compressed editor is decompressed; Output module 604 is used to export the voice data after decompressing; User interactive module 606 is used to accept the order of user's input, carries out the compression editor with control.
Wherein, recompression module 605 comprises an enveloping curve generation unit (not shown), is used for the characteristic parameter data according to described each frame voice data, is each frame generating feature parameter enveloping curve data; And a characteristic parameter replacement unit (not shown), be used to use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.
Wherein, decompression module 603 comprises calculation of characteristic parameters unit (not shown), is used for the characteristic parameter enveloping curve data according to each frame voice data, and dynamic calculation goes out the characteristic parameter data of current each frame voice data; And sectional type decompression unit (not shown), be used for characteristic parameter data according to described each frame voice data, current each frame voice data is carried out sectional type decompress.
Wherein, user interactive module 606 comprises an output indicative control unit 6062, is used to control the enveloping curve that described output module shows audio volume control and characteristic parameter; And enveloping curve is revised unit 6061, the user revises the unit by envelope the characteristic parameter enveloping curve that shows is made amendment, and the further characteristic parameter enveloping curve data of simplifying of generation, and to the modification of enveloping curve, can realize the characteristic parameter data that described recompression module uses described amended characteristic parameter enveloping curve data to replace each frame voice data by the node of revising described enveloping curve.If desired display waveform is further controlled, then can also comprise a waveform indicative control unit 6063, be used to control amplification (zoom out), dwindle (zoom in) and full figure shows (zoom all), can also comprise that direct selection shows that the enveloping curve of the tone in the source document and the tone is for referencial use the voice data waveform.Further control the compacted voice file of selecting in the described memory module 602 if desired, then can also comprise a downloaded unit 6064, be used to select compacted voice file.
Wherein, described enveloping curve comprises the enveloping curve of the tone and the tone.
Wherein, output module 604 is exported, and comprises graphic presentation output module and sound playing output module.
In addition, interactive compression editing device of the present invention can further include a download module 607, and the audio files in the memory module 602 can download in the download module 607 and play, and download module 607 can be the EMU plate.
As shown in Figure 7, for utilizing interactive compression editing device of the present invention and method thereof, voice data is compressed editor's user interface synoptic diagram, the output of output module can comprise that image shows output and sound playing output, and image shows that output can display waveform and enveloping curve thereof, also can only show one.
As can be seen from Figure 7, curve 701 is for being the enveloping curve that parameter is drawn according to each frame middle pitch tone pitch, and curve 702 is revised the enveloping curve that order generates once more for the user sends envelope by user interaction unit, can obviously find out the minimizing of data number.By modification so repeatedly, and the effect after the decompression of uppick at any time, can interactivelyly compress editor to audio compressed data.
As can be seen, because the tonal variations that adopts enveloping curve to come each sound frame of approximate representation, dynamic calculation goes out the pitch value of current sound frame data when playing, thereby plays the effect of compression from embodiments of the invention.When derived data, the tone data in the sound frame is taken away, and replaced, because the sound frame of general voice document is more, so the replacement of tone partial data has been played the function of compressing data with the data of enveloping curve.
For example, for a size is 74.3KB original sound file LF1.Wav, be 8000 to it with sampling rate earlier, rate of compression coding is that 2000bps carries out preliminary compression, file LF1.Bin size after the compression is 1.17KB, it is further editing compressed that adopt compression edit methods of the present invention that it is carried out this moment, tone data in the sound frame is replaced, file reduces 7 * sound frame number/8 bytes, generates the enveloping curve file (in general 16, getting 8 points) of a substitute tone adjusting data afterwards, size is 8 * 4 bytes, because the value of sound frame can be bigger, thereby reach the compression purpose, the file LF1.Bin that to obtain a size be 1.03KB.
And, utilizing output and the interactive editing to the voice after the compression provided by the invention, the user can carry out flexible interactive mode to the compressed voice data by various user commands and revise, and the real-time amended audio files of output edit.
In addition, what be worth proposition is that the present invention can be expanded into the further compression to the voice data after other compression algorithm compressions.So long as sectional type compression storage, and with characteristic parameter, the algorithm during for example tone data independently is kept at every section can be suitable for the present invention.