CN1892821A

CN1892821A - Method and apparatus for processing audio frequency data

Info

Publication number: CN1892821A
Application number: CNA2005100806790A
Authority: CN
Inventors: 廖栋才; 李琳
Original assignee: BEIJING BEIYANG ELECTRONIC TECHNOLOGY Co Ltd; Sunplus Technology Co Ltd
Current assignee: BEIJING SUNPLUS-EHUE TECHNOLOGY CO., LTD.; Sunplus Technology Co Ltd
Priority date: 2005-07-06
Filing date: 2005-07-06
Publication date: 2007-01-10
Anticipated expiration: 2025-07-06
Also published as: CN100538820C

Abstract

The present invention discloses method and device to process audio data which is stored with characteristic parameter data in each frame after sectional type compression. It contains Firstly according to said each frame audio data characteristic parameter data to generate characteristic parameter envelope curve data for each frame, then using said each frame characteristic parameter envelope curve data replacing said each frame audio data characteristic parameter data, when playing according to each frame audio data characteristic parameter envelope curve data dynamically calculating current each frame audio data characteristic parameter data. The present invention adopts envelope curve approximately meaning and replacing every sound frame characteristic parameter variation, thereby making recompression action; user capable of making flexible and convenient interactive mode modifying to compression voice data through different kinds of user command, and real time outputting modified sound file.

Description

A kind of method and device that voice data is handled

Technical field

The present invention relates to a kind of voice data disposal route and device, relate in particular to a kind of method and device that audio compressed data is handled.

Background technology

Along with the development of audio compression techniques, people are more and more wider for the application of audio compression techniques, require also more and more higher.For example, products such as existing a lot of electronic toys, e-book and electronic dictionary all have voice output, true man pronunciation, even function such as sound recording broadcast.In general, this class of electronic devices all needs the input voice are compressed storage, and then decompresses as required and play output.

Existing audio compression mode, with the compress speech is example, mainly take process as shown in Figure 1: at first, the raw tone of input is carried out LPC (Linear Predict Coding through after the pre-service, linear predictive coding) analyzes, LPC coefficient wherein carries out LSF (Line Spectrum Frequency, line spectral frequencies) conversion and obtains the LSF coefficient, and then the LSF coefficient is transferred to quantizer; Behind the lpc analysis, prediction residual is gone forward side by side, and lang gas (UV) is judged and tone (Pitch) search then, obtains tone and UV information, and it is flowed to quantizer, and quantizer is according to LSF coefficient and the synthetic output of tone coefficient UV information compressed bit stream.

Above-mentioned pre-service to voice signal mainly is that continuous speech is divided into a series of sound frame.When voice signal is handled, the feature of normal hypothesis voice is slowly to change, therefore, when handling the voice signal in this not fixing (nonstationary) cycle, often with its staging treating, as fixing (stationary) signal Processing, each segment is referred to as the sound frame each segment, or claim sound frame (Frame), or in short-term apart from (short time).The mode of following the example of of sound frame has stationary tone frame number to follow the example of, and stationary tone frame number is not followed the example of.

For the mode that decompresses, as shown in Figure 2, its process is opposite with Fig. 1: at first, compressed bit stream obtains the LSF coefficient through inverse DCT, tone coefficient and UV information, and the LSF coefficient changes into the LPC coefficient, tone coefficient and UV information are synthesized pumping signal, after LPC coefficient and pumping signal are synthesized through LPC, through aftertreatment, the output decoder voice.

The coded system that is applicable to compress speech has multiple, and for example above-mentioned linear predictive coding (LPC) belongs to parameter coding, and parameter coding is that source signal is extracted characteristic parameter in frequency field or other orthogonal transform domain, and it is transformed into digital code transmits.Also have waveform coding in addition, as pulse code modulation (PCM) (pulse code modulation, PCM), and hybrid coding, as multi-pulse excitation linear predictive coding (MPLPC).

Above-mentioned tone and tone information all belong to the characteristic parameter of importing voice, are the important rings that voice signal is effectively compressed to the processing of characteristic parameter.

Existing voice compression algorithm, particularly sectional type compression algorithm are utilized above-mentioned compress mode, the input voice document is carried out the sectional type compression, and with characteristic parameter, during for example tone data independently was kept at every section, speech data and tone data after will compressing then were saved in the memory module.

After utilizing above-mentioned existing compress mode that voice document is compressed processing, reach compression effects to a great extent, can satisfy general application demand.Yet, in today of more and more miniaturization of electronic equipment, portability, for a lot of digital products, for example palm electronic equipment, still occupied relatively large storage space through the voice after the above-mentioned compression, people still wish for further compressing through the voice after the above-mentioned compression.

Therefore,, can propose a solution, on the basis of existing compress mode, the voice data after the compression is further processed, become problem demanding prompt solution in the industry at current electronic equipment with audio compression demand.

Summary of the invention

Technical matters to be solved by this invention is to provide a kind of method and device that voice data is handled, on the basis of existing sectional type compress mode, voice data after the compression is further compressed processing, and sound quality can't be subjected to tangible influence.

The invention provides a kind of method that voice data is handled, voice data after sectional type compression, that independently preserve the characteristic parameter data in each frame is further compressed, at first, be each frame generating feature parameter enveloping curve data according to the characteristic parameter data of described each frame voice data; Use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data then.

In addition, the invention described above method also comprises following edit step to audio compressed data:

1) receives the operation information that the user imports;

2), generate corresponding therewith control command automatically according to aforementioned operation information;

3), edit the voice data after the compression of described storage according to described control command; And

4) waveform and the envelope of the voice data behind the demonstration editor.

The present invention and then a kind of audio-frequency data processing device also is provided comprises the sectional type compression module, is used for voice data is carried out the sectional type compression, and the characteristic parameter data independently are stored in every frame; Memory module is used for the store compressed voice data; And recompression module, be used for the voice data after the sectional type compression is further compressed, this recompression module comprises again: the enveloping curve generation unit, be used for characteristic parameter data according to described each frame voice data, and be each frame generating feature parameter enveloping curve data; And characteristic parameter replacement unit, be used to use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.

In addition, above-mentioned audio-frequency data processing device also comprises decompression module, be used for the voice data behind the compressed editor is decompressed, comprise: the calculation of characteristic parameters unit, be used for the characteristic parameter enveloping curve data according to each frame voice data, dynamic calculation goes out the characteristic parameter data of current each frame voice data; And the sectional type decompression unit, be used for characteristic parameter data according to described each frame voice data, current each frame voice data is carried out sectional type decompress.

In addition, above-mentioned audio-frequency data processing device also comprises output module, is used to export voice data and enveloping curve after decompressing.

In addition, above-mentioned audio-frequency data processing device also comprises user interactive module, is used to receive the order of user's input, to carry out the recompression editor and to show control.

The present invention is owing to adopt enveloping curve to represent the characteristic parameter of each sound frame approx, and for example variation of tone calculates the pitch value of current sound frame data dynamically, thereby plays the effect of compression when playing.When derived data, by the characteristic parameter data in the sound frame are taken away, and replace with the data of enveloping curve and to get final product, in general, because the sound frame of audio files is more, so the replacement of tone partial data has been played the function of compressing data.In addition, the present invention also provides output and the interactive editing to the voice after the compression, and the user can carry out flexible interactive mode to the compressed voice data by various user commands and revise, and the real-time amended audio files of output edit.

Description of drawings

Fig. 1 is existing audio compression mode synoptic diagram;

Fig. 2 is existing audio decompression mode synoptic diagram;

Fig. 3 is the method embodiment schematic flow sheet that voice data is handled of the present invention;

Fig. 4 utilizes the present invention voice data to be carried out the method embodiment schematic flow sheet of interactive compression editing and processing;

Fig. 5 is the device example structure synoptic diagram that voice data is handled of the present invention;

Fig. 6 utilizes the present invention voice data to be carried out the device embodiment synoptic diagram of interactive compression editing and processing; And

Fig. 7 utilizes the present invention voice data to be compressed editor's user interface synoptic diagram.

Embodiment

Analyze existing voice data compression, because after traditional sectional type compression processing, can be characteristic parameter of each sound frame separate records, the value of tone for example, but in fact, tone is as long as change within the specific limits, sound sounds still can be satisfactory, therefore, for the speech data after overcompression, still there is the further space of compression.

Specifically, existing compress mode by voice document being carried out sectional type compression, independently is kept at every section with tone data, and speech data and tone data after then will compressing are saved in storage unit.At this situation, the present invention at first proposes a kind of method that voice data is handled, voice data after sectional type compression, that independently preserve the characteristic parameter data in each frame is further compressed, as shown in Figure 3, at first, according to the characteristic parameter data of described each frame voice data, be each frame generating feature parameter enveloping curve data (step 301); Use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data (step 302) of described each frame voice data then.

Although should be noted in the discussion above that the present invention is is that example illustrates voice data with the tone, the characteristic parameter data of speech data particularly,, significantly, the characteristic parameter data can also comprise other characteristic parameter data, for example feature such as the tone.Therefore, no matter be which kind of voice data, which kind of characteristic parameter is all comprised by the present invention.

As shown in Figure 4, another embodiment schematic flow sheet for the inventive method, at first, to the original audio data of input, for example " .Wav " formatted file, carry out the sectional type compression, form for example " .Bin " formatted file, the characteristic parameter data independently are stored in every frame (step 401), should be noted that, if directly input is exactly " .Bin " formatted file, then this step can be omitted; According to the characteristic parameter data of described each frame audio compressed data, be each frame generating feature parameter enveloping curve data (step 402) then; Use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data (step 403) of described each frame voice data; According to the characteristic parameter enveloping curve data in each frame voice data, dynamic calculation goes out the characteristic parameter data of each frame voice data, with current each frame voice data is decompressed (step 404); Export described decompression result (step 405); Obtain the edit-modify order (step 406) that the user imports according to described output result; The modification of controlling described enveloping curve according to described edit-modify order generates and replaces, to compress adjustment (step 407); According to described compression adjustment, adjusted voice data is decompressed once more, and output result's (step 408); According to the user modification commands of obtaining, repeat described compression adjustment and decompression step then, finish (step 409) until editor.

Wherein, the described step (step 406) of obtaining the edit-modify order of user's input, if need changing the characteristic parameter enveloping curve, the user can also repeatedly revise the characteristic parameter enveloping curve data of described each frame, generate the characteristic parameter enveloping curve of further simplifying, and to the modification of enveloping curve, can realize by the node of revising described enveloping curve, use amended characteristic parameter enveloping curve data to replace the characteristic parameter data of each frame voice data then.

Wherein, the described step (step 406) of obtaining the edit-modify order of user's input, if desired the demonstration size of waveform is controlled, then can further include according to user command, control to the amplification (zoom out) of voice data oscillogram, dwindle (zoom in) and full figure (zoom all) shows output; Can also select the demonstration output of characteristic parameter enveloping curve according to user command, can select the tone enveloping curve, also can select the enveloping curve of other features such as the tone.

The step (step 406) of wherein said edit-modify order, input audio data if desired, can be directly to import audio frequency from the outside, also can be the user command according to the download audio files of obtaining, and selects to open the audio file that is stored in the memory module.

Corresponding to disposal route of the present invention, the present invention also provides a kind of device that voice data is handled, as shown in Figure 5, can comprise the sectional type compression module, be used for voice data is carried out the sectional type compression, the characteristic parameter data independently are stored in every frame, voice data after the sectional type compression is stored in the memory module 502, recompression device 501 of the present invention, comprise enveloping curve generation module 5011, be used for characteristic parameter data, be each frame generating feature parameter enveloping curve data according to described each frame voice data; And characteristic parameter replacement module 5012, be used to use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.

The present invention also provides a kind of embodiment of audio-frequency data processing device, as shown in Figure 6, for interactive compression editing device embodiment synoptic diagram of the present invention, comprise sectional type compression module 601, memory module 602, decompression module 603, output module 604, recompression module 605, user interactive module 606, if desired compressed file is downloaded in other hardware and play, then can also comprise a downloading and playing module 607.

Wherein, sectional type compression module 601 is used for the original audio data to input, the file of " .Wav " form for example, carry out the sectional type compression, form the audio compressed data of " .Bin " form, the characteristic parameter data independently are stored in every frame, it should be noted that, if the voice data of input directly is exactly the file of " .Bin form ", this sectional type compression module 601 can omit so, directly stores to get final product in memory module; Memory module 602 is used for the store compressed voice data, as " .Bin " formatted file; Recompression module 605 is used for the voice data after the sectional type compression is further compressed; Decompression module 603 is used for the voice data behind the compressed editor is decompressed; Output module 604 is used to export the voice data after decompressing; User interactive module 606 is used to accept the order of user's input, carries out the compression editor with control.

Wherein, recompression module 605 comprises an enveloping curve generation unit (not shown), is used for the characteristic parameter data according to described each frame voice data, is each frame generating feature parameter enveloping curve data; And a characteristic parameter replacement unit (not shown), be used to use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.

Wherein, decompression module 603 comprises calculation of characteristic parameters unit (not shown), is used for the characteristic parameter enveloping curve data according to each frame voice data, and dynamic calculation goes out the characteristic parameter data of current each frame voice data; And sectional type decompression unit (not shown), be used for characteristic parameter data according to described each frame voice data, current each frame voice data is carried out sectional type decompress.

Wherein, user interactive module 606 comprises an output indicative control unit 6062, is used to control the enveloping curve that described output module shows audio volume control and characteristic parameter; And enveloping curve is revised unit 6061, the user revises the unit by envelope the characteristic parameter enveloping curve that shows is made amendment, and the further characteristic parameter enveloping curve data of simplifying of generation, and to the modification of enveloping curve, can realize the characteristic parameter data that described recompression module uses described amended characteristic parameter enveloping curve data to replace each frame voice data by the node of revising described enveloping curve.If desired display waveform is further controlled, then can also comprise a waveform indicative control unit 6063, be used to control amplification (zoom out), dwindle (zoom in) and full figure shows (zoom all), can also comprise that direct selection shows that the enveloping curve of the tone in the source document and the tone is for referencial use the voice data waveform.Further control the compacted voice file of selecting in the described memory module 602 if desired, then can also comprise a downloaded unit 6064, be used to select compacted voice file.

Wherein, described enveloping curve comprises the enveloping curve of the tone and the tone.

Wherein, output module 604 is exported, and comprises graphic presentation output module and sound playing output module.

In addition, interactive compression editing device of the present invention can further include a download module 607, and the audio files in the memory module 602 can download in the download module 607 and play, and download module 607 can be the EMU plate.

As shown in Figure 7, for utilizing interactive compression editing device of the present invention and method thereof, voice data is compressed editor's user interface synoptic diagram, the output of output module can comprise that image shows output and sound playing output, and image shows that output can display waveform and enveloping curve thereof, also can only show one.

As can be seen from Figure 7, curve 701 is for being the enveloping curve that parameter is drawn according to each frame middle pitch tone pitch, and curve 702 is revised the enveloping curve that order generates once more for the user sends envelope by user interaction unit, can obviously find out the minimizing of data number.By modification so repeatedly, and the effect after the decompression of uppick at any time, can interactivelyly compress editor to audio compressed data.

As can be seen, because the tonal variations that adopts enveloping curve to come each sound frame of approximate representation, dynamic calculation goes out the pitch value of current sound frame data when playing, thereby plays the effect of compression from embodiments of the invention.When derived data, the tone data in the sound frame is taken away, and replaced, because the sound frame of general voice document is more, so the replacement of tone partial data has been played the function of compressing data with the data of enveloping curve.

For example, for a size is 74.3KB original sound file LF1.Wav, be 8000 to it with sampling rate earlier, rate of compression coding is that 2000bps carries out preliminary compression, file LF1.Bin size after the compression is 1.17KB, it is further editing compressed that adopt compression edit methods of the present invention that it is carried out this moment, tone data in the sound frame is replaced, file reduces 7 * sound frame number/8 bytes, generates the enveloping curve file (in general 16, getting 8 points) of a substitute tone adjusting data afterwards, size is 8 * 4 bytes, because the value of sound frame can be bigger, thereby reach the compression purpose, the file LF1.Bin that to obtain a size be 1.03KB.

And, utilizing output and the interactive editing to the voice after the compression provided by the invention, the user can carry out flexible interactive mode to the compressed voice data by various user commands and revise, and the real-time amended audio files of output edit.

In addition, what be worth proposition is that the present invention can be expanded into the further compression to the voice data after other compression algorithm compressions.So long as sectional type compression storage, and with characteristic parameter, the algorithm during for example tone data independently is kept at every section can be suitable for the present invention.

Claims

1, a kind of method that voice data is handled is further compressed voice data after sectional type compression, that independently preserve the characteristic parameter data in each frame, it is characterized in that comprising the steps:

According to the characteristic parameter data of described each frame voice data, be each frame generating feature parameter enveloping curve data; And

Use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.

2, the method for claim 1 is characterized in that also comprising following decompression step:

According to the characteristic parameter enveloping curve data in each frame voice data, dynamic calculation goes out the characteristic parameter data of current each frame voice data, and according to the characteristic parameter data voice data is decompressed.

3, method as claimed in claim 2 is characterized in that also comprising the step of the voice data after the described decompression of dynamic output.

4, method as claimed in claim 3 is characterized in that described dynamic output comprises waveform, the enveloping curve that shows voice data and/or plays this voice data.

5, method as claimed in claim 4 is characterized in that also comprising following edit step to audio compressed data:

1) receives the operation information that the user imports;

4) waveform and the envelope of the voice data behind the demonstration editor.

6, method as claimed in claim 5 is characterized in that the reception operation information of described step 1) comprises the characteristic parameter enveloping curve that the reception user simplifies at the modification and the generation of enveloping curve; Voice data after the compression of described step 3) inediting storage is meant the characteristic parameter data of using amended characteristic parameter enveloping curve data to replace each frame voice data.

7, method as claimed in claim 6 is characterized in that the operation information of described step 1) input also comprises amplification, dwindles, the full figure display operation; The file selection operation of file selects a sound.

8, a kind of audio-frequency data processing device is characterized in that comprising:

The sectional type compression module is used for voice data is carried out the sectional type compression, and the characteristic parameter data independently are stored in every frame;

Memory module is used for the store compressed voice data; And

The recompression module is used for the voice data after the sectional type compression is further compressed, and comprising:

The enveloping curve generation unit is used for the characteristic parameter data according to described each frame voice data, is each frame generating feature parameter enveloping curve data; And

Characteristic parameter is replaced the unit, is used to use described each frame characteristic parameter enveloping curve data to replace the characteristic parameter data of described each frame voice data.

9, device as claimed in claim 8 is characterized in that also comprising decompression module, is used for the voice data behind the compressed editor is decompressed, and comprising:

The calculation of characteristic parameters unit is used for the characteristic parameter enveloping curve data according to each frame voice data, and dynamic calculation goes out the characteristic parameter data of current each frame voice data; And

The sectional type decompression unit is used for the characteristic parameter data according to described each frame voice data, current each frame voice data is carried out sectional type decompress.

10, device as claimed in claim 9 is characterized in that also comprising output module, is used to export voice data and enveloping curve after decompressing.

11, device as claimed in claim 8 is characterized in that also comprising user interactive module, is used to receive the order of user's input, to carry out the recompression editor and to show control.

12, device as claimed in claim 11, it is characterized in that described user interactive module, further comprise: envelope is revised the unit, the user revises the unit by envelope the characteristic parameter enveloping curve that shows is made amendment, and the further characteristic parameter enveloping curve data of simplifying of generation, the characteristic parameter data that described recompression module uses described amended characteristic parameter enveloping curve data to replace each frame voice data.

13, device as claimed in claim 11 is characterized in that described user interactive module, further comprises:

The waveform indicative control unit is used to control amplification to the voice data waveform, dwindles and full figure shows;

The downloaded unit is used for selecting to open compacted voice file in described memory module.

14, device as claimed in claim 8 is characterized in that also comprising a download unit that the audio compressed data in the storage unit is downloaded to hardware.

15, device as claimed in claim 9 is characterized in that also comprising the broadcast unit of the voice data after a dynamic play decompresses.