CN1536557A

CN1536557A - DAC voice data compression and uncompression technique

Info

Publication number: CN1536557A
Application number: CNA031093183A
Authority: CN
Inventors: 梁肇新
Original assignee: Individual
Current assignee: Individual
Priority date: 2003-04-07
Filing date: 2003-04-07
Publication date: 2004-10-13

Abstract

The invention disclosed a DAC audio data compressing/decompressing technique, adopting multichannel independent coding mode to sample data, able to sample 8K-1MHz data, supporting up-to-32-channel multichannel processing mode, and settling the limit of the existing technique to audio data sampling range and channel processing mode and amount; switches the data from time domain to frequency domain, makes human ear sensitivity analysis on the data based on natural acoustic model, and adopts index and mantissa coding method to make fuzzy processing on the data, enhancing tone quality and effect, reaching above 95% above of the effect of primary sound; for the transformed data, uses data bit distribution and adopts nonintegral bit compressing method, enhancing data compression ratio; combines the compressed modules to obtain DAC file or DAC audio data flow, and then reads in data frames from the DAC file to realize decompression and play. It can be applied to the multimedia devices such as computer, VCD player etc.

Description

Compression of DAC voice data and decompression technique

1. technical field

Present technique belongs to the high quality audio processing technology field, refers in particular to realize high-quality sampling compression and arbitrarily hyperchannel compression and the technology that decompresses.

2. technical background

The technical indicator of decision audio compression and decompression quality mainly contains: sample frequency, sampling resolution (sampling precision), channel number (port number).Existing audio compression and decompression technique utilize people's psychological auditory model, voice data is sampled to the frequency between the 44.1khz with 20hz, sampling resolution is 8 or 16, from 1 to 8 sound channel of sampling sound channel, can only handle single channel (passage is exactly a sound channel), two passages, four-way and six passages, and every kind of passage processing mode is all inconsistent, makes it can not handle multilingual playback problem, can not the disposable hyperchannel.In order to reach the purpose of data compression, prior art according to people's ear to the insensitive characteristics of high-frequency sound, thereby these insensitive data are lost, cause losing of high-frequency data, can not sample to the frequency data more than the 48khz, this has just directly limited the effect of tonequality.

3. summary of the invention

Be restricted in order to overcome existing audio compression and decompression technique sample frequency and treatment channel, play the relatively poor problem of tonequality, improve the voice playing quality, patent of the present invention provides DAC compression and decompression technique (DAC is that the English Digital Audio Compres of digital audio compression is called for short), it is supported in sampling compression in the optional frequency between the 8K-1MHZ, support the nearly hyperchannel processing mode of 32 passages, make comparisons with identical bit rate 200K-300K (stereo), the acoustical quality of patent of the present invention can reach the former sound effective value of CD more than 95%, and prior art can only reach the former sound effective value of CD below 91%.

For solving above technical matters, the technical scheme that patent of the present invention provides is:

In the audio data collecting compression process, encode based on the natural model of sound, sample in the optional frequency scope between 8K-1MHZ, there are not frequency data to lose.The data after gathering when time domain is transformed into frequency domain, employing be index mantissa Coded Analysis method.Index can be determined clear and fuzzy sound, and mantissa is big more, and then sound is accurate more, and this coding has guaranteed that the high-frequency sound data can not lose, and adopts the method for fuzzy mantissa to limit the data volume behind the coding simultaneously.Patent of the present invention has been widened the scope and the dirigibility of compression frequency, can compress in the 8K-1MHZ scope, does not need to specify the general now frequency of using, as 8K, and 11.25,22.5K, 32K, 44.1K, 96K (DVD); Each passage absolute coding so can define a plurality of passages, can be supported nearly 32 passages.

After the voice data compression, the decompression playing process that carries out:

At first read in a DAC Frame, be divided into the plurality of data group, again data set is divided into data block according to the Frame sign from file.Then, come the decompressed data piece by looking into data distribution list.Behind the decompress(ion), data are reduced successively, obtain final frequency domain data by reduction.Carry out the conversion of frequency domain data at last, obtained the voice data of final broadcast to time domain.

The beneficial effect of patent of the present invention is: (1) compression of can the optional frequency between 8K-1MHZ sampling, improved acoustical quality greatly, and can reach the CD primary sound that is equivalent to more than 95%.(2) can support the nearly sound of 32 passages, can force together, not need a plurality of audio streams, reduce system overhead multilingual.(3) method that adopts non-integral bit to compress under the prerequisite that does not influence acoustical quality, has improved data compression rate greatly.

4. description of drawings

Fig. 1 has illustrated the compression overall process of patent of the present invention.

Fig. 2 has illustrated the process that patent channel data of the present invention is gathered.

How Fig. 3 explanation is converted into time domain data the process of frequency domain data.

Fig. 4 illustrates the process of frequency domain data conversion.

Fig. 5 illustrates the detailed process of data compression.

The process that Fig. 6 explanation is merged into data block Frame and preserved.

Fig. 7 has illustrated the overall process that present technique decompresses.

Fig. 8 has illustrated exponential sum mantissa data distribution situation.

Fig. 9 has illustrated Wave data section distribution situation.

5. specific implementation

The audio data collecting that relates in the patent of the present invention, compression, decompression process:

Among Fig. 1, the introduction of summary how to compress to data conversion from a sound collection devices collect data, up to merging into DAC document flow.At first, adopt hyperchannel absolute coding mode, carry out multi-channel data acquisition.Then, data are carried out the conversion of time domain to frequency domain, obtain frequency domain data.Next step is analyzed based on natural acoustic model mode, adopts exponential sum mantissa Coded Analysis method to the data Fuzzy processing.Continue above-mentioned steps, adopt the non-integral bit compression method that data are compressed.At last, the data block that processes is merged into data set,, write sync mark, write DAC file or DAC data stream at last again several data combinations and be a Frame.Below, will introduce each processing procedure in detail.

Fig. 2 has illustrated data acquisition: patent of the present invention adopts hyperchannel absolute coding mode to carry out a plurality of channel data collections, can gather 32 channel datas at most.So-called hyperchannel absolute coding is exactly the passage number of selecting according to the user, realizes that the data acquisition of passage is irrelevant mutually, and can gather compression simultaneously.If sound has a plurality of languages like this, such as national language and Guangdong language, when playing, play by selecting different passages, just can play different languages clearly.(data acquisition modes of prior art can only be gathered 1-8 channel data and the not independent differentiation of acquisition mode, so just can not realize multilingual broadcast.) Fig. 2 A shown each sound channel, Fig. 2 B has shown the voice data after each sound channel is gathered.

Fig. 3 has illustrated that time domain is to the frequency domain transfer process: this process is carried out segmentation (is a section such as 512 data) to data length in accordance with regulations.Owing to can produce difference between each data segment Wave data, as shown in Figure 9, can produce noise when playing like this.For addressing this problem, we have adopted the windowing method to eliminate difference between the data segment in Fig. 3 A, by Fourier transform among Fig. 3 B or MDCT conversion time domain data are converted to frequency domain data then.(the windowing method is exactly by producing a surplus profound functional value, then with each data segment value addition, eliminates the difference purpose thereby reach.)

Fig. 4 has illustrated data conversion process: after obtaining frequency domain data, adopt unique natural acoustic model mode that data are analyzed.

Nature acoustic model:, analyze high frequency data of people's ear susceptibility and the low frequency data of people's ear susceptibility according to the susceptibility of people's ear to sound frequency.The benefit of doing like this is by analyzing, can handle respectively different sensitivity datas, and the frequency data high susceptibility carry out refinement, the low frequency data of susceptibility are abandoned or compresses.So both guarantee the high-quality of sound, effectively controlled the size of data volume again.

By we analyze the data after the segmentation in Fig. 4 B then at first to the data segmentation among Fig. 4 A, at last confirm the responsive grade of each data segment and confirm mantissa's figure place of data at Fig. 4 C.

Analysis by to data has been divided into the zone to data by sound sensitive, just sets each regional mantissa according to sound sensitive.Mantissa is got in the zone that sound sensitive is high more, so that represent data accurately.Mantissa is got in the zone that sound sensitive is low less, so neither influences the quality of sound, has saved data space again.

After the data analysis, we take the exponent mantissa coding method that Fuzzy processing is carried out in each data area.

Specific practice is:

At first in Fig. 4 D, generate the exponent data section.The highest significant position of getting each data from the raw data section generates new data for as index according to this index.Such as data is that 13 binary forms are shown 1101, and its most significant data bits is 3 so, and the index of generation has been exactly 3.Generated new exponent data section by traveling through whole data segment.The length of this section is the same with the raw data segment length.

We generate mantissa data according to mantissa's figure place of data in Fig. 4 E then.Get the figure place of next significance bit of the highest significant position of raw data section earlier, it subtracts each other and obtains first mantissa value exponential sum, if there is not significance bit, then mantissa value is zero.Generated first mantissa data section like this.And the like, doing next time circulation and getting next mantissa and generate next mantissa data section.Last Fig. 4 F merged index data and mantissa data, result data is: exponent data section+mantissa data section 1+ mantissa data section 2+ ... mantissa data section n, n＜32.Below we illustrate this process.

Such as, there is the one piece of data district to be shown [1101,0110,1000,0111,1001] for [13,6,8,7,9] binary form.At first we get index, by having calculated exponent data section [0011,0010,0011,0010,0011].Here we get two mantissa, begin circulation from a segments of source data time high position and obtain first mantissa data section

[0001,0001,0000,0001,0011] then takes out second mantissa data section [0011,0000,0000,0010,0000], data segment is merged to have obtained target data segment at last, sees Fig. 8.

Fig. 5 has illustrated the detailed process of data compression: after handling through data obfuscation, need carry out data compression to it, adopt the distinctive non-integral bit compression method of patent of the present invention at this.Its advantage is: store data as much as possible with the least possible binary digit.Such as can be with 5 of 8 binary digits storages smaller or equal to 2 number, commonsense method must be with 10 binary digits.The great like this data compression rate that improved.

Detailed process is: by in Fig. 5 A segments of source data being pressed certain-length and size of data division group recently mutually, the radix of this group is set in Fig. 5 B then, radix=group maximum number subtracts the group minimum number.Set radix, look into the storage bit number that the data bit allocation table comes specified data by Fig. 5 C.The data bit allocation table is a kind of data list structure that we set, and it stipulates data bits and the data number that each radix is shared, represents that such as: the list item of radix 25 numbers between the 0-2 store with 8 binary digits.Have 32 list items.

For example: top data segment is [3,2,3,3,1,1,0,1,3,3,0,0,2,0] through exponent mantissa coding back, and we can be divided into two groups to it, are respectively [3,3,3,3,3] and [2,2,2,1,1,1,0,0,0,0].First group radix is that 3, the second groups radix is 2.

After the grouping, in Fig. 5 D according to the storage bit number pooled data of data.Specific as follows:

We are provided with a buffer zone, and every group of data are read in one by one, at first check the radix of data place group.Tabling look-up according to radix obtains storage bit number and storage data number, checks then whether the data of buffer zone reach storage data number.If reach then these several numbers are merged into one group of binary number and write the sequence number and the group mark of table, this just forms compressed data set, last clear buffer.Otherwise, will count the adding buffer zone, continue the read next number.Repeat above-mentioned steps, all dispose up to all data.

For first group of radix is 3 o'clock, and the list item of learning radix 3 by tabling look-up is that 5 numbers between the 0-3 are stored with 9 binary digits.We can merge into this group number one group of number of 9 binary digits like this.For first group of radix is 2 o'clock, and the list item of learning radix 3 by tabling look-up is that 5 numbers between the 0-2 are stored with 8 binary digits, because this group number has ten, so we merge it at twice, handles 5 numbers at every turn, has formed the number of two groups of 8 system positions.We merge all compressed data set and have formed a target data block then.

Fig. 6 has illustrated the data merging process: by aforesaid operations, formed target data block.The number of target data block depends on the passage number of selection.Among Fig. 6 A several target data blocks are merged into data set.Among figure six B plurality of data combination and write synchronous mark and just become a Frame.Among Fig. 6 C Frame is saved in DAC file or DAC stream.So far, the DAC compression finishes.

Fig. 7 has illustrated decompression process, and it is the contrary operation of patent compress technique of the present invention.At first read in a DAC frame, be divided into the plurality of data group, again data set is divided into data block according to the Frame sign from file.Then, come the decompressed data piece by looking into data distribution list.Behind the decompress(ion), data are reduced successively.We have obtained final frequency domain data by reducing.Carry out the conversion of frequency domain data at last, obtained final playing audio-fequency data to time domain.Introduce each operating process below in detail.

Fig. 7 A Frame is disassembled process: at first read in a Frame from the DAC file, be divided into several data sets according to the Frame sign.Be data component target data block according to block mark then.

Fig. 7 B data decompression compression process: we read in each compressed data set one by one from target data block, go to table look-up according to the sequence number of the data bit allocation table that data set identified.The data number and the radix of this compressed data set have been obtained.Remove the binary number that merges with division then and decomposite each data, remove how many times and depend on this compressed data set number.

Fig. 7 C reduction of data process: by data decompression, we have obtained comprising the data segment of exponent data and mantissa data.Need reduce processing now.The data that we represent with Fig. 9 explain how to reduce.

At first, we read in the data of each exponent data section successively, according to their the new data of value generation.New data=2X, X are the data of each exponent data section, and the target data segment that obtains is [8,4,8,4,8], and we read in and get first mantissa's section then.Attention: the generation method and the exponent data of mantissa data are different, (index-mantissa) power of new data=2, and mantissa is not equal to zero.Mantissa's then new data that equal zero directly equal zero.We have obtained mantissa's target data segment [4,2,0,2,1] like this.It and target data segment added up obtained new target data segment [12,6,8,6,9].We read in next mantissa data section again and have obtained mantissa's target data segment [1,0,0,1,0] in order to last method.Add up with target data segment and to have obtained [13,6,8,7,9].The final data of transformation result that Here it is, we can compare with the process of above obfuscation.

Fig. 7 D is the transfer process of frequency domain to time domain: after reduction of data, we have obtained the frequency domain data section.We have adopted method in common to use Fourier transform or MDCT conversion that frequency domain data is transformed into time domain data.

At last, we send into passage to time domain data in Fig. 7 E, just can play by sound device.

Claims

1.DAC voice data compression and decompression technique, be supported in sampling compression in the optional frequency between the 8K-1MHZ, support the nearly hyperchannel processing mode of 32 passages, make comparisons with identical bit rate 200K-300K (stereo), the acoustical quality of patent of the present invention can reach the former sound effective value of CD more than 95%.This technology may further comprise the steps:

Adopt hyperchannel absolute coding mode, carry out a plurality of channel data collections;

Data are carried out the conversion of time domain to frequency domain, obtain frequency domain data;

Analyze based on natural acoustic model mode, adopt exponential sum mantissa Methods for Coding the data Fuzzy processing;

Adopt the non-integral bit compression method that data are compressed;

Compressed modules merges, and obtains DAC file or DAC audio data stream;

Read in the Frame broadcast that decompresses from the DAC file.

2. according to the method for claim 1, wherein the step that " adopts hyperchannel absolute coding mode; carry out a plurality of channel data collections " comprises step: the passage number that the multi-channel audio collecting device is selected according to the user, and mutual irrelevant ground carries out the step of audio data collecting simultaneously between each passage;

3. according to the process of claim 1 wherein that the step of " analyze based on natural acoustic model mode, adopt exponential sum mantissa Coded Analysis method to the data Fuzzy processing " comprises step:

Analyze based on natural acoustic model mode;

Adopt exponential sum mantissa Coded Analysis method to the data Fuzzy processing.

4. according to the method for claim 3, wherein the step of " analyzing based on natural acoustic model mode " also comprises step:

Data are carried out segmentation;

Data segment is analyzed;

By the susceptibility height of people's ear, confirm the responsive grade of each data segment to sound.

5. according to the method for claim 3, wherein " adopt exponential sum mantissa Coded Analysis method to the data Fuzzy processing " step also comprise step:

Generate exponent data by the raw data section;

According to the sensitivity grade of data segment, generate mantissa data;

Merged index data segment and mantissa data section;

Draw the data block after the conversion.

6. according to the process of claim 1 wherein that the step of " adopting the non-integral bit compression method that data are compressed " comprises step:

Close with size of data is that principle is carried out packet;

Generate the radix of every group of data;

Look into the data bit allocation table according to the radix of every group of data;

According to stored data bit number of determining in the data bit allocation table and storage data number pooled data.

7. according to the process of claim 1 wherein that the step of " compressed modules merges, and obtains DAC file or DAC audio data stream " comprises step:

Data block is merged into data set, and the number of data block depends on the number of sampling channel;

Generate target set of data;

Several data combinations are Frame also;

Generate the target data frame;

The target data frame is saved as the DAC file.

8. according to the process of claim 1 wherein that the step of " reading in the Frame broadcast that decompresses from the DAC file " comprises step:

Read in Frame and disassemble from the DAC file;

Method at the non-integral bit compression decompresses;

The data segment that comprises exponent data and mantissa data is reduced processing;

The conversion of frequency domain to time domain;

Time domain data is sent into passage, just can play by sound device.