CN100386799C

CN100386799C - Voice frame computation method for audio frequency decoding

Info

Publication number: CN100386799C
Application number: CNB2004100499251A
Authority: CN
Inventors: 林士生
Original assignee: Ali Corp
Current assignee: Ali Corp
Priority date: 2004-05-28
Filing date: 2004-06-18
Publication date: 2008-05-07
Anticipated expiration: 2024-06-18
Also published as: CN1702737A

Abstract

The present invention relates to a voice frame computation method for audio frequency decoding for preventing decoding error of length calculating error of voice frame caused by reading error of padding bit of head information of the audio code, thus the invention changes the reading length of the bit data when decoding, then identifies sync word code therein, or single character, thus the head address of the voice frame and head information of the voice frame can be obtained, and a method truly calculating the voice frame length without referencing the padding bit is proposed hear.

Description

The speech frame computing method of audio coding

Technical field

The present invention relates to a kind of speech frame computing method of audio coding, relate in particular to a kind of decoding hour hands contraposition data and change and read length, to obtain the method that can also not obtain real speech frame length with reference to filler.

Background technology

The characteristic that network is borderless, make us can therefrom obtain the information of various kinds, data are shared, but be subject to frequency range, bigger audio/video file can't circulate swimmingly, so the development of MP3 (MPEG Layer 3) compress technique is arranged, the voice data ratio of compression can be arrived 12: 1 low distortion data compression method, in the range of sensitivity of people to audio frequency, MP3 format makes in the situation of aphonia matter not with the technology of a big ratio of compression and is issued to the file size of convenience networks transmission, and the consumer is when listening to the music of mp3 file, then must be by means of MP3 code translator (promptly decompressing), promptly as Fig. 1 prior art MP3 code translator synoptic diagram.

As shown in Figure 1, one group by arbitrary source of sound (as audio CD, WAV etc.) the MP3 sign indicating number 11 inputs one MP3 code translator 10 of conversion, after these MP3 code translator 10 decodings, export the end 15 of raising one's voice to, can be the output terminal of computer system or MP3 code translator, borrow earphone or loudspeaker (speaker) to listen to, wherein then roughly include the input traffic impact damper 12 that receives numeric data code in the MP3 code translator 10, decipher the output of the audio file after output audio impact damper 14 will be deciphered again through a code translator 13.

In the above-mentioned MP3 code translator 10, in the flow process that decompresses, must calculate the length of a speech frame (frame) in the MP3 code data stream (bit stream) earlier, data are pre-existed begin again in the impact damper (buffer) to decompress, just truly having data stream and must just can try to achieve with reference to the filler (padding bit) in the mp3 file head (head information) during the computing voice frame length.Compression end (compress) is when carrying out compression step, if when carrying out non-integer compression sampling frequency (as 44.1k), then can set filler is 1, if during integer compression sampling frequency, then need not set filler (being 0).So when this filler mistake, can cause the read error of data stream, for example mutiread or read less a byte (byte, 8bits), owing to the incorrect decoding error that causes of packed data, the step of related judgement speech frame length (is unit with the byte) also produces mistake when promptly causing decoding.

Filler reads correct situation:

If filler is 1, expression speech frame length non-integer byte then will be supplied a byte in the speech frame.

If filler is 0, expression speech frame length is integer, then must not supply.

Error situations is as follows:

Mistake one: if filler should be 0 and but judges into 1, then can get the file header of a speech frame more, cause the many bytes of data, can produce mistake during decompression, seek speech frame next time and then can skip a speech frame.

Mistake two: if filler should be 1 and but is judged as 0, then can get the data of a byte less, also cause the mistake of decompression.

The MP3 compression method is to be unit with the speech frame, be that the pointer that utilizes a general data to begin (main databegin) reaches optimization, see also the speech frame file layout synoptic diagram of prior art Fig. 2 A, be the data stream part synoptic diagram in the MP3 coding in the icon, include first speech frame 21 and second speech frame 22, first speech frame 21 includes borrows a plurality of positions (bit) to indicate first general data 25 of first file header 23 with the audio frequency of content, also has unappropriated first remaining space 27, and also roughly divide into first sync character (sync word) 23a in first file header 23, with other file header information 23b, and first traffic flow information (side information) 23c etc.; Be right after 27 of first remaining spaces and be next speech frame, second speech frame 22 that is similar to first speech frame 21 is also arranged in this second speech frame 22, comprising second file header 24, second general data 26 and second remaining space, second file header 24 also comprise the second sync character 24a, with alternative document header 24b, and the second traffic flow information 24c etc.

As above state speech frame 21, in 22, file header 23, content in 24 has

traffic flow information

23c, 24c, wherein general data begins (main data begin) and can point to the remaining space of other a plurality of

speech frames

27,28, store the file of compression this time, the MP3 compression utilizes promptly that remaining

space

27,28 improves compressibility in the data stream, and

file header

23,24 traffic flow information then has pointer and points out that each section audio compressed file begins and the information that finishes, so decoding holds the pointer that needs only in the reference paper head just to know correct address in the data stream (bit stream) of correct compressed file, and solves correct audio frequency.

Fig. 2 B then represents file header position in the prior art speech frame, icon is a file head part in the data stream, and only come functions with the part bit pointer, with the Figure 1A and first speech frame 21 is example, wherein comprise sync character 23a and other file header information 23b at least, sync character 23a includes 12 positions, compiles in three at MP3 and represents a sync character with 111111111111, its hexadecimal representation is FFF, is the beginning of a speech frame.Other file header information 23b comprises identification position (ID flag) 201, stratum position (layer flag) 202, error correction bit (error protection) 203, bit rate (bit rate) 204, sampling frequency (sampling frequency) 205, filler (padding bit) 206, private position (private bit) 207, pattern position (mode flag) 208, pattern expands (mode expend) 209, copyright position (copyright) 210, master position (original copy) 211 and reinforcement position (emphasis flag) 212 etc.

The method of calculating speech frame (frame) length in the prior art is with reference to above-mentioned sampling frequency 205, bit rate 204, i.e. pointers such as Bit Transmission Rate, and filler 206, and speech frame length borrows then wherein that the data of pointer indication obtain:

Number of samples in speech frame length=bit rate * sampling frequency * every speech frame

(Length=bitrate*sampling frequency*samples/frame) (formula one)

And the setting of filler can learn whether affiliated speech frame is the integral multiple of a byte, because if sampling frequency is non-integral frequencies such as 44.1k, formula one gained is a non integer value, must borrow the information (0 or 1) of filler supply a byte, mutiread one byte or judge the sync character sign indicating number when decoding more.If but wherein filler makes a mistake, then can cause the big or small miscount of actual speech frame, if first speech frame miscount, nature can have influence on the sync character string (sync word) and the file header (header) of next speech frame and produce decoding error.

The present invention avoids the read error of filler to cause speech frame file header read error, causes the wrong factor that decompresses, and proposes the method for the speech frame length that can not also obtain with reference to this filler.

Summary of the invention

The present invention is a kind of speech frame computing method of audio coding, for the filler read error of avoiding file header in the audio code causes speech frame length computation mistake, read length in the change of decoding hour hands contraposition data, the sync character sign indicating number wherein of interpretation afterwards, or independent character, obtain the method that can not obtain real speech frame length with reference to filler.

Fundamental purpose of the present invention is not in the reference voice frame under the situation of a filler of file, and carries out the decoding to speech frame, and speech frame length calculation method step comprises: read last voice intraframe data, and mutiread one byte, and be stored in the device; The synchronous character code FFF character that interpretation is complete, or an only independent F character; Judge whether to have described sync character sign indicating number? if not, then the described F character of mat obtains the file header address of described speech frame; Abandon the described byte of mutiread in the described buffer; And the file header information of separating described speech frame.But if the FFF character is arranged, then draw the file header address of described speech frame in the described speech frame.

Another speech frame length calculation method step of the present invention comprises: read next voice intraframe data; And judge character code more, whether be complete sync character sign indicating number with interpretation; The part that then will judge deposits in the buffer more if not, and again step of judging character code more; If sync character sign indicating number FFF, the file header address that then draws described speech frame; Abandon the character of mutiread in the described buffer; And the file header information of separating described speech frame.

Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.

Description of drawings

Fig. 1 is a prior art MP3 code translator synoptic diagram;

Fig. 2 A is a prior art speech frame file layout synoptic diagram;

Fig. 2 B is a file header position synoptic diagram in the prior art speech frame;

Fig. 3 A is a speech frame file layout synoptic diagram;

Fig. 3 B is a speech frame file header form synoptic diagram;

Fig. 4 is that character of the present invention reads synoptic diagram;

Fig. 5 is that the speech frame of audio coding of the present invention calculates the first method flow chart of steps;

Fig. 6 is that the speech frame of audio coding of the present invention calculates the second method flow chart of steps;

Embodiment

The MP3 compression method is to be unit with speech frame (frame), speech frame file layout synoptic diagram as shown in Figure 3A, include the data stream (bit stream) that a plurality of speech frames 30 are formed, sync character (sync word) 31a, the residing traffic flow information of information pointer (side information) 31c in the compressed file that wherein comprise the start address that indicates each speech frame are as each general data of identification (main data) or remaining space address of living in, record sampling frequency (sampling frequency) flag with various kinds pattern or version.And then 31 of file headers are general data 32, be the position of audio compressed files, also has unappropriated remaining space 33, so decoding end as long as the pointer in the reference paper head is just known correct compressed file correct address in data stream, and solves correct audio frequency.

Fig. 3 B is speech frame file header form synoptic diagram, and file header 31 includes sync character 31a and other file header information 31b at least.Sync character 31a is one 12 a bit code, with the beginning of a speech frame of 111111111111 (FFF) expression, so when mp3 file is deciphered then by looking for the FFF character to judge the correct address of speech frame.In other file header information 31b, include a plurality of information such as bit rate (bit rate) 34, sampling frequency (sampling frequency) 35 and filler (padding bit) 36, speech frame is to be that unit forms with byte (byte), and wherein filler 36 is that whether this residing speech frame is a non-integral bit group length (please described referring to prior art) so as to record when the compression MP3 format.

The present invention utilizes following two kinds of methods to avoid because the filler mistake causes computing voice frame length mistake and the wrong situation that decompresses.

Method one: ignore filler, just no matter filler is 0 or 1, bytes of mutiread (11111111 or FF) when reading speech frame all.When MP3 deciphers, mutiread is got a byte and is about to this byte and deposits in the buffer, because a byte has been read to buffer 40, if speech frame 30 length are under the situation of non-integer byte originally, need supply a byte because of not enough integer, then do not influence the method for judging synchronization bit originally, find the character representation of FFF to find sync character; If speech frame length is under the integer-bit group length situation,, as shown in Figure 4, find sync character as long as then find a F to be because character FF has read to buffer 40.Find sync character also just to define the file header of speech frame, also just can separate file header information, and then to the MP3 audio coding.

Method two: ignore filler, different with method one locate to be byte of mutiread not, but at the first bruss in the read next speech frame, character code of many judgements, if not sync character FFF, this byte is deposited in the buffer, begun to continue the decoding action that next step rejudges character again.

See also the flow chart of steps of method one of the speech frame computing method of Fig. 5 audio coding of the present invention:

Step 501, speech frame is a base unit in the MP3 data stream, the bit code of noting down in mode in proper order, comprising file header, general data and remaining space are arranged, when audio coding begins, promptly read last voice intraframe data for solving present speech frame length, and byte of mutiread, be stored in again in the buffer, so as to judging the address of present speech frame;

Step 502, a complete sync character sign indicating number FFF is looked in interpretation in data stream afterwards, or an only independent F character;

Does step 503 judge whether to have this FFF character?

Step 504, if not, the speech frame length of representing this speech frame is the integer byte, because a byte FF reads to buffer, so just win a F character;

Step 505 then obtains the file header address of described voice speech frame by this F character, also just can define the length of described speech frame; If the FFF character is arranged, represent that the speech frame length of this speech frame is the non-integer byte, and find sync character FFF that promptly this position is the file header of this speech frame, also obtains the length of this speech frame whereby;

Step 506 afterwards, because obtained speech frame length, is then abandoned the byte of the interior mutiread of buffer in the step 501;

Step 507 also just because of obtaining speech frame length, and can be separated file header information, and then the MP3 audio frequency is deciphered.

Finish the decoding of this speech frame.

And Fig. 6 is the flow chart of steps of the method two of another speech frame computing method of audio coding of the present invention:

Step 601, speech frame are a base unit in the MP3 data stream, and the bit code so that mode is in proper order noted down comprising file header, general data and remaining space are arranged, when audio coding begins, reads next voice intraframe data;

Step 602, and character code of many interpretations, and deposit in the buffer;

Does step 603 judge whether to be the sync character sign indicating number?

Step 604 if not, then deposits in this buffer except the character code that will judge more, also re-executes many interpretations character code step of step 602;

Step 605 if the sync character sign indicating number of FFF is arranged, represents that the speech frame length of this speech frame is the non-integer byte, and finds sync character FFF that promptly described position is the file header of described speech frame, also obtains the length of described speech frame whereby;

Step 606 afterwards, because obtained speech frame length, is then abandoned the character code of the interior mutiread of buffer in the step 602;

Step 607 also just because of obtaining speech frame length, and can be separated fileinfo, and then the MP3 audio frequency is deciphered.

Finish the decoding of this speech frame.

In sum, the speech frame computing method of a kind of audio coding of the present invention are the data of the time getting a byte in decoding more, so as to obtaining the method that can not obtain real speech frame length with reference to filler, and the erroneous judgement that causes when avoiding the filler mistake.

Certainly; the present invention also can have other various embodiments; under the situation that does not deviate from spirit of the present invention and essence thereof; those of ordinary skill in the art work as can make various corresponding changes and distortion according to the present invention, but these corresponding changes and distortion all should belong to the protection domain of the appended claim of the present invention.

Claims

1. the speech frame interpretation method of an audio coding is characterized in that, this method step includes:

Read last voice intraframe data, and mutiread one byte;

The synchronous character code FFF character that interpretation is complete, or an only independent F character;

Obtain the file header address of described speech frame; And

Separate the file header information of described speech frame.

2. the speech frame interpretation method of audio coding as claimed in claim 1, it is characterized in that, described speech frame is the base unit in the MP3 format, and this method is under the situation with reference to a filler of file header in the described speech frame not, and described speech frame is deciphered.

3. the speech frame interpretation method of audio coding as claimed in claim 1 is characterized in that, the bytes store of institute's mutiread and is abandoned this byte of mutiread in this buffer after obtaining the file header address of this speech frame in a buffer.

4. the speech frame interpretation method of audio coding as claimed in claim 1 is characterized in that, the step that obtains the file header address of this speech frame comprises if judge not have this sync character sign indicating number, then borrows this F character to obtain the file header address of this speech frame.

5. the speech frame interpretation method of audio coding as claimed in claim 1, it is characterized in that, the step that obtains the file header address of this speech frame comprises that this sync character sign indicating number is the file header address of this speech frame, and obtains the length of this speech frame if judge that this sync character sign indicating number is arranged.

6. the speech frame interpretation method of audio coding as claimed in claim 1 is characterized in that, described speech frame includes a described file header and a general data at least.

7. the speech frame interpretation method of audio coding as claimed in claim 6 is characterized in that, described file header includes a filler, a sampling frequency and a bit rate position at least.

8. the speech frame interpretation method of an audio coding, described speech frame is the base unit in the MP3 format, it is characterized in that, this method is under the situation with reference to a filler of file header in the described speech frame not, and described speech frame is deciphered, step includes: