CN101814973B

CN101814973B - RTP rapid packet accumulation method based on AMR audio frame

Info

Publication number: CN101814973B
Application number: CN2010101053972A
Authority: CN
Inventors: 欧志
Original assignee: Shenzhen Temobi Science and Technology Co Ltd
Current assignee: World (Shanghai) Technology Development Co., Ltd.
Priority date: 2010-01-29
Filing date: 2010-01-29
Publication date: 2013-07-03
Anticipated expiration: 2030-01-29
Also published as: CN101814973A

Abstract

The invention relates to a RTP rapid packet accumulation method based on AMR audio frames, which comprises the following steps: 1, receiving AMR audio frames, taking an audio frame header, confirming an encoding mode, and acquiring a length value L of the AMR audio frames corresponding to the encoding mode; 2, determining the total number of N of the AMR audio frames which can be accumulated in one RTP packaging packet according to the audio frame length L; 3, creating a RTP packet and filling RTP header information and PayloadHeader information; 4, processing information of the frame headers of the AMR audio frames to separate the frame headers from voice data in the audio frames; 5, filling the information of the frame headers of the AMR audio frames and the voice data into the RTP packet; and 6, cyclically receiving AMR audio frames, and repeating Step 4 and Step 5 once voice data are received until the Nth audio frame is received, wherein the frame headers and the frame data of the audio frames are generally referred to as PayloadData information. The method of the invention can effectively reduce the network overhead, lower the high packet loss rate caused by frequent data transmission and greatly improve the service quality of the streaming media business.

Description

A kind of RTP based on the AMR audio frame gathers the bag method fast

[technical field]

The present invention relates to the wireless flow media field, particularly a kind of RTP based on the AMR audio frame gathers the bag method fast.

[background technology]

AMR, full name Adaptive Multi-Rate (self adaptation multi code Rate of Chinese character) is mainly used in the audio coding of mobile device, and compression ratio is bigger.It is divided into two types of AMR-WB (Adaptive Multi-Rate wide band) and AMR-NB (Adaptive Multi-Rate narrow band), and AMR audio frame data format comprises frame head and speech data two parts.

Streaming Media refers to transmit in network in the stream mode media format of audio frequency, video and multimedia file.And for the real-time Transmission service of network enabled, stream medium audio and video stream all is that the form transmission with the encapsulation of RTP sends.Encapsulated delivery scheme to each data RTP has corresponding special RFC document at present.Concerning the AMR audio frame, because the shared byte number of every frame data is fewer, if being carried out the RTP packing, every frame data send, be equivalent to the RTP head that every frame data have added 12 bytes and transmit, to increase the added burden of network greatly like this, be unfavorable for saving bandwidth.Therefore be necessary that the polymerization that is together in series of addressed location with a plurality of audio frames is encapsulated in the RTP bag, do not influencing MTU (Maximum Transmission Unit MTU, referring to certain maximum data that can pass through above one deck newspaper size of a kind of communication protocol, is unit with the byte.The MTU parameter is usually relevant with communication interface, and is relevant as network interface unit, serial ports etc.) the situation of IP fragmentation under.

[summary of the invention]

The object of the invention is to propose a kind of poly-bag method that a plurality of as much as possible AMR audio frames is packaged into a RTP bag.

In order to realize above purpose, the present invention proposes a kind of RTP based on the AMR audio frame and gathers the bag method fast, and is specific as follows:

Step 1: receive an AMR audio frame, get the audio frequency frame head, separate the FT position, draw FT value, judge the audio coding type, according to the FT value of audio coding type and the correspondence affirmation coding mode of tabling look-up, and obtain the AMR audio frame length value L of corresponding coding mode;

Step 2: confirm that according to audio frame length L a RTP wrapper can gather total audio frame number N of bag,

Step 3: create the RTP bag, fill RTP header and PayloadHeader information, M position 1 in the RTP header wherein, the PT value is obtained by SDP;

Step 4:AMR audio frame header is handled, and the frame head in this audio frame is separated with speech data, and FT position, frame encoding mode position remains unchanged in the frame head, mend 0 clear 0 for minimum two, for the frame that is not destroyed, frame quality indicator bit Q position is set to 1, destroyedly is set to 0;

Step 5: AMR audio frame frame head and speech data information are filled to the RTP bag,

Step 6: circulation receives the AMR audio frame, whenever receives voice data and namely repeats above-mentioned steps 4,5, till receiving N audio frame, a described N audio frame is packaged into described RTP bag.

Relative prior art, this method can be saved greatly to the taking of the network bandwidth, and reduce network overhead, improve network throughput, reduce frequent data item and send the high packet loss that causes.

[description of drawings]

Figure 1 shows that the form of the AMR audio frame frame head that packing is preceding;

Figure 2 shows that the schematic diagram of the RTP bag audio frequency frame head that comprises three frame audio frequency;

Figure 3 shows that and comprise three frame audio frequency RTP bag schematic diagram.

[embodiment]

AMR, full name Adaptive Multi-Rate is mainly used in the audio coding of mobile device, and compression ratio is bigger.It is divided into two types of AMR-WB (Adaptive Multi-Rate wide band) and AMR-NB (AdaptiveMulti-Rate narrow band).Its AMR audio frame data format comprises frame head+speech data, and wherein frame head is a byte, and the audio frame frame head is actual to be six bits, but based on the needs of practical application, generally it is defined as 1 byte, and two of back are for mending 0.Its concrete definition as shown in Figure 1:

Wherein, P: filler 1bit is traditionally arranged to be 0.

Q:1bit is the frame mass indicator.0 show that frame is damaged if be.

FT (coding mode): 4bits can table look-up and try to achieve frame per second and frame length under this pattern.

Below be some main coding modes of Amr-nb and the data rate of correspondence and the length of every frame data:

Type Rate frame len

Mode 0-AMR 4.75-Encodes at 4.75kbit/s， 13

Mode 1-AMR 5.15-Encodes at 5.15kbit/s， 14

Mode 2-AMR 5.9-Encodes at 5.9kbit/s， 16

Mode 3-AMR 6.7-Encodes at 6.7kbit/s， 18

Mode 4-AMR 7.4-Encodes at 7.4kbit/s， 20

Mode 5-AMR 7.95-Encodes at 7.95kbit/s， 21

Mode 6-AMR 10.2-Encodes at 10.2kbit/s， 27

Mode 7-AMR 12.2-Encodes at 12.2kbit/s， 32

Below be some main coding modes of Amr-wb:

Type Rate frame len

Mode 0-AMR-WB 6.60-Encodes at 6.60kbit/s， 18

Mode 1-AMR-WB 8.85-Encodes at 8.85kbit/s， 24

Mode 2-AMR-WB 12.65-Encodes at 12.65kbit/s， 33

Mode 3-AMR-WB 14.25-Encodes at 14.25kbit/s， 37

Mode 4-AMR-WB 15.85-Encodes at 15.85kbit/s， 41

Mode 5-AMR-WB 18.25-Encodes at 18.25kbit/s， 47

Mode 6-AMR-WB 19.85-Encodes at 19.85kbit/s， 51

Mode 7-AMR-WB 23.05-Encodes at 23.05kbit/s， 59

Core concept of the present invention is to design a kind of RTP based on the AMR audio frequency and gathers the bag method fast.Because the speech data frame length of AMR data different coding type and pattern correspondence is different, for the data bearing mode that adopts UDP, this method can directly be gathered bag at the AMR data of different coding type and pattern, and at each AMR data type and coding mode size, under the situation that is no more than a MTU unit bag length, by calculating, can as much as possible the AMR voice data be encapsulated in the RTP bag and transmit.

Poly-bag principle: be less than the data of a MTU element length, need it is gathered the bag transmission, to save the network bandwidth, improve network throughput.The RFC3267 document standard is followed in the encapsulation of AMR audio frame fully, generally is several audio frames are packaged into a RTP bag, and this RTP bag overall length can not surpass a MTU cell size, and for the encapsulation of a plurality of audio frames, its encapsulation format is:

RTP head+Payload header (1 byte)+Payload data.

When the AMR audio frame number of need packing is n, can be expressed as again:

RTP head+Payload header (1 byte)+frame head 1+ frame head 2+...+ frame head n+ speech data 1+ speech data 2+...+ speech data n.

Wherein the preceding 4bits of Payload Header is CMR (Codec Mode Request), according to RFC3267 document standard, if definition is not to the preferential selectivity reception of which kind of pattern for terminal, the CMR value must be set to 15, other positions are 0, so the value of PayloadHeader is 0XF0.Payload Data refers to all audio frame number certificates, comprises frame head and the frame data of each audio frequency.

It is as follows that RTP of the present invention seals process of assembling:

1, receives the AMR audio frame, get the audio frequency frame head, separate the FT position, draw the FT value, then according to the difference of audio coding type (type of coding judges that in SDP AMR represents the NB type, and AMR-WB represents WB coding), according to audio coding type and the corresponding FT value affirmation coding mode of tabling look-up, and obtain the AMR audio frame length value L of correspondence.

2, confirm that according to audio frame length L a RTP wrapper can gather total audio frame number N of bag, computing formula is:

N=(MTU length-RTP packet header length-PayloadHeader length)/L.

For example MTU length is got 1400 bytes, and L is chosen as corresponding 13 bytes of AMR-NB coding mode Mode0, and RTP packet header length is 12 bytes, and AMR load frame head PayloadHeader length is 1 byte, and then calculating N is 106.

3, create the RTP bag, length is a MTU unit, fills RTP header and PayloadHeader information, wherein M position 1 in the RTP header, the PT value is obtained by SDP, and sequence number SeqNo, time stamp T imeStamp and SSRC must be consistent with description in the RTSP signaling.The value of PayloadHeader information is 0XF0.

4, frame head information processing separates the frame head in this audio frame with speech data, FT position, frame encoding mode position remains unchanged in the frame head, minimum two mend 0 clear 0, for the frame that is not destroyed, frame quality indicator bit Q position is set to 1, destroyedly is set to 0;

5, the frame head of AMR audio frame and the RTP bag of speech data information are filled.Following data are directly deposited in order in the relevant position of RTP bag.

The frame head filling position of AMR audio frame:

RTP packet header length+PayloadHeader length+(i-1)

Frame head length: 1 byte

The speech data filling position of AMR audio frame:

RTP packet header length+PayloadHeader length+N+ (i-1) * L

Frame data length: (L-1) individual byte

Wherein i need to represent the audio frame sequence number of poly-bag, and mark since 1 at the end, and up to N (total audio frame number that a RTP wrapper that is calculated by step 2 can gather bag), L represents AMR audio frame speech data length value.

If it is not the last frame data of RTP bag the inside that the frame data sequence number, is then represented this frame less than N, the filler P position 1 in the frame head, there is continuous in succession audio frame the expression back, the frame data sequence number is that the frame of N is last audio frequency, and its frame head highest order P position is then clear 0, the end of expression audio frequency frame head.Be illustrated in figure 2 as the RTP bag audio frequency frame head schematic diagram that comprises three frame audio frequency, be divided into frame head 1, frame head 2 and frame head 3, wherein the form of the preceding AMR audio frame frame head of frame head form and packing shown in Figure 1 is just the same, and only the concrete value in the P position has difference.

6, circulation receives the AMR audio frame, and whenever receiving voice data is repeated execution of steps 4,5, till receiving N audio frame.Like this, a complete RTP bag has just been accomplished fluently.Be illustrated in figure 3 as and comprise three frame audio frequency RTP bag schematic diagram: RTP among the figure (12) expression RTP packet header and take 12 bytes, PH (1) expression PayloadHeader takies 1 byte, frame head 1 (1), frame head 2 (1), frame head 3 (1) represent that respectively each frame head takies 1 byte, speech data 1 (L), speech data 2 (L), speech data 3 (L) represent that each speech data takies the L byte, and for example L gets corresponding 13 bytes of AMR-NB coding mode Mode0.

The present invention can directly gather bag at the AMR data of different coding type and pattern, and at each AMR data type and coding mode size, under the situation that is no more than a MTU unit bag length, calculate a portative maximal audio frame number of RTP bag, the as much as possible AMR voice data is encapsulated in the RTP bag transmitted, and fill by the AMR data being carried out direct RTP bag, need not to expend any other extra memory source, can when reducing network overhead, reach purpose efficiently.

In the above-described embodiments, only the present invention has been carried out exemplary description, but those skilled in the art can design various execution modes according to different actual needs under the situation that does not break away from the scope and spirit that the present invention protects.

Claims

1. the RTP based on the AMR audio frame gathers the bag method fast, and concrete steps are as follows:

Step 1: receive the AMR audio frame, get the audio frequency frame head, separate the FT position, draw FT value, judge the audio coding type, according to the FT value of audio coding type and the correspondence affirmation coding mode of tabling look-up, and obtain the AMR audio frame length value L of corresponding coding mode;

Step 2: confirm that according to AMR audio frame length value L a RTP bag can gather total AMR audio frame number N of bag,

Step 3: create the RTP bag, fill RTP header and PayloadHeader information, M position 1 in the RTP header wherein, the PT value is obtained by SDP, and SDP represents Session Description Protocol;

Step 4: AMR audio frame header is handled, the frame head in this AMR audio frame is separated with speech data, FT position, frame encoding mode position remains unchanged in the frame head, mend 0 clear 0 for minimum two, for the frame that is not destroyed, frame quality indicator bit Q position is set to 1, destroyedly is set to 0;

Step 6: circulation receives the AMR audio frame, whenever receives voice data and namely repeats above-mentioned steps 4,5, till receiving N AMR audio frame, finishes described N AMR audio frame and is packaged into described RTP bag.

2. the RTP based on the AMR audio frame as claimed in claim 1 gathers the bag method fast, and it is characterized in that: described AMR presentation code type comprises two kinds of type of codings of AMR-NB and AMR-WB.

3. the RTP based on the AMR audio frame as claimed in claim 2 gathers the bag method fast, and it is characterized in that: the computing formula of determining N in the step 2 is as follows:

N=(MTU length-RTP packet header length-PayloadHeader length)/L

Wherein: MTU length is the length of network MTU, and unit is byte, and RTP packet header length is 12 bytes, and an AMR load PayloadHeader length is 1 byte.

4. the RTP based on the AMR audio frame as claimed in claim 1 gathers the bag method fast, and it is characterized in that: in the step 3, the PT value is obtained by SDP, and SeqNo, TimeStamp are consistent with description in the RTSP signaling with SSRC.

5. the RTP based on the AMR audio frame as claimed in claim 3 gathers the bag method fast, and it is characterized in that: step 5 further comprises:

51: determine the filling position of AMR audio frequency frame head in the RTP bag, frame head is directly deposited in order in the corresponding filling position of RTP bag;

52: determine the filling position of AMR audio frame speech data in the RTP bag, speech data is directly deposited in order in the corresponding filling position of RTP bag.

6. the RTP based on the AMR audio frame as claimed in claim 5 gathers the bag method fast, it is characterized in that: step 51, and the filling position of AMR audio frequency frame head in the RTP bag determines that method is as follows:

RTP packet header length+PayloadHeader length+(i-1);

Step 52, the filling position of AMR audio frame speech data in the RTP bag determines that method is as follows:

RTP packet header length+PayloadHeader length+N+ (i-1) * L;

Wherein: RTP packet header length is 12 bytes, and PayloadHeader length is 1 byte, and i-1 unit is byte, and i need to represent the audio frame sequence number of poly-bag, and i is since 1 for the audio frame sequence number, up to N.

7. the RTP based on the AMR audio frame as claimed in claim 6 gathers the bag method fast, it is characterized in that: if audio frame sequence number i is less than N, represent that then this AMR audio frame is not the last frame data of RTP bag the inside, filler P position 1 in the RTP bag in the frame head, there is continuous in succession AMR audio frame the expression back, the audio frame sequence number is that the frame of N is last AMR audio frame, and its frame head highest order P position is then clear 0, the end of expression AMR audio frequency frame head.

8. the RTP based on the AMR audio frame as claimed in claim 3 gathers the bag method fast, and it is characterized in that: the value of PayloadHeader information is 0XF0.

9. the RTP based on the AMR audio frame as claimed in claim 3 gathers the bag method fast, and it is characterized in that: MTU length is got 1400 bytes, and L is chosen as corresponding 13 bytes of AMR-NB coding mode Mode0.