CN1641748A

CN1641748A - Voice data storage method

Info

Publication number: CN1641748A
Application number: CNA2004100004013A
Authority: CN
Inventors: 王利军
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2004-01-06
Filing date: 2004-01-06
Publication date: 2005-07-20
Anticipated expiration: 2024-01-06
Also published as: CN100456357C

Abstract

The invention discloses a storage method for voice data that stores the voice data into file after encoding by AMR coder. The needed storage space of voice data of AMR encoding is much less than that of PCM, so the file created by the invention is suited to store on user terminal or transmitting on network. Thus, it can satisfy the demand of supporting multipartite communication and communication party dynamic joining in or quit, telephone recording, voice email and even recording the whole communication. So the invention has a great application prospect.

Description

A kind of storage means of speech data

Technical field

The present invention relates to data storage technology, particularly relate to a kind of storage means of speech data.

Background technology

In the 3G system, except normal conversation, the user also has generally that phone is prerecorded, the demand of the other side's telephonograph and even the whole call function of real-time recording, like this, just need be saved in dialog context in the user terminal, treat to play when the user needs the dialog context that is kept at user terminal.In addition, the application of the voice e-mail and the networking telephone also need be in network transmitting audio data, so just need speech data to can be used for the network storage and transmission.

What telephonograph was in the past stored is that (Pulse Code Modulation, PCM) data are recorded in tape or the mass-memory unit voice pulse code modulation (PCM).PCM a kind ofly is usually used in sound signal to the Analog signals'digital Sampling techniques, general per second sampling 8000 times, and each sampled value accounts for 8 bits, 64Kbit/s altogether.Because storage PCM data need the very large memory device of capacity, have not only improved the user terminal cost, and have influenced the configuration design of user terminal.In addition, the PCM data also are unfavorable for storage and the transmission of voice document in network.

Summary of the invention

Fundamental purpose of the present invention is to provide a kind of storage means of speech data, to save the storage space of speech data, helps storage and transmitting audio data in network.

The objective of the invention is to be achieved through the following technical solutions:

A kind of storage means of speech data may further comprise the steps:

A, be the adaptive multi-rate speech frame, determine the essential information of one or more correspondent voice channels, and essential information is stored as file header according to described speech frame with encoded speech data;

B, frame type in the described speech frame and sub-flow data are stored as file body, formed voice document.

The essential information of described voice channel comprises, adaptive multi-rate interface format, ordering indication, voice channel telephone number, voice channel start time.

Before described steps A, further comprise, determine the type of coding of the speech data stored, with this encoded speech data type as file type, and storage this document type.

Before described steps A, further comprise, determine the number of voice channel by the number that detects correspondent, and, the voice channel call number is stored as essential information for each voice channel is provided with the voice channel call number;

Described speech frame information also comprises the voice channel call number.

The described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format one.

The described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format two.

If the speech frame length of described adaptive multi-rate interface format one is not the integral multiple of byte, then the low level from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.

If the length of the speech frame of described adaptive multi-rate interface format two is not the integral multiple of byte, then the high position from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.

If the described correspondent of steps A is the correspondent that dynamically adds in communication process, then adds the voice channel essential information of fashionable this correspondent of storage, and begin to store the speech frame of this correspondent voice channel from this correspondent joining day in this correspondent.

If the described correspondent of steps A dynamically withdraws from, then stop to store the speech frame of this correspondent voice channel post-set time from this correspondent in communication process.

Use method of the present invention, can be hereof with the speech data storage behind the AMR coding, so that on user terminal, preserve or transmission on the net, not only save storage space greatly, also compatible multiple AMR interface format, different ordering, and support MPTY and correspondent dynamically to add and withdraw from, can satisfy multiple demands such as telephonograph, voice e-mail and even the whole conversation of real-time recording, have great application prospect.

Description of drawings

Fig. 1 is the frame structure synoptic diagram of AMR interface format 1.

Fig. 2 is the frame structure synoptic diagram of AMR interface format 2.

Fig. 3 is the method flow diagram according to storage AMR coded voice data of the present invention.

Fig. 4 is the Three-Way Calling AMR coded voice data file structure synoptic diagram of a preferred embodiment of the present invention.

Fig. 5 is the voice e-mail AMR coded voice data file structure synoptic diagram of a preferred embodiment of the present invention.

Fig. 6 is that the existence of a preferred embodiment of the present invention dynamically adds and the AMR coded voice data file structure synoptic diagram of the MPTY that withdraws from.

Embodiment

In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention is further described below in conjunction with the drawings and specific embodiments.

The present invention utilizes advanced encoded speech data, be adaptive multi-rate (Adaptive Multi-Rate, AMR) coding, the speed of AMR coding generally is from 4.75Kbit/s to 12.2Kbit/s, that is to say, p.s. AMR coding the required storage space of speech data be 4750 bits to 12200 bits, and p.s. pcm encoder the required storage space of speech data be generally 64000 bits.By contrast as can be seen, through the required storage space storage space much less more required of speech data behind the AMR coding than the speech data of pcm encoder.

It is 20ms that speech data is carried out the AMR coding resulting AMR speech frame cycle of back, and the AMR sample frequency is 8khz, and then every frame has 160 sampled values.At present, AMR speech frame form has two kinds: AMR interface format 1 (AMR IF1) and AMR interface format 2 (AMR IF2).

The concrete form of AMR IF1 as shown in Figure 1, an AMR IF1 speech frame comprises frame type, the indication of frame quality, pattern indication, mode request, Cyclic Redundancy Check and A, B, C three sub-flow datas.Bit arrangement adopts high-end alignment in the AMR IF1 frame.

The concrete form of AMR IF2 as shown in Figure 2, an AMR IF2 speech frame comprises frame type, A, B, C three sub-flow data and filling bits.

AMR IF2 adopts the mode of byte-aligned, if speech frame is not the integral multiple of byte, just needs the filler polishing, and the bit arrangement in the AMR IF2 frame adopts the bottom aligned mode.

For AMR IF1 frame,, replace with empty frame if the CRC check mistake just abandons this content frame; And if CRC check is errorless, then the gross of speech frame just can guarantee, can not preserve so other AMR assists a ruler in governing a country content (indication of frame quality, pattern indication, mode request and CRC check), only need know that frame type just can meet the demands substantially.Like this, can unify to adopt the structure identical to be used as preserving in the method for the present invention the structure of AMR speech frame with AMR IF2.Because the bit sortord difference of two kinds of interface formats for two kinds of AMR speech frame structures are distinguished, need define an interface format (1 bit, 0 expression IF1,1 expression IF2) when this speech data of storage.

The bits of AMR three stream might be the bits of exporting after the encoder encodes, or the result that these bits are sorted according to subjective importance.Sort method is exactly the speech coder bit sequencing table according to the agreement regulation, and most important bit is concentrated on A stream, and the importance of B, C stream is successively decreased successively, and the sub-stream of A is carried out CRC check.The purpose of doing so mainly is under the certain situation of error code rate, and voice quality reduces less.In order to be distinguished, need define an ordering indication (1 bit, 0 expression is not sorted, ordering was carried out in 1 expression) during this speech frame in storage to the bit after the ordering with without the bit of ordering.

Based on above regulation about the AMR coded data, the invention provides a kind of method of the AMR of record coded voice data, with the speech data storage hereof, so that store, record and transmission on the net, this method may further comprise the steps:

Step 301, determine the type of voice document by the type of coding that detects speech data, for through the AMR encoder encodes, represent then that this document is the voice document of storage AMR speech frame if determine type of coding, storage " AMR n ", for the ease of expansion, its length is 16 bytes.

The number of step 302, the correspondent by detecting all accesses is determined the voice channel number, be the corresponding voice channel of each correspondent, and store this voice channel number, the voice channel of general conversation is to be conversation calling party, a tunnel be conversation callee at 2 the tunnel: the one tunnel, the voice channel number of Three-Way Calling is 3, and the voice channel number of voice e-mail is 1.Maximum numbers of voice channel can reach 16, and its length is generally 8 bits.

The passage call number of step 303, storaged voice passage correspondence, value are between 0 to 15, and its length is generally 4 bits.

Step 304, the speech frame form that takies the correspondent of this passage by detection are determined the AMR interface format of the speech data of this passage, and store this AMR interface format, 0 expression AMR interface type, 1,1 expression AMR interface type 2, and its length is generally 1 bit.

Step 305, the speech frame form that takies the correspondent of this passage by detection are determined the ordering indication of the speech data of this passage, and store this AMR speech frame ordering and indicate, 0 expression is not sorted according to importance, and 1 expression is sorted according to importance, and its length is generally 1 bit.

Step 306, determine and storage takies the telephone number of the correspondent of this passage that its length is generally 8 bytes.

Step 307, determine and storage takies turn-on time of the correspondent of this passage that its length is generally 8 bytes.

Step 308, storaged voice passage call number, its length is commonly defined as 4 bits.

Step 309, store the frame type of the speech data on this passage, its length is generally 4 bits, has 16 kinds of frame types, and its call number and corresponding content frame are as shown in table 1.Table 1 is the table of comparisons of frame type call number and content frame, as can be seen from the table, have 16 kinds of frame types, the coded format that 16 kinds of AMR are promptly arranged, different coded formats is represented with different speed encoded speech data, for example, AMR 4.75 presentation code speed are 4.75Kbit/s, AMR12.2 presentation code speed is 12.2Kbit/s, AMR SID represents the adaptive multi-rate quiet frame, GSM-EFR SID represents the network enhanced full rate quiet frame of global mobile communication, TDMA-EFR SID represents time division multiple access (TDMA) enhanced full rate quiet frame, PDC-EFR SID represents individual digital communication enhanced full rate quiet frame, and quiet frame is the speech frame type of correspondent when being in the state of keeping silence in communication process.

The frame type call number	Content frame
The frame type call number	Content frame	??0	????AMR?4.75
??1	????AMR?5.15	??0	????AMR?4.75
??1	????AMR?5.15	??2	????AMR?5.90
??3	????AMR?6.70	??2	????AMR?5.90
??3	????AMR?6.70	??4	????AMR?7.40
??5	????AMR?7.95	??4	????AMR?7.40
??5	????AMR?7.95	??6	????AMR?10.2
??7	????AMR?12.2	??6	????AMR?10.2
??7	????AMR?12.2	??8	????AMR?SID
??9	????GSM-EFR?SID	??8	????AMR?SID
??9	????GSM-EFR?SID	??10	????TDMA-EFR?SID
??11	????PDC-EFR?SID	??10	????TDMA-EFR?SID
??11	????PDC-EFR?SID	??12-14	????For?future?use
??15	????No?Data	??12-14	????For?future?use

Table 1

Step 310, storage take the sub-flow data of A, B, C of the speech data of this passage, and sub-flow data can be unsorted data, also can be the data after the ordering.

Step 311, additional filler are the integral multiples of byte length to guarantee whole speech frame.

Step 303 is the essential information of voice channel to step 307 storage, i.e. the essential information of correspondent, and method of the present invention has been reserved enough spaces, but the voice channel essential information of 16 correspondent of outfit as many as.If the voice channel number is N, step 303 needs to carry out N time to step 307 so, promptly needs the essential information of all voice channels is stored in the headspace.If there is new correspondent to add in the process of conversation, the voice channel essential information with new correspondent is added in the space of reservation automatically; If have correspondent to withdraw from conversation in the process of conversation, the essential information of this correspondent still is retained in the headspace, and after headspace is taken, initiate correspondent will cover the shared space of having withdrawed from of correspondent successively.

Step 308 is the information of speech frame to step 311 storage, if defining a shared time (20ms) of AMR speech frame is a time quantum, and certain conversation has continued M AMR time quantum, its voice channel number is N, and the centre adds without any correspondent or withdraws from, so whole conversation has M * N speech frame, so step 308 need to be carried out M * N time to step 311, promptly needs to store successively the content of each speech frame of each passage; If the centre has new correspondent to add, insert the speech frame information that begins to store respective channel from this new correspondent so; If the centre has correspondent to withdraw from, begin to stop to store the speech frame information of respective channel so from the time that this correspondent withdraws from.

The result of storage means of the present invention is the file that generates a storage AMR coded voice data, and above-mentioned steps 301 is to the file header of step 307 storage file, and step 308 is to the file body of step 311 storage file.This document can be preserved on user terminal or in transmission over networks.

Fig. 4 is the file structure of storage AMR coded voice data in the embodiment of Three-Way Calling, in this embodiment, by 10 seconds of Three-Way Calling, and middle do not have new correspondent to add, also withdraw from without any a correspondent, the voice channel number is 3 so as can be known, and conversation has continued 500 AMR time quantums, so conversation comprises 1500 speech frames altogether.As can be seen from Figure 4, this document comprises essential information and 1500 speech frames of 3 voice channels.

Fig. 5 is the file structure of storage AMR coded voice data among the embodiment of voice e-mail, this embodiment is the voice e-mail that continued for 10 seconds, so, its voice channel number is 1, this voice e-mail has continued 500 AMR time quantums, so this voice e-mail comprises 500 speech frames altogether.As can be seen from Figure 5, this document comprises essential information and 500 speech frames of 1 voice channel.

Fig. 6 is the AMR coded voice data file structure that has the MPTY that dynamically adds and withdraw from, this embodiment is the conversation that continued for 10 seconds, begin to converse simultaneously by tripartite (being respectively A, B and C side) in 4 seconds, since the 5th second A side withdraw from, add conversation since the 6th second D side as a new correspondent.As can be seen from Figure 6, passage 1 is the voice channel essential information of A side; Passage 2 is voice channel essential informations of B side; Passage 3 is voice channel essential informations of C side; Store the voice channel essential information of D sides when inserting at passage 4 in D side.It can also be seen that from Fig. 6, is A, B and C Three-Way Calling from AMR time quantum 1 to AMR time quantum 200, has 600 speech frames; From AMR time quantum 201 to AMR time quantums 300 is B and C two square tubes words, has 200 speech frames; From AMR time quantum 301 to AMR time quantums 500 is B, C and D Three-Way Calling, has 600 speech frames.Like this, whole communication process one has 1400 speech frames.This document comprises essential information and 1400 speech frames of 4 voice channels.

In concrete implementation process, can carry out suitable improvement, to adapt to the concrete needs of concrete condition to the method according to this invention.Therefore be appreciated that according to the specific embodiment of the present invention just to play an exemplary role, not in order to restriction protection scope of the present invention.

Claims

1, a kind of storage means of speech data is characterized in that, may further comprise the steps:

2, the storage means of speech data according to claim 1 is characterized in that, the essential information of described voice channel comprises, adaptive multi-rate interface format, ordering indication, voice channel telephone number, voice channel start time.

3, the storage means of speech data according to claim 1, it is characterized in that, before described steps A, further comprise, the type of coding of definite speech data of being stored, with this encoded speech data type as file type, and storage this document type.

4, the storage means of speech data according to claim 1, it is characterized in that, before described steps A, further comprise, determine the number of voice channel by the number that detects correspondent, and, the voice channel call number is stored as essential information for each voice channel is provided with the voice channel call number;

5, the storage means of speech data according to claim 1 is characterized in that, the described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format one.

6, the storage means of speech data according to claim 1 is characterized in that, the described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format two.

7, the storage means of speech data according to claim 5, it is characterized in that, if the length of described speech frame is not the integral multiple of byte, then the low level from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.

8, the storage means of speech data according to claim 6, it is characterized in that, if the length of described speech frame is not the integral multiple of byte, then the high position from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.

9, the storage means of speech data according to claim 1, it is characterized in that, if the described correspondent of steps A is the correspondent that dynamically adds in communication process, then add the voice channel essential information of fashionable this correspondent of storage, and begin to store the speech frame of this correspondent voice channel from this correspondent joining day in this correspondent.

10, the storage means of speech data according to claim 1 is characterized in that, if the described correspondent of steps A dynamically withdraws from communication process, then stops to store the speech frame of this correspondent voice channel post-set time from this correspondent.