CN1641748A - Voice data storage method - Google Patents

Voice data storage method Download PDF

Info

Publication number
CN1641748A
CN1641748A CNA2004100004013A CN200410000401A CN1641748A CN 1641748 A CN1641748 A CN 1641748A CN A2004100004013 A CNA2004100004013 A CN A2004100004013A CN 200410000401 A CN200410000401 A CN 200410000401A CN 1641748 A CN1641748 A CN 1641748A
Authority
CN
China
Prior art keywords
correspondent
voice channel
speech
speech data
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2004100004013A
Other languages
Chinese (zh)
Other versions
CN100456357C (en
Inventor
王利军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CNB2004100004013A priority Critical patent/CN100456357C/en
Publication of CN1641748A publication Critical patent/CN1641748A/en
Application granted granted Critical
Publication of CN100456357C publication Critical patent/CN100456357C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a storage method for voice data that stores the voice data into file after encoding by AMR coder. The needed storage space of voice data of AMR encoding is much less than that of PCM, so the file created by the invention is suited to store on user terminal or transmitting on network. Thus, it can satisfy the demand of supporting multipartite communication and communication party dynamic joining in or quit, telephone recording, voice email and even recording the whole communication. So the invention has a great application prospect.

Description

A kind of storage means of speech data
Technical field
The present invention relates to data storage technology, particularly relate to a kind of storage means of speech data.
Background technology
In the 3G system, except normal conversation, the user also has generally that phone is prerecorded, the demand of the other side's telephonograph and even the whole call function of real-time recording, like this, just need be saved in dialog context in the user terminal, treat to play when the user needs the dialog context that is kept at user terminal.In addition, the application of the voice e-mail and the networking telephone also need be in network transmitting audio data, so just need speech data to can be used for the network storage and transmission.
What telephonograph was in the past stored is that (Pulse Code Modulation, PCM) data are recorded in tape or the mass-memory unit voice pulse code modulation (PCM).PCM a kind ofly is usually used in sound signal to the Analog signals'digital Sampling techniques, general per second sampling 8000 times, and each sampled value accounts for 8 bits, 64Kbit/s altogether.Because storage PCM data need the very large memory device of capacity, have not only improved the user terminal cost, and have influenced the configuration design of user terminal.In addition, the PCM data also are unfavorable for storage and the transmission of voice document in network.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of storage means of speech data, to save the storage space of speech data, helps storage and transmitting audio data in network.
The objective of the invention is to be achieved through the following technical solutions:
A kind of storage means of speech data may further comprise the steps:
A, be the adaptive multi-rate speech frame, determine the essential information of one or more correspondent voice channels, and essential information is stored as file header according to described speech frame with encoded speech data;
B, frame type in the described speech frame and sub-flow data are stored as file body, formed voice document.
The essential information of described voice channel comprises, adaptive multi-rate interface format, ordering indication, voice channel telephone number, voice channel start time.
Before described steps A, further comprise, determine the type of coding of the speech data stored, with this encoded speech data type as file type, and storage this document type.
Before described steps A, further comprise, determine the number of voice channel by the number that detects correspondent, and, the voice channel call number is stored as essential information for each voice channel is provided with the voice channel call number;
Described speech frame information also comprises the voice channel call number.
The described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format one.
The described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format two.
If the speech frame length of described adaptive multi-rate interface format one is not the integral multiple of byte, then the low level from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
If the length of the speech frame of described adaptive multi-rate interface format two is not the integral multiple of byte, then the high position from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
If the described correspondent of steps A is the correspondent that dynamically adds in communication process, then adds the voice channel essential information of fashionable this correspondent of storage, and begin to store the speech frame of this correspondent voice channel from this correspondent joining day in this correspondent.
If the described correspondent of steps A dynamically withdraws from, then stop to store the speech frame of this correspondent voice channel post-set time from this correspondent in communication process.
Use method of the present invention, can be hereof with the speech data storage behind the AMR coding, so that on user terminal, preserve or transmission on the net, not only save storage space greatly, also compatible multiple AMR interface format, different ordering, and support MPTY and correspondent dynamically to add and withdraw from, can satisfy multiple demands such as telephonograph, voice e-mail and even the whole conversation of real-time recording, have great application prospect.
Description of drawings
Fig. 1 is the frame structure synoptic diagram of AMR interface format 1.
Fig. 2 is the frame structure synoptic diagram of AMR interface format 2.
Fig. 3 is the method flow diagram according to storage AMR coded voice data of the present invention.
Fig. 4 is the Three-Way Calling AMR coded voice data file structure synoptic diagram of a preferred embodiment of the present invention.
Fig. 5 is the voice e-mail AMR coded voice data file structure synoptic diagram of a preferred embodiment of the present invention.
Fig. 6 is that the existence of a preferred embodiment of the present invention dynamically adds and the AMR coded voice data file structure synoptic diagram of the MPTY that withdraws from.
Embodiment
In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention is further described below in conjunction with the drawings and specific embodiments.
The present invention utilizes advanced encoded speech data, be adaptive multi-rate (Adaptive Multi-Rate, AMR) coding, the speed of AMR coding generally is from 4.75Kbit/s to 12.2Kbit/s, that is to say, p.s. AMR coding the required storage space of speech data be 4750 bits to 12200 bits, and p.s. pcm encoder the required storage space of speech data be generally 64000 bits.By contrast as can be seen, through the required storage space storage space much less more required of speech data behind the AMR coding than the speech data of pcm encoder.
It is 20ms that speech data is carried out the AMR coding resulting AMR speech frame cycle of back, and the AMR sample frequency is 8khz, and then every frame has 160 sampled values.At present, AMR speech frame form has two kinds: AMR interface format 1 (AMR IF1) and AMR interface format 2 (AMR IF2).
The concrete form of AMR IF1 as shown in Figure 1, an AMR IF1 speech frame comprises frame type, the indication of frame quality, pattern indication, mode request, Cyclic Redundancy Check and A, B, C three sub-flow datas.Bit arrangement adopts high-end alignment in the AMR IF1 frame.
The concrete form of AMR IF2 as shown in Figure 2, an AMR IF2 speech frame comprises frame type, A, B, C three sub-flow data and filling bits.
AMR IF2 adopts the mode of byte-aligned, if speech frame is not the integral multiple of byte, just needs the filler polishing, and the bit arrangement in the AMR IF2 frame adopts the bottom aligned mode.
For AMR IF1 frame,, replace with empty frame if the CRC check mistake just abandons this content frame; And if CRC check is errorless, then the gross of speech frame just can guarantee, can not preserve so other AMR assists a ruler in governing a country content (indication of frame quality, pattern indication, mode request and CRC check), only need know that frame type just can meet the demands substantially.Like this, can unify to adopt the structure identical to be used as preserving in the method for the present invention the structure of AMR speech frame with AMR IF2.Because the bit sortord difference of two kinds of interface formats for two kinds of AMR speech frame structures are distinguished, need define an interface format (1 bit, 0 expression IF1,1 expression IF2) when this speech data of storage.
The bits of AMR three stream might be the bits of exporting after the encoder encodes, or the result that these bits are sorted according to subjective importance.Sort method is exactly the speech coder bit sequencing table according to the agreement regulation, and most important bit is concentrated on A stream, and the importance of B, C stream is successively decreased successively, and the sub-stream of A is carried out CRC check.The purpose of doing so mainly is under the certain situation of error code rate, and voice quality reduces less.In order to be distinguished, need define an ordering indication (1 bit, 0 expression is not sorted, ordering was carried out in 1 expression) during this speech frame in storage to the bit after the ordering with without the bit of ordering.
Based on above regulation about the AMR coded data, the invention provides a kind of method of the AMR of record coded voice data, with the speech data storage hereof, so that store, record and transmission on the net, this method may further comprise the steps:
Step 301, determine the type of voice document by the type of coding that detects speech data, for through the AMR encoder encodes, represent then that this document is the voice document of storage AMR speech frame if determine type of coding, storage " AMR n ", for the ease of expansion, its length is 16 bytes.
The number of step 302, the correspondent by detecting all accesses is determined the voice channel number, be the corresponding voice channel of each correspondent, and store this voice channel number, the voice channel of general conversation is to be conversation calling party, a tunnel be conversation callee at 2 the tunnel: the one tunnel, the voice channel number of Three-Way Calling is 3, and the voice channel number of voice e-mail is 1.Maximum numbers of voice channel can reach 16, and its length is generally 8 bits.
The passage call number of step 303, storaged voice passage correspondence, value are between 0 to 15, and its length is generally 4 bits.
Step 304, the speech frame form that takies the correspondent of this passage by detection are determined the AMR interface format of the speech data of this passage, and store this AMR interface format, 0 expression AMR interface type, 1,1 expression AMR interface type 2, and its length is generally 1 bit.
Step 305, the speech frame form that takies the correspondent of this passage by detection are determined the ordering indication of the speech data of this passage, and store this AMR speech frame ordering and indicate, 0 expression is not sorted according to importance, and 1 expression is sorted according to importance, and its length is generally 1 bit.
Step 306, determine and storage takies the telephone number of the correspondent of this passage that its length is generally 8 bytes.
Step 307, determine and storage takies turn-on time of the correspondent of this passage that its length is generally 8 bytes.
Step 308, storaged voice passage call number, its length is commonly defined as 4 bits.
Step 309, store the frame type of the speech data on this passage, its length is generally 4 bits, has 16 kinds of frame types, and its call number and corresponding content frame are as shown in table 1.Table 1 is the table of comparisons of frame type call number and content frame, as can be seen from the table, have 16 kinds of frame types, the coded format that 16 kinds of AMR are promptly arranged, different coded formats is represented with different speed encoded speech data, for example, AMR 4.75 presentation code speed are 4.75Kbit/s, AMR12.2 presentation code speed is 12.2Kbit/s, AMR SID represents the adaptive multi-rate quiet frame, GSM-EFR SID represents the network enhanced full rate quiet frame of global mobile communication, TDMA-EFR SID represents time division multiple access (TDMA) enhanced full rate quiet frame, PDC-EFR SID represents individual digital communication enhanced full rate quiet frame, and quiet frame is the speech frame type of correspondent when being in the state of keeping silence in communication process.
The frame type call number Content frame
??0 ????AMR?4.75
??1 ????AMR?5.15
??2 ????AMR?5.90
??3 ????AMR?6.70
??4 ????AMR?7.40
??5 ????AMR?7.95
??6 ????AMR?10.2
??7 ????AMR?12.2
??8 ????AMR?SID
??9 ????GSM-EFR?SID
??10 ????TDMA-EFR?SID
??11 ????PDC-EFR?SID
??12-14 ????For?future?use
??15 ????No?Data
Table 1
Step 310, storage take the sub-flow data of A, B, C of the speech data of this passage, and sub-flow data can be unsorted data, also can be the data after the ordering.
Step 311, additional filler are the integral multiples of byte length to guarantee whole speech frame.
Step 303 is the essential information of voice channel to step 307 storage, i.e. the essential information of correspondent, and method of the present invention has been reserved enough spaces, but the voice channel essential information of 16 correspondent of outfit as many as.If the voice channel number is N, step 303 needs to carry out N time to step 307 so, promptly needs the essential information of all voice channels is stored in the headspace.If there is new correspondent to add in the process of conversation, the voice channel essential information with new correspondent is added in the space of reservation automatically; If have correspondent to withdraw from conversation in the process of conversation, the essential information of this correspondent still is retained in the headspace, and after headspace is taken, initiate correspondent will cover the shared space of having withdrawed from of correspondent successively.
Step 308 is the information of speech frame to step 311 storage, if defining a shared time (20ms) of AMR speech frame is a time quantum, and certain conversation has continued M AMR time quantum, its voice channel number is N, and the centre adds without any correspondent or withdraws from, so whole conversation has M * N speech frame, so step 308 need to be carried out M * N time to step 311, promptly needs to store successively the content of each speech frame of each passage; If the centre has new correspondent to add, insert the speech frame information that begins to store respective channel from this new correspondent so; If the centre has correspondent to withdraw from, begin to stop to store the speech frame information of respective channel so from the time that this correspondent withdraws from.
The result of storage means of the present invention is the file that generates a storage AMR coded voice data, and above-mentioned steps 301 is to the file header of step 307 storage file, and step 308 is to the file body of step 311 storage file.This document can be preserved on user terminal or in transmission over networks.
Fig. 4 is the file structure of storage AMR coded voice data in the embodiment of Three-Way Calling, in this embodiment, by 10 seconds of Three-Way Calling, and middle do not have new correspondent to add, also withdraw from without any a correspondent, the voice channel number is 3 so as can be known, and conversation has continued 500 AMR time quantums, so conversation comprises 1500 speech frames altogether.As can be seen from Figure 4, this document comprises essential information and 1500 speech frames of 3 voice channels.
Fig. 5 is the file structure of storage AMR coded voice data among the embodiment of voice e-mail, this embodiment is the voice e-mail that continued for 10 seconds, so, its voice channel number is 1, this voice e-mail has continued 500 AMR time quantums, so this voice e-mail comprises 500 speech frames altogether.As can be seen from Figure 5, this document comprises essential information and 500 speech frames of 1 voice channel.
Fig. 6 is the AMR coded voice data file structure that has the MPTY that dynamically adds and withdraw from, this embodiment is the conversation that continued for 10 seconds, begin to converse simultaneously by tripartite (being respectively A, B and C side) in 4 seconds, since the 5th second A side withdraw from, add conversation since the 6th second D side as a new correspondent.As can be seen from Figure 6, passage 1 is the voice channel essential information of A side; Passage 2 is voice channel essential informations of B side; Passage 3 is voice channel essential informations of C side; Store the voice channel essential information of D sides when inserting at passage 4 in D side.It can also be seen that from Fig. 6, is A, B and C Three-Way Calling from AMR time quantum 1 to AMR time quantum 200, has 600 speech frames; From AMR time quantum 201 to AMR time quantums 300 is B and C two square tubes words, has 200 speech frames; From AMR time quantum 301 to AMR time quantums 500 is B, C and D Three-Way Calling, has 600 speech frames.Like this, whole communication process one has 1400 speech frames.This document comprises essential information and 1400 speech frames of 4 voice channels.
In concrete implementation process, can carry out suitable improvement, to adapt to the concrete needs of concrete condition to the method according to this invention.Therefore be appreciated that according to the specific embodiment of the present invention just to play an exemplary role, not in order to restriction protection scope of the present invention.

Claims (10)

1, a kind of storage means of speech data is characterized in that, may further comprise the steps:
A, be the adaptive multi-rate speech frame, determine the essential information of one or more correspondent voice channels, and essential information is stored as file header according to described speech frame with encoded speech data;
B, frame type in the described speech frame and sub-flow data are stored as file body, formed voice document.
2, the storage means of speech data according to claim 1 is characterized in that, the essential information of described voice channel comprises, adaptive multi-rate interface format, ordering indication, voice channel telephone number, voice channel start time.
3, the storage means of speech data according to claim 1, it is characterized in that, before described steps A, further comprise, the type of coding of definite speech data of being stored, with this encoded speech data type as file type, and storage this document type.
4, the storage means of speech data according to claim 1, it is characterized in that, before described steps A, further comprise, determine the number of voice channel by the number that detects correspondent, and, the voice channel call number is stored as essential information for each voice channel is provided with the voice channel call number;
Described speech frame information also comprises the voice channel call number.
5, the storage means of speech data according to claim 1 is characterized in that, the described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format one.
6, the storage means of speech data according to claim 1 is characterized in that, the described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format two.
7, the storage means of speech data according to claim 5, it is characterized in that, if the length of described speech frame is not the integral multiple of byte, then the low level from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
8, the storage means of speech data according to claim 6, it is characterized in that, if the length of described speech frame is not the integral multiple of byte, then the high position from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
9, the storage means of speech data according to claim 1, it is characterized in that, if the described correspondent of steps A is the correspondent that dynamically adds in communication process, then add the voice channel essential information of fashionable this correspondent of storage, and begin to store the speech frame of this correspondent voice channel from this correspondent joining day in this correspondent.
10, the storage means of speech data according to claim 1 is characterized in that, if the described correspondent of steps A dynamically withdraws from communication process, then stops to store the speech frame of this correspondent voice channel post-set time from this correspondent.
CNB2004100004013A 2004-01-06 2004-01-06 Voice data storage method Expired - Fee Related CN100456357C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100004013A CN100456357C (en) 2004-01-06 2004-01-06 Voice data storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100004013A CN100456357C (en) 2004-01-06 2004-01-06 Voice data storage method

Publications (2)

Publication Number Publication Date
CN1641748A true CN1641748A (en) 2005-07-20
CN100456357C CN100456357C (en) 2009-01-28

Family

ID=34866745

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100004013A Expired - Fee Related CN100456357C (en) 2004-01-06 2004-01-06 Voice data storage method

Country Status (1)

Country Link
CN (1) CN100456357C (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996286B (en) * 2006-01-06 2010-07-14 英华达(上海)电子有限公司 Method for saving and quickly searching speech information in electronic dictionary on portable device
WO2014190830A1 (en) * 2013-05-29 2014-12-04 小米科技有限责任公司 Sound recording synchronization method, apparatus, and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IT1256823B (en) * 1992-05-14 1995-12-21 Olivetti & Co Spa PORTABLE CALCULATOR WITH VERBAL NOTES.
ID15832A (en) * 1996-02-12 1997-08-14 Philips Electronics Nv AIRCRAFT CLIPS

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1996286B (en) * 2006-01-06 2010-07-14 英华达(上海)电子有限公司 Method for saving and quickly searching speech information in electronic dictionary on portable device
WO2014190830A1 (en) * 2013-05-29 2014-12-04 小米科技有限责任公司 Sound recording synchronization method, apparatus, and device

Also Published As

Publication number Publication date
CN100456357C (en) 2009-01-28

Similar Documents

Publication Publication Date Title
US8386523B2 (en) Random access audio decoder
CN1136715C (en) Mobile radio telephone capable of recording/reproducing voice signal and method for controlling the same
US20080117906A1 (en) Payload header compression in an rtp session
US20020111812A1 (en) Method and apparatus for encoding and decoding pause informantion
CN104917671B (en) Audio-frequency processing method and device based on mobile terminal
US20030236674A1 (en) Methods and systems for compression of stored audio
US20040038715A1 (en) Methods of recording voice signals in a mobile set
US20100188967A1 (en) System and Method for Providing a Replacement Packet
CN1732512A (en) Method and device for compressed-domain packet loss concealment
JP2014512575A (en) Frame loss concealment for multi-rate speech / audio codecs
JP2001503233A (en) Method and apparatus for decoding variable rate data
CN1212607C (en) Predictive speech coder using coding scheme selection patterns to reduce sensitivity to frame errors
US8438167B2 (en) Method and device for recording media
WO2007132377A1 (en) Adaptive jitter management control in decoder
CN109644444B (en) Method, apparatus, device and computer readable storage medium for wireless communication
CN1364362A (en) Method of providing error protection for data bit flow
CN1575491A (en) Method and apparatus for decoding a coded digital audio signal which is arranged in frames containing headers
CN1255788A (en) Method and appts. for improving speech signal quality transmitted in radio communication installation
Gardner et al. QCELP: A variable rate speech coder for CDMA digital cellular
WO2007091927A1 (en) Variable frame offset coding
US20030046711A1 (en) Formatting a file for encoded frames and the formatter
CN1641748A (en) Voice data storage method
CN101043759A (en) Method for realizing data service through voice band data VBD mode and system thereof
US7127390B1 (en) Rate determination coding
US7362770B2 (en) Method and apparatus for using and combining sub-frame processing and adaptive jitter-buffers for improved voice quality in voice-over-packet networks

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20090128

Termination date: 20200106