CN1641748A - Voice data storage method - Google Patents
Voice data storage method Download PDFInfo
- Publication number
- CN1641748A CN1641748A CNA2004100004013A CN200410000401A CN1641748A CN 1641748 A CN1641748 A CN 1641748A CN A2004100004013 A CNA2004100004013 A CN A2004100004013A CN 200410000401 A CN200410000401 A CN 200410000401A CN 1641748 A CN1641748 A CN 1641748A
- Authority
- CN
- China
- Prior art keywords
- correspondent
- voice channel
- speech
- speech data
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a storage method for voice data that stores the voice data into file after encoding by AMR coder. The needed storage space of voice data of AMR encoding is much less than that of PCM, so the file created by the invention is suited to store on user terminal or transmitting on network. Thus, it can satisfy the demand of supporting multipartite communication and communication party dynamic joining in or quit, telephone recording, voice email and even recording the whole communication. So the invention has a great application prospect.
Description
Technical field
The present invention relates to data storage technology, particularly relate to a kind of storage means of speech data.
Background technology
In the 3G system, except normal conversation, the user also has generally that phone is prerecorded, the demand of the other side's telephonograph and even the whole call function of real-time recording, like this, just need be saved in dialog context in the user terminal, treat to play when the user needs the dialog context that is kept at user terminal.In addition, the application of the voice e-mail and the networking telephone also need be in network transmitting audio data, so just need speech data to can be used for the network storage and transmission.
What telephonograph was in the past stored is that (Pulse Code Modulation, PCM) data are recorded in tape or the mass-memory unit voice pulse code modulation (PCM).PCM a kind ofly is usually used in sound signal to the Analog signals'digital Sampling techniques, general per second sampling 8000 times, and each sampled value accounts for 8 bits, 64Kbit/s altogether.Because storage PCM data need the very large memory device of capacity, have not only improved the user terminal cost, and have influenced the configuration design of user terminal.In addition, the PCM data also are unfavorable for storage and the transmission of voice document in network.
Summary of the invention
Fundamental purpose of the present invention is to provide a kind of storage means of speech data, to save the storage space of speech data, helps storage and transmitting audio data in network.
The objective of the invention is to be achieved through the following technical solutions:
A kind of storage means of speech data may further comprise the steps:
A, be the adaptive multi-rate speech frame, determine the essential information of one or more correspondent voice channels, and essential information is stored as file header according to described speech frame with encoded speech data;
B, frame type in the described speech frame and sub-flow data are stored as file body, formed voice document.
The essential information of described voice channel comprises, adaptive multi-rate interface format, ordering indication, voice channel telephone number, voice channel start time.
Before described steps A, further comprise, determine the type of coding of the speech data stored, with this encoded speech data type as file type, and storage this document type.
Before described steps A, further comprise, determine the number of voice channel by the number that detects correspondent, and, the voice channel call number is stored as essential information for each voice channel is provided with the voice channel call number;
Described speech frame information also comprises the voice channel call number.
The described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format one.
The described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format two.
If the speech frame length of described adaptive multi-rate interface format one is not the integral multiple of byte, then the low level from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
If the length of the speech frame of described adaptive multi-rate interface format two is not the integral multiple of byte, then the high position from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
If the described correspondent of steps A is the correspondent that dynamically adds in communication process, then adds the voice channel essential information of fashionable this correspondent of storage, and begin to store the speech frame of this correspondent voice channel from this correspondent joining day in this correspondent.
If the described correspondent of steps A dynamically withdraws from, then stop to store the speech frame of this correspondent voice channel post-set time from this correspondent in communication process.
Use method of the present invention, can be hereof with the speech data storage behind the AMR coding, so that on user terminal, preserve or transmission on the net, not only save storage space greatly, also compatible multiple AMR interface format, different ordering, and support MPTY and correspondent dynamically to add and withdraw from, can satisfy multiple demands such as telephonograph, voice e-mail and even the whole conversation of real-time recording, have great application prospect.
Description of drawings
Fig. 1 is the frame structure synoptic diagram of AMR interface format 1.
Fig. 2 is the frame structure synoptic diagram of AMR interface format 2.
Fig. 3 is the method flow diagram according to storage AMR coded voice data of the present invention.
Fig. 4 is the Three-Way Calling AMR coded voice data file structure synoptic diagram of a preferred embodiment of the present invention.
Fig. 5 is the voice e-mail AMR coded voice data file structure synoptic diagram of a preferred embodiment of the present invention.
Fig. 6 is that the existence of a preferred embodiment of the present invention dynamically adds and the AMR coded voice data file structure synoptic diagram of the MPTY that withdraws from.
Embodiment
In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention is further described below in conjunction with the drawings and specific embodiments.
The present invention utilizes advanced encoded speech data, be adaptive multi-rate (Adaptive Multi-Rate, AMR) coding, the speed of AMR coding generally is from 4.75Kbit/s to 12.2Kbit/s, that is to say, p.s. AMR coding the required storage space of speech data be 4750 bits to 12200 bits, and p.s. pcm encoder the required storage space of speech data be generally 64000 bits.By contrast as can be seen, through the required storage space storage space much less more required of speech data behind the AMR coding than the speech data of pcm encoder.
It is 20ms that speech data is carried out the AMR coding resulting AMR speech frame cycle of back, and the AMR sample frequency is 8khz, and then every frame has 160 sampled values.At present, AMR speech frame form has two kinds: AMR interface format 1 (AMR IF1) and AMR interface format 2 (AMR IF2).
The concrete form of AMR IF1 as shown in Figure 1, an AMR IF1 speech frame comprises frame type, the indication of frame quality, pattern indication, mode request, Cyclic Redundancy Check and A, B, C three sub-flow datas.Bit arrangement adopts high-end alignment in the AMR IF1 frame.
The concrete form of AMR IF2 as shown in Figure 2, an AMR IF2 speech frame comprises frame type, A, B, C three sub-flow data and filling bits.
AMR IF2 adopts the mode of byte-aligned, if speech frame is not the integral multiple of byte, just needs the filler polishing, and the bit arrangement in the AMR IF2 frame adopts the bottom aligned mode.
For AMR IF1 frame,, replace with empty frame if the CRC check mistake just abandons this content frame; And if CRC check is errorless, then the gross of speech frame just can guarantee, can not preserve so other AMR assists a ruler in governing a country content (indication of frame quality, pattern indication, mode request and CRC check), only need know that frame type just can meet the demands substantially.Like this, can unify to adopt the structure identical to be used as preserving in the method for the present invention the structure of AMR speech frame with AMR IF2.Because the bit sortord difference of two kinds of interface formats for two kinds of AMR speech frame structures are distinguished, need define an interface format (1 bit, 0 expression IF1,1 expression IF2) when this speech data of storage.
The bits of AMR three stream might be the bits of exporting after the encoder encodes, or the result that these bits are sorted according to subjective importance.Sort method is exactly the speech coder bit sequencing table according to the agreement regulation, and most important bit is concentrated on A stream, and the importance of B, C stream is successively decreased successively, and the sub-stream of A is carried out CRC check.The purpose of doing so mainly is under the certain situation of error code rate, and voice quality reduces less.In order to be distinguished, need define an ordering indication (1 bit, 0 expression is not sorted, ordering was carried out in 1 expression) during this speech frame in storage to the bit after the ordering with without the bit of ordering.
Based on above regulation about the AMR coded data, the invention provides a kind of method of the AMR of record coded voice data, with the speech data storage hereof, so that store, record and transmission on the net, this method may further comprise the steps:
The number of step 302, the correspondent by detecting all accesses is determined the voice channel number, be the corresponding voice channel of each correspondent, and store this voice channel number, the voice channel of general conversation is to be conversation calling party, a tunnel be conversation callee at 2 the tunnel: the one tunnel, the voice channel number of Three-Way Calling is 3, and the voice channel number of voice e-mail is 1.Maximum numbers of voice channel can reach 16, and its length is generally 8 bits.
The passage call number of step 303, storaged voice passage correspondence, value are between 0 to 15, and its length is generally 4 bits.
The frame type call number | Content frame |
??0 | ????AMR?4.75 |
??1 | ????AMR?5.15 |
??2 | ????AMR?5.90 |
??3 | ????AMR?6.70 |
??4 | ????AMR?7.40 |
??5 | ????AMR?7.95 |
??6 | ????AMR?10.2 |
??7 | ????AMR?12.2 |
??8 | ????AMR?SID |
??9 | ????GSM-EFR?SID |
??10 | ????TDMA-EFR?SID |
??11 | ????PDC-EFR?SID |
??12-14 | ????For?future?use |
??15 | ????No?Data |
Table 1
The result of storage means of the present invention is the file that generates a storage AMR coded voice data, and above-mentioned steps 301 is to the file header of step 307 storage file, and step 308 is to the file body of step 311 storage file.This document can be preserved on user terminal or in transmission over networks.
Fig. 4 is the file structure of storage AMR coded voice data in the embodiment of Three-Way Calling, in this embodiment, by 10 seconds of Three-Way Calling, and middle do not have new correspondent to add, also withdraw from without any a correspondent, the voice channel number is 3 so as can be known, and conversation has continued 500 AMR time quantums, so conversation comprises 1500 speech frames altogether.As can be seen from Figure 4, this document comprises essential information and 1500 speech frames of 3 voice channels.
Fig. 5 is the file structure of storage AMR coded voice data among the embodiment of voice e-mail, this embodiment is the voice e-mail that continued for 10 seconds, so, its voice channel number is 1, this voice e-mail has continued 500 AMR time quantums, so this voice e-mail comprises 500 speech frames altogether.As can be seen from Figure 5, this document comprises essential information and 500 speech frames of 1 voice channel.
Fig. 6 is the AMR coded voice data file structure that has the MPTY that dynamically adds and withdraw from, this embodiment is the conversation that continued for 10 seconds, begin to converse simultaneously by tripartite (being respectively A, B and C side) in 4 seconds, since the 5th second A side withdraw from, add conversation since the 6th second D side as a new correspondent.As can be seen from Figure 6, passage 1 is the voice channel essential information of A side; Passage 2 is voice channel essential informations of B side; Passage 3 is voice channel essential informations of C side; Store the voice channel essential information of D sides when inserting at passage 4 in D side.It can also be seen that from Fig. 6, is A, B and C Three-Way Calling from AMR time quantum 1 to AMR time quantum 200, has 600 speech frames; From AMR time quantum 201 to AMR time quantums 300 is B and C two square tubes words, has 200 speech frames; From AMR time quantum 301 to AMR time quantums 500 is B, C and D Three-Way Calling, has 600 speech frames.Like this, whole communication process one has 1400 speech frames.This document comprises essential information and 1400 speech frames of 4 voice channels.
In concrete implementation process, can carry out suitable improvement, to adapt to the concrete needs of concrete condition to the method according to this invention.Therefore be appreciated that according to the specific embodiment of the present invention just to play an exemplary role, not in order to restriction protection scope of the present invention.
Claims (10)
1, a kind of storage means of speech data is characterized in that, may further comprise the steps:
A, be the adaptive multi-rate speech frame, determine the essential information of one or more correspondent voice channels, and essential information is stored as file header according to described speech frame with encoded speech data;
B, frame type in the described speech frame and sub-flow data are stored as file body, formed voice document.
2, the storage means of speech data according to claim 1 is characterized in that, the essential information of described voice channel comprises, adaptive multi-rate interface format, ordering indication, voice channel telephone number, voice channel start time.
3, the storage means of speech data according to claim 1, it is characterized in that, before described steps A, further comprise, the type of coding of definite speech data of being stored, with this encoded speech data type as file type, and storage this document type.
4, the storage means of speech data according to claim 1, it is characterized in that, before described steps A, further comprise, determine the number of voice channel by the number that detects correspondent, and, the voice channel call number is stored as essential information for each voice channel is provided with the voice channel call number;
Described speech frame information also comprises the voice channel call number.
5, the storage means of speech data according to claim 1 is characterized in that, the described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format one.
6, the storage means of speech data according to claim 1 is characterized in that, the described adaptive multi-rate interface format of steps A is an adaptive multi-rate interface format two.
7, the storage means of speech data according to claim 5, it is characterized in that, if the length of described speech frame is not the integral multiple of byte, then the low level from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
8, the storage means of speech data according to claim 6, it is characterized in that, if the length of described speech frame is not the integral multiple of byte, then the high position from last byte begins to store filler, is the integral multiple of byte until the length of described speech frame.
9, the storage means of speech data according to claim 1, it is characterized in that, if the described correspondent of steps A is the correspondent that dynamically adds in communication process, then add the voice channel essential information of fashionable this correspondent of storage, and begin to store the speech frame of this correspondent voice channel from this correspondent joining day in this correspondent.
10, the storage means of speech data according to claim 1 is characterized in that, if the described correspondent of steps A dynamically withdraws from communication process, then stops to store the speech frame of this correspondent voice channel post-set time from this correspondent.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100004013A CN100456357C (en) | 2004-01-06 | 2004-01-06 | Voice data storage method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100004013A CN100456357C (en) | 2004-01-06 | 2004-01-06 | Voice data storage method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1641748A true CN1641748A (en) | 2005-07-20 |
CN100456357C CN100456357C (en) | 2009-01-28 |
Family
ID=34866745
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100004013A Expired - Fee Related CN100456357C (en) | 2004-01-06 | 2004-01-06 | Voice data storage method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100456357C (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1996286B (en) * | 2006-01-06 | 2010-07-14 | 英华达(上海)电子有限公司 | Method for saving and quickly searching speech information in electronic dictionary on portable device |
WO2014190830A1 (en) * | 2013-05-29 | 2014-12-04 | 小米科技有限责任公司 | Sound recording synchronization method, apparatus, and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
IT1256823B (en) * | 1992-05-14 | 1995-12-21 | Olivetti & Co Spa | PORTABLE CALCULATOR WITH VERBAL NOTES. |
ID15832A (en) * | 1996-02-12 | 1997-08-14 | Philips Electronics Nv | AIRCRAFT CLIPS |
-
2004
- 2004-01-06 CN CNB2004100004013A patent/CN100456357C/en not_active Expired - Fee Related
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1996286B (en) * | 2006-01-06 | 2010-07-14 | 英华达(上海)电子有限公司 | Method for saving and quickly searching speech information in electronic dictionary on portable device |
WO2014190830A1 (en) * | 2013-05-29 | 2014-12-04 | 小米科技有限责任公司 | Sound recording synchronization method, apparatus, and device |
Also Published As
Publication number | Publication date |
---|---|
CN100456357C (en) | 2009-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8386523B2 (en) | Random access audio decoder | |
CN1136715C (en) | Mobile radio telephone capable of recording/reproducing voice signal and method for controlling the same | |
US20080117906A1 (en) | Payload header compression in an rtp session | |
US20020111812A1 (en) | Method and apparatus for encoding and decoding pause informantion | |
CN104917671B (en) | Audio-frequency processing method and device based on mobile terminal | |
US20030236674A1 (en) | Methods and systems for compression of stored audio | |
US20040038715A1 (en) | Methods of recording voice signals in a mobile set | |
US20100188967A1 (en) | System and Method for Providing a Replacement Packet | |
CN1732512A (en) | Method and device for compressed-domain packet loss concealment | |
JP2014512575A (en) | Frame loss concealment for multi-rate speech / audio codecs | |
JP2001503233A (en) | Method and apparatus for decoding variable rate data | |
CN1212607C (en) | Predictive speech coder using coding scheme selection patterns to reduce sensitivity to frame errors | |
US8438167B2 (en) | Method and device for recording media | |
WO2007132377A1 (en) | Adaptive jitter management control in decoder | |
CN109644444B (en) | Method, apparatus, device and computer readable storage medium for wireless communication | |
CN1364362A (en) | Method of providing error protection for data bit flow | |
CN1575491A (en) | Method and apparatus for decoding a coded digital audio signal which is arranged in frames containing headers | |
CN1255788A (en) | Method and appts. for improving speech signal quality transmitted in radio communication installation | |
Gardner et al. | QCELP: A variable rate speech coder for CDMA digital cellular | |
WO2007091927A1 (en) | Variable frame offset coding | |
US20030046711A1 (en) | Formatting a file for encoded frames and the formatter | |
CN1641748A (en) | Voice data storage method | |
CN101043759A (en) | Method for realizing data service through voice band data VBD mode and system thereof | |
US7127390B1 (en) | Rate determination coding | |
US7362770B2 (en) | Method and apparatus for using and combining sub-frame processing and adaptive jitter-buffers for improved voice quality in voice-over-packet networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20090128 Termination date: 20200106 |