WO2012161887A1 - Encoded packet selection from a first voice stream to create a second voice stream - Google Patents
Encoded packet selection from a first voice stream to create a second voice stream Download PDFInfo
- Publication number
- WO2012161887A1 WO2012161887A1 PCT/US2012/033742 US2012033742W WO2012161887A1 WO 2012161887 A1 WO2012161887 A1 WO 2012161887A1 US 2012033742 W US2012033742 W US 2012033742W WO 2012161887 A1 WO2012161887 A1 WO 2012161887A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- silence suppression
- packet
- voice stream
- scheme
- silence
- Prior art date
Links
- 230000001629 suppression Effects 0.000 claims abstract description 181
- 238000000034 method Methods 0.000 claims description 11
- 230000001360 synchronised effect Effects 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000013500 data storage Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 3
- 239000000835 fiber Substances 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000007792 addition Methods 0.000 description 1
- 230000010267 cellular communication Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/10—Architectures or entities
- H04L65/1053—IP private branch exchange [PBX] functionality entities or arrangements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/60—Network streaming of media packets
- H04L65/70—Media network packetisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/54—Store-and-forward switching systems
- H04L12/56—Packet switching systems
- H04L12/5601—Transfer mode dependent, e.g. ATM
- H04L2012/5603—Access techniques
Definitions
- the invention relates generally to silence suppression in a voice stream.
- Voice calls over packet-switched networks are performed using a codec at a transmitting device to create encoded packets that represent a user's voice.
- the encoded packets are forwarded over the packet-switched network and decoded at a receiving device.
- Examples of transmitting and receiving devices comprise a mobile phone, smart phone, Voice over Internet Protocol (VoIP) terminal, personal computer, or other telephony devices.
- VoIP Voice over Internet Protocol
- a mobile phone encodes a calling party's voice into a plurality of encoded packets.
- the mobile phone sends the encoded packets to the packet-switched network as a voice stream (e.g., a stream of packets).
- the packet-switched network forwards the voice stream to a mobile phone of a called party where the encoded packets are decoded for playback to the called party.
- the called party's voice is encoded and forwarded to the calling party for playback.
- the mobile phone employs an Enhanced Variable Rate Codec (EVRC) standard to create encoded packets from the user's voice.
- EVRC Enhanced Variable Rate Codec
- Each encoded packet represents a 20 millisecond sample of the user's voice and/or background noise.
- the encoded packet is created at one code rate of a plurality of predefined code rates and associated sizes, which are defined by the EVRC standard. Examples of code rates in EVRC comprise full rate, half rate, and eighth rate, which have packet sizes of 171 bits, 80 bits, and 16 bits, respectively.
- Encoded packets or frames that are encoded at the eighth rate are generally used for samples that are predominantly background noise since they have a smaller frame size, use fewer network resources to be transmitted, and the background noise is not necessary for conversation between the users, as will be understood by those skilled in the art.
- eighth rate frames are encoded by the transmitting device and forwarded to the called party. Since background noise is generally not an important part of the conversation, some of the eighth rate frames can be removed from the voice stream for transmission over the packet-switched network and replaced or substituted with another eighth rate frame before playback at the receiving device. A complete elimination of background noise from the voice stream sounds to the called party as if the call has been dropped or otherwise ended.
- the invention in one implementation encompasses a method.
- a first voice stream for a packet-switched call is received from a calling party.
- the first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call.
- a subset of encoded packets are selected from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme.
- the second voice stream comprises the subset of encoded packets.
- the first silence suppression scheme is distinct from the second silence suppression scheme.
- the second voice stream is forwarded to a called party for the packet-switched call.
- the apparatus comprises a silence suppression interface component configured to receive a first voice stream for a packet-switched call from a calling party.
- the first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call.
- the silence suppression interface component is configured to select a subset of encoded packets from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme.
- the second voice stream comprises the subset of encoded packets.
- the first silence suppression scheme is distinct from the second silence suppression scheme.
- the silence suppression interface component is configured to forward the second voice stream to a called party for the packet-switched call.
- a further implementation of the invention encompasses an article.
- the article comprises one or more non-transitory processor-readable media storing instructions which, when executed by a processor, cause the processor to perform a method.
- the method comprises the step of receiving a first voice stream for a packet-switched call from a calling party, where the first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call.
- the method further comprises the step of selecting a subset of encoded packets from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme, where the second voice stream comprises the subset of encoded packets and where the first silence suppression scheme is distinct from the second silence suppression scheme.
- the method further comprises the step of forwarding the second voice stream to a called party for the packet-switched call.
- FIG. 1 is a representation of one embodiment of an apparatus that comprises a transmitting device, a packet-switched network, and a receiving device.
- FIG. 2 is a representation of one implementation of packet flow through the apparatus of FIG. 1 , illustrating silence suppression in the packet-switched network.
- FIG. 3 is a representation of one embodiment of an apparatus that comprises a transmitting device, a packet-switched network, a receiving device, and a silence suppression interface component.
- FIG. 4 is a representation of another embodiment of the apparatus of
- FIG. 3 illustrating the silence suppression interface component combined or integral with a packet switch of the packet-switched network.
- FIG. 5 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a pass-through of an EVRC-NW
- FIG. 6 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a mediation between an EVRC- NW DTX silence suppression scheme to an RFC 4788 or similar silence suppression scheme.
- FIG. 7 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a pass-through of FIG. 5 with a different silence suppression interval.
- FIG. 8 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating the mediation of FIG. 6 with a different silence suppression interval.
- FIG. 9 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a mediation between mismatched codecs.
- FIG. 10 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating another mediation between mismatched codecs.
- FIG. 1 1 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating another variation to the pass- through of FIG. 5.
- FIG. 12 is a representation of another embodiment of the apparatus of FIG. 3, illustrating Voice over IP terminals as the transmitting and receiving devices.
- a silence suppression scheme comprises rules and/or parameters which can be used to "drop" or omit one or more eighth rate (or background noise) packets from a continuous stream of packets to create a discontinuous stream.
- a silence suppression scheme may be used to both reduce the number of encoded packets that represent background noise which are transmitted toward or across the packet-switched network and to maintain a certain amount of background noise or "comfort noise" in the voice stream, as will be appreciated by those skilled in the art.
- a silence suppression scheme is a discontinuous transmission (DTX) scheme.
- DTX discontinuous transmission
- 3GPP2 3 rd Generation Partnership Project 2
- C.S0076-0 and C.S0014-D www.3gpp2.org
- IETF Internet Engineering Task Force
- the transmitting and/or receiving device in one example may employ 3GPP2 compatible silence suppression (e.g., prior to sending a voice stream over the packet-switched network) while the packet-switched network employs the IETF compatible (e.g., RFC 4788, IETF draft draft-ietf-avt-rtp- evrc-nw-02, or similar) silence suppression.
- the 3GPP2 Enhanced Variable Rate Codec Narrowband-Wideband (EVRC-NW; described in C.S0014-D) describes a modified DTX scheme.
- a voice stream which conforms to the EVRC-NW silence suppression scheme may not conform to the RFC 4788 or similar (e.g., IETF draft-ietf-avt-rtp-evrc-nw-02) silence suppression scheme.
- an apparatus 100 of the prior art comprises a transmitting device 110, a packet-switched network 130, and a receiving device 140.
- the apparatus 100 is configured to carry a voice stream (i.e., voice bearer traffic) from a user (not shown) of the transmitting device 110 to a user (not shown) of the receiving device 140.
- the voice stream comprises a plurality of voice packets or frames encoded at different rates.
- One or more of the plurality of voice packets are forwarded from the transmitting device 110, over the packet-switched network 130, and to the receiving device 140.
- the transmitting device 110 and receiving device 140 in one implementation comprise mobile stations of a cellular communication network.
- the transmitting device 110 comprises a mobile station 114 in a calling party network (or call domain) 112 and the receiving device 140 comprises a mobile station 146 in a called party network (or call domain) 142.
- the mobile stations 114 and 146 are configured to communicate with respective base stations 116 and 144 over respective air interfaces 118 and 148, as will be appreciated by those skilled in the art.
- the base stations 116 and 144 in one example comprise cellular base stations.
- the packet-switched network 130 in this implementation comprises a core network, transport network, back-haul network, or other packet-switched network.
- the packet-switched network 130 in one example comprises one or more packet switches 132 and 134.
- Examples of the packet switches 132 and 134 comprise packet switch gateways (PSG), media gateways (MGW), packet frame selectors, routers, and other network devices.
- the packet-switched network 130 comprises or is coupled with a Transcoder Free Operation (TrFO) network, such as TrFO network 160. While only two packet switches are shown, the packet-switched network 130 may comprise additional packet switches. Accordingly, the packet switches 132 and 134 may be located in an interior or on edges of the packet- switched network 130, as will be appreciated by those skilled in the art.
- the packet switch 132 is configured to communicate with the base station 1 16 and the TrFO network 160 over interfaces 120 and 162, respectively.
- the packet switch 134 is configured to communicate with the base station 144 and the TrFO network 160 over interfaces 150 and 164, respectively.
- Examples of the interfaces 120, 162, 150, and 164 comprise wireline, wireless, fiber optic, or other communication paths, as will be appreciated by those skilled in the art.
- the packet-switched network 130 in one example comprises or is coupled with one or more packet switch controllers (PSC) 136 and 137.
- the packet switch controllers 136 and 137 comprise an access manager, mobile switching center (MSC), or mobile switching center emulation (MSCe) component.
- the packet switch controllers 136 and 137 in one example are configured to manage communications between the calling party network 1 12, the called party network 142, and the packet-switched network 130.
- the packet switch controllers 136 and 137 instruct the base stations 1 16 and 144 and packet switches 132 and 134 to create a voice bearer path for the packet-switched call, as will be appreciated by those skilled in the art.
- both packet switches 132 and 134 may be controlled by a single packet switch controller.
- a packet flow 200 of the prior art represents one example of a flow of packets for a call (e.g., voice stream) between the mobile station 1 14 and 146 for the apparatus 100 of FIG. 1 .
- the packet flow 200 shows a sample duration of 400 milliseconds of silence or background noise with 20 millisecond intervals between encoded packets.
- the call domains e.g., mobile stations 1 14 and 146 and base stations 1 16 and 144
- eighth rate encoded packets 250, 251 , 252, ... 270 are forwarded from the mobile station 1 14 to the packet switch 132 via the base station 1 16.
- the packet switch 132 of FIG. 2 is configured to employ a silence suppression scheme across the TrFO network 160 to the packet switch 134.
- the silence suppression scheme of the packet switch 132 is compatible with or conforms to the RFC 4788 or similar DTX scheme (including IETF draft draft-ietf-avt-rtp-evrc-nw-02).
- the DTX scheme comprises Silence Insertion Description (SID) frames which are eighth rate packets sent during DTX periods of silence.
- SID Silence Insertion Description
- a "guaranteed update interval" or silence suppression interval N is a number of packets from one SID frame to the next.
- the DTX scheme of C.S0076-0 allows for a silence suppression interval N in the range of one to 50, while the modified DTX scheme of C.S0014-D allows for a silence suppression interval N of 1 , 4, or 8.
- the packets dropped by the packet switch 132 are replaced or substituted by the packet switch 134.
- the packet switch 134 repeats or copies a last received packet on the silence suppression interval (e.g., packets 250, 260, and 270) so that it appears to the mobile station 146 (and base station 144) that a continual sequence of packets is sent from the packet switch 134.
- FIG. 2 illustrates that the packet switch 134 sends ten instances of the packet 250, one "original" packet and nine additional instances or copies, as will be appreciated by those skilled in the art.
- an apparatus 300 comprises the components of the apparatus 100 and further comprises a silence suppression interface (SSI) component 320.
- the SSI component 320 in one example comprises a network infrastructure device, application server, or other computing device.
- the SSI component 320 comprises a processor or computer that is configured to execute software or instructions stored in a memory.
- the SSI component 320 comprises an instance of a recordable data storage medium 322, as described herein.
- the SSI component 320 in one example comprises one or more parameters or rules for the first and/or second silence suppression schemes, for example, stored on the recordable data storage medium 322.
- the SSI component 320 in one example is configured to be updated with additional and/or newer silence suppression schemes or parameters or rules of silence suppression schemes to support alternative codecs, transmitting devices, and/or receiving devices.
- the SSI component 320 receives one or more parameters for a third silence suppression scheme and stores the parameters in the recordable data storage medium 322.
- the parameters and/or silence suppression schemes may be received over a network, human interface device (e.g., keyboard or terminal), or through another instance of the recordable data storage medium 322.
- the SSI component 320 is shown as a separate component from the calling party network 1 12 and the packet- switched network 130.
- the SSI component 320 may be a device or component within the calling party network 1 12 or the packet-switched network 130.
- the SSI component 320 may also be combined, or formed integrally with a device or component of the calling party network 1 12, called party network 142, or packet-switched network 130, such as the base stations 1 16 or 144, the packet switch 132, or the packet switch 134.
- the air interfaces 1 18 and 148 may be replaced by a wireline, fiber optic, or other communication paths.
- the calling party network 1 12 and the called party network 142 may be the same network or comprise a landline network.
- an apparatus 400 represents another implementation of the apparatus 300 and illustrates the SSI component 320 combined with the packet switch 132.
- the SSI component 320 may be combined with or implemented by a codec module, voice quality enhancement module (e.g., acoustic echo canceller, noise suppressor, automatic gain controller, and noise compensator), packet frame selector, network switch, network bridge, or router, as will be appreciated by those skilled in the art.
- voice quality enhancement module e.g., acoustic echo canceller, noise suppressor, automatic gain controller, and noise compensator
- packet frame selector e.g., packet frame selector, network switch, network bridge, or router, as will be appreciated by those skilled in the art.
- the SSI component 320 is configured to mediate between first and second silence suppression schemes.
- the SSI component 320 is configured to receive a first voice stream that conforms to the first silence suppression scheme, alter the first voice stream to create a second voice stream that conforms to the second silence suppression scheme, and forward the second voice stream, as will be appreciated by those skilled in the art.
- the SSI component 320 in one example is configured for bi-directional mediation between the transmitting device 1 10 and receiving device 140. For example, either device 1 10 or 140 can send voice streams and/or receive altered voice streams via the SSI component 320.
- FIG. 5 one example of a packet flow 500 for the implementation of FIG. 4 is shown.
- the SSI component 320 in this implementation (shown combined with the packet switch 132) is configured for mediating between two distinct silence suppression schemes by "passing through” each eighth rate packet that is received (with the exception of "blanks," as described herein).
- the interfaces 1 18, 120, 148, and 150 are configured for EVRC-NW DTX streams and the interfaces 162 and 164 (FIG. 4) are configured for RFC 4788 or similar DTX streams.
- the packet flow 500 illustrates a potential problem of mediating between the calling and called party networks 1 12 and 142 using a newer silence suppression scheme while maintaining a silence suppression scheme currently used in the packet-switched network 130 between the networks 1 12 and 142.
- a first silence suppression scheme e.g., EVRC-NW
- a second silence suppression scheme e.g., RFC 4788
- the mobile station 1 14 sends an eighth rate packet after every fourth encoded packet, for example, eighth rate (or SID) packets 550, 554, 558, 562, 566, and 570 are sent upon the silence suppression interval.
- the eighth rate packets 550, 554, 558, 562, 566, and 570 in one example comprise synchronous packets or "non-critical" packets.
- the EVRC-NW DTX scheme further defines "critical" or asynchronous packets which comprise eighth rate packets that identify significant changes in background noise by the mobile encoder. Critical packets can be transmitted independently of the silence suppression interval.
- Examples of critical packets comprise packets 556 and 568 which occur outside of the silence suppression interval.
- the mobile station 1 14 sends a plurality of encoded packets for the voice stream, comprising the eighth rate packets 550, 554, 556, 558, 562, 566, 568, and 570 to the base station 1 16.
- the base station 1 16 is configured to forward the synchronous and asynchronous packets along with "blanks" or blanked packets 580 to the packet switch 132 (and SSI component 320).
- the blanks 580 in one example provide an indication that the interface 120 or "line” is still live and that an update or packet is expected.
- the blanks 580 in this example are not forwarded to the packet switch 134 or TrFO network 160.
- the packet switch 132 forwards (e.g., selects and passes through) both the synchronous and asynchronous packets to the packet switch 134 (but not the blanks 580).
- the packet switch 134 forwards the synchronous and asynchronous packets to the base station 144 along with repeated packets to substitute for packets dropped by the packet switch 132.
- the packet switch 134 sends four instances of the packet 550, two instances of the packet 554, etc.
- FIG. 5 is one embodiment of a solution which provides mediation between different silence suppression schemes by the SSI component 320. It shows an increase in traffic over the TrFO network 160, relative to FIG. 2, where silence is suppressed according to the RFC 4788 (or similar) DTX scheme. Further, this embodiment relies on the premise that EVRC-NW narrowband (NB) modes are compatible with EVRC-B vocoders. Here, the mobile stations are directed to use only EVRC-NW NB modes, and wideband (WB) mode 0 is disallowed, as will be appreciated by those skilled in the art.
- NB narrowband
- WB wideband
- FIG. 5 illustrates an embodiment where the silence suppression scheme employed in either calling party network 1 12 or called party network 142 is adapted for an end-to-end silence suppression scheme by the silence suppression interface component 320.
- this solution lacks the flexibility to maintain an existing configuration in the packet-switched network 130 that an operator of the packet-switched network 130 might prefer.
- configurations e.g., silence suppression schemes
- the goals and operative components of the packet-switched network 130 may differ from the goals and operative components of the calling party network 1 12 and called party network 142.
- the SSI component 320 as configured in FIGS.
- FIGS. 5, 7, and 1 1 may require extra resources to provide an end to end solution.
- FIG. 6 a packet flow 600 for another implementation of the apparatus 300 is shown.
- the SSI component 320 (shown combined with the packet switch 132) is configured to mediate between the calling party network 1 12 and the packet-switched network 130.
- Packet flow 600 shows a 400 millisecond period of background noise through the voice stream.
- the packet switch 132 is configured to alter a voice stream received from the base station 1 16 according to the second silence suppression scheme. Analogously to packet flow 500 in FIG. 5, the synchronous packets 550, 554, 558, 562, 566, and 570, the asynchronous packets 556 and 568, and the blanks 580 are forwarded to the packet switch 132 via the base station 1 16. The packet switch 132 counts the synchronous packets, asynchronous packets, and blanks in determining when to apply the silence suppression interval. The packet switch 132 in one example alters the voice stream by selecting a subset of received packets from the plurality of received packets of the voice stream.
- the second voice stream comprises packets 550, 558, and 570, selected by the packet switch 132.
- the second voice stream is forwarded via the TrFO Network 160 to the packet switch 134.
- the packet switch 134 forwards the second voice stream to the base station 144 along with copies of the most recently received packets (e.g., packets 550 and 558) to fill in the remaining packets of the voice stream.
- the packet switch 132 passes through packets 550, 556, 558, 566, and 568 in conformance with the silence suppression scheme of the mobile station 1 14 and base station 1 16.
- the packet switch 134 passes the received packets along with copies to the base station 144.
- FIG. 8 illustrates a packet flow 800 similar to the packet flow 600 of FIG. 6.
- the SSI component 320 maintains the existing settings of the TrFO Network 160. Moreover, it adapts or alters the voice stream received from the base stations 1 16 by selecting the subset of received packets.
- the SSI component 320 facilitates transmission between networks with distinct silence suppression schemes, for example, the update intervals (or other parameters, rules, etc.) of the silence suppression schemes do not match between the calling party network 1 12 and the called party network 142.
- a packet flow 900 is similar to packet flow 800 but modified for mismatched codecs.
- the call domains can use different silence suppression schemes, or can use a voice encoder that does not contain a silence suppression scheme.
- the packet switch 132 alters the voice stream received from the base station 116 by selecting a subset of the received packets 550, 556, 558, 566, and 568.
- the altered voice stream comprises packets 550, 558, and 568.
- the packets 550, 558, and 568 are forwarded to the packet switch 134, which forwards the packets along with copies to the mobile station 146 via base station 144. Accordingly, two call domains with distinct silence suppression schemes are interfaced with a call domain using an older voice codec without a silence suppression scheme.
- the mobile station 1 14 is using an older voice encoder (e.g., EVRC-B) and is not performing silence suppression.
- FIG. 10 shows a "reverse" of the codecs for the voice stream of FIG. 9, or the application of the EVRC-NW DTX scheme to a first voice stream without silence suppression in the calling party network 1 12 as shown in FIG. 2.
- packet switch 132 applies silence suppression to the packets 251 through 259 and from 261 through 269 and passes packets 250, 260, and 270 to the packet switch 134.
- the packet switch 134 forwards the packets 250, 260, and 270 along with copies of packets 250 and 260.
- the base station 144 applies the silence suppression scheme of the mobile station 146 to the voice stream.
- a packet flow 1 100 illustrates another variation of the "pass-through" of FIG. 5.
- the called party network 142 is not performing silence suppression and is using the EVRC-B codec.
- While portions of the above description are related to, for example, mobile stations, packet-switched networks, and codecs such as EVRC, those skilled in the art will appreciate that alternate implementations and embodiments are possible.
- Alternate implementations of mobile stations comprise mobile phones, smart phones, voicemail servers, Voice over Internet Protocol (VoIP) terminals, personal computers, session initiation protocol (SIP) devices, telephony devices, or other devices configured to receive speech.
- Alternate implementations of packet-switched networks comprise local area networks, wide area networks, the Internet, other packet- switched networks and combinations thereof.
- codecs comprise EVRC-A, EVRC-B, EVRC-WB, EVRC-NW, Long Term Evolution (LTE) codecs, Adaptive MultiRate (AMR) codecs, and other speech or audio codecs where packets comprising background noise may be omitted.
- additional embodiments comprise wireline or landline networks, private phone networks, voicemail services, and combinations thereof.
- silence suppression schemes compatible with DTX and the above-mentioned 3GPP2 and IETF documents are described, alternative silence suppression schemes may be used with different rules, parameters, or criteria for dropping or omitting packets from the voice stream, as will be appreciated by those skilled in the art.
- the silence suppression interface component 320 is configured to mediate between the transmitting device 110 and the packet-switched network 130 (e.g., between first and second silence suppression schemes).
- the silence suppression interface component 320 is configured to mediate between one or more additional devices, networks, and/or call domains with one or more respective silence suppression schemes.
- the silence suppression interface component 320 is configured to provide mediation between one silence suppression scheme and a set of a number Z additional silence suppression schemes (e.g., "one-to-many" or 1 -to-Z mediation).
- the silence suppression interface component 320 is configured to provide mediation between a second set of a number Y of silence suppression schemes and the first set of silence suppression schemes, such as Y-to-Z mediation.
- the silence suppression interface component 320 in one example comprises a 1 x Z or Y x Z decision table and/or matrix for mediation between multiple silence suppression schemes. Accordingly, the silence suppression interface component 320 is configured to mediate between multiple instances of the transmitting device 1 10, the receiving device 140, and/or the packet-switched network 130 with respective silence suppression schemes, as will be appreciated by those skilled in the art.
- an apparatus 1200 represents another implementation of the apparatus 300.
- the apparatus 1200 comprises the transmitting device 1 10, receiving device 140, and packet-switched network 130.
- the transmitting device 1 10 and receiving device 140 comprise Voice over IP terminals 1210 and 1240, respectively.
- the VoIP terminals 1210 and 1240 are configured to communicate over the packet- switched network 130.
- the VoIP terminal 1210 is configured to employ a first silence suppression scheme and to send a voice stream conforming to the first silence suppression scheme to the packet switch 132.
- the packet switch 132 comprises the SSI component 320 and is configured to alter the voice stream to conform to a second silence suppression scheme.
- the packet switch 132 selects a subset of packets from the voice stream to create a second voice stream that conforms to a second silence suppression scheme.
- the packet switch 134 is configured to replace or substitute one or more packets into the second voice stream before playback at the VoIP terminal 1240, as will be appreciated by those skilled in the art.
- the apparatus 300 in one example employs one or more non-transitory processor-readable media.
- the non-transitory processor-readable media store software (e.g., compiled or interpreted code), firmware and/or assembly language for performing (e.g., by a processor or computer) one or more portions of one or more implementations of the invention.
- Examples of a non- transitory processor-readable medium for the apparatus 300 comprise the recordable data storage medium 322 of the SSI component 320.
- the non- transitory processor-readable media for the apparatus 300 in one example comprise one or more of a magnetic, electrical, optical, biological, and atomic data storage medium.
- non-transitory processor-readable media comprise floppy disks, magnetic tapes, CD-ROMs, DVD-ROMs, hard disk drives, and electronic memory.
- the non-transitory processor-readable media comprise removable or portable devices, such as flash memory drives.
- the apparatus 300 in one example comprises a plurality of components such as one or more of electronic components, hardware components, and computer software components. A number of such components can be combined or divided in the apparatus 300.
- An example component of the apparatus 300 employs and/or comprises a set and/or series of computer instructions written in or implemented with any of a number of programming languages, as will be appreciated by those skilled in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Networks & Wireless Communication (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Telephonic Communication Services (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
In one implementation, a first voice stream for a packet-switched call is received from a calling party. The first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call. A subset of encoded packets are selected from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme. The second voice stream comprises the subset of encoded packets. The first silence suppression scheme is distinct from the second silence suppression scheme. The second voice stream is forwarded toward a called party for the packet-switched call.
Description
ENCODED PACKET SELECTION FROM A FIRST VOICE STREAM TO CREATE A SECOND VOICE STREAM
TECHNICAL FIELD
[01] The invention relates generally to silence suppression in a voice stream.
BACKGROUND
[02] Voice calls over packet-switched networks are performed using a codec at a transmitting device to create encoded packets that represent a user's voice. The encoded packets are forwarded over the packet-switched network and decoded at a receiving device. Examples of transmitting and receiving devices comprise a mobile phone, smart phone, Voice over Internet Protocol (VoIP) terminal, personal computer, or other telephony devices. In one implementation, a mobile phone encodes a calling party's voice into a plurality of encoded packets. The mobile phone sends the encoded packets to the packet-switched network as a voice stream (e.g., a stream of packets). The packet-switched network forwards the voice stream to a mobile phone of a called party where the encoded packets are decoded for playback to the called party. Analogously, the called party's voice is encoded and forwarded to the calling party for playback.
[03] In this implementation, the mobile phone employs an Enhanced Variable Rate Codec (EVRC) standard to create encoded packets from the user's voice. Each encoded packet represents a 20 millisecond sample of the user's voice and/or background noise. Based on the sample, the encoded packet is created at one code rate of a plurality of predefined code rates and associated sizes, which are defined by the EVRC standard. Examples of code rates in EVRC comprise full rate, half rate, and eighth rate, which have packet sizes of 171 bits, 80 bits, and 16 bits, respectively. Encoded packets or frames that are encoded at the eighth rate (e.g., eighth rate frames, eighth
rate packets, or rate 1/8 frames) are generally used for samples that are predominantly background noise since they have a smaller frame size, use fewer network resources to be transmitted, and the background noise is not necessary for conversation between the users, as will be understood by those skilled in the art.
[04] When the calling party is generally silent, such as when listening to the called party or during a pause in conversation, eighth rate frames are encoded by the transmitting device and forwarded to the called party. Since background noise is generally not an important part of the conversation, some of the eighth rate frames can be removed from the voice stream for transmission over the packet-switched network and replaced or substituted with another eighth rate frame before playback at the receiving device. A complete elimination of background noise from the voice stream sounds to the called party as if the call has been dropped or otherwise ended.
SUMMARY
[05] The invention in one implementation encompasses a method. A first voice stream for a packet-switched call is received from a calling party. The first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call. A subset of encoded packets are selected from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme. The second voice stream comprises the subset of encoded packets. The first silence suppression scheme is distinct from the second silence suppression scheme. The second voice stream is forwarded to a called party for the packet-switched call.
[06] Another implementation of the invention encompasses an apparatus. The apparatus comprises a silence suppression interface component configured to receive a first voice stream for a packet-switched call from a calling party. The first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched
call. The silence suppression interface component is configured to select a subset of encoded packets from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme. The second voice stream comprises the subset of encoded packets. The first silence suppression scheme is distinct from the second silence suppression scheme. The silence suppression interface component is configured to forward the second voice stream to a called party for the packet-switched call.
[07] A further implementation of the invention encompasses an article. The article comprises one or more non-transitory processor-readable media storing instructions which, when executed by a processor, cause the processor to perform a method. The method comprises the step of receiving a first voice stream for a packet-switched call from a calling party, where the first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call. The method further comprises the step of selecting a subset of encoded packets from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme, where the second voice stream comprises the subset of encoded packets and where the first silence suppression scheme is distinct from the second silence suppression scheme. The method further comprises the step of forwarding the second voice stream to a called party for the packet-switched call.
DESCRIPTION OF THE DRAWINGS
[08] FIG. 1 is a representation of one embodiment of an apparatus that comprises a transmitting device, a packet-switched network, and a receiving device.
[09] FIG. 2 is a representation of one implementation of packet flow through the apparatus of FIG. 1 , illustrating silence suppression in the packet-switched network.
[10] FIG. 3 is a representation of one embodiment of an apparatus that comprises a transmitting device, a packet-switched network, a receiving device, and a silence suppression interface component.
[11] FIG. 4 is a representation of another embodiment of the apparatus of
FIG. 3, illustrating the silence suppression interface component combined or integral with a packet switch of the packet-switched network.
[12] FIG. 5 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a pass-through of an EVRC-NW
DTX silence suppression scheme to an RFC 4788 or similar silence suppression scheme.
[13] FIG. 6 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a mediation between an EVRC- NW DTX silence suppression scheme to an RFC 4788 or similar silence suppression scheme.
[14] FIG. 7 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a pass-through of FIG. 5 with a different silence suppression interval.
[15] FIG. 8 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating the mediation of FIG. 6 with a different silence suppression interval.
[16] FIG. 9 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating a mediation between mismatched codecs.
[17] FIG. 10 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating another mediation between mismatched codecs.
[18] FIG. 1 1 is a representation of one implementation of a packet flow through the apparatus of FIG. 4, illustrating another variation to the pass- through of FIG. 5.
[19] FIG. 12 is a representation of another embodiment of the apparatus of FIG. 3, illustrating Voice over IP terminals as the transmitting and receiving devices.
DETAILED DESCRIPTION
[20] As described in the Background, some eighth rate packets can be removed from the voice stream to improve transmission efficiency of the voice stream. Removal of eighth rate packets is also known as silence suppression. For example, a silence suppression scheme comprises rules and/or parameters which can be used to "drop" or omit one or more eighth rate (or background noise) packets from a continuous stream of packets to create a discontinuous stream. A silence suppression scheme may be used to both reduce the number of encoded packets that represent background noise which are transmitted toward or across the packet-switched network and to maintain a certain amount of background noise or "comfort noise" in the voice stream, as will be appreciated by those skilled in the art.
[21] Current implementations of transmitting devices, receiving devices, and packet-switched networks can apply silence suppression schemes over one or more portions of the packet-switched network. One example of a silence suppression scheme is a discontinuous transmission (DTX) scheme. Examples of DTX schemes are described in 3rd Generation Partnership Project 2 (3GPP2) documents C.S0076-0 and C.S0014-D (www.3gpp2.org) and also in Internet Engineering Task Force (IETF) internet standard documents RFC 3558 and RFC 4788, and further in IETF internet draft "draft- ietf-avt-rtp-evrc-nw-02" (www.ietf.org), which are incorporated herein by reference.
[22] The transmitting and/or receiving device in one example may employ 3GPP2 compatible silence suppression (e.g., prior to sending a voice stream over the packet-switched network) while the packet-switched network employs the IETF compatible (e.g., RFC 4788, IETF draft draft-ietf-avt-rtp- evrc-nw-02, or similar) silence suppression. The 3GPP2 Enhanced Variable
Rate Codec Narrowband-Wideband (EVRC-NW; described in C.S0014-D) describes a modified DTX scheme. Accordingly, a voice stream which conforms to the EVRC-NW silence suppression scheme may not conform to the RFC 4788 or similar (e.g., IETF draft-ietf-avt-rtp-evrc-nw-02) silence suppression scheme.
[23] Turning to FIG. 1 , an apparatus 100 of the prior art comprises a transmitting device 110, a packet-switched network 130, and a receiving device 140. The apparatus 100 is configured to carry a voice stream (i.e., voice bearer traffic) from a user (not shown) of the transmitting device 110 to a user (not shown) of the receiving device 140. As described above, the voice stream comprises a plurality of voice packets or frames encoded at different rates. One or more of the plurality of voice packets are forwarded from the transmitting device 110, over the packet-switched network 130, and to the receiving device 140.
[24] The transmitting device 110 and receiving device 140 in one implementation comprise mobile stations of a cellular communication network. For example, the transmitting device 110 comprises a mobile station 114 in a calling party network (or call domain) 112 and the receiving device 140 comprises a mobile station 146 in a called party network (or call domain) 142. The mobile stations 114 and 146 are configured to communicate with respective base stations 116 and 144 over respective air interfaces 118 and 148, as will be appreciated by those skilled in the art. The base stations 116 and 144 in one example comprise cellular base stations.
[25] The packet-switched network 130 in this implementation comprises a core network, transport network, back-haul network, or other packet-switched network. The packet-switched network 130 in one example comprises one or more packet switches 132 and 134. Examples of the packet switches 132 and 134 comprise packet switch gateways (PSG), media gateways (MGW), packet frame selectors, routers, and other network devices. In alternative implementations, the packet-switched network 130 comprises or is coupled with a Transcoder Free Operation (TrFO) network, such as TrFO network
160. While only two packet switches are shown, the packet-switched network 130 may comprise additional packet switches. Accordingly, the packet switches 132 and 134 may be located in an interior or on edges of the packet- switched network 130, as will be appreciated by those skilled in the art.
[26] The packet switch 132 is configured to communicate with the base station 1 16 and the TrFO network 160 over interfaces 120 and 162, respectively. The packet switch 134 is configured to communicate with the base station 144 and the TrFO network 160 over interfaces 150 and 164, respectively. Examples of the interfaces 120, 162, 150, and 164 comprise wireline, wireless, fiber optic, or other communication paths, as will be appreciated by those skilled in the art.
[27] The packet-switched network 130 in one example comprises or is coupled with one or more packet switch controllers (PSC) 136 and 137. Examples of the packet switch controllers 136 and 137 comprise an access manager, mobile switching center (MSC), or mobile switching center emulation (MSCe) component. The packet switch controllers 136 and 137 in one example are configured to manage communications between the calling party network 1 12, the called party network 142, and the packet-switched network 130. For example, the packet switch controllers 136 and 137 instruct the base stations 1 16 and 144 and packet switches 132 and 134 to create a voice bearer path for the packet-switched call, as will be appreciated by those skilled in the art. In other implementations, both packet switches 132 and 134 may be controlled by a single packet switch controller.
[28] One or more of the transmitting device 1 10, the packet-switched network 130, and the receiving device 140 are configured to employ respective silence suppression schemes. Turning to FIG. 2, a packet flow 200 of the prior art represents one example of a flow of packets for a call (e.g., voice stream) between the mobile station 1 14 and 146 for the apparatus 100 of FIG. 1 . The packet flow 200 shows a sample duration of 400 milliseconds of silence or background noise with 20 millisecond intervals between encoded packets. In this example, the call domains (e.g., mobile
stations 1 14 and 146 and base stations 1 16 and 144) are configured to forward the voice packets over interfaces 1 18, 120, 148, and 150 without a silence suppression scheme. Accordingly, eighth rate encoded packets 250, 251 , 252, ... 270 (e.g., silence frames) are forwarded from the mobile station 1 14 to the packet switch 132 via the base station 1 16.
[29] The packet switch 132 of FIG. 2 is configured to employ a silence suppression scheme across the TrFO network 160 to the packet switch 134. In one example, the silence suppression scheme of the packet switch 132 is compatible with or conforms to the RFC 4788 or similar DTX scheme (including IETF draft draft-ietf-avt-rtp-evrc-nw-02). As described in 3GPP2 C.S0076-0, the DTX scheme comprises Silence Insertion Description (SID) frames which are eighth rate packets sent during DTX periods of silence. A "guaranteed update interval" or silence suppression interval N is a number of packets from one SID frame to the next. For example, a silence suppression interval N = 5 packets means that a SID frame is guaranteed to be sent once every 5 packets. While sending of the SID frame or packet is "guaranteed," the base station or another component may lose the packet in transit, but a frame update is guaranteed by the set interval, as will be appreciated by those skilled in the art. The DTX scheme of C.S0076-0 allows for a silence suppression interval N in the range of one to 50, while the modified DTX scheme of C.S0014-D allows for a silence suppression interval N of 1 , 4, or 8.
[30] In the example of FIG. 2, the packet switch 132 employs a value N=10 for the silence suppression interval N of the silence suppression scheme. Accordingly, the packet switch 132 will forward one in ten of each eighth rate packet received from the base station 1 16. Referring to FIG. 2, the packet switch 132 forwards encoded packets 250, 260, and 270 to the packet switch 134 and "drops" or omits packets 251 through 259 and 261 through 269 during the period shown.
[31] Since the base station 144 and mobile station 146 are configured without silence suppression, the packets dropped by the packet switch 132 are replaced or substituted by the packet switch 134. For example, the
packet switch 134 repeats or copies a last received packet on the silence suppression interval (e.g., packets 250, 260, and 270) so that it appears to the mobile station 146 (and base station 144) that a continual sequence of packets is sent from the packet switch 134. Accordingly, FIG. 2 illustrates that the packet switch 134 sends ten instances of the packet 250, one "original" packet and nine additional instances or copies, as will be appreciated by those skilled in the art.
[32] Turning to FIG. 3, one implementation of an apparatus 300 comprises the components of the apparatus 100 and further comprises a silence suppression interface (SSI) component 320. The SSI component 320 in one example comprises a network infrastructure device, application server, or other computing device. In another example, the SSI component 320 comprises a processor or computer that is configured to execute software or instructions stored in a memory. For example, the SSI component 320 comprises an instance of a recordable data storage medium 322, as described herein. The SSI component 320 in one example comprises one or more parameters or rules for the first and/or second silence suppression schemes, for example, stored on the recordable data storage medium 322. The SSI component 320 in one example is configured to be updated with additional and/or newer silence suppression schemes or parameters or rules of silence suppression schemes to support alternative codecs, transmitting devices, and/or receiving devices. For example, the SSI component 320 receives one or more parameters for a third silence suppression scheme and stores the parameters in the recordable data storage medium 322. The parameters and/or silence suppression schemes may be received over a network, human interface device (e.g., keyboard or terminal), or through another instance of the recordable data storage medium 322.
[33] In the implementation of FIG. 3, the SSI component 320 is shown as a separate component from the calling party network 1 12 and the packet- switched network 130. However in alternate implementations, the SSI component 320 may be a device or component within the calling party
network 1 12 or the packet-switched network 130. The SSI component 320 may also be combined, or formed integrally with a device or component of the calling party network 1 12, called party network 142, or packet-switched network 130, such as the base stations 1 16 or 144, the packet switch 132, or the packet switch 134. In other implementations, the air interfaces 1 18 and 148 may be replaced by a wireline, fiber optic, or other communication paths. Additionally, the calling party network 1 12 and the called party network 142 may be the same network or comprise a landline network.
[34] Turning to FIG. 4, an apparatus 400 represents another implementation of the apparatus 300 and illustrates the SSI component 320 combined with the packet switch 132. In alternative implementations, the SSI component 320 may be combined with or implemented by a codec module, voice quality enhancement module (e.g., acoustic echo canceller, noise suppressor, automatic gain controller, and noise compensator), packet frame selector, network switch, network bridge, or router, as will be appreciated by those skilled in the art.
[35] The SSI component 320 is configured to mediate between first and second silence suppression schemes. For example, the SSI component 320 is configured to receive a first voice stream that conforms to the first silence suppression scheme, alter the first voice stream to create a second voice stream that conforms to the second silence suppression scheme, and forward the second voice stream, as will be appreciated by those skilled in the art. The SSI component 320 in one example is configured for bi-directional mediation between the transmitting device 1 10 and receiving device 140. For example, either device 1 10 or 140 can send voice streams and/or receive altered voice streams via the SSI component 320.
[36] Turning to FIG. 5, one example of a packet flow 500 for the implementation of FIG. 4 is shown. The SSI component 320 in this implementation (shown combined with the packet switch 132) is configured for mediating between two distinct silence suppression schemes by "passing through" each eighth rate packet that is received (with the exception of
"blanks," as described herein). In the packet flow 500, the calling party network 1 12 and called party network 142 are configured to employ Enhanced Variable Rate Codec Narrowband-Wideband (EVRC-NW) DTX streams for silence suppression with a silence suppression interval N=4, while the packet- switched network 130 and TrFO network 160 are configured to employ DTX streams in accordance with RFC 4788 or similar (e.g., IETF draft-ietf-avt-rtp- evrc-nw-02) for silence suppression with a silence suppression interval N=10. For example, the interfaces 1 18, 120, 148, and 150 (as shown in FIG. 4) are configured for EVRC-NW DTX streams and the interfaces 162 and 164 (FIG. 4) are configured for RFC 4788 or similar DTX streams.
[37] The packet flow 500 illustrates a potential problem of mediating between the calling and called party networks 1 12 and 142 using a newer silence suppression scheme while maintaining a silence suppression scheme currently used in the packet-switched network 130 between the networks 1 12 and 142. For example, a first silence suppression scheme (e.g., EVRC-NW) that employs asynchronous packets may have additional packets compared to a second silence suppression scheme (e.g., RFC 4788) that does not employ asynchronous packets.
[38] One solution to the problem is to select a subset of packets from a voice stream that conforms to the first silence suppression scheme such that the subset of packets is compatible with the second silence suppression scheme, since the concern is with silence suppression and not voice encoding. Yet another solution would be to apply the same silence suppression scheme end-to-end. However, this change in the silence suppression scheme would require considerable resources (e.g., upgrading the packet-switched network to support the newer EVRC-NW codec and silence suppression scheme).
[39] Referring to FIG. 5, the first silence suppression scheme comprises the EVRC-NW DTX scheme where the silence suppression interval N = 4. Accordingly, the mobile station 1 14 sends an eighth rate packet after every fourth encoded packet, for example, eighth rate (or SID) packets 550, 554,
558, 562, 566, and 570 are sent upon the silence suppression interval. The eighth rate packets 550, 554, 558, 562, 566, and 570 in one example comprise synchronous packets or "non-critical" packets. The EVRC-NW DTX scheme further defines "critical" or asynchronous packets which comprise eighth rate packets that identify significant changes in background noise by the mobile encoder. Critical packets can be transmitted independently of the silence suppression interval. Examples of critical packets comprise packets 556 and 568 which occur outside of the silence suppression interval. The mobile station 1 14 sends a plurality of encoded packets for the voice stream, comprising the eighth rate packets 550, 554, 556, 558, 562, 566, 568, and 570 to the base station 1 16.
[40] In the implementation of FIG. 5, the base station 1 16 is configured to forward the synchronous and asynchronous packets along with "blanks" or blanked packets 580 to the packet switch 132 (and SSI component 320). The blanks 580 in one example provide an indication that the interface 120 or "line" is still live and that an update or packet is expected. The blanks 580 in this example are not forwarded to the packet switch 134 or TrFO network 160.
[41] The packet switch 132 forwards (e.g., selects and passes through) both the synchronous and asynchronous packets to the packet switch 134 (but not the blanks 580). The packet switch 134 forwards the synchronous and asynchronous packets to the base station 144 along with repeated packets to substitute for packets dropped by the packet switch 132. For example, the packet switch 134 sends four instances of the packet 550, two instances of the packet 554, etc. The base station 144 applies a silence suppression scheme where N=4 for forwarding the voice stream to the mobile station 146.
[42] FIG. 5 is one embodiment of a solution which provides mediation between different silence suppression schemes by the SSI component 320. It shows an increase in traffic over the TrFO network 160, relative to FIG. 2, where silence is suppressed according to the RFC 4788 (or similar) DTX scheme. Further, this embodiment relies on the premise that EVRC-NW
narrowband (NB) modes are compatible with EVRC-B vocoders. Here, the mobile stations are directed to use only EVRC-NW NB modes, and wideband (WB) mode 0 is disallowed, as will be appreciated by those skilled in the art.
[43] Thus, FIG. 5 illustrates an embodiment where the silence suppression scheme employed in either calling party network 1 12 or called party network 142 is adapted for an end-to-end silence suppression scheme by the silence suppression interface component 320. However, this solution lacks the flexibility to maintain an existing configuration in the packet-switched network 130 that an operator of the packet-switched network 130 might prefer. Likewise, it is not usable where configurations (e.g., silence suppression schemes) of the calling party network 1 12, the packet-switched network 130, or called party network 142 differ. The goals and operative components of the packet-switched network 130 may differ from the goals and operative components of the calling party network 1 12 and called party network 142. Advantageously, the SSI component 320 as configured in FIGS. 6 and 8-10 allows for the calling party network 112, called party network 142, and packet- switched network 130 to each manage the voice stream for the packet- switched call independently and according to their respective operators' preference. In contrast, the configuration of FIGS. 5, 7, and 1 1 may require extra resources to provide an end to end solution.
[44] Turning to FIG. 6, a packet flow 600 for another implementation of the apparatus 300 is shown. The SSI component 320 (shown combined with the packet switch 132) is configured to mediate between the calling party network 1 12 and the packet-switched network 130. In this example, the EVRC-NW DTX scheme where N = 4 is employed by the mobile station 1 14 as a first silence suppression scheme and a second silence suppression scheme (RFC 4788 or similar where N = 10) is employed between packet switches 132 and 134. Packet flow 600 shows a 400 millisecond period of background noise through the voice stream.
[45] The packet switch 132 is configured to alter a voice stream received from the base station 1 16 according to the second silence suppression
scheme. Analogously to packet flow 500 in FIG. 5, the synchronous packets 550, 554, 558, 562, 566, and 570, the asynchronous packets 556 and 568, and the blanks 580 are forwarded to the packet switch 132 via the base station 1 16. The packet switch 132 counts the synchronous packets, asynchronous packets, and blanks in determining when to apply the silence suppression interval. The packet switch 132 in one example alters the voice stream by selecting a subset of received packets from the plurality of received packets of the voice stream. From the subset of received packets, the packet switch 132 creates a second voice stream that conforms to the silence suppression scheme of the packet-switched network 130. For example, the packets are selected by the packet switch 132 based on the parameters of the second silence suppression scheme. The packet switch 132 selects a most recently received packet on the silence suppression interval N = 10 for the packet-switched network 130, instead of the silence suppression interval N = 4 for the first silence suppression scheme, as will be appreciated by those skilled in the art.
[46] The second voice stream comprises packets 550, 558, and 570, selected by the packet switch 132. The second voice stream is forwarded via the TrFO Network 160 to the packet switch 134. The packet switch 134 forwards the second voice stream to the base station 144 along with copies of the most recently received packets (e.g., packets 550 and 558) to fill in the remaining packets of the voice stream. The base station 144 then applies a silence suppression scheme for the called party network 142 (e.g., where N = 4) to the received voice stream, as will be appreciated by those skilled in the art.
[47] Turning to FIG. 7, a packet flow 700 is similar to packet flow 500 of FIG. 5, but where the silence suppression interval of the calling party network 1 12 and called party network 142 have a guaranteed update interval of N=8. The packet switch 132 passes through packets 550, 556, 558, 566, and 568 in conformance with the silence suppression scheme of the mobile station 1 14 and base station 1 16. The packet switch 134 passes the received packets
along with copies to the base station 144. The base station 144 then applies a silence suppression scheme with N=8 for forwarding the packets to the mobile station 146. Accordingly, mobile station 146 only receives packets 550, 558, and 566 but not the critical packets 556, 568 or the copies.
[48] FIG. 8 illustrates a packet flow 800 similar to the packet flow 600 of FIG. 6. However, packet flow 800 shows an update interval N=10 for the packet-switched network 130 and an update interval of N=8 for the calling party network 1 12 and called party network 142. Accordingly, the SSI component 320 maintains the existing settings of the TrFO Network 160. Moreover, it adapts or alters the voice stream received from the base stations 1 16 by selecting the subset of received packets. The SSI component 320 facilitates transmission between networks with distinct silence suppression schemes, for example, the update intervals (or other parameters, rules, etc.) of the silence suppression schemes do not match between the calling party network 1 12 and the called party network 142.
[49] Turning to FIG. 9, a packet flow 900 is similar to packet flow 800 but modified for mismatched codecs. In adapting systems to allow silence intervals in packetized speech data to be sent at different update intervals within a network, the call domains can use different silence suppression schemes, or can use a voice encoder that does not contain a silence suppression scheme. In packet flow 900, the mobile station 1 14 and base station 1 16 employ EVRC-NW, with N=8, whereas the codec for mobile station 146 and base station 144 is EVRC-B, with no silence suppression included. Here, the called party network 142 is not using any form of silence suppression. This can occur where the receiving side is using an older service option or codec, such as EVRC-B as shown, or where silence suppression is disabled by setting N=1 .
[50] The packet switch 132 alters the voice stream received from the base station 116 by selecting a subset of the received packets 550, 556, 558, 566, and 568. The altered voice stream comprises packets 550, 558, and 568. The packets 550, 558, and 568 are forwarded to the packet switch 134, which
forwards the packets along with copies to the mobile station 146 via base station 144. Accordingly, two call domains with distinct silence suppression schemes are interfaced with a call domain using an older voice codec without a silence suppression scheme.
[51] Turning to FIG. 10, a packet flow 1000 illustrates interfacing of mismatched codecs of a voice stream from EVRC-B to EVRC-NW, where N=10 for the silence suppression scheme in the packet-switched network 130. Here, the mobile station 1 14 is using an older voice encoder (e.g., EVRC-B) and is not performing silence suppression. The mobile station 146 is using a new voice codec including a silence suppression protocol, where N=8.
[52] FIG. 10 shows a "reverse" of the codecs for the voice stream of FIG. 9, or the application of the EVRC-NW DTX scheme to a first voice stream without silence suppression in the calling party network 1 12 as shown in FIG. 2. Specifically, packet switch 132 applies silence suppression to the packets 251 through 259 and from 261 through 269 and passes packets 250, 260, and 270 to the packet switch 134. The packet switch 134 forwards the packets 250, 260, and 270 along with copies of packets 250 and 260. However in FIG. 10, the base station 144 applies the silence suppression scheme of the mobile station 146 to the voice stream.
[53] As shown in FIGS. 9 and 10, additional variations in silence suppression schemes exist and are contemplated to be within the scope of the principles herein. For example, adjustments may be made by the base stations 1 14 and 116 to agree to send the blanked packets and by the packet switches 132 and 134 being set to pass or not pass critical packets, in order to accommodate application of distinct silence suppression schemes in the SSI component 320.
[54] Turning to FIG. 1 1 , a packet flow 1 100 illustrates another variation of the "pass-through" of FIG. 5. In this example, the calling party network 1 12 employs the EVRC-NW DTX scheme with N=8. The called party network 142 is not performing silence suppression and is using the EVRC-B codec. The
packet switch 132 in this example is configured to apply a silence suppression scheme using N=10.
[55] While portions of the above description are related to, for example, mobile stations, packet-switched networks, and codecs such as EVRC, those skilled in the art will appreciate that alternate implementations and embodiments are possible. Alternate implementations of mobile stations comprise mobile phones, smart phones, voicemail servers, Voice over Internet Protocol (VoIP) terminals, personal computers, session initiation protocol (SIP) devices, telephony devices, or other devices configured to receive speech. Alternate implementations of packet-switched networks comprise local area networks, wide area networks, the Internet, other packet- switched networks and combinations thereof. Alternate implementations of codecs comprise EVRC-A, EVRC-B, EVRC-WB, EVRC-NW, Long Term Evolution (LTE) codecs, Adaptive MultiRate (AMR) codecs, and other speech or audio codecs where packets comprising background noise may be omitted. Accordingly, additional embodiments comprise wireline or landline networks, private phone networks, voicemail services, and combinations thereof. Additionally, while silence suppression schemes compatible with DTX and the above-mentioned 3GPP2 and IETF documents are described, alternative silence suppression schemes may be used with different rules, parameters, or criteria for dropping or omitting packets from the voice stream, as will be appreciated by those skilled in the art.
[56] Referring to the implementation of FIG. 3, the silence suppression interface component 320 is configured to mediate between the transmitting device 110 and the packet-switched network 130 (e.g., between first and second silence suppression schemes). In alternative implementations, the silence suppression interface component 320 is configured to mediate between one or more additional devices, networks, and/or call domains with one or more respective silence suppression schemes. In a first example, the silence suppression interface component 320 is configured to provide mediation between one silence suppression scheme and a set of a number Z
additional silence suppression schemes (e.g., "one-to-many" or 1 -to-Z mediation). In a second example, the silence suppression interface component 320 is configured to provide mediation between a second set of a number Y of silence suppression schemes and the first set of silence suppression schemes, such as Y-to-Z mediation. The silence suppression interface component 320 in one example comprises a 1 x Z or Y x Z decision table and/or matrix for mediation between multiple silence suppression schemes. Accordingly, the silence suppression interface component 320 is configured to mediate between multiple instances of the transmitting device 1 10, the receiving device 140, and/or the packet-switched network 130 with respective silence suppression schemes, as will be appreciated by those skilled in the art.
[57] Turning to FIG. 12, an apparatus 1200 represents another implementation of the apparatus 300. The apparatus 1200 comprises the transmitting device 1 10, receiving device 140, and packet-switched network 130. In this implementation, the transmitting device 1 10 and receiving device 140 comprise Voice over IP terminals 1210 and 1240, respectively. The VoIP terminals 1210 and 1240 are configured to communicate over the packet- switched network 130. The VoIP terminal 1210 is configured to employ a first silence suppression scheme and to send a voice stream conforming to the first silence suppression scheme to the packet switch 132. The packet switch 132 comprises the SSI component 320 and is configured to alter the voice stream to conform to a second silence suppression scheme. Accordingly, the packet switch 132 selects a subset of packets from the voice stream to create a second voice stream that conforms to a second silence suppression scheme. The packet switch 134 is configured to replace or substitute one or more packets into the second voice stream before playback at the VoIP terminal 1240, as will be appreciated by those skilled in the art.
[58] The apparatus 300 in one example employs one or more non-transitory processor-readable media. The non-transitory processor-readable media store software (e.g., compiled or interpreted code), firmware and/or assembly
language for performing (e.g., by a processor or computer) one or more portions of one or more implementations of the invention. Examples of a non- transitory processor-readable medium for the apparatus 300 comprise the recordable data storage medium 322 of the SSI component 320. The non- transitory processor-readable media for the apparatus 300 in one example comprise one or more of a magnetic, electrical, optical, biological, and atomic data storage medium. For example, the non-transitory processor-readable media comprise floppy disks, magnetic tapes, CD-ROMs, DVD-ROMs, hard disk drives, and electronic memory. In another example, the non-transitory processor-readable media comprise removable or portable devices, such as flash memory drives.
[59] The apparatus 300 in one example comprises a plurality of components such as one or more of electronic components, hardware components, and computer software components. A number of such components can be combined or divided in the apparatus 300. An example component of the apparatus 300 employs and/or comprises a set and/or series of computer instructions written in or implemented with any of a number of programming languages, as will be appreciated by those skilled in the art.
[60] The steps or operations described herein are just for example. There may be many variations to these steps or operations without departing from the spirit of the invention. For instance, the steps may be performed in a differing order, or steps may be added, deleted, or modified.
[61] Although example implementations of the invention have been depicted and described in detail herein, it will be apparent to those skilled in the relevant art that various modifications, additions, substitutions, and the like can be made without departing from the spirit of the invention and these are therefore considered to be within the scope of the invention as defined in the following claims.
Claims
1. A method, comprising the steps of:
receiving a first voice stream for a packet-switched call from a calling party, wherein the first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call;
selecting a subset of encoded packets from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme, wherein the second voice stream comprises the subset of encoded packets, wherein the first silence suppression scheme is distinct from the second silence suppression scheme; and
forwarding the second voice stream toward a called party for the packet-switched call.
2. The method of claim 1 , wherein the first voice stream comprises a first discontinuous transmission (DTX) stream, wherein the second voice stream comprises a second DTX stream;
wherein the step of selecting the subset of encoded packets comprises the step of:
selecting the subset of encoded packets according to the second silence suppression scheme, such that the second DTX stream conforms to the second silence suppression scheme.
3. The method of claim 2, wherein the first and second silence suppression schemes comprise respective first and second silence suppression intervals;
wherein the plurality of encoded packets comprise at least one of:
a synchronous packet, sent upon the first silence suppression interval according to the first silence suppression scheme, or
an asynchronous packet, sent independently of the first silence suppression interval according to the first silence suppression scheme; wherein the step of selecting the subset of encoded packets comprises the step of:
selecting, upon the second silence suppression interval, a most recently received packet of the at least one of the synchronous packet or the asynchronous packet, wherein the subset of encoded packets comprises the most recently received packet;
wherein the step of forwarding the second DTX stream comprises the step of:
forwarding the most recently received packet upon the second silence suppression interval.
4. The method of claim 3, wherein the first silence suppression interval is distinct from the second silence suppression interval.
5. The method of claim 3, wherein the first silence suppression scheme conforms to an Enhanced Variable Rate Codec Narrowband- Wideband (EVRC-NW) standard;
wherein the synchronous packet comprises a rate 1/8th non-critical packet of the EVRC-NW standard;
wherein the second silence suppression scheme conforms to an Enhanced Variable Rate Codec B (EVRC-B) standard.
6. The method of claim 2, further comprising the steps of:
receiving one or more parameters of a third silence suppression scheme that is distinct from the first and second silence suppression schemes;
storing the one or more parameters of the third silence suppression scheme; and
selecting the subset of encoded packets based on the one or more parameters of the third silence suppression scheme.
7. An apparatus, comprising:
a silence suppression interface component configured to receive a first voice stream for a packet-switched call from a transmitting device;
wherein the first voice stream conforms to a first silence suppression scheme and comprises a plurality of encoded packets for the packet-switched call;
wherein the silence suppression interface component is configured to select a subset of encoded packets from the plurality of encoded packets to create a second voice stream that conforms to a second silence suppression scheme;
wherein the second voice stream comprises the subset of encoded packets;
wherein the first silence suppression scheme is distinct from the second silence suppression scheme;
wherein the silence suppression interface component is configured to forward the second voice stream toward a receiving device for the packet- switched call.
8. The apparatus of claim 7, wherein the first voice stream comprises a first discontinuous transmission (DTX) stream;
wherein the second voice stream comprises a second DTX stream.
9. The apparatus of claim 8, wherein the first and second silence suppression schemes comprise respective first and second silence suppression intervals;
wherein the plurality of encoded packets comprise at least one of:
a synchronous packet, sent upon the first silence suppression interval according to the first silence suppression scheme, or
an asynchronous packet, sent independently of the first silence suppression interval according to the first silence suppression scheme; wherein the silence suppression interface component is configured to select a most recently received packet of the at least one of the synchronous packet or the asynchronous packet upon the second silence suppression interval.
10. The apparatus of claim 7, wherein the silence suppression interface component is configured to mediate between a first set of a number Y silence suppression schemes and a second set of a number Z silence suppression schemes;
wherein the first set of silence suppression schemes comprises the first silence suppression scheme;
wherein the second set of silence suppression schemes comprises the second silence suppression scheme.
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014512841A JP5868496B2 (en) | 2011-05-24 | 2012-04-16 | Selection of encoded packets from the first audio stream to create a second audio stream |
CN201280025092.8A CN103548141B (en) | 2011-05-24 | 2012-04-16 | Coding groups is selected to create the second voice flow from the first voice flow |
EP12718507.2A EP2715793B1 (en) | 2011-05-24 | 2012-04-16 | Encoded packet selection from a first voice stream to create a second voice stream |
KR1020137030870A KR101502315B1 (en) | 2011-05-24 | 2012-04-16 | Encoded packet selection from a first voice stream to create a second voice stream |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/068,924 | 2011-05-24 | ||
US13/068,924 US8751223B2 (en) | 2011-05-24 | 2011-05-24 | Encoded packet selection from a first voice stream to create a second voice stream |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2012161887A1 true WO2012161887A1 (en) | 2012-11-29 |
Family
ID=46025937
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2012/033742 WO2012161887A1 (en) | 2011-05-24 | 2012-04-16 | Encoded packet selection from a first voice stream to create a second voice stream |
Country Status (6)
Country | Link |
---|---|
US (1) | US8751223B2 (en) |
EP (1) | EP2715793B1 (en) |
JP (1) | JP5868496B2 (en) |
KR (1) | KR101502315B1 (en) |
CN (1) | CN103548141B (en) |
WO (1) | WO2012161887A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1475929A1 (en) * | 2003-05-07 | 2004-11-10 | Lucent Technologies Inc. | Control component removing encoded frames from isochronous telecommunication stream |
US20080027717A1 (en) * | 2006-07-31 | 2008-01-31 | Vivek Rajendran | Systems, methods, and apparatus for wideband encoding and decoding of inactive frames |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
FI105001B (en) * | 1995-06-30 | 2000-05-15 | Nokia Mobile Phones Ltd | Method for Determining Wait Time in Speech Decoder in Continuous Transmission and Speech Decoder and Transceiver |
JP4518714B2 (en) * | 2001-08-31 | 2010-08-04 | 富士通株式会社 | Speech code conversion method |
WO2006136901A2 (en) * | 2005-06-18 | 2006-12-28 | Nokia Corporation | System and method for adaptive transmission of comfort noise parameters during discontinuous speech transmission |
EP2276023A3 (en) * | 2005-11-30 | 2011-10-05 | Telefonaktiebolaget LM Ericsson (publ) | Efficient speech stream conversion |
US8532984B2 (en) * | 2006-07-31 | 2013-09-10 | Qualcomm Incorporated | Systems, methods, and apparatus for wideband encoding and decoding of active frames |
US8913512B2 (en) * | 2008-10-16 | 2014-12-16 | Telefonaktiebolaget Lm Ericsson (Publ) | Telecommunication apparatus, method, and computer program controlling sporadic data transmissions |
-
2011
- 2011-05-24 US US13/068,924 patent/US8751223B2/en not_active Expired - Fee Related
-
2012
- 2012-04-16 CN CN201280025092.8A patent/CN103548141B/en not_active Expired - Fee Related
- 2012-04-16 EP EP12718507.2A patent/EP2715793B1/en not_active Not-in-force
- 2012-04-16 KR KR1020137030870A patent/KR101502315B1/en active IP Right Grant
- 2012-04-16 JP JP2014512841A patent/JP5868496B2/en not_active Expired - Fee Related
- 2012-04-16 WO PCT/US2012/033742 patent/WO2012161887A1/en unknown
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1475929A1 (en) * | 2003-05-07 | 2004-11-10 | Lucent Technologies Inc. | Control component removing encoded frames from isochronous telecommunication stream |
US20080027717A1 (en) * | 2006-07-31 | 2008-01-31 | Vivek Rajendran | Systems, methods, and apparatus for wideband encoding and decoding of inactive frames |
Non-Patent Citations (1)
Title |
---|
FANG QUALCOMM Z: "RTP payload format for Enhanced Variable Rate Narrowband-Wideband Codec (EVRC-NW); draft-ietf-avt-rtp-evrc-nw-02.txt", RTP PAYLOAD FORMAT FOR ENHANCED VARIABLE RATE NARROWBAND-WIDEBAND CODEC (EVRC-NW); DRAFT-IETF-AVT-RTP-EVRC-NW-02.TXT, INTERNET ENGINEERING TASK FORCE, IETF; STANDARDWORKINGDRAFT, INTERNET SOCIETY (ISOC) 4, RUE DES FALAISES CH- 1205 GENEVA, SWITZERLAN, no. 2, 30 November 2010 (2010-11-30), pages 1 - 28, XP015072770 * |
Also Published As
Publication number | Publication date |
---|---|
US8751223B2 (en) | 2014-06-10 |
KR20140002068A (en) | 2014-01-07 |
KR101502315B1 (en) | 2015-03-13 |
CN103548141B (en) | 2016-08-17 |
EP2715793B1 (en) | 2018-04-04 |
JP2014518054A (en) | 2014-07-24 |
JP5868496B2 (en) | 2016-02-24 |
EP2715793A1 (en) | 2014-04-09 |
CN103548141A (en) | 2014-01-29 |
US20120303364A1 (en) | 2012-11-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10645198B2 (en) | Communication terminal and communication method | |
Singh et al. | VoIP: State of art for global connectivity—A critical review | |
JP5112447B2 (en) | Announcement Media Processing in Communication Network Environment | |
CN110089092A (en) | Assist the network core of terminal interoperability | |
BR112019019144A2 (en) | target sample generation | |
US10535359B2 (en) | Hybrid RTP payload format | |
WO2008069722A2 (en) | Receiver actions and implementations for efficient media handling | |
CN109891917A (en) | It is interoperated using network-terminal of compatible payload | |
JP2012105211A (en) | Core network and communication system | |
JP2012105210A (en) | Core network and communications system | |
US7701980B1 (en) | Predetermined jitter buffer settings | |
WO2014207978A1 (en) | Transmission device, receiving device, and relay device | |
WO2015014409A1 (en) | Transcription of communication sessions | |
US20130155924A1 (en) | Coded-domain echo control | |
EP2715793B1 (en) | Encoded packet selection from a first voice stream to create a second voice stream | |
JP2012105212A (en) | Core network and communication system | |
WO2014142295A1 (en) | Media communication system, bitrate control method and computer readable information recording medium | |
US20180020026A1 (en) | Method and system for providing lawful interception in a peer to peer communication | |
CN109314886A (en) | The dedicated radio link adaptation of codec | |
Ali et al. | Reliability analysis of VoIP system | |
CN102142930B (en) | Rate adjustment method and equipment used for TrFO voice calling switching | |
US9319874B2 (en) | Automatic channel pass-through | |
US8873669B1 (en) | Adaptable codec for selecting encoding scheme based on quality of experience |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 12718507 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 20137030870 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2014512841 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |