EP3228096B1 - Audioanschluss - Google Patents

Audioanschluss Download PDF

Info

Publication number
EP3228096B1
EP3228096B1 EP14777648.8A EP14777648A EP3228096B1 EP 3228096 B1 EP3228096 B1 EP 3228096B1 EP 14777648 A EP14777648 A EP 14777648A EP 3228096 B1 EP3228096 B1 EP 3228096B1
Authority
EP
European Patent Office
Prior art keywords
audio
binaural
channel
terminal
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
EP14777648.8A
Other languages
English (en)
French (fr)
Other versions
EP3228096A1 (de
Inventor
Detlef Wiese
Lars IMMISCH
Hauke Krüger
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Binauric Se
Original Assignee
Binauric Se
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Binauric Se filed Critical Binauric Se
Publication of EP3228096A1 publication Critical patent/EP3228096A1/de
Application granted granted Critical
Publication of EP3228096B1 publication Critical patent/EP3228096B1/de
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2420/00Details of connection covered by H04R, not provided for in its groups
    • H04R2420/07Applications of wireless loudspeakers or wireless microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • H04R5/04Circuit arrangements, e.g. for selective connection of amplifier inputs/outputs to loudspeakers, for loudspeaker detection, or for adaptation of settings to personal preferences or hearing impairments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/15Aspects of sound capture and related signal processing for recording or reproduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2420/00Techniques used stereophonic systems covered by H04S but not provided for in its groups
    • H04S2420/01Enhancing the perception of the sound image or of the spatial distribution using head related transfer functions [HRTF's] or equivalents thereof, e.g. interaural time difference [ITD] or interaural level difference [ILD]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic
    • H04S3/002Non-adaptive circuits, e.g. manually adjustable or static, for enhancing the sound image or the spatial distribution
    • H04S3/004For headphones

Definitions

  • the present invention generally relates to the field of audio data processing. More particularly, the present invention relates to an audio terminal.
  • Everybody uses a telephone - either using a wired telephone connected to the well-known PSTN (Public Switched Telephone Network) via cable or a modern mobile phone, such as a smartphone, which is connected to the world via wireless connections based on, e.g., UMTS (Universal Mobile Telecommunications System).
  • PSTN Public Switched Telephone Network
  • UMTS Universal Mobile Telecommunications System
  • VoIP voice over internet protocol
  • Companies such as Skype or Google offer services which employ novel speech codecs offering so-called HD-Voice quality (see, e.g., 3GPP TS 26.190, "Speech codec speech processing functions; Adaptive Multi-Rate - Wideband (AMR-WB) speech codec; Transcoding functions", 3GPP Technical Specification Group Services and System Aspects, 2001 ; ITU-T G.722.2, "Wideband coding of speech at around 16 kbit/s using Adaptive Multi-Rate Wideband (AMR-WB)", 2002 ).
  • AMR-WB Adaptive Multi-Rate Wideband
  • speech signals cover a frequency bandwidth between 50 Hz and 7 kHz (so-called “wideband speech”) and even more, for instance, a frequency bandwidth between 50 Hz and 14 kHz (so-called “super-wideband speech”) (see 3GPP TS 26.290, "Audio codec processing functions; Extended Adaptive Multi-Rate - Wideband (AMR-WB+) codec; Transcoding functions", 3GPP Technical Specification Group Services and System Aspects, 2005 ) or an even higher frequency bandwidth (e.g., "full band speech”).
  • AMR-WB+ Extended Adaptive Multi-Rate - Wideband
  • Audio-3D - also denoted as binaural communication - is expected by the present inventors to be the next emerging technology in communication.
  • the benefit of Audio-3D in comparison to conventional (HD-)Voice communication lies in the use of a binaural instead of a monaural audio signal. Audio contents will be captured and played back by novel binaural terminals involving two microphones and two speakers, yielding an acoustical reproduction that better resembles what the remote communication partner really hears.
  • binaural telephony is "listening to the audio ambience with the ears of the remote speaker ", wherein the pure content of the recorded speech is extended by the capturing of the acoustical ambience.
  • the virtual representation of room acoustics in binaural signals is, preferably, based on differences in the time of arrival of the signals reaching the left and the right ear as well as attenuation and filtering effects caused by the human head, the body and the ears allowing the location of sources also in vertical direction.
  • Audio-3D is expected to represent the first radical change of the more than 100 years known old form of audio communication, which the society has named telephone or phoning. It targets particularly a new mobile type of communication which may be called "audio portation".
  • Audio portation a new mobile type of communication which may be called "audio portation”.
  • everybody being equipped with a future binaural terminal equipment as well as a smartphone app to handle the communication will be able to effectively capture the acoustical environment, i.e., the acoustical events of real life, preferably, as they are perceived with the two ears of the user, and provide them as captured, like a listening picture, to another user, anywhere in the world.
  • Audio-3D communication partners will no longer feel distant (or, at least, less distant) and, eventually, it will even result in a reduction of traveling - which indeed is an intelligent economic and ecological approach.
  • the present invention has been made in view of the above situation and considerations and embodiments of the present invention aim at providing technology that may be used in various Audio-3D usage scenarios.
  • the term "binaural” or “binaurally” is not used in an as strict sense as in some publications, where only audio signals captured with an artificial head (also called “ Kunststoffkopf” ) are considered truly binaural. Rather the term is used here for audio any signals that compared to a conventional stereo signal more closely resemble the acoustical ambience as it would be perceived by a real human. Such audio signals may be captured, for instance, by the audio terminals described in more detail in sections 3 to 9 below.
  • WO 2012/061148 A1 discloses systems, methods and apparatuses for detecting head movement based on recorded sound signals.
  • US 2011/0280409 A1 discloses that a personalized hearing profile is generated for an ear-level device comprising a memory, microphone, speaker and processor. Communication is established between the ear-level device and a companion device, having a user interface. A frame of reference in the user interface is provided, where positions in the frame of reference are associated with sound profile data. A position on the frame of reference is determined in response to user interaction with the user interface, and certain sound profile data associated with the position. Certain data is transmitted to the ear level device. Sound can be generated through the speaker based upon the audio stream data to provide real-time feedback to the user. The determining and transmitting steps are repeated until detection of an end event.
  • an audio system according to claim 1 is presented.
  • the conference bridge is adapted to monaurally mix the multi-channel audio data streamed from the first and the second audio terminal to the multi-channel audio data streamed from the third audio terminal to generate the multi-channel audio mix.
  • the conference bridge is further adapted to spatially position the monaurally mixed multi-channel audio data streamed from the first and the second audio terminal when generating the multi-channel audio mix.
  • the audio system further comprises a telephone comprising a microphone and a speaker
  • the conference bridge is further connectable with the telephone, wherein the conference bridge is adapted to mix the multi-channel audio data streamed from the first and the second audio terminal into a single-channel audio mix comprising a single audio channel and to stream the single-channel audio mix to the telephone.
  • FIG. 1 A basic configuration of an audio terminal 100 that may be used for Audio-3D is schematically and exemplarily shown in Fig. 1 .
  • the audio terminal 100 comprises a first device 10 and a second device 20 which is separate from the first device 10.
  • the first device 10 there are provided a first and a second microphone 11, 12 for capturing multi-channel audio data comprising a first and a second audio channel.
  • the second device 20 there is provided a communication unit 21 for, here, voice and data communication.
  • the first and the second device 10, 20 are adapted to be connected with each other via a local wireless transmission link 30.
  • the first device 10 is adapted to stream the multi-channel audio data, i.e., the data comprising the first and the second audio channel, to the second device 20 via the local wireless transmission link 30 and the second device 20 is adapted to receive and process and/or store the multi-channel audio data streamed from the first device 10.
  • the first device 10 is an external speaker/microphone apparatus as described in detail in the unpublished International patent application PCT/EP2013/067534, filed on 23 August 2013 .
  • it comprises a housing 17 that is formed in the shape of a (regular) icosahedron, i.e.,a polyhedron with 20 triangular faces.
  • Such an external speaker/microphone apparatus in this specification also designated as a "speakerbox”, is marketed by the company Binauric SE under the name "BoomBoom”.
  • the first and the second microphone 11, 12 are arranged at opposite sides of the housing 17, at a distance of, for example, about 12.5 cm.
  • the multi-channel audio data captured by the two microphones 11, 12 can more closely resemble the acoustical ambience as it would be perceived by a real human (compared to a conventional stereo signal).
  • the audio terminal 100 here, in particular, the first device 10, further comprises a first and a second speaker 15, 16 for playing back multi-channel audio data comprising at least a first and a second audio channel.
  • the audio terminal 10 is adapted to stream the multi-channel audio data from the second device 20 to the first device 10 via a local wireless transmission link, for instance, a transmission link complying with the Bluetooth standard, preferably, the current Bluetooth Core Specification 4.1.
  • the second device 20, here, is a smartphone, such as an Apple iPhone or a Samsung Galaxy.
  • the data communication unit 21 supports voice and data communication via one or more mobile communication standards, such as GSM (Global System for Mobile Communication), UMTS (Universal Mobile Telecommunication terminal) or LTE (Long-Term Evolution). Additionally, it may support one or more further network technologies, such as WLAN (Wireless LAN).
  • GSM Global System for Mobile Communication
  • UMTS Universal Mobile Telecommunication terminal
  • LTE Long-Term Evolution
  • WLAN Wireless LAN
  • the audio terminal 100 here, in particular, the first device 10, further comprises a third and a fourth microphone 13, 14 for capturing further multi-channel audio data comprising a third and a fourth audio channel.
  • the third and the fourth microphone 13, 14 are provided on a same side of the housing 17, at a distance of, for example, about 1.8 cm.
  • these microphones can be used to better classify audio capturing situations (e.g., the direction of arrival of the audio signals) and may thereby support stereo enhancement.
  • the third and the fourth microphone 13, 14 of each of the two speakerboxes may be used to locate the position of the speakerboxes for allowing True Wireless Stereo in combination with stereo crosstalk cancellation (see below for details).
  • Further options for using the third and the fourth microphone 13, 14 are to preferably capture the acoustical ambience for reducing background noise with noise cancelling algorithm (near speaker to far speaker), to measure the ambience volume level for adjusting the playback level (loudness of music, voice prompts and far speaker) to a convenient listening level automatically, to a lower volume late at night in bedroom, or to a loud playback in noise environment, and/or to detect the direction of sound sources (for example, a beamformer could focus on near speakers and attenuate unwanted sources more efficiently).
  • background noise with noise cancelling algorithm near speaker to far speaker
  • the ambience volume level for adjusting the playback level (loudness of music, voice prompts and far speaker) to a convenient listening level automatically, to a lower volume late at night in bedroom, or to a loud playback in noise environment
  • the direction of sound sources for example, a beamformer could focus on near speakers and attenuate unwanted sources more efficiently.
  • the local wireless transmission link 30, here is a transmission link complying with the Bluetooth standard, preferably, the current Bluetooth Core Specification 4.1.
  • the standard provides a large number of different Bluetooth "profiles" (currently over 35), which are specifications regarding a certain aspect of a Bluetooth-based wireless communication between devices.
  • One of the profiles is the so-called Advanced Audio Distribution Profile (A2DP), which describes how stereo-quality audio data can be streamed from an audio source to an audio sink. This profile could, in principle, be used to also stream binaurally recorded audio data.
  • A2DP Advanced Audio Distribution Profile
  • HFP Hands-Free Profile
  • the multi-channel audio data are streamed according to the present invention using the Bluetooth Serial Port Profile (SPP) or the iPod Accessory Protocol (iAP).
  • SPP defines how to set up virtual serial ports and connect two Bluetooth enabled devices. It is based on 3GPP TS 07.10, "Terminal Equipment to Mobile Station (TE-MS) multiplexer protocol", 3GPP Technical Specification Group Terminals, 1997 and the RFCOMM protocol. It basically emulates a serial cable to provide a simple substitute for existing RS-232, including the control signals known from that technology.
  • SPP is supported, for example, by Android based smartphones, such as a Samsung Galaxy. For iOS based devices, such as the Apple iPhone, iAP provides a similar protocol that is likewise based on both 3GPP TS 70.10 and RFCOMM.
  • the synchronization between the first and the second audio channel is as much as possible kept during the transmission, since any synchronization problems may destroy the binaural cues or at least lead to the impression of moving audio sources. For instance, at a sampling rate of 48 kHz, the delay between the left and the right ear is limited to about 25 to 30 samples if the audio signal arrives from one side.
  • one preferred solution is to transmit synchronized audio data from each of the first and the second channel together in the same packet, ensuring that the synchronization between the audio data is not lost during transmission.
  • samples from the first and the second audio channel may preferably be packed into one packet for each segment, hence, there is no chance of deviation Moreover, it is preferred that the audio data of the first and the second audio channel are generated by the first and the second microphone 11, 12 on the basis of the same clock or a common clock reference in order to ensure a substantially zero sample rate deviation.
  • SBC Low Complexity Subband Coding
  • the first device 10 is an external speaker/microphone apparatus, which comprises a housing 17 that is formed in the shape of a (regular) icosahedron.
  • the first device 10 may also be something else.
  • the shape of the housing may be formed in substantially a U-shape for being worn by a user on the shoulders around the neck, in this specification also designated as a "shoulderspeaker" (not shown in the figures).
  • at least a first and a second microphone for capturing multi-channel audio data comprising a first and a second audio channel may be provided at the sides of the "legs" of the U-shape, at a distance of, for example, about 20 cm.
  • the first device may be an external speaker/microphone apparatus that is configured as an over- or on-the-ear headset, as an in-ear phone or that is arranged on glasses worn by the user.
  • the captured multi-channel audio data comprising a first and a second audio channel may provide a better approximation of what a real human would here than a conventional stereo signal, wherein the resemblance may become particularly good if the microphones are arranged as close as possible to (or even within) the ears of the user, as it is possible, e.g., with headphones and in-ear phones.
  • the microphones may preferably be provided with structures that resemble the form of the human outer and/or inner ears.
  • the audio terminal 100 here, in particular, the first device 10, may also comprise an accelerometer (not shown in the figures) for measuring an acceleration and/or gravity thereof.
  • the audio terminal 100 is preferably adapted to control a function in dependence of the measured acceleration and/or gravity. For instance, it can be foreseen that the user can power up (switch on) the first device 10 by simply shaking it.
  • the audio terminal 100 can also be adapted to determine a misplacement thereof in dependence of the measured acceleration and/or gravity. For instance, it can be foreseen that the audio terminal 100 can determine whether the first device 10 is placed with an orientation that is generally suited for providing a good audio capturing performance.
  • the audio terminal 100 may comprise, in some scenarios, at least one additional one of the second device (shown in a smaller size at the top of the figure), or, more generally, at least one further speaker for playing back audio data comprising at least a first audio channel provided in a device that is separate from the first device 10.
  • the second device 20 is a smartphone, it may also be, for example, a tablet PC, a stationary PC or a notebook with WLAN support, etc.
  • the audio terminal 100 preferably allows over-the-air flash updates and device control of the first device 10 from the second device 20 (including updates for voice prompts used to notify status information and the like to a user) over a reliable Bluetooth protocol.
  • a reliable Bluetooth protocol For an Android based smartphone, such as a Samsung Galaxy, a custom RFCOMM Bluetooth service will preferably be used.
  • an iOS based device such as the Apple iPhone, the External Accessory Framework is preferably utilized. It is foreseen that the first device 10 supports at most two simultaneous control connections, be it to an Android based device or an iOS based device. If both are already connected, further control connections will preferably be rejected.
  • the iOS Extended Accessory protocol identifier may, for example, be a simple string like com.binauric.bconfig.
  • a custom service UUID of, for example, 0x5dd9a71c3c6341c6a3572929b4da78b1 may be used.
  • the speakerbox here, comprises a virtual machine (VM) application executing at least part of the operations as well as one or more flash memories.
  • VM virtual machine
  • each message consists of a tag (16 bit, unsigned), followed by a length (16 bit, unsigned) and then the optional payload.
  • the length is always the size of the entire payload in bytes, including the TL header. All integer values are preferably big-endian.
  • the OTA control operations preferably start at "Hash-Request” and work on 8 Kbyte sectors.
  • the protocol is inspired by rsync: Before transmitting flash updates, applications should compute the number of changed sectors by retrieving the hashes of all sectors, and then only transmit sectors that need updating.
  • Flash updates go to a secondary flash memory which only once confirmed to be correct is used to update the primary flash.
  • Table 1 Requests and responses Request/Response Tag Payload Comment STATUS_REQUEST 1 - STATUS_RESPONSE 2 + Signal strength, battery level, and accelerometer data VERSION_REQUEST 3 - VERSION_RESPONSE 4 + Firmware and prompt versions, language and variant SET_NAME_REQUEST 5 + Set the name of the device SET_NAME_RESPONSE 6 + STEREO _ PAIR REQUEST 16 + Start special pairing for stereo operation (master) STEREO_ PAIR RESPONSE 17 + STEREO_UNPAIR_REQUEST 18 - Remove special stereo pairing (should be sent to master and slave) STEREO_UNPAIR_RESPONSE 19 + COUNT_PAIRED_DEVICE_REQUEST 128 - COUNT_PAIRED_DEVICE_RESPONSE 129 + Returns the number of devices in the paired device list.
  • BINAURAL RECORD AUDIO_RESPONSE_packets BINAURAL_RECORD_START_RESPONSE 401 + BINAURAL_RECORD_ STOP_ REQUEST 402 - Stops the automatic sending of binaural audio packets BINAURAL_RECORD_STOP_RESPONSE 403 + BINAURAL_RECORD AUDIO_ RESPONSE 405 + Unsolicited packets containing SBC encoded audio data.
  • Table 2 enumerates the status codes.
  • Table 2 Status codes Name Code SUCCESS 0 INVALID_ARGUMENT 1 NOT_ SUPPORTED 3 OTA ERROR 4 PAIRED_ DEV DELETE_ FAIL 5 STATUS_VALUE_READ_FAIL 6 OTA ERROR LOW BATTERY 7 OPERATION_NOT_ALLOWED 8 UNKNOWN_ERROR 0xFFFF
  • Table 3 illustrates the response to the STATUS_REQUEST, which has no parameters. It returns the current signal strength, battery level and accelerometer data.
  • Table 3 STATUS_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes Signal strength uint16 The signal strength Battery level uint16 The battery level as integer percent value between 0 and 100.
  • Charging status uint16 One of: • power_charger_disconnected (0) • power_charger_disabled (1) • power_charger_trickle (2) • power_charger_fast (3) • power_charger_boost_internal (4) • power_charger_boost_external (5) • power_charger_complete (6) Accelerometer x-value int16 Acceleration in thousands of a g. Accelerometer y-value int16 See above Accelerometer z-value int16 See above
  • Table 4 illustrates the response to the VERSION_REQUEST, which has no parameters. All strings in this response need only be null terminated if their values are shorter than their maximum length.
  • Table 4 VERSION_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes Version 40 byte string The git tag (or hash if the version is not tagged).
  • Prompt version 40 40 byte string
  • Prompt variant 32 byte string The name of the variant (the name of the speaker or more generally, the name of the prompt set)
  • Table 5 illustrates the SET_NAME_REQUEST.
  • This request allows setting the name of the speaker.
  • Table 5 SET_NAME_REQUEST Parameter Name Size Comment Name 31 byte string New name of the speaker
  • Table 6 illustrates the response to the SET_NAME_REQUEST.
  • Table 6 SET_NAME_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes
  • Table 7 illustrates the STEREO_PAIR_REQUEST. This request initiates the special pairing with two speakerboxes for stereo mode (True Wireless Stereo; TWS), which will be described in more detail below. It needs to be sent to both speakerboxes, in different roles. The decision which speakerbox is master, and which is slave is arbitrary. The master device will become the right channel. Table 7: STEREO_PAIR_REQUEST Parameter Name Size Comment BT address 6 byte, Bluetooth address The address of the device to pair with Role uint16 0: slave 1: master
  • Table 8 illustrates the response to the STEREO_PAIR_REQUEST, which has no parameters.
  • Table 8 STEREO_PAIR_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes
  • Table 9 illustrates the response to the STEREO_UNPAIR_REQUEST, which has no parameters. It must be sent to both the master and the slave.
  • Table 9 STEREO_UNPAIR_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes
  • Table 10 illustrates the response to the COUNT_PAIRED_DEVICE_REQUEST, which has no parameters. It returns the number of paired devices.
  • Table 10 COUNT_PAIRED_DEVICE_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes Count uint16 The number of paired devices
  • Table 11 illustrates the PAIRED_DEVICE_REQUEST. It allows requesting information about a paired device from the speakerbox.
  • Table 11 COUNT_PAIRED_DEVICE_RESPONSE Parameter Name Size Comment Index uint16 Index of the paired device to retrieve information on. Must be between 0 and count-1.
  • Table 12 illustrates the response to the PAIRED_DEVICE_REQUEST.
  • the smartphone's app needs to send this request for each paired device it is interested in. If for some reason the read of the requested information fails the speakerbox will return a PAIRED_DEVICE_RESPONSE with just the status field. The remaining fields specified below will not be included in the response packet. Therefore the actual length of the packet will vary depending on whether the required information can be supplied.
  • Table 12 PAIRED_DEVICE_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes Index uint16 The index of the paired device BT address 6 byte, Bluetooth address The address of the paired device Device class 4 bytes The most significant byte is zero, followed by 3 bytes with the BT Class of Device. If the class of device for that device is not available, it will be set to all zeroes. Name 31 byte string, null terminated The name of the device that was used for pairing
  • Table 13 illustrates the DELETE_PAIRED_DEVICE_REQUEST. It allows deleting paired devices from the speakerbox. It is permissible to delete the currently connected device, but this will make it necessary to pair with the current device again the next time the user connects to it. If no Bluetooth address is included in this request, all paired devices will be deleted. Table 13: DELETE_PAIRED_DEVICE_REQUEST Parameter Name Size Comment Device Single 6 byte Bluetooth address Bluetooth addresses of device to be removed from paired device list.
  • Table 14 illustrates the response to the DELETE_PAIRED_DEVICE_REQUEST.
  • Table 14 DELETE_PAIRED_DEVICE_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes
  • Table 15 illustrates the ENTER_OTA_MODE_RESPONSE. It will put the device in OTA mode. The firmware will drop all other profile links, thus stopping e.g. the playback of music.
  • Table 16 illustrates the EXIT_OTA_MODE_REQUEST. If the payload of the request is non-zero in length, the requester wants to write the new flash contents to the primary flash. To avoid bricking the device, this operation must only succeed if the flash image hash can be validated. If the payload of the request is zero in length, the requester just wants to exit the OTA mode and continue without updating any flash contents. Table 16: EXIT_OTA_MODE_REQUEST Parameter Name Type Comment Complete flash image hash uint64 A 64-bit hash of the complete 15.69 Mbit of flash. This is an extra sanity check. If the hash doesn't match then the primary flash update operation will not occur. If this isn't present then the OTA mode is exited without updating flash contents.
  • Table 17 illustrates the response to the EXIT_OTA_MODE_REQUEST.
  • Table 17 EXIT_OTA_MODE_RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes OTA_ERROR returned if the hash does not match. SUCCESS returned for "matching hash", and also for "exit OTA mode and continue without updating flash contents"
  • the EXIT_OTA_COMPLETE_REQUEST will shut down the Bluetooth transport link, and kick the PIC to carry out the CSR8670 internal flash update operation. This message will only be acted upon if it follows from an EXIT_OTA_MODE_RESPONSE with SUCCESS "matching hash”.
  • Table 18 illustrates the HASH_REQUEST. It requests the hash values for a number of sectors. The requester should not request more sectors than can fit in a single response packet.
  • Table 18 HASH_REQUEST Parameter Name Type Comment Sector Map uint256 A bit field of 251 bits (1 bit per flash sector). Sectors are 8kByte in size. A bit being set indicates that a hash for that sector is requested. Note: Bit 0 of the 32 nd byte equates to sector 0.
  • Table 19 illustrates the response to the HASH_REQUEST.
  • Table 19 HASH_RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes Hashes Array of uint64 hash values
  • Table 20 illustrates the READ_REQUEST. It requests a read of the data from flash. Each sector will be read in small chunks so as not to exceed the maximum response packet size of 128.
  • Table 20 READ_REQUEST Parameter Name Type Comment Sector number uint16 The sector (0-250) where the data should be read from. Each sector is 8kByte in size Sector Offset uint16 An offset from the start of the sector to read from. This offset is in units of uint16's. Count uint16 Number of words (uint16's) to read
  • Table 21 illustrates the response to the READ_REQUEST.
  • Table 21 READ_RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes Data uint16 array The data read from the flash
  • Table 22 illustrates the ERASE_REQUEST. It requests a set of flash sectors to be erased.
  • Table 22 ERASE_REQUEST Parameter Name Type Comment Sector Map uint256 A bit field of 251 bits (1 bit per flash sector). Sectors are 8kByte in size. A bit being set indicates that the sector should be erased. Note: Bit 0 of the 32 nd byte equates to sector 0.
  • Table 23 illustrates response to the ERASE_REQUEST.
  • Table 23 ERASE_RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes
  • Table 24 illustrates the WRITE_REQUEST. It write a sector. This packet has - unlike all other packets - a maximum size of 8200 bytes to be able to hold an entire 8Kbyte sector. Table 24: WRITE_REQUEST Parameter Name Size Comment Sector number uint16 The sector (0-251) where the data should be written Offset uint16 Offset within the sector, in units of uint16, at which to start writing Data uint16 array At most 4096 words (or 8192 bytes)
  • Table 25 illustrates the response to the WRITE_REQUEST.
  • Table 25 WRITE_RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes
  • Table 26 illustrates the WRITE_KALIMBA_RAM_REQUEST. It writes to the Kalimba RAM. The overall request must not be larger than 128 bytes. Table 26: WRITE_KALIMBA_RAM_REQUEST Parameter Name Size Comment Address uint32 Destination address Data uint32 array The length of the data may be at most 30 uint32 values (or 120 bytes)
  • Table 27 illustrates the response to the WRITE_KALIMBA_RAM_REQUEST.
  • Table 27 WRITE_KALIMBA_RAM_RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes
  • Table 28 illustrates the EXECUTE_KALIMBA_REQUEST. On the Kalimba, it forces execution from a given address. Table 28: EXECUTE_KALIMBA_REQUEST Parameter Name Size Comment Address uint32 Address
  • Table 29 illustrates the response to the EXECUTE_KALIMBA_REQUEST.
  • Table 29 EXECUTE_KALIMBA_RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes
  • Table 30 illustrates the response to the BINAURIAL_RECORD_START_ REQUEST.
  • Table 30 BINAURAL RECORD START RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes Codec uint16 Codec list: PCM (linear) SBC APT-X Opus G.729 AAC HE MPEG Layer 2 Mpeg Layer 3 Sampling rate uint16
  • Table 31 illustrates the response to the BINAURIAL_RECORD_STOP_REQUEST.
  • Table 31 BINAURAL_RECORD_START_RESPONSE RESPONSE Parameter Name Type Comment Status uint16 See Table 2: Status codes
  • the following Table 32 illustrates the BINAURAL_RECORD_AUDIO_RESPONSE.
  • This is an unsolicited packet that will be sent repeatedly from the speakerbox with new audio content (preferably, SBC encoded audio data from the binaural microphones), following a BINAURAL_RECORD_START_REQUEST. To stop the automatic sending of these packets a BINAURAL_RECORD_STOP_REQUEST must be sent.
  • the first BINAURAL_RECORD_AUDIO_RESPONSE packet will contain the header below, subsequent packets will contain just the SBC frames (no header) until the total length of data sent is equal to the length in the header, i.e., a single BINAURAL_RECORD_AUDIO_RESPONSE packet may be fragmented across a large number of RFCOMM packets depending on the RFCOMM frame size negotiated.
  • the header for BINAURAL_RECORD_AUDIO_RESPONSE is not sent with every audio frame. Rather, it is only sent approximately once per second to minimize the protocol overhead.
  • Table 32 BINAURAL_RECORD_AUDIO_RESPONSE Parameter Name Size Comment Status uint16 See Table 2: Status codes Number of SBC frames in payload uint16 The number of SBC packets contained in the "SBC packet stream" portion of this control protocol packet. Number of SBC frames discarded uint16 If the radio link is not maintaining the required data rate to send all generated audio then some SBC packets will need to be discarded and not sent. This parameter holds the number of SBC packets that were discarded between this control protocol packet and the last one. N.B. This parameter should be zero for successful streaming without audio loss. SBC packet stream n bytes A concatenated stream of SBC packets.
  • Audio-3D aims to transmit speech contents as well as the acoustical ambience in which the speaker currently is located.
  • Audio-3D may also be used to create binaural "snapshots" (also called “moments” in this specification) of situations in life, to share acoustical experiences and/or to create diary like flashbacks based on the strong emotions that can be triggered by the reproduction of the acoustical ambience of a life-changing event.
  • a remote speaker is located in a natural acoustical environment which is characterized by a specific acoustical ambience.
  • the remote speaker uses a mobile binaural terminal, e.g., comprising an over-the-ear headset connected with a smartphone via a local wireless transmission link (see sections 4 and 5 above), which connects to a local speaker.
  • the binaural terminal of the remote speaker captures the acoustical ambience using the binaural headset.
  • the binaural audio signal is transmitted to the local speaker which allows the local speaker to participate in the acoustical environment in which the remote speaker is located substantially as if the local speaker would be there (which is designated in this specification as "audio portation").
  • the local speaker Compared to a communication link based on conventional telephony, besides understanding the content of the speech emitted by the remote speaker, the local speaker preferably hears all acoustical nuances of the acoustical environment in which the remote speaker is located such as the bird, the bat and the sounds of the beach.
  • FIG. 3 A possible scenario “Sharing Audio Snapshots” is shown schematically and exemplarily in Fig. 3 .
  • a user is at a specific location and enjoys his stay there.
  • he/she makes a binaural recording using an Audio-3D headset which is connected to a smartphone, denoted as the "Audio-3D-Snapshot”.
  • the snapshot is complete, the user also takes a photo from the location.
  • the binaural recording is tagged by the photo, the exact position, which is available in the smartphone, the date and time and possibly a specific comment to identify this moment in time later on. All these informations are uploaded to a virtual place, such as a social media network, at which people can share Audio-3D-Snapshots.
  • the user and those who share the uploaded contents can listen to the binaural content. Due to the additional information/data and the realistic impression that the Audio-3D-Snapshot can produce in the ears of the listener, the feelings the user may have had in the situation where he/she captured the Audio-3D-Snapshot can be reproduced in a way much more realistic than it could based on a photo or a single channel audio recording.
  • FIG. 4 A possible scenario "Attending a Conference from Remote” is shown schematically and exemplarily in Fig. 4 .
  • Audio-3D technology connects a single remote speaker with a conferencing situation with multiple speakers.
  • the remote speaker uses a binaural headset 202 which is connected to a smartphone (not shown in the figure) that operates a binaural communication link (realized, for example, by means of an app).
  • a binaural headset 201 On the local side, one of the local speakers wears a binaural headset 201 to capture the signal or, alternatively, there is a binaural recording device on the local side which mimics the characteristics of a natural human head such as an artificial head.
  • the remote person hears not only the speech content which the speakers on the local side emit, but also additional information which is inherent to the binaural signal transmitted via the Audio-3D communication link.
  • This additional information may allow the remote speaker to better identify the location of the speakers within the conference room. This, in particular, may enable the remote speaker to link specific speech segments to different speakers and may significantly increase the intelligibility even in case that all speakers talk at the same time.
  • FIG. 5 A possible scenario "Multiple User Binaural Conference” is shown schematically and exemplarily in Fig. 5 .
  • two endpoints at remote locations M, N are connected via an Audio-3D communication link with multiple communication partners on both sides.
  • One participant on each side has a "Master-Headset device” 301, 302, which is equipped with speakers and microphones. All other participants wear conventional stereo headsets 303, 304 with speakers only.
  • Audio-3D Due to the use of Audio-3D, a communication is enabled as if all participants would share one room. In particular, even if multiple speakers on both sides speak at the same time, the transmission of the binaural cues enables to separate the speakers due to the ability to separate the different locations of the speakers.
  • FIG. 6 A possible scenario "Binaural Conference with Multiple Endpoints” is shown schematically and exemplarily in Fig. 6 . This scenario is very similar to the scenario “Multiple User Binaural Conference", explained in section 7.4 above.
  • the main difference is that more than two groups are connected, e.g., three groups at remote locations M, N, O in Fig. 6 .
  • a network located Audio-3D conference bridge 406 is used to connect all three parties.
  • a peer-to-peer connection from each of the groups to all other groups would, in principle, also be possible.
  • the overall number of data links increases exponentially with the number of participating groups.
  • the purpose of the conference bridge 406 is to provide each participant group with a mix-down of the signals from all other participants. As a result, all participants involved in this communication situation have the feeling that all speakers are located at one place, such as, in one room. In specific situations, it may, however, be useful to preserve the grouping of people participating in this communication. In that case, the conference bridge may employ sophisticated digital signal processing to relocate signals in the virtual acoustical space. For example, for the listeners in group 1, the participants from group 2 may be artificially relocated to the left side and the participants from group 3 may be artificially relocated to the right side of the virtual acoustical environment.
  • FIG. 7 A possible scenario "Binaural Conference with Conventional Telephone Endpoints” is shown schematically and exemplarily in Fig. 7 .
  • This scenario is very similar to the scenario "Binaural Conference with Multiple Endpoints", explained in section 7.5 above.
  • two participants at remote location O are connected to the binaural conference situation via a conventional telephone link using a telephone 505.
  • the Audio-3D conference bridge 506 provides binaural signals to the two groups which are connected via an Audio-3D link.
  • the signals originating from the conventional telephone link are preferably extended to be located at a specific location in the virtual acoustical environment by HRTF (Head Related Transfer Function) rendering techniques (see, for example, G. Enzner et al., "Trends in Acquisition of Individual Head-Related Transfer Functions", The Technology of Binaural Listening, Springer-Verlag, pages 57 to 92, 2013 ; J.
  • HRTF Head Related Transfer Function
  • Speech enhancement technologies such as bandwidth extension (see B. Geiser, "High-Definition Telephony over Heterogeneous Networks", PhD dissertation, Institute of Communication Systems and Data Processing, RWTH Aachen, 2012 ) are preferably employed to improve the overall communication quality.
  • the Audio-3D conference bridge 506 creates a mix-down from the binaural signals. Sophisticated mix-down techniques to avoid comb filtering effects and similar from the binaural signals should preferably be employed. Also, the binaural signals should preferably be processed by means of sophisticated signal enhancement techniques such as, e.g., noise reduction and dereverberation to help the connected participants which listen to monaural signals captured in a situation with multiple speakers speaking at the same time from different directions.
  • sophisticated signal enhancement techniques such as, e.g., noise reduction and dereverberation to help the connected participants which listen to monaural signals captured in a situation with multiple speakers speaking at the same time from different directions.
  • binaural conferences may be extended by means of a recorder which captures the audio signals of the complete conference and afterwards stores it as an Audio-3D snapshot for later recovery.
  • a binaural conferencing situation (not shown in the figures) with three participants at different locations which all use a binaural terminal, such as an Audio-3D headset.
  • a binaural terminal such as an Audio-3D headset.
  • the audio signals from all participants are mixed to an overall resulting signal at the same time.
  • this may end up in a quite acoustical noisy result and signal distortions due to the overlaying/mixing of three different binaural audio signals originating from the same environment. Therefore, the present invention foresees the following selection by a participant or by an automatic approach.
  • one participant of the binaural conference may select a master binaural signal, either from participant 1, 2 or 3.
  • the signal from participant 3 has been selected.
  • the participants 1 and 2 may be represented in mono (preferably, being freed from the sounds related to the acoustical environment) and mixed to the binaural signal from participant 3.
  • the signals from participants 1 and 2 are only monaural (preferably, being freed from the sounds related to the acoustical environment) and are then mixed binaurally to the binaural signal from participant 3.
  • the binaural signal from the currently speaking participant is preferably always used, which means that there will be a switch of the binaural acoustical environment.
  • This concept may be realized by commonly known means, such as by detecting the current speaker by means of a level detection or the like.
  • sophisticated signal processing algorithms may be employed to combine the recorded signals to form the best combination targeting a specific optimization criterion (e.g. to maximize the intelligibility).
  • a first example preferably consists of one or more of the following steps:
  • a second example preferably consists of one or more of the following steps:
  • the binaural cues inherent to the audio signal captured at the one side must be preserved until the audio signal reaches the ears of the connected partner at the other side.
  • the binaural cues are defined as the characteristics of the relations among the two channels of the binaural signal which are commonly mainly expressed as the Interaural Time Differences (ITD) and the Interaural Level Differences (ILD) (see J. Blauert, "Spatial Hearing: The Psychophysics of Human Sound Localization", The MIT press, Cambridge, Massachusetts, 1983 ).
  • the ITD cues influence the perception of the spatial location of acoustical events at low frequencies due to the time differences between the arrival of an acoustical wavefront at the left and the right human ear. Often, these cues are also denoted as phase differences between the two channels of the binaural signal.
  • the ILD binaural cues have a strong impact on the human perception at high frequencies.
  • the ILD cues are due to the shadowing and attenuation effects caused by the human head given signals arriving from a specific direction: The level tends to be higher at that side of the head which points into the direction of the origin of the acoustical event.
  • Audio-3D can only be based on transmission channels for which the provider has end-to-end control.
  • An introduction of Audio-3D as a standard in public telephone networks seems to be unrealistic due to the lack of cooperation and interest of the big telecommunication companies.
  • Audio-3D should preferably be based on packet based transmission schemes, which requires technical solutions to deal with packet losses and delays.
  • Audio-3D new terminal devices are required. Instead of a single microphone in proximity to the mouth of the speaker as commonly used in conventional telephony, two microphones are required for Audio-3D, which must be located in proximity to the natural location where human perception actually happens, hence close to the entrance of the ear canal.
  • Fig. 8 A possible realization is shown in Fig. 8 based on an example of an artificial head equipped with a prototype headset for Audio-3D.
  • the microphone capsules are in close proximity to the entrance of the ear canal.
  • the shown headset is not closed; otherwise, the usage scenario "Multiple User Binaural Teleconference" would not be possible, since in that scenario, the local acoustical signals need to reach the ear of the speaker on a direct path also.
  • closed headphones extended by a "hear-through” functionality as well as loudspeaker-microphone enclosures combined with stereo-crosstalk-cancellation and stereo-widening or wave field synthesis techniques are optional variants of Audio-3D terminal devices (refer to section 8.4.2).
  • Audio-3D Special consideration has to be taken to realize Audio-3D, since currently available smartphones support only monaural input channels.
  • some manufacturers such as, e.g., Tascam (see www.tacsam.com) offer soundcards which can be used in stereo input and output mode in combination with, e.g., an iPhone. It is very likely that the USB-to-go standard (OTG) will allow connecting USB compliant high-quality soundcards with smartphones soon.
  • OOG USB-to-go standard
  • binaural signals should preferably be of a higher quality, since the binaural masking threshold level is known to be lower than the masking threshold for monaural signals (see B.C.J. Moore, "An Introduction to the Psychology of Hearing, Academic Press, 4th Edition, 1997 ).
  • a binaural signal transmitted from one location to the other should preferentially be of a higher quality compared to the signal transmitted in conventional monaural telephony. This implies that high-quality acoustical signal processing approaches should be realized as well as audio compression schemes (audio codec) which allow higher bit rate and therefore quality modes.
  • Audio-3D in this example, is packet based and principally an interactive duplex application. Therefore, the end-to-end delay should preferably be as low as possible to avoid negative impacts on conversations and the transmission should be able to deal with different network conditions. Therefore jitter compensation methods, frame loss concealment strategies and audio codecs which adapt the quality and the delay with respect to a given instantaneous network characteristic are deemed crucial elements of Audio-3D applications.
  • Audio-3D applications shall be available for everybody. Therefore, simplicity in usage may also be considered a key feature of Audio-3D.
  • the functional units in a packet based Audio-3D terminal can be similar to those in a conventional VoIP-terminal.
  • Two variants are considered in the following, of which the variant shown schematically and exemplarily in Fig. 9 is preferably foreseen for use in a headset terminal device as shown in Fig. 8 - which is a preferred solution -, whereas the variant shown schematically and exemplarily in Fig. 10 is preferably foreseen for use in a terminal device realized as a speakerbox, which may require additional signal processing for realizing a stereo crosstalk cancellation in the receiving direction and a stereo widening in the sending direction.
  • the Audio-3D terminal comprises two speakers and two microphones, which are associated with the left and the right ear of the person wearing the headset.
  • AEC acoustical echo cancellers
  • the signal captured by each of the microphones is preferably processed by a noise reduction (NR), an equalizer (EQ) and an automatic gain control (AGC).
  • NR noise reduction
  • EQ equalizer
  • AGC automatic gain control
  • the output from the AGC is finally fed into the source codec.
  • This source coded is preferably specifically suited for binaural signals and transforms the two channels of the audio signal into a stream of packets of a moderate data rate which fulfill the high quality constraints as defined in section 8.3 above.
  • the packets are finally transmitted to the connected communication partner via an IP link.
  • sequences of packets arrive from the connected communication partner.
  • the packets are fed into the adaptive jitter buffer unit (JB).
  • This jitter buffer has control of the decoder to reconstruct the binaural audio signal from the arriving packets as well as of the frame loss concealment (FLC) functionality that proceeds error concealment in case packets have been lost or arrive too late.
  • FLC frame loss concealment
  • jitters network delays, denoted as “jitters”, are compensated by buffering a specific number of samples. It is adaptive as the number of samples to be stored for jitter compensation may vary over time to adapt to given network characteristics. However, caution should be taken not to increase the end-to-end communication delay which depends on the number of samples stored in the buffer before playback.
  • the decoder is preferably driven to perform a frameloss concealment. In some situations, however, a frameloss concealment cannot be performed by the decoder. In this case, the frameloss concealment unit is preferably driven to output audio samples that conceal the gap in the audio signal due to the missing audio samples.
  • the output signal from the jitter buffer is fed, here, into an optional noise reduction (NR) and an automatic gain control (AGC) unit.
  • NR noise reduction
  • AGC automatic gain control
  • these units are not necessary, since this functionality has been realized on the side of the connected communication partner. Nevertheless, it often makes sense if the connected terminal does not provide the desired audio quality due to low bit rate source encoders or insufficient signal processing on the side of the connected terminal.
  • the following equalizer in the receiving direction is preferably used to individually equalize the headset speakers and to adapt the audio signals according to the subjective amenities of the user. It was found, e.g., in R. Bomhardt et al., "Individualtechnisch der kopf Congressen effetsfunktion", 40.ttagung für Akustik (DAGA), 2014 , that an individual equalization can be crucial for a high-quality spatial perception of the binaural signals.
  • the processed signal is finally emitted by the speakers of the Audio-3D terminal headset.
  • a functional unit for a stereo widening (STW) as well as a functional unit for a stereo crosstalk cancellation (XTC) are added.
  • the stereo widening units transforms a stereo signal captured by means of two microphones into a binaural signal. This enhancement is principally necessary if the two microphones are not in a distance which is identical (or close to) that of the ears in human perception due to, e.g., a limited size of the speakerbox terminal device. Due to the knowledge of the capturing situation, the stereo widening unit can compensate for the lack of distance by artificially adding binaural cues such as increased interchannel phase differences for low frequencies and interchannel level differences for higher frequencies.
  • stereo widening on the sending side in a communication scenario may be denoted as “side information based stereo widening”.
  • stereo widening may also be based solely on the received signal on the receiving side of a communication scenario. In that case, it is denoted as "blind stereo widening" since no side information is available in addition to the transmitted binaural signal.
  • the stereo crosstalk cancelling unit is preferably used to aid the listener who is located at a specific position to perceive binaural signals. Mainly, it compensates for the loss of binaural cues due to the emission of the two channels via closely spaced speakers and a cross-channel interference (audio signals emitted by the right loudspeaker reaching the left ear and audio signals emitted by the left loudspeaker reaching the right ear).
  • the purpose of the stereo crosstalk canceller unit is to employ signal processing to emit signals which cancel out the undesired cross-channel interference signals reaching the ears.
  • a full two-channel acoustical echo canceller is preferably used, rather than two single channel acoustical echo cancellers.
  • Audio-3D conference bridge The purpose of the Audio-3D conference bridge is to provide audio streams to the participants of a conference situation with more than two participants. Principally, to establish multiple peer-to-peer connections between all participating connections would be possible also; some of the functionalities performed by the conference bridge would then have to be realized in the terminals. However, the overall data rate involved would grow exponentially as a function of the number of participants and, therefore, would start to become inefficient for a low number of connected participants already.
  • Fig. 11 The typical functionality to be realized in the conference bridge is shown schematically and exemplarily in Fig. 11 , based on an exemplary setup composed of three participants, of which one is connected via a conventional telephone (PSTN; public switched telephone network) connection, whereas the other two participants are connected via a packet based Audio-3D link.
  • PSTN public switched telephone network
  • the conference bridge receives audio streams from all three endpoints, shown as the incoming gray arrows in the figure.
  • the streams originating from participants 1 and 2 contain binaural signals in Audio-3D quality, indicated by the double arrows, whereas the signal from participant 3 is only monaural and of narrow band quality.
  • the conference bridge creates one outgoing stream for each of the participants:
  • each participant receives the audio data from all participants but himself.
  • Variants are possible to control the outgoing audio streams, e.g.,
  • incoming audio signals may be decoded and transformed into PCM (pulse code modulation) signals to be accessible for audio signal processing algorithms.
  • PCM pulse code modulation
  • the signal processing functionalities in the PCM domain are similar to those functionalities realized in the terminals (e.g., adaptive jitter buffer) and shall not be explained in detail here.
  • a signal adaptation is preferentially used in both directions, from the telephone network to the Audio-3D network (Voice to Audio-3D) and from the Audio-3D network to the telephone network (Audio-3D to Voice).
  • the audio signals must be converted from narrowband to Audio quality and from monaural to binaural, as shown schematically and exemplarily in Fig. 12 .
  • the monaural signal is transformed into a binaural signal.
  • So-called spatial rendering (SR) is employed for this purpose in most cases.
  • HRTFs head related transfer functions
  • HRTFs head related transfer functions
  • HRTFs mimic the effect of the temporal delay caused by a signal reaching the one ear before the other and the attenuation effects caused by the human head.
  • HRTFs mimic the effect of the temporal delay caused by a signal reaching the one ear before the other and the attenuation effects caused by the human head.
  • an additional binaural reverberation can be useful (SR+REV).
  • the monaural signal must be converted into a signal which is compliant to a conventional telephone.
  • the audio bandwidth must be limited and the signal must be converted from binaural to mono, as shown schematically and exemplarily in Fig. 13 .
  • an intelligent down-mix is preferably realized, such that undesired comb effects and spectral colorations are avoided.
  • additional signal processing / speech enhancements may preferably be implemented, such as a noise reduction and a dereverberation that may help the listener to better follow the conference.
  • the binaural cues as introduced in section 8.1 above must be preserved. In order to do so, the sensitivity of human perception with respect to phase shifts in binaural signals is preferably taken into account in the source codec. VoIP applications tend to transfer different media types in independent streams of data and to synchronize on the receiver side. This procedure makes sense for audio and video due to the use of different recording and playback clocks.
  • the receiver side synchronization is not very critical, since a temporal shift between audio and video can be tolerated unless it exceeds 15 to 45 milliseconds (see Advanced Television Systems Committee, "ATSC Implementation Subcommittee Finding: Relative Timing of Sound and Vision for Broadcast Operations", IS-191, 2003 ).
  • the two channels of a binaural signal should preferably be captured using one physical device with one common clock rate to prevent signal drifting.
  • the synchronization on the receiver side cannot be realized or only with an immense signal processing effort to achieve an accuracy which allows preserving the ITD binaural cues as defined in section 8.1 above.
  • a transmission of the encoded binary data taken from two independent instances of the same monaural source encoder, one for each binaural channel, in one data packet is the most simple approach as long as the left and right binaural channels are captured sample and frame synchronously, which implies that both are recorded by ADCs (analog-to-digital converters) which are driven by the same clock or a common clock reference.
  • ADCs analog-to-digital converters
  • This approach yields a data rate which is twice the data rate of a monaural HD-Voice communication terminal.
  • sophisticated approaches to exploit the redundancies in both channels may be a promising solution to decrease the overall data rate (see, e.g., H.
  • VoIP transmission schemes in general rely on the so-called User Datagram Protocol (UDP).
  • UDP User Datagram Protocol
  • TCP Transmission Control Protocol
  • packets emitted by one side of the communication arrive in time very often, but may also arrive with a significant delay (denoted as the "network jitter ").
  • packets may also get lost during the transmission (denoted as a "frameloss").
  • JB jitter buffer
  • the network jitter characteristics observed in real applications are in general strongly time-varying.
  • An example with strongly variable network jitter is a typical WiFi router used in many households nowadays.
  • packets are not transmitted via the WiFi transmission link for a couple of hundred milliseconds if a microwave oven is used which produces disturbances in the same frequency band used by WiFi or if a Bluetooth link is used in parallel. Therefore, a good jitter buffer should preferably be managed and should adapt to the instantaneous network quality which must be observed by the Audio-3D communication application.
  • Such a jitter buffer is denoted as an adaptive jitter buffer.
  • the number of samples stored in the jitter buffer (the fill height) is preferably modified by the employment of approaches for signal modifications such as the waveform similarity overlap-add (WSOLA) approach (see W. Verhelst and M. Roelands, "An overlap-add technique based on waveform similarity (WSOLA) for high quality time-scale modification of speech", IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 554 to 557, 1993 ), a phase vocoder (see M. Dolson, "The phase vocoder: A tutorial", Computer Music Journal, Vol. 10, No. 4, pages 14 to 27, 1986 ) approach or similar techniques.
  • WOLA waveform similarity overlap-add
  • the goal during this adaptation is to play the signal with an increase or decrease of speed without producing artifacts which are audible, also denoted as "Time-Stretching".
  • time stretching is achieved by re-assembling the signal from signal segments originating from the past or the future.
  • the exact signal synthesis process may be different for the left and the right channel of a binaural signal due to independent WSOLA processing instances.
  • Arbitrary phase shifts may be the result, which do not really produce audible artifacts, but which may lead to a manipulation of the ITD cues in Audio-3D and may destroy or modify the spatial localization of audio events.
  • a preferred approach which does not influence the ITD binaural cues is to use an adaptive resampler.
  • the core component is a flexible resampler, the output sample rate of which can be modified continuously during operation.
  • signal levels are preferably adapted such that the transmitted signal does appear neither to be too loud nor of too low volume.
  • this increases the perceived communication quality since, e.g., a source encoder works better for signals with a higher level than for lower levels and the intelligibility is higher for higher level signals.
  • the ILD binaural cues are based on level differences in the two channels of a binaural signal. Given two AGC instances which operate independently on the left and the right channel, these cues may be destroyed since the level differences are removed. Thus, a usage of conventional AGCs which operate independently may not be suitable for Audio-3D. Instead, the gain control for the left channel should preferably somehow be coupled to the gain control for the right channel.
  • the signals are recorded with devices which mimic the influence of real ears (for example, an artificial head in general has “average ears “ which shall approximate the impact of the ears of a huge amount of persons) or by using headset devices with a microphone in close proximity to the ear canal (see section 8.4.1).
  • headset devices with a microphone in close proximity to the ear canal (see section 8.4.1).
  • the ears of the person who listens to the recorded signals and the ears which have been the basis for the binaural recording are not identical.
  • an equalizer can be used in the sending direction in Figs. 9 and 10 to compensate for possible deviations of the microphone characteristics related to the left and the right channel of the binaural recordings.
  • an equalizer may also be useful to adapt to the hearing preference of the listener to attenuate or amplify specific frequencies.
  • attenuations and amplifications of parts of the binaural signal may also be realized in the equalizer according to the needs of the person wearing the binaural terminal device to increase the overall intelligibility.
  • some care has to be taken to not destroy or manipulate the ILD binaural cues.
  • Audio-3D a goal of Audio-3D is the transmission of speech contents as well as a transparent reproduction of the ambience in which acoustical contents have been recorded. In this sense, a noise reduction which removes acoustical background noise may not be useful at the first glance.
  • At least stationary undesired noises should preferably be removed to increase the conversational intelligibility.
  • Audio-3D a more accurate classification of the recording situation should be performed to distinguish between "desired” and “undesired” background noises.
  • two rather than only one microphone help in this classification process by locating audio sources in a given room environment.
  • additional sensors such as an accelerometer or a compass may support the auditory scene analysis.
  • noise reduction is based on the attenuation of those frequencies of the recorded signal where noise is present, such that the speech is left unaltered, whereas noise is as much as possible suppressed.
  • two noise reduction instances operating on the left and the right channel independently may destroy or manipulate the binaural ILD cues.
  • approaches have been developed for binaural hearing aids in the past (see T. Lotter, “Single and Multimicrophone Speech Enhancement for Hearing Aids", PhD dissertation, Institute of Communication Systems and Data Processing, RWTH Aachen, 2004 ; M. Jeub, "Joint Dereverberation and Noise Reduction for Binaural Hearing Aids and Mobile Phones", PhD dissertation, Institute of Communication Systems and Data Processing, RWTH Aachen, 2012 ).
  • acoustical echo compensation in general an approach is followed which is composed of an acoustical echo canceller and a statistical postfilter.
  • the acoustical echo canceller part is based on the estimation of the "real" physical acoustical path between speaker and microphone by means of an adaptive filter. Once determined, the estimate of the acoustical path is afterwards used to approximate the undesired acoustical echo signal recorded by the microphones of the terminal device.
  • the approximation of the acoustical echo and the acoustical echo signal inherent to the recorded signal are finally cancelled out by means of destructive superposition (see S. Haykin, "Adaptive Filter Theory", Prentice Hall, 4th Edition, 2001 ).
  • Audio-3D headset terminal devices a strong coupling between speaker and microphone is present, which is due to the close proximity of the microphone and the speaker (see, for example, Fig. 8 ) and which produces a strong undesired acoustical echo.
  • a well- designed adaptive filter may reduce this acoustical echo by a couple of dB but may never remove it completely. The remaining acoustical echo can still audible and may be very confusing in terms of the perception of a binaural signal given that two independent instances of an acoustical echo compensator are operated for the left and the right channel. Phantom signals may appear to be present, which are located at arbitrary locations in the acoustical scenery.
  • a postfilter is therefore considered to be of great importance here, but it may have a negative impact on the ILD binaural cues due to an independent manipulation of the signal levels of the left and the right channel of the binaural signal.
  • the hardware setup to consume binaural contents if not using a headset device is expected to be composed of two loudspeakers, for instance, two speakerboxes, being placed in a typical stereo playback scenario.
  • Such a stereo hardware setup is not optimal for binaural contents as it suffers from cross channel interferences: Signals emitted by the left of the two loudspeakers of the stereo playback system will reach the right ear and signals emitted by the right speaker will reach the left ear.
  • the two channels of a captured binaural signal to be emitted by the two involved speakers are pre-processed by means of linear filtering in order to minimize the amount of cross channel interferences. Principally, it employs cancellation techniques based on fixed filtering techniques described e.g. in B. B. Bauer, "Stereophonic Earphones and Binaural Loudspeakers," Journal of the Audio Engineering Society, Vol. 9, No. 2, pages 148 to 151, 1961 .
  • the pre-processing required for crosstalk cancellation depends heavily on the physical location and characteristics of the involved loudspeakers. Normally, users have no common sense in placing stereo loudspeakers, e.g., in the context of a home cinema.
  • the location of the stereo speakers is fixed and users are assumed to be located in front of the display in a specific distance.
  • a carefully designed set of pre-processing filter coefficients is preferably sufficient to cover most use-cases.
  • the position of the loudspeakers is definitely not fixed.
  • stereo enhancement techniques may preferably be employed to transform a stereo signal into a somewhat binaural signal.
  • the main principle of these stereo enhancement techniques is to artificially modify the captured stereo audio signals to reconstruct lost binaural cues artificially.
  • any audio recording is simply played back by devices without taking care of how it was captured, e.g., whether it is a mono, a stereo, a surround sound or a binaural recording and/or whether the playback device is a speakerbox, a headset, a surround sound equipment, a loudspeaker arrangement in the car or the like.
  • the maximum that can be expected today is that a mono signal is automatically played back on both loudspeakers, right and left, or on both headset speakers, left and right, or that a surround sound signal is down-mixed to two speakers if the surround sound is indicated.
  • the ignorance of the audio signal's nature may result in an audio quality which is not satisfactory for the listener.
  • a binaural signal might be played back via loudspeakers and a surround sound signal might be played back via headphones.
  • Another example might occur with more distribution of binaurally recorded sounds in the market, provided by music labels or broadcasters.
  • 3D algorithms for enhancing the flat audio field of a stereo signal exist and are being applied, such devices or algorithms cannot make a difference between stereo signals and binaurally recorded signals. Thus, they would even apply 3D processing on already binaurally recorded signals. This needs to be avoided, because it could result in a very impaired sound quality that does not at all match the target of the audio signal supplier, whether it is a broadcaster or the music industry.
  • the audio terminal 100 shown in Fig. 1 generates metadata provided with the multi-channel audio data, wherein the metadata indicates that the multi-channel audio data is binaurally captured.
  • the metadata further indicates one or more of: a type of the first device, a microphone use case, a microphone attenuation level, a beamforming processing profile, a signal processing profile and an audio encoding format.
  • a suitable metadata format could be defined as follows: Device ID: 3 bit to indicate a setup of the first and the second microphone, e.g., '000' BoomBoom '001' Shoulderspeaker '010' Headset over the ear '011' Headset on the ear '100' In-Ear '101' Headset over the ear with artificial ears '110' Headset on the ear with artificial ears '111' In-Ear with hear-through Microphone Use Case: 3 bit to indicate the use case of the microphones, e.g.
  • Level Setup 32 bit (4 x 8 bit) or more to indicate the respective attenuation of the microphones, e.g., 'Bit 0-7' Attenuation of microphone 1 in dB 'Bit 8-15' Attenuation of microphone 2 in dB 'Bit 16-23' Attenuation of microphone 3 in dB 'Bit 24-31' Attenuation of microphone 4 in dB Beamforming Processing Profile: 2 bit to indicate which beamforming algorithms have been applied to the microphones, e.g., '00' Beamforming algorithm 1 '01' Beamforming algorithm 2 '10' Beamforming algorithm 3 '11' Beamforming algorithm 4 Signal Processing Profile: 4 bit to indicate which algorithms have been applied to the microphones, e.g., '00' Signal processing 1 '01' Signal processing 2 '10' Signal processing 3 '11' Signal processing 4
  • Encoding Algorithm Format 2 to 4 bit to indicate the encoding algorithm being used, such as SBC, apt
  • the metadata preferably indicates a position of the two speakers relative to each other.
  • audio terminal 100 described with reference to Fig. 1 comprises a first device 10 and a second device 20 which is separate from the first device 10, this does not have to be the case.
  • other audio terminals according to the present invention which may be used for Audio-3D may be integrated terminals, in which both (a) at least a first and a second microphone for capturing multi-channel audio data comprising a first and a second audio channel, and (b) a communication unit for voice and/or data communication, are provided in a single first device.
  • a connection via a local wireless transmission link may not be needed and the concepts and technologies described in sections 7 to 9 above could also be realized in an integrated terminal.
  • an audio terminal which realizes the concepts and technologies described in sections 7 to 9 above could comprise a first and a second device which are adapted to be connected with each other via a wired link.
  • an audio terminal which comprises only one of (a) at least a first and a second microphone and (b) at least one of a first and a second speaker, the first one being preferably usable for recording multi-channel audio data comprising at least a first and a second audio channel and the second one being preferably usable for playing back multi-channel audio data comprising at least a first and a second audio channel.
  • audio terminal 100 described with reference to Fig. 1 comprises a communication unit 21 for voice and/or data communication
  • other audio terminals according to the present invention which may be used for Audio-3D may comprise, additionally or alternatively, a recording unit (not shown in the figures) for recording the captured multi-channel audio data comprising a first and a second audio channel.
  • a recording unit preferably comprises a non-volatile memory, such as a hard disk drive or a flash memory, in particular, a flash RAM.
  • the memory may be integrated into the audio terminal or the audio terminal may provide an interface for inserting an external memory.
  • the audio terminal 100 further comprises an image capturing unit (not shown in the figures) for capturing a still or moving picture, preferably, while capturing the multi-channel audio data, wherein the audio terminal 100 is adapted to provide, preferably, automatically or substantially automatically, an information associating the captured still or moving picture with the captured multi-channel audio data.
  • an image capturing unit not shown in the figures for capturing a still or moving picture, preferably, while capturing the multi-channel audio data
  • the audio terminal 100 is adapted to provide, preferably, automatically or substantially automatically, an information associating the captured still or moving picture with the captured multi-channel audio data.
  • the audio terminal 100 may further comprise a text inputting unit for inputting text, preferably, while capturing the multi-channel audio data, wherein the audio terminal 100 is adapted to provide, preferably, automatically or substantially automatically, an information associating the inputted text with the captured multi-channel audio data.
  • the audio terminal 100 is adapted to provide, preferably, by means of the communication unit 21, the multi-channel audio data such that a remote user is able to listen to the multi-channel audio data.
  • the audio terminal 100 may be adapted to communicate the multi-channel audio data to a remote audio terminal via a data communication, e.g., a suitable Voice-over-IP communication.
  • a data communication e.g., a suitable Voice-over-IP communication.
  • the first and the second microphone 11, 12 and the first speaker 15 can be provided in a headset, for instance, an over- or on-the-ear headset, or a in-ear phone.
  • Audio-3D is not realized with narrowband audio data but, preferably, with wideband or even super-wideband or full band audio data.
  • these latter cases which may be referred to as HD-Audio-3D
  • the various technologies described above are adapted to deal with such high definition audio content.
  • a single unit or device may fulfill the functions of several items recited in the claims.
  • the mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used to advantage.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Telephone Function (AREA)

Claims (4)

  1. Audiosystem (400; 500) zum Bereitstellen einer Kommunikation zwischen mindestens drei entfernten Orten (M, N, 0), umfassend ein erstes Audioterminal (401), das sich an einem ersten entfernten Ort (M) befindet, ein zweites Audioterminal (402), das sich an einem zweiten entfernten Ort (N) befindet, ein drittes Audioterminal (405), das sich an einem dritten entfernten Ort (O) befindet, und eine Konferenzbrücke (406; 506), die mit dem ersten, dem zweiten und dem dritten Audioterminal (401, 402, 405; 501, 502, 505) über eine Übertragungsverbindung, vorzugsweise eine Einwahl- oder IP-Übertragungsverbindung, die jeweils mindestens einen ersten und einen zweiten Audiokanal unterstützt, verbindbar ist, wobei jedes des ersten, des zweiten und des dritten Audioterminals (401, 402, 405) mindestens ein erstes und ein zweites Mikrofon (11, 12) zum Erfassen von Mehrkanal-Audiodaten, die mindestens einen ersten und einen zweiten Audiokanal umfassen, umfasst, und mindestens einen ersten Lautsprecher (15) zum Wiedergeben von Audiodaten, die mindestens einen ersten Audiokanal umfassen, wobei das erste und das zweite Mikrofon (11, 12) und der erste Lautsprecher (15) in einem Headset oder einem Im-Ohr-Telefon bereitgestellt sind, wobei jedes des ersten, des zweiten und des dritten Terminals (401, 402, 405) des Weiteren mindestens einen zweiten Lautsprecher zum Wiedergeben von Audiodaten umfasst, die mindestens einen zweiten Audiokanal umfassen, wobei die Konferenzbrücke (406; 506) angepasst ist, um für jedes Audioterminal des ersten, des zweiten und des dritten Audioterminals (401, 402, 405) eine Mehrkanal-Audiomischung von Mehrkanal-Audiodaten zu erzeugen, die von allen anderen Audioterminals des ersten, des zweiten und des dritten Audioterminals (401, 402, 405) gestreamt werden, und um die Mehrkanal-Audiomischung zu dem Audioterminal zu streamen, für das sie erzeugt wird, wobei die Mehrkanal-Audiomischung mindestens einen ersten und einen zweiten Audiokanal umfasst.
  2. Audiosystem (400) nach Anspruch 1, wobei die Konferenzbrücke (406) angepasst ist, um die von dem ersten und dem zweiten Audioterminal (401, 402) gestreamten Mehrkanal-Audiodaten mit den von dem dritten Audioterminal (405) gestreamten Mehrkanal-Audiodaten monaural zu mischen, um die Mehrkanal-Audiomischung zu erzeugen.
  3. Audiosystem (400) nach Anspruch 2, wobei die Konferenzbrücke (406) ferner ausgebildet ist, um die von dem ersten und dem zweiten Audioterminal (401, 402) gestreamten, monaural gemischten Mehrkanal-Audiodaten bei der Erzeugung der Mehrkanal-Audiomischung räumlich zu positionieren.
  4. Audiosystem (500) nach einem der Ansprüche 1 bis 3, des Weiteren umfassend ein Telefon (505), das ein Mikrofon und einen Lautsprecher umfasst, wobei die Konferenzbrücke (506) des Weiteren mit dem Telefon (505) verbindbar ist, wobei die Konferenzbrücke (506) angepasst ist, um die von dem ersten und dem zweiten Audioterminal (501, 502) gestreamten Mehrkanal-Audiodaten in eine Einkanal-Audiomischung zu mischen, die einen einzigen Audiokanal umfasst, und die Einkanal-Audiomischung zu dem Telefon (505) zu streamen.
EP14777648.8A 2014-10-01 2014-10-01 Audioanschluss Active EP3228096B1 (de)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2014/071083 WO2016050298A1 (en) 2014-10-01 2014-10-01 Audio terminal

Publications (2)

Publication Number Publication Date
EP3228096A1 EP3228096A1 (de) 2017-10-11
EP3228096B1 true EP3228096B1 (de) 2021-06-23

Family

ID=51655751

Family Applications (1)

Application Number Title Priority Date Filing Date
EP14777648.8A Active EP3228096B1 (de) 2014-10-01 2014-10-01 Audioanschluss

Country Status (2)

Country Link
EP (1) EP3228096B1 (de)
WO (1) WO2016050298A1 (de)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106126185A (zh) * 2016-08-18 2016-11-16 北京塞宾科技有限公司 一种基于蓝牙的全息声场录音通讯装置及系统
CN114047902B (zh) * 2017-09-29 2024-06-14 苹果公司 用于空间音频的文件格式
WO2019157069A1 (en) * 2018-02-09 2019-08-15 Google Llc Concurrent reception of multiple user speech input for translation
CN110351690B (zh) * 2018-04-04 2022-04-15 炬芯科技股份有限公司 一种智能语音系统及其语音处理方法
CN111385775A (zh) * 2018-12-28 2020-07-07 盛微先进科技股份有限公司 一种无线传输系统及其方法
TWI700953B (zh) * 2018-12-28 2020-08-01 盛微先進科技股份有限公司 一種無線傳輸系統及其方法
KR102565882B1 (ko) * 2019-02-12 2023-08-10 삼성전자주식회사 복수의 마이크들을 포함하는 음향 출력 장치 및 복수의 마이크들을 이용한 음향 신호의 처리 방법
CN110444232B (zh) * 2019-07-31 2021-06-01 国金黄金股份有限公司 音箱的录音控制方法及装置、存储介质和处理器
EP4300994A4 (de) * 2021-04-30 2024-06-19 Samsung Electronics Co., Ltd. Verfahren und elektronische vorrichtung zur aufzeichnung von audiodaten aus mehreren vorrichtungen
CN117795978A (zh) * 2021-09-28 2024-03-29 深圳市大疆创新科技有限公司 音频采集方法、系统及计算机可读存储介质

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8379871B2 (en) * 2010-05-12 2013-02-19 Sound Id Personalized hearing profile generation with real-time feedback
US8855341B2 (en) * 2010-10-25 2014-10-07 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for head tracking based on recorded sound signals
US8767996B1 (en) * 2014-01-06 2014-07-01 Alpine Electronics of Silicon Valley, Inc. Methods and devices for reproducing audio signals with a haptic apparatus on acoustic headphones

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
None *

Also Published As

Publication number Publication date
EP3228096A1 (de) 2017-10-11
WO2016050298A1 (en) 2016-04-07

Similar Documents

Publication Publication Date Title
EP3228096B1 (de) Audioanschluss
EP3276905B1 (de) System zur audio-kommunikation mittels lte
US8918197B2 (en) Audio communication networks
US8073125B2 (en) Spatial audio conferencing
US20080004866A1 (en) Artificial Bandwidth Expansion Method For A Multichannel Signal
JP2018528479A (ja) スーパー広帯域音楽のための適応雑音抑圧
US20090150151A1 (en) Audio processing apparatus, audio processing system, and audio processing program
US20140226842A1 (en) Spatial audio processing apparatus
US20030044002A1 (en) Three dimensional audio telephony
TW200901744A (en) Headset having wirelessly linked earpieces
US20070109977A1 (en) Method and apparatus for improving listener differentiation of talkers during a conference call
KR20110069112A (ko) 보청기 시스템에서 바이노럴 스테레오를 렌더링하는 방법 및 보청기 시스템
US20220369034A1 (en) Method and system for switching wireless audio connections during a call
US20220038769A1 (en) Synchronizing bluetooth data capture to data playback
US20150092950A1 (en) Matching Reverberation in Teleconferencing Environments
US20170223474A1 (en) Digital audio processing systems and methods
US11503405B2 (en) Capturing and synchronizing data from multiple sensors
EP2901668A1 (de) Verfahren zur verbesserung der wahrgenommenen kontinuität in einem räumlichen telekonferenzsystem
WO2021180115A1 (zh) 一种真无线耳机的录音方法及录音系统
BRPI0715573A2 (pt) processo e dispositivo para aquisiÇço, transmissço e reproduÇço de eventos sonoros para aplicaÇÕes em comunicaÇço
TWM626327U (zh) 用於在分別對應於複數個使用者的複數個通訊裝置之間分配音訊信號之系統
CN111225102A (zh) 一种蓝牙音频信号传输方法和装置
US12010496B2 (en) Method and system for performing audio ducking for headsets
US20220368554A1 (en) Method and system for processing remote active speech during a call
Rothbucher et al. Backwards compatible 3d audio conference server using hrtf synthesis and sip

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20170818

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20190415

GRAP Despatch of communication of intention to grant a patent

Free format text: ORIGINAL CODE: EPIDOSNIGR1

RIC1 Information provided on ipc code assigned before grant

Ipc: H04S 3/00 20060101AFI20201221BHEP

Ipc: H04R 5/04 20060101ALN20201221BHEP

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: GRANT OF PATENT IS INTENDED

RIC1 Information provided on ipc code assigned before grant

Ipc: H04R 5/04 20060101ALN20210113BHEP

Ipc: H04S 3/00 20060101AFI20210113BHEP

INTG Intention to grant announced

Effective date: 20210128

GRAS Grant fee paid

Free format text: ORIGINAL CODE: EPIDOSNIGR3

GRAA (expected) grant

Free format text: ORIGINAL CODE: 0009210

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE PATENT HAS BEEN GRANTED

AK Designated contracting states

Kind code of ref document: B1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

REG Reference to a national code

Ref country code: GB

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: CH

Ref legal event code: EP

REG Reference to a national code

Ref country code: DE

Ref legal event code: R096

Ref document number: 602014078295

Country of ref document: DE

Ref country code: AT

Ref legal event code: REF

Ref document number: 1405344

Country of ref document: AT

Kind code of ref document: T

Effective date: 20210715

REG Reference to a national code

Ref country code: IE

Ref legal event code: FG4D

REG Reference to a national code

Ref country code: LT

Ref legal event code: MG9D

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: FI

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: BG

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210923

Ref country code: HR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

REG Reference to a national code

Ref country code: AT

Ref legal event code: MK05

Ref document number: 1405344

Country of ref document: AT

Kind code of ref document: T

Effective date: 20210623

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: NO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210923

Ref country code: SE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: RS

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: LV

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: GR

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210924

REG Reference to a national code

Ref country code: NL

Ref legal event code: MP

Effective date: 20210623

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: SM

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: SK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: CZ

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: EE

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: AT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: RO

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: PT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20211025

Ref country code: NL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: ES

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: PL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

REG Reference to a national code

Ref country code: DE

Ref legal event code: R097

Ref document number: 602014078295

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: DK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

PLBE No opposition filed within time limit

Free format text: ORIGINAL CODE: 0009261

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: NO OPPOSITION FILED WITHIN TIME LIMIT

REG Reference to a national code

Ref country code: CH

Ref legal event code: PL

26N No opposition filed

Effective date: 20220324

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: AL

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

REG Reference to a national code

Ref country code: BE

Ref legal event code: MM

Effective date: 20211031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MC

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

REG Reference to a national code

Ref country code: DE

Ref legal event code: R082

Ref document number: 602014078295

Country of ref document: DE

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LU

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211001

Ref country code: IT

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

Ref country code: BE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: LI

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211031

Ref country code: CH

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211031

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: IE

Free format text: LAPSE BECAUSE OF NON-PAYMENT OF DUE FEES

Effective date: 20211001

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: HU

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT; INVALID AB INITIO

Effective date: 20141001

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: CY

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

PG25 Lapsed in a contracting state [announced via postgrant information from national office to epo]

Ref country code: MK

Free format text: LAPSE BECAUSE OF FAILURE TO SUBMIT A TRANSLATION OF THE DESCRIPTION OR TO PAY THE FEE WITHIN THE PRESCRIBED TIME-LIMIT

Effective date: 20210623

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: DE

Payment date: 20240318

Year of fee payment: 10

Ref country code: GB

Payment date: 20240325

Year of fee payment: 10

PGFP Annual fee paid to national office [announced via postgrant information from national office to epo]

Ref country code: FR

Payment date: 20240325

Year of fee payment: 10