WO2023213395A1

WO2023213395A1 - Echo cancellation for i/o user devices performing user terminal emulation as a cloud computing service

Info

Publication number: WO2023213395A1
Application number: PCT/EP2022/062065
Authority: WO
Inventors: Hans Hannu; Peter ÖKVIST; Tommy Arngren; Daniel Lindström
Original assignee: Telefonaktiebolaget Lm Ericsson (Publ)
Priority date: 2022-05-04
Filing date: 2022-05-04
Publication date: 2023-11-09

Abstract

A method by a user terminal emulation server provides communication services using sets of I/O user devices. First data flows are provided between a first user terminal emulation application and a first set of the I/O user devices that are proximately located to a location of a first user and satisfy a combined capability rule for being combinable to provide a combined I/O user interface for the first user to interface with the first user terminal emulation application to perform a communication service through a network entity providing communication services. A first time offset is obtained indicating an elapsed time between when speaker data is sent through a first speaker data flow and when the speaker signal is played-out through a first speaker. A speaker echo component of microphone data received in the first microphone data flow is cancelled based on time shifting and combining the speaker signal and the microphone data using the first time offset.

Description

ECHO CANCELLATION FOR I/O USER DEVICES PERFORMING USER TERMINAL EMULATION AS A CLOUD COMPUTING SERVICE

TECHNICAL FIELD

[0001] The present disclosure relates to providing communication services through user terminals of a wireless communications system.

BACKGROUND

[0002] The market for user terminals is driven by the quest to provide users with increasingly advanced communication and other operational features within the constraints of a portable handheld form factor. The development requirements for user terminal are increasingly complex as designers seek to integrate a greater variety of user interfaces and advanced operational features within the portable handheld form factor. Advancements in operational features have required more highly integrated and faster processing circuits with greater circuit densities, which becomes more difficult under constraints on costs and power consumption.

[0003] This all-inclusive feature-rich approach for user terminal development does not satisfy all of the myriad of differing desires held by consumers seeking solutions for the rapidly expanding variety of communication services. Moreover, the always-connected expectations of today's society obligates users to vigilantly keep their user terminals within reach or risk being unable to timely receive or initiate communication services.

[0004] Centralized server-based approaches have been proposed for emulating a user terminal using one or more networked input and/or output (I/O) user devices, also "IODS", that are proximately located to a user. The IODs are individually or combinable to have user interface (UI) capabilities to provide an I/O user interface for the user to interface with a user terminal emulation application of the server to perform a communication service.

[0005] When a server-based emulation of a user terminal involves separate IODs, e.g., an “lOD-mi crophone” and a separate “lOD-speaker”, the resulting configuration can be subject to undesirable echo by audio from the lOD-speaker being fed-back by the IOD- microphone to the user terminal emulation application of the server. SUMMARY

[0006] Some embodiments disclosed herein are directed to a user terminal emulation server for providing communication services using sets of input and/or output (I/O) user devices. The user terminal emulation server includes at least one processor and at least one memory storing program code that is executable by the at least one processor to perform operations. The operations provide first data flows between a first user terminal emulation application and a first set of the I/O user devices that are proximately located to a location of a first user and satisfy a combined capability rule for being combinable to provide a combined I/O user interface for the first user to interface with the first user terminal emulation application to perform a communication service through a network entity providing communication services. The first data flows include a first microphone data flow received from a first microphone in the first set of the I/O user devices and a first speaker data flow sent to a first speaker in the first set of the I/O user devices. The operations obtain a first time offset indicating an elapsed time between when speaker data is sent through the first speaker data flow and when the speaker data is played-out through the first speaker. The operations cancel a speaker echo component of microphone data received in the first microphone data flow, based on time shifting and combining the speaker data and the microphone data using the first time offset.

[0007] Some other related embodiments disclosed herein are directed to a method by a user terminal emulation server for providing communication services using sets of I/O user devices. The method includes providing first data flows between a first user terminal emulation application and a first set of the I/O user devices that are proximately located to a location of a first user and satisfy a combined capability rule for being combinable to provide a combined I/O user interface for the first user to interface with the first user terminal emulation application to perform a communication service through a network entity providing communication services. The first data flows include a first microphone data flow received from a first microphone in the first set of the I/O user devices and a first speaker data flow sent to a first speaker in the first set of the I/O user devices. The method obtains a first time offset indicating an elapsed time between when speaker data is sent through the first speaker data flow and when the speaker data is played-out through the first speaker. The method cancels a speaker echo component of microphone data received in the first microphone data flow, based on time shifting and combining the speaker data and the microphone data using the first time offset. [0008] Some potential advantages of these and related embodiments include that a user can receive and initiate communication services without the necessity of a traditional all- inclusive feature-rich user terminal. The centralized server based approach emulates a user terminal using one or more networked I/O user devices that are proximately located to a user, and which individually or combinable have user interface (UI) capabilities to provide an I/O user interface for the user to interface with a user terminal emulation application of the server to perform a communication service. Although the server can be remotely located from the I/O user devices and, in particular, from microphone(s) and speaker(s), embodiments disclosed herein can effectively perform echo cancellation so that a speaker echo component which is present in microphone data received in a microphone data flow can be effectively cancelled.

[0009] Other user terminal emulation servers and related methods according to embodiments of the inventive subject matter will be or become apparent to one with skill in the art upon review of the following drawings and detailed description. It is intended that all such additional user terminal emulation servers and related methods be included within this description, be within the scope of the present inventive subject matter, and be protected by the accompanying claims. Moreover, it is intended that all embodiments disclosed herein can be implemented individually or combined in any way and/or combination

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] Aspects of the present disclosure are illustrated by way of example and are not limited by the accompanying drawings. In the drawings:

[0011 ] Figure 1 illustrates a system with a user terminal emulation server that operationally integrates sets of I/O user devices that are proximately located to users to logically form virtualized user terminals providing communication services in accordance with some embodiments of the present disclosure;

[0012] Figure 2 illustrates a block diagram of the user terminal emulation server communicating with various elements of a cellular system to provide communication services in accordance with some embodiments of the present disclosure;

[0013] Figure 3 illustrates a block diagram of the user terminal emulation server communicating in a different manner with various elements of a cellular system to provide communication services in accordance with some other embodiments of the present disclosure; [0014] Figure 4 illustrates the user terminal emulation server 100 executing two user terminal emulation applications to provide concurrent communication services through two sets of I/O user devices;

[0015] Figures 5, 6, 7, and 8 illustrate flowcharts of operations that may be performed by a user terminal emulation server in accordance with some embodiments of the present disclosure;

[0016] Figure 9 illustrates a block diagram of hardware circuit components of an I/O user device which are configured to operate in accordance with some embodiments;

[0017] Figure 10 illustrates a block diagram of hardware circuit components of a user terminal emulation server that are configured to operate in accordance with some embodiments of the present disclosure; and

[0018] Figure 11 illustrates a circuit for acoustic echo cancellation using adaptive filtering.

DETAILED DESCRIPTION

[0019] Inventive concepts will now be described more fully hereinafter with reference to the accompanying drawings, in which examples of embodiments of inventive concepts are shown. Inventive concepts may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of various present inventive concepts to those skilled in the art. It should also be noted that these embodiments are not mutually exclusive. Components from one embodiment may be tacitly assumed to be present or used in another embodiment.

[0020] Various embodiments disclosed herein are directed to improvements in operation of a centralized server based approach for emulating a user terminal using one or more networked input and/or output (I/O) user devices that are proximately located to a user, and which individually or combinable have user interface (UI) capabilities to provide an I/O user interface for the user to interface with a user terminal emulation application of the server to perform a communication service.

[0021] Some potential advantages of these embodiments include that a user can obtain a communication service without the necessity of a traditional all-inclusive feature-rich user terminal, i.e., a conventional smartphone, mobile phone, tablet computer, etc. A user terminal emulation server can utilize the available UI capability of one or more I/O user devices that are proximate to a user to provide user terminal functionality for a communication service. The server-based approach can provide low cost adaptable communication services to users.

[0022] Dynamic allocation of I/O user device capabilities whenever and wherever the I/O user devices are in the proximity of a user enables efficient and flexible use of existing hardware, such as televisions, conference phones, laptops, surveillance cameras, connected household appliances, connected cars, etc., that is capable of providing necessary UI functionality to user during a communication service. The user thereby has reduced or no need to carry an expensive and all-inclusive user terminal, e.g. smart phone, that includes all necessary UI capabilities, display device, keyboard, speakers, etc. The user may instead carry a hardware device which operates to identify the user, referred to as a "UserTag" or "user tag", over a wireless communication interface, such as a near field communication (NFC) interface, to one or more of the I/O user devices. Various embodiments disclosed herein may disrupt the traditional handset-centric mobile communication industry as the features and capabilities of what forms a user terminal are no longer constrained to the domain of mobile phone manufacturers. A user terminal emulation server can run a terminal emulation application (also referred to as "SoftUE") which may be instantiated in a one-to- one mapping to each user for providing a communication service between a network entity and the user.

[0023] Figure 1 illustrates a system with a user terminal emulation server 100 that can use one or more I/O user devices 130 that is/are proximately located to users to logically emulate a user terminal providing a communication service in accordance with some embodiments of the present disclosure. The user terminal emulation server 100 may operationally integrate the UI capabilities of a set of the I/O user devices 130 to logically emulate a user terminal providing communication services in accordance with some embodiments of the present disclosure.

[0024] Referring to Figure 1, the user terminal emulation server 100 may be a cloud resource that is networked and remote from the I/O user devices 130, or may be more proximately located on a shared network with the I/O user devices 130. The user terminal emulation server 100 is configured to communicate with the I/O user device(s) 130 proximately to a user for use in providing UI capabilities during a communication service. [0025] Users may carry a hardware tag, a.k.a. "UserTag" or "user tag", which is capable of transmitting a unique user identifier through a communications interface, such as a nearfield communications interface (e.g., Bluetooth, BLE, NFC, RFID, etc., or combinations thereof), for receipt by one or more of the I/O user devices 130 which are proximately located to the user. One type of UserTag can be a low-complexity stand-alone electronic device having limited capability for transmitting an identifier through a near-field communications interface, and which may perform authentication operations. Another type of UserTag can be a smartphone or smartwatch having cellular connectivity that transmits a cellular identity (e.g., from a SIM card) or an application identity through a cellular interface or a near-field communications interface and is configured to perform authentication operations such as described herein.

[0026] The user identifier may alternatively or additionally be operationally determined by the user proving user credentials through one of the I/O user devices 130 and/or biometrics operations performed by, e.g., one or more of the I/O user devices 130. The biometrics operations may include, without limitation, one or more of voice recognition, image/face recognition, eye recognition, fingerprint recognition, or a combination thereof. The user identity may be determined based on credential provided by the user when, e.g., logging into an application or account. The user identity may be provided by a cell phone using information from the subscription SIM and proximity of the cell phone to one or more of the I/O user devices 130 can be determined using the phone’s near-field communications (NFC) capability.

[0027] A user identifier, a UserTag identifier, and a user terminal emulation application 110 can be logically associated with each other in a database 120 during a user registration process or as part of another setup process. For example, during a user registration process a user may obtain an account login identifier (serving as the user identifier) that is registered in the database 120 as being associated with a UserTag identifier for a physical UserTag that has been provided to (e.g., purchased by) the user and being associated with a user terminal application 110 that emulates a user terminal having defined capabilities (e.g., a cell phone providing cellular and over-the-top voice-over-IP communication services).

[0028] The user terminal emulation server 100 may maintain in the database 120 network addresses of I/O user devices 130 and UI capabilities of the I/O user devices 130. The capabilities of the I/O user devices 130 may be logically arranged in the database 120 based on the type of UI capability provided, e.g., display device, microphone, speaker, keyboard, and may be further arranged based on a quality of service provided by the UI capability. [0029] The user terminal emulation server 100 may register a network address of one of the user terminal emulation applications 110 and an identity of a user with a network entity 150 providing communication services. The network entity 150 provides a communication service function 140 which may, for example, correspond to an over-the-top Voice Over Internet Protocol (VoIP) service, Netflix service, Facebook service, Microsoft Teams meeting service, Internet browser service, a cellular communication service, etc. The user terminal emulation application 110 is executed by the user terminal emulation server 100. A user terminal emulation application 110 may run one or more applications that are normally run by a smart phone, such as a Netflix application, Facebook application, Microsoft Teams application, Internet browser application, etc., to provide a communication service for a user through, e.g., the network entity 150.

[0030] As illustrated in Figure 1, a different instantiation of the user terminal emulation application 110 may be hosted by the server 100 for each user who is to be provided communication services (i.e., illustrated user terminal emulation applications #1-#N corresponding to users 1-N). The user terminal emulation application 110 may perform registration of the user with the network entity 150 and setup of a communication service with a user responsive to communication requests.

[0031] When the communication service function 140 of the network entity 150 is a VoIP service, the operation to register the network address of the user terminal emulation application and the identity of the user with the network entity can include registering the network address of the user terminal emulation application 110 and the identity of the user with a network server of a VoIP communication service provider.

[0032] When the communication service function 140 of the network entity 150 is a cellular communication service, the operation to register the network address of the user terminal emulation application and the identity of the user with the network entity can include registering the network address of the user terminal emulation application 110 and the identity of the user with a Home Subscriber Server (HSS) or other network node of a core network operated by a cellular communication service provider.

[0033] The user terminal emulation server 100 may receive the registration messages from the I/O user devices using the Session Initiation Protocol (SlP)ZSession Description Protocol (SDP), where each of the registration messages identifies the network address and the UI capability of one of the I/O user devices. The communication request may be received from the network entity 150 using the SIP/SDP, and the operation to provide communication sessions between the user terminal emulation application 110 and each of the I/O user devices in the set, and between the user terminal emulation application 110 and the requesting user terminal may be performing using the SIP/SDP.

[0034] A registration message from an I/O user device can include, for example, an IP address and port number, MAC address, fully qualified domain name (FQDN), and/or another network address, and further include information identifying the UI capability of the I/O user device. The I/O user device may respond to becoming powered-on by communicating the registration message to the user terminal emulation server 100.

[0035] The user terminal emulation server 100 receives a communication request from the network entity 150 for establishing a communication service between the user and a requesting user terminal, e.g., a cellular phone, computer with Microsoft Teams application, etc. Responsive to the communication request, the user terminal emulation server 100 identifies one or more of the I/O user devices 130, which may be registered in the database, that are proximately located to a location of the user and are determined, based on the UI capabilities identified by the database 120 for the set of I/O user devices and based on content of the communication request, to satisfy a capability rule for being individually usable or combinable to provide an I/O user interface for the user to interface with the user terminal emulation application 110 to provide the communication service.

[0036] The user terminal emulation server 100 provides one or more communication sessions between the user terminal emulation application 110 and the one or more I/O user devices 130 and between the user terminal emulation application 110 and the requesting user terminal via the network entity 150. The communication request that is received by the user terminal emulation application 110 may contain an indication of a minimum UI capability that must be provided to the user during the communication service, such as: speaker only; combination of speaker and microphone; display only; combination of display device, speaker, and microphone; etc. A UI capability rule which can be used by the server 100 to determine whether a communication service can be provided and by which set of I/O user devices, may thereby be defined based on the minimum UI capability that is indicated by the communication request.

[0037] The user terminal emulation server 100 then routes communication traffic that is received from at least one of the I/O user devices in the set toward the requesting user terminal via the network entity 150. In some embodiments, for each data type that is received as communication traffic from the requesting user terminal, the user terminal emulation server 100 selects one of the I/O user devices from among the set of I/O user devices based on matching characteristics of the data type to the UI capabilities identified by the database 120 for the one of the I/O user devices, and then routes the data of the data type toward the network address of the selected one of the I/O user devices. [0038] As will be explained in further detail below, the server 100 may also combine data streams that are received from the I/O user devices in the set, and route the combined data streams towards the requesting user terminal, e.g., via the network entity 150.

[0039] The user terminal emulation server 100 (e.g., the application 110 or an I/O user device handler described below) may be responsible for tracking which I/O user devices are proximately located to a present location of the user. The server 100 may receive presence reports from individual ones of the I/O user devices containing their network address and an identifier of a user who is determined by the I/O user device to be proximately located. For example, an I/O user device may read a user tag through a NFC communication interface, may sense biometric information from the user, and/or may perform other operations to detect presence of a user and to identify the user. Responsive to the presence reports, the server 100 updates the database 120 to indicate which user identifiers are proximately located to which of the I/O user devices.

[0040] With further reference to the example system of Figure 1, a set of I/O user devices 130 has been determined by the instantiated user terminal emulation application #1 to be proximately located to a location of a first user carrying UserTag#!, and to further have UI capabilities that are combinable to satisfy the UI capability rule for providing a combined I/O user interface for the first user to use during a requested communication service. Application #1 responsively uses that set of I/O user devices 130 to provide a combined I/O user interface for use by the first user during a communication service, e.g., via network entity 150, between the first user and another user terminal.

[0041 ] Similarly, another set of I/O user devices 130 has been determined by the instantiated user terminal emulation application #2 to be proximately located to a location of a second user carrying UserTag#2, and to further have UI capabilities that are combinable to satisfy the UI capability rule for providing a combined I/O user interface for the second user to use during a requested communication service. Application #2 responsively uses that set of I/O user devices 130 to provide a combined I/O user interface for use by the second user during a communication service, e.g., via network entity 150, between the second user and yet another user terminal.

[0042] Figure 1 also illustrates that another set of I/O user devices 130 is not proximately located to either UserTag#! or UserTag#2.

[0043] As explained above, the communication request which is requesting the establishment of communication service with an identified user may be initiated by the network entity 150 using the network address of the user terminal emulation application and identity of the user which were earlier registered with the network entity 150. However, the communication request may additionally or alternatively be generated by one of the I/O user devices 130 responsive to a command received from a proximately located user. For example, a user may operate a user interface provided by one of the I/O user devices 130 to initiate a combined audio and video call with another user. The user terminal emulation server 100 (e.g., the IODH or the application 110 for that user) receives the communication request along with the identity of the user, which may be sensed via the UserTag. The application 110 performs the identifying, providing, routing, selecting, and combining operations described above to set up and operate a communication service between the user and the other user via the network entity 150.

[0044] Further example systems and related operations will now be described to further illustrate how I/O user devices having different UI capabilities can be operationally used or combined to provide a combined UI that can be used by user to satisfy the communication requirements of a communication service.

[0045] Further illustrative operations are described regarding an example embodiment in which a speaker device is one of the I/O user devices 130 in the set capable of playing a received audio stream and a microphone device is one of the I/O user devices 130 in the set capable of sensing audio to output a microphone stream. Operations by the user terminal emulation application include updating the database 120 based on content of registration messages from the speaker device and the microphone device to identify network addresses of the speaker device and the microphone device, and to identify UI capabilities of the speaker device as having a speaker capability and the microphone device as having a microphone capability. The speaker UI capabilities may identify a number of speakers provided, sound loudness capability, and/or other operational characteristics. The microphone UI capabilities may identify a number of microphones provided, sensitivity of the microphones, directionality capabilities of the microphone(s), and/or other operational characteristics. The speaker device and the microphone device are each identified as belonging to the set of I/O user devices that are determined to be proximately located to the location of the user (e.g., UserTag#!) and are further determined, based on the UI capabilities identified by the database 120, to satisfy the UI capability rule for used individually or combined to provide a combined I/O UI for the user to interface with the user terminal emulation application 110 to provide the communication service. Based on determining that the speaker device and the microphone device satisfy the UI capability rule, further operations are performed to route a microphone stream received from the microphone device toward the requesting user terminal (e.g., via network entity 150). When an audio stream is received as communication traffic from the requesting user terminal the operations select the speaker device based on matching an audio characteristic of the audio stream to the speaker capability identified by the database for the speaker device, and then route the audio stream toward the network address of the speaker device.

[0046] The example embodiment may include, when a display device is one of the I/O user devices in the set capable of displaying a received video stream, the operations update the database 120 based on content of registration messages to identify network addresses of the display device, and to identify UI capabilities of the display device as having a display capability. The display UI capabilities may identify a screen display size, aspect ratio, pixel resolution, video frame rates supported, whether display device supports shared user support via split screen configuration, and/or other operational characteristics. The display device is also identified as among the set of I/O user devices that is determined to be proximately located to the location of the user and are further determined, based on the UI capabilities identified by the database 120, to satisfy the UI capability rule for being used individually or combined to provide the combined I/O UI for the user to interface with the user terminal emulation application 110 to provide the communication service. Based on determining that the speaker device, the display device, and the microphone device satisfy the UI capability rule, further operations respond by to receipt of video stream as communication traffic from the requesting user terminal selecting the display device based on matching a video characteristic of the video stream to the display capability identified by the database 120 for the display device, and then routing the video stream toward the network address of the display device.

[0047] In the example embodiment the operations for routing the audio stream and the video stream toward the network addresses of the speaker device and the display device, respectively, may include when audio data and video data are received within a same stream from the requesting user terminal through a first communication session: separating the audio data from the video data; routing the audio data toward the network address of the speaker device through a second communication session; and routing the video data toward the network address of the display device through the second communication session or a third communication session.

[0048] When a camera device is one of the I/O user devices in the set capable of outputting a camera stream, the operations may update the database 120 based on content of a registration message to identify a network address of the camera device and to identify a UI capability of the camera device as having a camera capability. The camera UI capabilities may identify a camera pixel count, image quality, light sensitivity, and/or other operational characteristics. The camera device may be identified as a member of the set of I/O user devices that are determined to be proximately located to the location of the user and is further determined, based on the UI capability identified by the database 120, to satisfy the UI capability rule for being used individually or combined with the other I/O user devices in the set to provide the combined I/O UI for the user to interface with the user terminal emulation application 110 to provide the communication service. Based on determining that the camera device satisfies the UI capability rule, further operations are performed to route the camera stream received from the camera device toward the requesting user terminal, e.g., via the network entity 150.

[0049] The operations for routing the microphone stream received from the microphone device and the camera stream received from the camera device toward the requesting user terminal, may include: receiving the microphone stream from the microphone device through a first communication session; receiving the camera stream from the camera device through the first communication session or a second communication session; combining the microphone stream and camera stream in a combined stream; and routing the combined stream toward the requesting user terminal through a third communication session, e.g., via the network entity 150.

[0050] The example embodiment may include, when a keyboard device is one of the I/O user devices in the set capable of outputting key selection data responsive to key selections by a user among keys of the keyboard device, the operations can update the database 120 based on content of a registration message to identify a network address of the keyboard device and to identify a UI capability of the keyboard device as having a keyboard capability. The keyboard device capabilities may identify a key count, indication of whether the keyboard is a physical keyboard or a touch sensitive input device, and/or other keyboard capabilities. The keyboard device is further identified as a member of the set of I/O user devices that are determined to be proximately located to the location of the user and is further determined, based on the UI capability identified by the database 120, to satisfy the UI capability rule for being used individually or combined with the other I/O user devices in the set to provide the combined I/O UI for the user to interface with the user terminal emulation application 110 to provide the communication service. Based on determining that the keyboard device satisfies the UI capability rule, further operations are performed to identify commands formed by the key selection data received from the keyboard and to perform operations that have been predefined as being triggered based on receipt of the identified commands.

[0051] The operations for routing the key selection data received from the keyboard device and microphone stream received from the microphone device, may include: receiving the key selection data from the keyboard device through a first communication session receiving the microphone stream from the microphone device through the first communication session or a second communication session; combining the key selection data and the microphone stream in a combined stream; and routing the combined stream toward the requesting user terminal through a third communication session, e.g., via the network entity 150.

[0052] Figure 2 is a block diagram illustrating the user terminal emulation server 100 as an element of an operator service node 202 within a cellular system 200. Referring to Figure 2, the communication service function of the network entity 140 (Fig. 1) may be provided by the operator service node 202 or may be reached through external infrastructure 240, e.g., the Internet. The server 100 may, for example, be implemented in the radio access network 220 to provide edge computing with faster responsiveness or may be implemented within another node of the cellular system 200. The user terminal emulation server 100 can include an I/O user device handler (IODH) 212, a control function (CF) 214, the instantiated user terminal emulation applications 110, and a service gateway (GW) 216. A user terminal emulation application 110 may perform one or more user applications which are provided a smart phone, such as a Netflix application, Facebook application, Microsoft Teams application, Internet browser application, etc.

[0053] The IODH 212 may perform operations to manage the I/O user devices, such as to handle maintenance of the database 120, perform registration of the user terminal emulation applications 110, and/or control operational characteristics of the managed I/O user devices. For example, the IODH 212 may operate to register with a Microsoft Teams server the IP address of a Microsoft Teams application, which is run by or interfaced to the user terminal emulation application 110, and the user's Microsoft Teams name. The CF 214 may be responsible for assigning an IP address to each user terminal emulation application 110. The IP address to be assigned by the CF 214 may be received from the core network 210 functionality such as a PDN-GW. The service GW 216 may interconnect the user terminal emulation server 100 to a PSTN network, packet data network gateway of a 3GPP (3^rd Generation Partnership Project) system, etc. The cellular system 200 can include a Core Network 210 having a Home Subscriber Server (HSS), a Policy and Charging Roles Function (PCRF), gateway (GW) and Mobility Management Entity (MME) providing control signaling related to mobile terminal mobility and security for the radio access. The HSS contains subscriber-related information and provides support functionality for user authentication and user access to the system. The PCRF enables QoS control per data flow and radio bearer, by setting QoS rules for each data flow, based on operator set policies and subscriber information. The GW can include a Serving GW (S-GW) and a Packet Data Network GW (PDN-GW), where the S-GW interconnects the core network 210 with the radio access network 220 and routes incoming and outgoing packets for the I/O user devices 232 and/or 130 and the user terminals 230. The PDN-GW interconnects the core network 210 with external infrastructure 240, such as the Internet, and allocates IP-addresses and performs policy control and charging.

[0054] Some I/O user devices 232 having cellular communication capability can communicate via, e.g., eNBs or other radio access nodes of a Radio Access Network 220 with the operator service node 202 via the core network 210. In the system of Figure 2, the user terminal emulation server 100 may handle setup of a communication service between a selected set of the I/O user devices are proximate to a user and a remote user terminal 230 (e.g., smart phone) via the cellular system 200.

[0055] Figure 3 is a block diagram illustrating the user terminal emulation server 100 communicating in a different manner with various elements of a cellular system 200, which may operate as the network entity 140 (Fig. 1), to provide communication services in accordance with some embodiments of the present disclosure. The system of Figure 3 differs from the system of Figure 2 by the user terminal emulation server 100 being an Internet service within external infrastructure 240 outside of the cellular system 200. In the system of Figure 3, the CF 214 may determine the IP address to be assigned to different ones of the user terminal emulation applications 110 based on signaling from the Internet service within the external infrastructure 240.

[0056] The above and other example operations will now be described in further detail in the context of three different example "use cases": 1) incoming call scenario; and 2) outgoing call scenario.

Use Case 1: Incoming call Scenario

[0057] This use case involves a user, with a UserTag or other way of being identified, being proximately located to I/O user devices 130 having different UI capabilities when an incoming call is received by the user terminal emulation server. Although operations are explained below in the context of identifying a user through a physical UserTag carried by the user, these operations are not limited thereto and may be used with any other way of identifying a user, such as by the user entering identifying information, e.g., user credentials, through one of the I/O user devices 130 and/or by sensing biometric information that identifies the user.

[0058] A user terminal emulation application 110 may be instantiated or otherwise activated responsive by an incoming call (service, session) targeting the UserTag. The user terminal emulation application 110 can identify subscriptions associated with the UserTag (i.e. the physical user) and preferred methods of communication (e.g., audio not video, audio and video, etc.) that have been specified by the user, and determines the UI capabilities of the I/O user devices that will be needed to satisfy the UI capabilities which may be specified for the incoming communication session. The user terminal emulation application 110 may ask the IODH to identify which I/O user devices 130 are proximately located to the UserTag, and may further ask the IODH to determine or may determine itself whether the identified I/O user devices 130 are usable individually or combinable to satisfy the UI capabilities specified by the incoming communication session. The user terminal emulation application 110 and/or the IODH may receive an ACK or NACK back on whether a sufficient set of I/O user devices 130 can be used to provide the communication service. If ACK, then the IODH also sets the state of the I/O user devices 130 in the set to in-use to avoid another user terminal emulation application 110 attempting to utilize the same I/O user devices 130 as which are presently in use. In case of NACK, the user terminal emulation application 110 and/or the IODH can take different actions to setup a reduced UI capability communication service with the user depending on user settings, e.g. only allow sound-based communications instead of a combination of sound and video responsive to when no display device is presently available for use. An example of no display device being available may occur when the only display device that is proximately located to the user is presently being used by another user to receive information from another user terminal emulation application during an ongoing communication service or when no display device is proximately located to the user.

[0059] Further operations by UserTags, I/O user devices, and the user terminal emulation server are described in accordance with this example use case. A UserTag enters a room and signals its presence to any proximately located and capable I/O user device in the room using a discovery beacon signal. Alternatively, one or more of the I/O user devices determines presence of the UserTag by polling, such as by periodically transmitting discover beacon signals that trigger responsive signaling by the UserTag. The I/O user devices that receive signaling indicated presence of the UserTag report to the I0DH in the user terminal emulation server along with a network address of the I/O user device (e.g., IP address, port number, MAC address, FQDN, etc.). The user terminal emulation application corresponding to the specific user (i.e., the UserTag) is updated with respect to the detected user's presence. The I0DH may operate to receive the notifications from the I/O user devices proximately located to the UserTag. Further UI capability discovery (synchronization) communications are performed between the user terminal emulation server and the I/O user devices. The I/O user devices are associated to the user in the database, along with associated indications subscriptions, combinable UI capabilities provided by the set of I/O user devices which are proximately located to the UserTag. One or more of the I/O user devices may be selected for default call reception ACK/NACK. The user via the UserTag is now known to be reachable within the system through an identified set of I/O user devices with identified UI capabilities (e.g., speakers yes/no, display yes/no, microphone yes/no, keyboard yes/no, etc.), thereby creating a logical virtualized user terminal through which the user may be provided in a communication service. The user may initiate a communication service through a touchscreen, voice command sensed by a microphone, performing a defined gesture observable by a camera, and/or other input provided to one of the proximately located I/O user devices.

[0060] An incoming session (e.g., video call) from a requesting user terminal which is directed to the user (UserTag) arrives at the user terminal emulation server for the user carrying the UserTag. The individual or combinable UI capabilities of the available I/O user devices is compared to the UI requirements of the incoming session. When the UI requirements of the incoming session are not satisfied by the UI capabilities of the I/O user devices, the user terminal emulation server may renegotiate the required UI capabilities (e.g., QoS) of the incoming session. In contrast, when the UI requirements of the incoming session are satisfied the user terminal emulation server prompts, via one or more of the available I/O user devices (e.g., a pre-selected answer device), the user carrying the UserTag to provide a session request answer (ACK/NACK). The user responds through the pre-selected answer device to accept (ACK) or reject (NACK) the incoming session, to provide signaling to the user terminal emulation server. When an ACK is received, operations route an audio stream from the requesting user terminal to one of the I/O user devices in the set that has a speaker capability via one or more sessions, and routes a video stream from the requesting user terminal to another one of the I/O user devices in the set that has a display capability via one or more sessions. A data stream that is received from one of I/O user devices in the set through a one or more sessions is routed toward the requesting user terminal. When two or more data streams are received through one or more sessions from the I/O user devices, they can be combined into a combined data stream that is routed toward the requesting user terminal.

[0061] The user terminal emulation server may perform operations to continuously monitor presence of the I/O user devices to determine when one or more of I/O user devices is no longer proximately located to the user such that it can no longer be included as part of the combined UI be provided during the ongoing communication session. The user terminal emulation server may substitute the UI capability of another I/O user device to the set being used by the user for the ongoing communication session responsive to a previous member of the set no longer having required presence.

Use case 2, Outgoing call

[0062] This use case involves a user, with a UserTag, being proximately located to I/O user devices 130 having different UI capabilities when an outgoing call (communication session) is received by the user terminal emulation server. The I/O user devices 130 are associated to the identified user via the user terminal emulation server 100 which handles all communications sessions for the user while the associated I/O user devices 130 are managed by an IODH.

[0063] A user terminal emulation application 110 may be instantiated or otherwise activated responsive by an outgoing call being requested by a user carrying the UserTag. The user may initiate an outgoing call through a touchscreen, voice command sensed by a microphone, performing a defined gesture observable by a camera, and/or other input provided to one of the proximately located I/O user devices.

[0064] The user terminal emulation application 110 can identify subscriptions associated with the UserTag and preferred methods of communication (e.g., audio only, audio not video, audio and video, etc.) that have been specified by the user, and determines the UI capabilities of the I/O user devices that will be needed to satisfy the UI capabilities which may be specified for the outgoing call. The user terminal emulation application 110 may ask the IODH to identify which I/O user devices 130 are proximately located to the UserTag, and may further ask the IODH to determine or may determine itself whether the identified I/O user devices 130 are individually useable or combinable to satisfy the UI capabilities specified by the outgoing call. The user terminal emulation application 110 and/or the IODH may receive an ACK or NACK back on whether one or a set of I/O user devices 130 can be used to provide the communication service. If ACK, then the IODH also sets the state of the one or more I/O user devices 130 in the set to in-use to avoid another user terminal emulation application 110 attempting to utilize the same I/O user device(s) 130 as which are presently in use. In case of NACK, the user terminal emulation application 110 and/or the IODH can take different actions to setup a reduced UI capability communication service with the user depending on user settings, e.g. only allow sound instead of the preferred sound and video responsive to when no display device is presently available for use (e.g., when presently used by another user terminal emulation application 110 or when none is proximately located to the UserTag).

[0065] Example operations for an outgoing call and related data flows between UserTags, I/O user devices, and the user terminal emulation server are now described in the context of this use case. A UserTag enters a room and signals its presence to any proximately located and capable I/O user device in the room using a discovery beacon signal. Alternatively, one or more of the I/O user devices determines presence of the UserTag by polling, such as by periodically transmitting discover beacon signals that trigger responsive signaling by the UserTag. The I/O user devices that receive signaling indicated presence of the UserTag report to the IODH in the user terminal emulation server along with a network address of the I/O user device (e.g., IP address, port number, MAC address, FQDN, etc.). The user terminal emulation application corresponding to the specific user (i.e., the UserTag) is updated with respect to the detected user's presence.

[0066] The IODH may operate to receive the notifications from the I/O user devices proximately located to the UserTag. Further UI capability discovery (synchronization) communications are performed between the user terminal emulation server and the I/O user devices. The I/O user devices are associated to the user in the database, along with associated indicated service subscriptions and combinable UI capabilities provided by the set of I/O user devices which are proximately located to the UserTag. One or more of the I/O user devices may be selected for default call reception ACK/NACK. The user via the UserTag is now known to be reachable within the system through an identified set of I/O user devices with identified UI capabilities (e.g., speakers yes/no, display yes/no, microphone yes/no, keyboard yes/no, etc.), thereby creating a logical virtualized user terminal through which the user may be provided in a communication service. The user may initiate a communication service through a touchscreen, voice command sensed by a microphone, performing a defined gesture observable by a camera, and/or other input provided to one of the proximately located I/O user devices. [0067] A user carrying the UserTag uses the UI of one of the I/O user devices to trigger an outgoing call (e.g., video call), which triggers signaling of the outgoing call to the user terminal emulation server. The IODH queries the user (e.g., displays a message, generates a sound, etc.) through one of the I/O user devices proximately located to the user to request the user to select among available types of communication methods that can be presently used for the outgoing call. One of the I/O user devices provides responsive signaling to the IODH indicating the user's selected type of communication method for the outgoing call. The user terminal emulation server communicates an outgoing session stream request to the network entity 150, where the request may include an identifier of the calling user, identifier of the user terminal of the called user, and a quality of service for the communication session. The user terminal emulation server receives a communication session acceptance (ACK) or denial (NACK) from the network entity 150. When the communication session is denied, the user terminal emulation server may attempt to renegotiate the requested communication session such as at a lower quality of service.

[0068] When the communication session is accepted (ACK), for each data type that is received as communication traffic from the requesting user terminal, the user terminal emulation server selects one of the I/O user devices from among the set of I/O user devices based on matching characteristics of the data type to the UI capabilities identified by the database for the one of the I/O user devices, and then routes the data of the data type toward the network address of the selected one of the I/O user devices. The data originating ones of the I/O user devices transmit data stream through one or more sessions to the user terminal emulation server, which may combine the data streams into a combined data stream that is routed toward the called user terminals via the network entity 150.

[0069] The user terminal emulation server may continuously monitor presence of the I/O user devices to determine when one or more of I/O user devices is no longer proximately located to the user such that it can no longer be included as part of the combined UI be provided during the ongoing communication session. The user terminal emulation server may substitute the UI capability of another I/O user device to the set being used by the user for the ongoing communication session responsive to a previous member of the set no longer having required presence.

Cancelling Speaker Echo Component of Microphone Data

[0070] Figure 11 illustrates a circuit for acoustic echo cancellation using adaptive filtering. Referring to Figure 11, an audio signal, x(k), provided to the speaker is fed into a channel estimator filter, e.g., adaptive acoustic echo cancellation (AEC) filter, which produces the signal d^A(k). Hence, d^A(k) is the estimated signal of the original speaker data when it is affected by the estimated "impulse response" of the environment, referred as "h^A", which includes the speaker and a microphone. The environment may correspond to a room in which the speaker and microphone reside. Accordingly, for this example the environment impulse response is also referred to as a room impulse response, he.

[0071] The signal captured by the microphone, y(k), can include speech s(k), the room impulse response, and noise as follows: y(k) = s(k) + x(k)*he + “noise” , where he is the actual room impulse response.

[0072] Disregarding the noise term for simplicity of explanation, it is desired to cancel the echo by subtracting the estimated room echo, according to: e(k) = y(k) - d^A(k) e(k) = s(k) + x(k)*he - x(k)*h^A

[0073] When he is close to h the signal after echo cancellation, e(k), (e(k) signal after echo cancellation) would be e(k) = s(k). Thus, what is left is the wanted sound, e.g., speech s(k).

[0074] If there were multiple speakers there would be several audio streams x(k) and associated room impulse responses he(k), and thus several estimated signals of the original speaker d^A(k) to subtract to form the final e(k) signal.

Measuring room/environment impulse response

[0075] The room/environment impulse response may be estimated as room impulse responses (RIRs), based on computing a cross correlation between a signal played out though a speaker at point A, and the signal received through a microphone at point B. Improved results may be obtained when the signal played out is white noise, or a maximum length sequence. No prior knowledge needs to be exploited when computing the RIR, which is assumed to be the cross correlation between played and received signals.

[0076] In another approach, the RIR is estimated based on localization of reflective boundaries in an enclosed space. The approach infers room geometry based on the positions of loudspeakers and real or image microphones, which are computed using sets of times of arrival (TOAs) obtained from room impulse responses (RIRs).” Difficulties with the Existing Approaches

[0077] Prior existing approaches would not properly perform echo cancellation when use in the scenario of I/O user devices, e.g., an “lOD-mic” and a separate “lOD-speaker”, which interface with a user terminal emulation server 100. In that scenario the individual IODS may have insufficient processing power to locally perform echo cancellation and, instead, the echo cancellation is performed by the user terminal emulation server 100 which can reside far away in a networked cloud computing resource.

[0078] Furthermore, in a typical setup and device deployment, respective RTT-mic and RTT-speaker may be different, and differently varying. The respective lOD-mic(s) and lOD-speaker(s) each have an internal clock which have differing characteristics (e.g., drift, stability, etc.) such that the sound capture in the mic-IOD(s) may drift and cause separation in time between the played sound from the lOD-speaker(s) and the capture time in the mic- IOD(s).

[0079] Thus, in the equations above respective time sequence step k in x(k) may be different from the time step as in s(k). Hence, this can be expressed as x(ki) and s(k). Prior existing approaches do not consider ki k which needs to be considered in the scenario of the user terminal emulation server 100 residing far away in a networked cloud computing resource solution.

Echo Cancellation According to Some Embodiments

[0080] In some embodiments, it is assumed that one lOD-speaker and one IOD- microphone are available proximately located to the user. Hence, it is possible the audio generated from the speaker may be captured by the microphone as undesirable echo.

[0081 ] Figure 4 illustrates the user terminal emulation server 100 executing two user terminal emulation applications 110 (i.e., App. #1 and App. #2) to provide concurrent communication services through two sets of I/O user devices (i.e., set 430a-430c and set 440a-440c). Figures 5, 6, 7, and 8 illustrate flowcharts of operations that may be performed by a user terminal emulation server in accordance with some embodiments of the present disclosure.

[0082] Referring to Figures 4 and 5, a first user terminal emulation App. #1 110 provides 500, via a network 420 (public and/or private network(s)), first data flows between itself and a first set of the I/O user devices that are proximately located to a location of a first user and satisfy a combined capability rule for being combinable to provide a combined I/O user interface for the first user to interface with the first user terminal emulation App. #1 110 to perform a communication service through a network entity providing communication services. The first data flows include a first microphone data flow received from a first microphone 430a in the first set and a first speaker data flow sent to a first speaker 430b in the first set. The first data flows may include other types of data flows, such as a first video data flow sent to a first display device 430c in the first set.

[0083] Similarly, a second user terminal emulation App. #2 110 provides 500 second data flows between itself and a second set of the I/O user devices that are proximately located to a location of a second user and satisfy a combined capability rule for being combinable to provide a combined I/O user interface for the second user to interface with the second user terminal emulation App. #2 110 to perform a communication service through a network entity providing communication services. The second data flows includes a second microphone data flow received from a second microphone 440a in the second set and a second speaker data flow sent to a second speaker 440b in the second set. The second data flows may include other types of data flows, such as a second video data flow sent to a second display device 440c in the second set

[0084] In the example of Figure 4, the microphone 430a and speaker 430b are sufficiently close that speaker data sent through the first speaker data flow by the first user terminal emulation App. #1 110 is acoustically played out by the first speaker 430b and sensed by the first microphone 430a, and the first speaker output is undesirably fed-back as a speaker echo component of the microphone data sent by the first microphone 430a through the first speaker data flow to the first user terminal emulation App. #1 110. Similarly in the other set, the microphone 440a and speaker 440b are sufficiently close that speaker data sent through the second speaker data flow by the second user terminal emulation App. #2 110 is acoustically played out by the second speaker 440b and sensed by the second microphone 440a, and the second speaker output is undesirably fed-back as a speaker echo component of the microphone data sent by the second microphone 440a through the second speaker data flow to the second user terminal emulation App. #2 110. Still further, the two sets of I/O user devices may be sufficiently close to each other to create audio interference with each other, such that that the output of respective first and second speakers 430b and 440b are undesirably fed-back as an echo component in the microphone data of the respective second and first microphones 440a and 430a.

[0085] As further shown in the example of Figure 4, the user terminal emulation server 100 includes an echo cancellation circuit (or software module) 400 which is configured to obtain 502 a time offset indicating an elapsed time between when speaker data is sent through one of the speaker data flows and when the speaker data is played-out through one of the speakers, and is further configured to cancel 504 a speaker echo component of microphone data received in the microphone data flow based on time shifting and combining the speaker data and the microphone data using the time offset.

[0086] More particularly, with regard to the first set of I/O user devices, the echo cancellation circuit 400 operates to obtain 502 a first time offset indicating an elapsed time between when speaker data is sent through the first speaker data flow and when the speaker data is played-out through the first speaker 430b. The echo cancellation circuit 400 operates to cancel 504 a speaker echo component of microphone data received in the first microphone data flow, based on time shifting and combining the speaker data and the microphone data using the first time offset.

[0087] In a further embodiment illustrated in Figure 6, the operations determine 600 an impulse response of a spatial area which includes the first speaker 430b and the first microphone 430a. The operations to cancel 602 (also shown as 504 in Figure 5) the speaker echo component of the microphone data received in the first microphone data flow, then includes time shifting the speaker data using the first time offset, filtering the time shifted speaker data using the impulse response, and subtracting the filtered and time shifted speaker data from the microphone data.

[0088] In a similar manner, the echo cancellation circuit 400 operates to obtain 502 a second time offset indicating an elapsed time between when speaker data is sent through the second speaker data flow and when the speaker data is played-out through the second speaker 440b. The echo cancellation circuit 400 operates to cancel 504 a speaker echo component of microphone data received in the second microphone data flow, based on time shifting and combining the speaker data and the microphone data using the second time offset. In the context of Figure 6, the operations may further determine 600 a second impulse response of a spatial area which includes the second speaker 440b and the second microphone 440a. The operation to cancel 602 (also shown as 504 in Figure 5) the speaker echo component of the microphone data received in the second microphone data flow, then includes time shifting the speaker data using the second time offset, filtering the time shifted speaker data using the second impulse response, and subtracting the filtered and time shifted speaker data from the microphone data.

[0089] In some more particular embodiments, the user terminal emulation server 100 will generate (via one of the user terminal emulation applications) a (RTP) packet stream for the lOD-speaker which the lOD-speaker will play out according to the information in the RTP header and RTP payload (audio frames).

[0090] As the generated sound is captured by the lOD-microphone a (RTP) packet stream will be generated from the lOD-microphone to the user terminal emulation server 100.

[0091] Thus, carried by the RTP stream from the lOD-mi crophone is the audio signal: y(k) = s(k) + x(ki)*he, where ki can be expressed as ki = k-n, thus a previous audio (e.g., speech) sample ki. The term "n" represents a time offset from the common (current) time base k. [0092] In the user terminal emulation server 100, the audio signal x(k) is processed through a filter, he, that mimics the impulse response of the room (environment) where the lOD-speaker and lOD-microphone reside. In further sections below, various operations are described for estimating a mimicked impulse response he of the room (environment) and for reusing an earlier estimate, e.g., such as by inheriting echo cancellation parameters between different user sessions and which may not be occurring currently in time.

[0093] Thus, the echo canceller operations take the received signal y(k) and subtract the filtered signal x(ki)*h^A. The audio signal after echo cancellation is represented by the following equation: e(k) = s(k) + x(ki)*he - x(?)*h^A

[0094] As seen in the equation above, effective echo cancellation depends upon knowing the symbol (?). Various operations of the present disclosure are therefore directed to obtaining ki or essentially the offset n at the user terminal emulation server 100 to be able to use the correct correlated samples of the audio signals in the echo cancellation function (illustrated as adaptive AEC filter in Figure 11).

[0095] In some embodiments, the lOD-speaker sends back information needed for the user terminal emulation server 100 to determine the offset n. The offset n can be similar for many played audio samples. Accordingly, information information about offset n can be carried in a control message from the lOD-speaker to the user terminal emulation server 100. The information may be carried in a RTCP (Real-time Transport Control Protocol) message, more specifically a RTCP - APP message.

[0096] In one embodiment, the offset n is encoded as an RTP timestamp and is interpreted as the offset from the RTP timestamp in the received RTP packet to when the speaker played the frame(s). Hence, the control message would also contain the sequence number of the referenced received RTP packet. Thus, the information can be an RTP timestamp and a sequence number. Using a RTCP protocol sender report, the IOD speaker can deduct the relation between the NTP timestamp and the senders RTP timestamp relation and can thus send back a correct offset n as expressed in a RTP timestamp offset. [0097] Further, from the RTCP sender report of the IOD microphone the user terminal emulation server 100 determines the relation between the RTP timestamp in the received RTP packets and the NTP and, therefrom, determines the timing between sent audio to the IOD speaker and received audio from the IOD microphone.

[0098] The user terminal emulation server 100 can use the knowledge of the RTP timestamp of the referenced RTP packet (known via the RTCP-APP message) and the RTP offset timestamp (known via the RTCP-APP message) to calculate the value n, and thus determine the offset n in time to apply to x(k) prior to filtering it, thus filtering the signal x(k-n) = x(ki) with filter he. Then making: e(k) = s(k) + x(ki)*he - x(k-n)*h^A, a e(k) = s(k) + x(k-n)*he - x(k-n)*h^A the operations obtain: e(k) ~= s(k), which can be output as an almost echo free signal.

[0099] In some embodiments and further to the discussion above, the operation to obtain 502 the first time offset may include to receive a Real-time Transport Control Protocol, RTCP, message from the first speaker. The RTCP message contains timing information indicating when an earlier speaker data was received through the first speaker data flow and played-out through the first speaker. The operations then determine the first time offset based on the timing information contained in the RTCP message.

[00100] In another embodiment, the operation to obtain 502 the time offset includes to receive a Real-time Transport Protocol, RTP, message from the first speaker. The RTP message contains a sequence number and a RTP timestamp indicating when earlier speaker data was received through the first speaker data flow and played-out through the first speaker. The operations then determine the first time offset based on the sequence number and the RTP timestamp contained in the RTP message.

[00101] In another embodiment, the time offset further indicates elapsed time between when the first microphone senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server 100 (e.g., user terminal emulation App. #1 110) in the first microphone data flow. [00102] In another embodiment, the operations determine the elapsed time between when the first microphone 430a senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation 100 (e.g., user terminal emulation App. #1 110) in the first microphone data flow, based on timing information contained in a Real-time Transport Control Protocol, RTCP, message received from the first microphone 430a.

[00103] In another embodiment, the operation to obtain 502 the time offset include to determine the elapsed time between when the first microphone 430a senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server 100 (e.g., user terminal emulation App. #1 110) in the first microphone data flow, based on a sequence number and a Real-time Transport Protocol, RTP, timestamp carried in a RTP message received from the first microphone 430a.

[00104] The user terminal emulation applications may request the timing information from the speakers and/or the microphones. For example, the operations may include determining when a timing re-calibration event has occurred. The timing re-calibration event may be determined based on at least one of packet jitter, lost packets, and/or latency measured in the first speaker data flow and/or the first microphone data flow. Responsive to determining occurrence of the timing re-calibration event, the operations send a time offset information request to the speaker(s) and/or the microphone(s) associated with the timing recalibration event, e.g., which microphone has experienced excessive packet jitter and/or latency.

Further embodiment - inheriting echo cancellation parameters between different user sessions (not occurring parallel in time)

[00105] In a further embodiment, the offset value(s) obtained for a particular I/O user device or for a set, group or a cluster of I/O user devices is saved by the user terminal emulation server 100 (e.g., by a managing node thereof) for later use in upcoming communication sessions.

[00106] The upcoming communication session may be related to the same user terminal emulation server or may be related to a different user terminal emulation server which use the same or part of the set of previous I/O user devices.

[00107] Thus, in a situation where the “managing node” of a user terminal emulation server determines that a second user has become proximately located to a certain set of I/O user devices, the managing node may apply echo cancelling parameters from a previous first user session that used the same or a similarly situated set of I/O user devices as the second user is to start using for a communication service.

[00108] In the context of Figure 4, the user terminal emulation server 100 may include a repository 410 of echo cancellation parameters. In the example of Figure 7, the operations store 700 the impulse response, i.e., data characterizing the determined impulse response, in a data structure (repository) 410 in at least one memory with a logical association to an identifier of the first speaker 430b and/or an identifier of the first microphone 430a. The operations later retrieve 702 the impulse response from the data structure 410 in the at least one memory using the identifier of the first speaker 430b and/or the identifier of the first microphone 430a. The operation to cancel 602 the speaker echo component of the microphone data received in the first microphone data flow, can include using the impulse response retrieved from the data structure 410 in the at least one memory to filter 704 the time shifted speaker data.

[00109] Similarly, for the second set of I/O user devices, the operations store 700 the second impulse response in the data structure (repository) 410 with a logical association to an identifier of the second speaker 440b and/or an identifier of the second microphone 440a. The operations later retrieve 702 the impulse response from the data structure 410 using the identifier of the second speaker 440b and/or the identifier of the second microphone 440a. The operation to cancel 602 the speaker echo component of the microphone data received in the second microphone data flow, can include using the second impulse response retrieved from the data structure 410 to filter 704 the time shifted speaker data.

[00110] Operations for determining and re-using an impulse response of a spatial area that includes the two sets of I/O user devices in Figure 4 are now explained in the context of the flowchart of Figure 8 which is performed along with the operations discussed above for Figure 5.

[00111] Referring to Figures 4 and 8, the operations determine 800 an impulse response of a spatial area which includes the first speaker 430b, the first microphone 430a, the second speaker 440b, and the second microphone 440a. The operations store 802 the impulse response in a data structure 410 in the at least one memory with a logical association to an identifier of the second speaker 440b and/or an identifier of the second microphone 440a. The operations provide 804 second data flows between the second user terminal emulation App. #2 110 and the second set of the I/O user devices that are proximately located to a location of the second user and satisfy the combined capability rule for being combinable to provide a combined I/O user interface for the second user to interface with the second user terminal emulation App. #2 110 to perform a communication service through the network entity providing communication service. The second data flows include a second microphone data flow received from the second microphone 440a in the second set of the I/O user devices and a second speaker data flow sent to the second speaker 440b in the second set of the I/O user devices. The operations obtain 806 a second time offset indicating an elapsed time between when a second speaker data is sent through the second speaker data flow and when the second speaker data is played-out through the second speaker 440b. The operations cancel 808 a speaker echo component of a second microphone data received in the second microphone data flow, based on: retrieving 810 the impulse response from the data structure 410 in the at least memory using the identifier of the second speaker 440b and/or the identifier of the second microphone 440a; and time shifting the second speaker data using the second time offset, filtering the time shifted second speaker data using the impulse response, and subtracting 812 the filtered and time shifted second speaker data from the second microphone data.

Further embodiment - echo cancelling parameter management for input-output devices simultaneously shared by two (or more) users

[00112] In a further embodiment, the operations limit reuse of the echo cancelling parameters discussed above to during a predefined or determined period of time during which the set of I/O user devices may be assumed to have sufficiently non-varying (stable) offset values.

[00113] In the scenario where two different users share the same set of (or part ol) I/O user devices, the managing node (e.g., of the user terminal emulation server) applies echocancellation settings derived from a previously obtained set of offset values to both users using the various I/O user devices.

Further embodiment - obtaining 'h^A’ (estimated impulse response) of the room/ environment)

[00114] In some embodiments, the managing node (e.g., of the user terminal emulation server) causes one first lOD-speaker to acoustically playout defined impulse sounding sound data, also referred to as an "impulse data", which may be a defined tone, white noise, etc. The acoustic speaker output from the impulse data is sensed by an lOD-microphone and fed-back to the managing node as a component of microphone data. Sending of the impulse data may be triggered based on the managing node determining that no person(s) are present in the room/ environment for which the impulse response is to be determined. [00115] Upon reception of the acoustic play out of the impulse data as a component of the microphone data, and after a respective time offset between the sending of the impulse signal to the lOD-speaker and receipt of the microphone data from the lOD-microphone, the managing node derives the estimated room/ environment response (he).

[00116] In a scenario where a plurality of lOD-speakers and/or a plurality of IOD- microphones are present in the room/environment, the playout-listen operations for the room/environment for determining the impulse response is repeated for each respective individual speaker-to-microphone pairing.

[00117] In some embodiments, operations to determine the impulse response are inhibited until a determination is made that the subject room is empty. The empty room determination may be based on at least one of: determining absence of NFC, Bluetooth, and/or other RF transmitters associated with users in the room; determining no person(s) are present in video from a camera viewing the room; determining no motion has been detected by a motion detector; and/or determining absence of UserTags in the room. Avoiding determination of the impulse response while person(s) are present may improve accuracy of the determination by avoiding sound reflective/ absorptive effects of the person(s) on the sounding characteristics of the room/environment.

[00118] The operations may determine when an impulse response re-calibration event has occurred, which triggers re-determination of the impulse response of the room/environment. The impulse response re-calibration event may be determined based on at least one of: detecting a threshold change in distance between the first speaker and the first microphone; determining that another speaker and/or microphone has been added or removed from the first set of I/O user devices used to provide the combined I/O user interface for the first user; and determining that another communication service has started being provided or has ceased being provided through a second set of I/O user devices that are proximately located to the first set of I/O user devices. Responsive to determining that the impulse response re-calibration event has occurred, the operations determine the impulse response of the spatial area which includes the first speaker and the first microphone.

[00119] The corresponding operations by the echo cancellation circuit 400 of the user terminal emulation server 100 may include, for each combination of speaker and microphone among the first set of the I/O user devices, the operations: send impulse data in a packet including a timestamp to the speaker for play out; receive a response packet including a play out time delay indicating elapsed time between when the impulse data in the packet was sent and when the impulse data was acoustically played-out through the speaker; and receive from the microphone an impulse microphone data in a packet including a microphone timestamp indicating when the packet was sent by the microphone. The operations determine an impulse response of a spatial area which includes the first set of the I/O user devices based on using the play out time delays and the microphone timestamps received from each combination of speaker and microphone among the first set of the I/O user devices.

[00120] Similarly, in the context of Figure 4 where the second set of I/O user devices can be located in the same room/environment as the first set of I/O user devices such that crosscoupling of echo feedback occurs, the above operations can be repeated for each pairing of speakers and microphones in each of the first and second sets of I/O user devices.

Example I/O User Device and User Terminal Emulation Server

[00121 ] Figure 9 is a block diagram of hardware circuit components of an I/O user device 130 which are configured to operate in accordance with some embodiments. The I/O user device 130 can include a wired/wireless network interface circuit 902, a near field communication circuit 920, at least one processor circuit 900 (processor), and at least one memory circuit 910 (memory). The processor 900 is connected to communicate with the other components. The memory 910 stores program code (e.g., user terminal emulation application(s) 110) that is executed by the processor 900 to perform operations disclosed herein. The processor 900 may include one or more data processing circuits (e.g., microprocessor and/or digital signal processor), which may be collocated or distributed across one or more data networks. The processor 900 is configured to execute the program code in the memory 910, described below as a non-transitory computer readable medium, to perform some or all of the operations and methods for one or more of the embodiments disclosed herein for a mobile electronic device. The I/O user device 130 can include one or more UI component devices, including without limitation, microphone(s) 940, speaker(s) 950, camera(s) 930, display device(s) 960, and other user input interface(s) 970.

[00122] Figure 10 is a block diagram of hardware circuit components of a user terminal emulation server 100 which are configured to operate in accordance with some embodiments. The user terminal emulation server 100 can include a wired/wireless network interface circuit 1050, a database 1060 (e.g., any one or more of a listing I/O user devices, UI capabilities of the I/O user devices, communication protocols used to communicate with the I/O user devices, known proximities to user identifiers, identifiers of user tags, and/or time offsets stored in data structures associated with identified I/O user devices), a display device 1030, a user input interface 1040 (e.g., keyboard or touch sensitive display), at least one processor circuit 1000 (processor), and at least one memory circuit 1010 (memory). The processor 1000 is connected to communicate with the other components. The memory 1010 stores user terminal emulation application(s) 110 and an echo cancellation module that is executed by the processor 1000 to perform operations disclosed herein. The processor 1000 may include one or more data processing circuits (e.g., microprocessor and/or digital signal processor), which may be collocated or distributed across one or more data networks. The processor 1000 is configured to execute computer program instructions in the memory 1010, described below as a non-transitory computer readable medium, to perform some or all of the operations and methods for one or more of the embodiments disclosed herein for a mobile electronic device.

Cloud Implementation

[00123] Some or all operations described above as being performed by the user terminal emulation server 100 and the I/O user devices 130 may alternatively be performed by the other one, and/or by another node that is part of a cloud computing resource. For example, those operations can be performed as a network function that is close to the edge, such as in a cloud server or a cloud resource of a telecommunications network operator, e.g., in a CloudRAN or a core network, and/or may be performed by a cloud server or a cloud resource of a media provider, e.g., iTunes service provider or Spotify service provider.

Abbreviations:

3GPP 3^rd Generation Partnership Project’

App Application, i.e. program eNB Evolved Node B (a.k.a. RBS, Radio Base Station)

GW Gateway (also, acronym for Leif GW Persson)

ICMP Internet Control Message Protocol

IOD Input-Output Device

ITU International Telecommunication Union

RTP Real Time Protocol RTCP Real Time Control Protocol

IOD Input Output Device

IODH Input Output Device Handler

NTP Network Time Protocol

SDP Session Description Protocol

SR Sender Response

UE User equipment

Further Definitions and Embodiments:

[00124] In the above-description of various embodiments of present inventive concepts, it is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of present inventive concepts. Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which present inventive concepts belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of this specification and the relevant art and will not be interpreted in an idealized or overly formal sense expressly so defined herein.

[00125] When an element is referred to as being "connected", "coupled", "responsive", or variants thereof to another element, it can be directly connected, coupled, or responsive to the other element or intervening elements may be present. In contrast, when an element is referred to as being "directly connected", "directly coupled", "directly responsive", or variants thereof to another element, there are no intervening elements present. Like numbers refer to like elements throughout. Furthermore, "coupled", "connected", "responsive", or variants thereof as used herein may include wirelessly coupled, connected, or responsive. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. Well-known functions or constructions may not be described in detail for brevity and/or clarity. The term "and/or" includes any and all combinations of one or more of the associated listed items.

[00126] It will be understood that although the terms first, second, third, etc. may be used herein to describe various elements/operations, these elements/operations should not be limited by these terms. These terms are only used to distinguish one element/operation from another element/operation. Thus, a first element/operation in some embodiments could be termed a second element/operation in other embodiments without departing from the teachings of present inventive concepts. The same reference numerals or the same reference designators denote the same or similar elements throughout the specification.

[00127] As used herein, the terms "comprise", "comprising", "comprises", "include", "including", "includes", "have", "has", "having", or variants thereof are open-ended, and include one or more stated features, integers, elements, steps, components or functions but does not preclude the presence or addition of one or more other features, integers, elements, steps, components, functions or groups thereof. Furthermore, as used herein, the common abbreviation "e.g.", which derives from the Latin phrase "exempli gratia," may be used to introduce or specify a general example or examples of a previously mentioned item, and is not intended to be limiting of such item. The common abbreviation "i.e.", which derives from the Latin phrase "id est," may be used to specify a particular item from a more general recitation.

[00128] Example embodiments are described herein with reference to block diagrams and/or flowchart illustrations of computer-implemented methods, apparatus (systems and/or devices) and/or computer program products. It is understood that a block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by computer program instructions that are performed by one or more computer circuits. These computer program instructions may be provided to a processor circuit of a general purpose computer circuit, special purpose computer circuit, and/or other programmable data processing circuit to produce a machine, such that the instructions, which execute via the processor of the computer and/or other programmable data processing apparatus, transform and control transistors, values stored in memory locations, and other hardware components within such circuitry to implement the functions/acts specified in the block diagrams and/or flowchart block or blocks, and thereby create means (functionality) and/or structure for implementing the functions/acts specified in the block diagrams and/or flowchart block(s).

[00129] These computer program instructions may also be stored in a tangible computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer- readable medium produce an article of manufacture including instructions which implement the functions/acts specified in the block diagrams and/or flowchart block or blocks. Accordingly, embodiments of present inventive concepts may be embodied in hardware and/or in software (including firmware, resident software, micro-code, etc.) that runs on a processor such as a digital signal processor, which may collectively be referred to as "circuitry," "a module" or variants thereof.

[00130] It should also be noted that in some alternate implementations, the functions/acts noted in the blocks may occur out of the order noted in the flowcharts. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Moreover, the functionality of a given block of the flowcharts and/or block diagrams may be separated into multiple blocks and/or the functionality of two or more blocks of the flowcharts and/or block diagrams may be at least partially integrated. Finally, other blocks may be added/inserted between the blocks that are illustrated, and/or blocks/operations may be omitted without departing from the scope of inventive concepts. Moreover, although some of the diagrams include arrows on communication paths to show a primary direction of communication, it is to be understood that communication may occur in the opposite direction to the depicted arrows.

[00131 ] Many variations and modifications can be made to the embodiments without substantially departing from the principles of the present inventive concepts. All such variations and modifications are intended to be included herein within the scope of present inventive concepts. Accordingly, the above disclosed subject matter is to be considered illustrative, and not restrictive, and the appended examples of embodiments are intended to cover all such modifications, enhancements, and other embodiments, which fall within the spirit and scope of present inventive concepts. Thus, to the maximum extent allowed by law, the scope of present inventive concepts are to be determined by the broadest permissible interpretation of the present disclosure including the following examples of embodiments and their equivalents, and shall not be restricted or limited by the foregoing detailed description.

Claims

CLAIMS:

1. A user terminal emulation server (100) for providing communication services using sets of input and/or output, I/O, user devices (130), the user terminal emulation server (100) comprising: at least one processor (1000); and at least one memory (1010) storing program code that is executable by the at least one processor to perform operations comprising: providing first data flows between a first user terminal emulation application and a first set of the I/O user devices that are proximately located to a location of a first user and satisfy a combined capability rule for being combinable to provide a combined I/O user interface for the first user to interface with the first user terminal emulation application to perform a communication service through a network entity providing communication services, wherein the first data flows comprise a first microphone data flow received from a first microphone in the first set of the I/O user devices and a first speaker data flow sent to a first speaker in the first set of the I/O user devices; obtaining a first time offset indicating an elapsed time between when speaker data is sent through the first speaker data flow and when the speaker data is played-out through the first speaker; and cancelling a speaker echo component of microphone data received in the first microphone data flow, based on time shifting and combining the speaker data and the microphone data using the first time offset.

2. The user terminal emulation server (100) of Claim 1, wherein the operations further comprise: determining an impulse response of a spatial area which includes the first speaker and the first microphone, wherein the cancelling of the speaker echo component of the microphone data received in the first microphone data flow, comprises time shifting the speaker data using the first time offset, filtering the time shifted speaker data using the impulse response, and subtracting the filtered and time shifted speaker data from the microphone data.

3. The user terminal emulation server (100) of Claim 2, wherein the operations further comprise: storing the impulse response in a data structure in the at least one memory with a logical association to an identifier of the first speaker and/or an identifier of the first microphone, wherein the cancelling of the speaker echo component of the microphone data received in the first microphone data flow, comprises retrieving the impulse response from the data structure in the at least memory using the identifier of the first speaker and/or the identifier of the first microphone.

4. The user terminal emulation server (100) of Claims 2 to 3, wherein the operations further comprise: determining when an impulse response re-calibration event has occurred, wherein the impulse response re-calibration event is based on at least one of: detecting a threshold change in distance between the first speaker and the first microphone, determining that another speaker and/or microphone has been added or removed from the first set of I/O user devices used to provide the combined I/O user interface for the first user, and determining that another communication service has started being provided or has ceased being provided through a second set of I/O user devices that are proximately located to the first set of I/O user devices; and responsive to determining that the impulse response re-calibration event has occurred, determining the impulse response of the spatial area which includes the first speaker and the first microphone.

5. The user terminal emulation server (100) of any of Claims 1 to 4, wherein the operation to obtain the first time offset comprises: receiving a Real-time Transport Control Protocol, RTCP, message from the first speaker, the RTCP message containing timing information indicating when earlier speaker data was received through the first speaker data flow and played- out through the first speaker; and determining the first time offset based on the timing information contained in the RTCP message.

6. The user terminal emulation server (100) of any of Claims 1 to 5, wherein the operation to obtain the time offset comprises: receiving a Real-time Transport Protocol, RTP, message from the first speaker, the RTP message containing a sequence number and a RTP timestamp indicating when earlier speaker data was received through the first speaker data flow and played-out through the first speaker; and determining the first time offset based on the sequence number and the RTP timestamp contained in the RTP message.

7. The user terminal emulation server (100) of any of Claims 1 to 6, wherein: the time offset further indicates elapsed time between when the first microphone acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server in the first microphone data flow.

8. The user terminal emulation server (100) of Claim 7, wherein the operation to obtain the time offset comprises: determining the elapsed time between when the first microphone senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server in the first microphone data flow, based on timing information contained in a Real-time Transport Control Protocol, RTCP, message received from the first microphone.

9. The user terminal emulation server (100) of Claims 7 to 8, wherein the operation to obtain the time offset comprises: determining the elapsed time between when the first microphone senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server in the first microphone data flow, based on a sequence number and a Real-time Transport Protocol, RTP, timestamp carried in a RTP message received from the first microphone.

10. The user terminal emulation server (100) of Claims 1 to 9, wherein the operations further comprise: determining when a timing re-calibration event has occurred, wherein the timing recalibration event is based on at least one of packet jitter, lost packets, and/or latency measured in the first speaker data flow and/or the first microphone data flow; and responsive to determining occurrence of the timing re-calibration event, sending a time offset information request to the first speaker and/or the first microphone.

11. The user terminal emulation server (100) of any of Claims 1 to 10, wherein the operations further comprise: storing the time offset in a data structure in the at least one memory with a logical association to an identifier of the first speaker and/or an identifier of the first microphone, wherein the operation to obtain the time offset comprises retrieving the time offset from the data structure in the at least memory using the identifier of the first speaker and/or the identifier of the first microphone.

12. The user terminal emulation server (100) of any of Claims 1 to 11, wherein the operations further comprise: determining an impulse response of a spatial area which includes the first speaker, the first microphone, the second speaker, and the second microphone; storing the impulse response in a data structure in the at least one memory with a logical association to an identifier of the second speaker and/or an identifier of the second microphone; providing second data flows between a second user terminal emulation application and a second set of the I/O user devices that are proximately located to a location of a second user and satisfy the combined capability rule for being combinable to provide a combined I/O user interface for the second user to interface with the second user terminal emulation application to perform a communication service through the network entity providing communication services, wherein the second data flows comprise a second microphone data flow received from the second microphone in the second set of the I/O user devices and a second speaker data flow sent to the second speaker in the second set of the I/O user devices; obtaining a second time offset indicating an elapsed time between when a second speaker signal is sent through the second speaker data flow and when the second speaker signal is played-out through the second speaker; and cancelling a speaker echo component of a second microphone data received in the second microphone data flow, based on retrieving the impulse response from the data structure in the at least memory using the identifier of the second speaker and/or the identifier of the second microphone, and time shifting the second speaker signal using the second time offset, filtering the time shifted second speaker signal using the impulse response, and subtracting the filtered and time shifted second speaker signal from the second microphone data.

13. The user terminal emulation server (100) of any of Claims 1 to 12, wherein the operations further comprise: for each combination of speaker and microphone among the first set of the I/O user devices, sending impulse data in a packet including a timestamp to the speaker for playout, receiving a response packet including a playout time delay indicating elapsed time between when the impulse data in the packet was sent and when the impulse data was acoustically played-out through the speaker, and receiving from the microphone an impulse microphone data in a packet including a microphone timestamp indicating when the packet was sent by the microphone; and determining an impulse response of a spatial area which includes the first set of the I/O user devices based on using the playout time delays and the microphone timestamps received from each combination of speaker and microphone among the first set of the I/O user devices.

14. A method by a user terminal emulation server for providing communication services using sets of input and/or output, I/O, user devices, the method comprising: providing (500) first data flows between a first user terminal emulation application and a first set of the I/O user devices that are proximately located to a location of a first user and satisfy a combined capability rule for being combinable to provide a combined I/O user interface for the first user to interface with the first user terminal emulation application to perform a communication service through a network entity providing communication services, wherein the first data flows comprise a first microphone data flow received from a first microphone in the first set of the I/O user devices and a first speaker data flow sent to a first speaker in the first set of the I/O user devices; obtaining (502) a first time offset indicating an elapsed time between when speaker data is sent through the first speaker data flow and when the speaker signal is played-out through the first speaker; and cancelling (504) a speaker echo component of microphone data received in the first microphone data flow, based on time shifting and combining the speaker signal and the microphone data using the first time offset.

15. The method of Claim 14, further comprising: determining (600) an impulse response of a spatial area which includes the first speaker and the first microphone, wherein the cancelling (504) of the speaker echo component of the microphone data received in the first microphone data flow, comprises time shifting the speaker signal using the first time offset, filtering the time shifted speaker signal using the impulse response, and subtracting (602) the filtered and time shifted speaker signal from the microphone data.

16. The method of Claim 15, further comprising: storing (700) the impulse response in a data structure in the at least one memory with a logical association to an identifier of the first speaker and/or an identifier of the first microphone; and retrieving (702) the impulse response from the data structure in the at least memory using the identifier of the first speaker and/or the identifier of the first microphone; and wherein the cancelling (602) of the speaker echo component of the microphone data received in the first microphone data flow, comprises using the impulse response retrieved from the data structure in the at least one memory to filter (704) the time shifted speaker signal.

17. The method of Claims 15 to 16, further comprising: determining when an impulse response re-calibration event has occurred, wherein the impulse response re-calibration event is based on at least one of: detecting a threshold change in distance between the first speaker and the first microphone, determining that another speaker and/or microphone has been added or removed from the first set of I/O user devices used to provide the combined I/O user interface for the first user, and determining that another communication service has started being provided or has ceased being provided through a second set of I/O user devices that are proximately located to the first set of I/O user devices; and responsive to determining that the impulse response re-calibration event has occurred, determining the impulse response of the spatial area which includes the first speaker and the first microphone.

18. The method of any of Claims 14 to 17, wherein the obtaining (502) of the first time offset comprises: receiving a Real-time Transport Control Protocol, RTCP, message from the first speaker, the RTCP message containing timing information indicating when an earlier speaker signal was received through the first speaker data flow and played-out through the first speaker; and determining the first time offset based on the timing information contained in the RTCP message.

19. The method of any of Claims 14 to 18, wherein the obtaining (502) of the time offset comprises: receiving a Real-time Transport Protocol, RTP, message from the first speaker, the RTP message containing a sequence number and a RTP timestamp indicating when an earlier speaker signal was received through the first speaker data flow and played-out through the first speaker; and determining the first time offset based on the sequence number and the RTP timestamp contained in the RTP message.

20. The method of any of Claims 14 to 19, wherein: the time offset further indicates elapsed time between when the first microphone senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server in the first microphone data flow.

21. The method of Claim 20, further comprising: determining the elapsed time between when the first microphone senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server in the first microphone data flow, based on timing information contained in a Real-time Transport Control Protocol, RTCP, message received from the first microphone.

22. The method of Claims 20 to 21, wherein the obtaining (502) of the time offset comprises: determining the elapsed time between when the first microphone senses the acoustic output of the speaker, responsive to the speaker data, and when the microphone data is received by the user terminal emulation server in the first microphone data flow, based on a sequence number and a Real-time Transport Protocol, RTP, timestamp carried in a RTP message received from the first microphone.

23. The method of Claims 14 to 22, further comprising: determining when a timing re-calibration event has occurred, wherein the timing recalibration event is based on at least one of packet jitter, lost packets, and/or latency measured in the first speaker data flow and/or the first microphone data flow; and responsive to determining occurrence of the timing re-calibration event, sending a time offset information request to the first speaker and/or the first microphone.

24. The method of any of Claims 14 to 23, further comprising: storing (700) the time offset in a data structure in the at least one memory with a logical association to an identifier of the first speaker and/or an identifier of the first microphone, wherein the operation to obtain the time offset comprises retrieving (702) the time offset from the data structure in the at least memory using the identifier of the first speaker and/or the identifier of the first microphone.

25. The method of any of Claims 14 to 24, further comprising: determining (800) an impulse response of a spatial area which includes the first speaker, the first microphone, the second speaker, and the second microphone; storing (802) the impulse response in a data structure in the at least one memory with a logical association to an identifier of the second speaker and/or an identifier of the second microphone; providing (804) second data flows between a second user terminal emulation application and a second set of the I/O user devices that are proximately located to a location of a second user and satisfy the combined capability rule for being combinable to provide a combined I/O user interface for the second user to interface with the second user terminal emulation application to perform a communication service through the network entity providing communication services, wherein the second data flows comprise a second microphone data flow received from the second microphone in the second set of the I/O user devices and a second speaker data flow sent to the second speaker in the second set of the I/O user devices; obtaining (806) a second time offset indicating an elapsed time between when a second speaker signal is sent through the second speaker data flow and when the second speaker signal is played-out through the second speaker; and cancelling (808) a speaker echo component of a second microphone data received in the second microphone data flow, based on retrieving (810) the impulse response from the data structure in the at least memory using the identifier of the second speaker and/or the identifier of the second microphone, and time shifting the second speaker signal using the second time offset, filtering the time shifted second speaker signal using the impulse response, and subtracting (812) the filtered and time shifted second speaker signal from the second microphone data.

26. The method of any of Claims 14 to 25, further comprising: for each combination of speaker and microphone among the first set of the I/O user devices, sending impulse data in a packet including a timestamp to the speaker for playout, receiving a response packet including a playout time delay indicating elapsed time between when the impulse data in the packet was sent and when the impulse data was acoustically played-out through the speaker, and receiving from the microphone an impulse microphone data in a packet including a microphone timestamp indicating when the packet was sent by the microphone; and determining an impulse response of a spatial area which includes the first set of the I/O user devices based on using the playout time delays and the microphone timestamps received from each combination of speaker and microphone among the first set of the I/O user devices.