WO2020081614A1

WO2020081614A1 - Systems and method for control of telephone calls over cellular networks

Info

Publication number: WO2020081614A1
Application number: PCT/US2019/056400
Authority: WO
Inventors: Richard Laurence HEAP; Christopher Douglas Blair
Original assignee: Heap Richard Laurence; Christopher Douglas Blair
Priority date: 2018-10-14
Filing date: 2019-10-15
Publication date: 2020-04-23

Abstract

The present invention relates to a method and system for the controlled use of personally owned mobile phones in a business environment. It is particularly applicable to companies with a multi-national presence and existing infrastructure for handling fixed line calls made by and to their employees. It exploits said infrastructure by routing at least those calls that should or must be recorded via a "Mobile Access Point", preferably in the country or region the phone is currently being used within. Thus only the local leg of the call traverses the mobile network with the call being recorded as required and routed to its endpoint within or via the existing corporate infrastructure and least cost routing plan.

Description

SYSTEMS AND METHOD FOR CONTROL OF TELEPHONE CALLS OVER

CELLULAR NETWORKS

Field of &e Ait

[001] TTiis invention relates to a means of controlling and, optionally, recording telephone calls made to or from business mobile phone numbers. Discussion of the State of the Ait

[002] Many businesses do (or would like to) allow employees to select, purchase and use their own mobile phone with which to conduct business calls as well as personal calls. Such “Bring Your Own Device” or“BYOD” policies are now common across many industries.

[003] However, some sectors - such as financial services - must comply with regulations such as the European Union’s Markets in Financial Instruments Directive (“MiFID P”), Frank- Dodd in the U.S.A amongst others. These can require that, for at least a subset of the calls, details about the call and, in some cases, the content of the call are recorded, archived and easily accessible to regulators.

[004] Conversely, increasingly complex privacy regulations, such as the European Union’s

General Data Protection Regulations (“GDPR”) mean that blanket recording of all details and voice - including private calls made on the same device - is not permitted.

[005] There is therefore a requirement in at least some businesses for visibility, control and access to the content of the business calls made on the device. A number of approaches are already well established but all have significant disadvantages when end users are in many countries where the business already has a fixed telephony presence.

[006] Let us call our hypothetical, multinational, at least partially regulated company wanting to implement a global BYOD smartphone policy“GLOBALCO”.

[007] Some phones allow two SIMs to be inserted - giving them two phone numbers, one business and one personal More recently, the eSIM specification allows this without the need for a physical SIM. This provides separation of business and personal calls, each using a separate phone number - often over different Mobile Network Operators (MNOs).

[008] Some MNOs provide recording services“in the network” allowing calls over the business number to be recorded there. However, this may be sub-optimal when the user travels overseas as it either requires a similar recording capability in the MNO which the user is roaming on, or the call has to be routed to or a fork taken to the recording service in the user’s home network.

[009] As there are no truly global MNOs, GLOBALCO cannot do a single deal with one provider for such a service. As operators come and go, signing up to vaguely similar services from several MNOs is not attractive. One or other will be bought, leave a territory, cease trading or change this offering soon.

[010] GLOBALCO typically will already have at least one recording system and would rather the call details and content were delivered into that than to a separate, MNO-specific system. Some regulations require that all communications associated with a given transaction are presented together - quickly and easily on demand. This drives GLOBALCO to seek solutions that feed its existing recording systems instead.

[011] Alternative solutions - say from hypothetical company“SMALLCO” - use an application on the phone to route business calls via one or more switching hubs (owned or at least operated by SMALLCO) before they are connected on to the far end i.e. the destination phone number. In this way, the audio can be recorded at said hub and/or forwarded to a separate recording service. However, existing such solutions then route the call from said hub either directly to the number dialled or do so via an overlay network of such switching hubs. The latter approach can avoid international phone call charges by routing the call via the Internet (typically using SIPS) to the country in which the destination phone number is present The final leg of the call is then placed over that destination country’s telephone network.

[012] This approach requires that GLOBALCO pays SMALLCO for the calls its hubs place over the various fixed and mobile networks that SMALLCO has chosen. GLOBALCO is likely to already have deals in place with telecom operators worldwide and a sophisticated least cost routing system to exploit these optimally. GLOBALCO may not beheve that SMALLCO can obtain rates as competitive as GLOBALCO already has in place for its fixed infrastructure. Nor will GLOBALCO like being entirety reliant on SMALLCO for the ongoing provision of all its staff’s mobile calls worldwide.

[013] An alternative approach has been adopted by at least one company - referred to hereafter as“MultiMVNO”. Such a company operates as a Mobile Virtual Network Operator (“MVNO”) in many countries. This means it does not need to have its own network in any country - just an agreement with a local MNO in each - though achieving that is non- trivial. A single business SLM can therefore be used in all of these countries without incurring roaming charges and with calls still visible to and under the control of MultiMVNO. [014] However, GLOBALCO will already have contracts for mobile service with its preferred MNO in each territory it operates in. It will also be unwilling to risk all of its staff’s mobile service being dependent on the ongoing operation of MultiMVNO - a company that is typically much smaller, and less stable than any one MNO it already deals with. [015] A further approach is more popular in the United States - where (unlike in Europe) it is impossible to tell whether a given phone number is a fixed or mobile number. There, employees typically have a single number on their business card. This is typically terminated in a data centre but is often then forwarded to or otherwise linked to their cellular phone - which is actually itself a completely different number. However, this ensures that they can be reached with that one number whether they are at their desk or on the road.

[016] This approach is offered by at least one of the major business telephony system providers and does exploit the least cost routing, resilience and recording systems already present for fixed telephony lines within GLOBALCO. However, it does not work well for GLOBALCO’ s employees in countries such as the UK and much of Europe where business cards show fixed and mobile numbers - and people choose which to dial.

[017] None of the existing solutions satisfactorily exploits the considerable investment and infrastructure GLOBALCO will already have in place for its fixed line telephony and data traffic. As more and more interaction is done over the data network, GLOBALCO is likely to have spare capacity in their telephony network - which is typically highly reliable and fault tolerant and has to be maintained for fixed office phone and call centres.

[018] Many of the solutions above introduce an additional, global overlay of MNO agreements and infrastructure controlled and managed by SMALT This would rapidly

fall apart in the event of SMALT .CO failing and would take a huge effort and considerable time to replace. They also require SMALL CO to pass through the (considerable) call charges. These are likely to be less favourable than GLOBALCO’s existing tariff agreements, are out of its control and usually subject to a mark-up.

[019] None of the present solutions outshines the others - as evidenced by the lack of a dominant approach anchor vendors in this market There is therefore scope for an alternative and novel system architecture that overcomes all of the objections described above. [020] All of the current solutions are more complex than they need be. A simpler approach that is cheaper, more reliable and does not require massive investment should be possible. This is particularly important when one realises that the phone network is a decreasing part of our real-time communications. It is now an anachronistic hangover that needs to be accommodated but gracefully phased out rather than driving yet another fragile, mission- critical global headache.

SUMMARY OF THE INVENTION

[021] Accordingly, the inventor has conceived and reduced to practice, in a preferred embodiment of the invention, a system and methods for control of business telephone calls over cellular networks, hands-free advanced control of real-time data stream interactions, exploiting teleconference participants’ computing devices, and real-time user/counterparty verification. The following non-limiting summary of the invention is provided for clarity, and should be construed consistently with embodiments described in the detailed description below.

[022] The present invention consists of a novel telecommunications system architecture and method for the integration of mobile phone calls into the existing infrastructure that supports the fixed line telephone anchor data networks of a multi-national business.

[023] This consists of a plurality of access points, ideally at least one per country, through which are routed at least a subset of business mobile calls made by or to employees currently present in that country. These“Mobile Access Points” (“MAP”s) route said calls via said existing infrastructure and/or the internet so as to leverage the existing infrastructure, fault tolerance, least cost routing, archiving and transmission capacity available therein.

[024] A system consisting of an application running on a plurality of communications devices, each configured with the addresses of a plurality of mobile access points, each of which may be contacted via a telephone call over the public telephone network using one or more public network telephone numbers and over the internet by one or more addresses wherein all calls handled by said application are connected with desired counterparty address(es) such that the associated media stream(s) pass via at least one of said mobile access points and wherein said mobile access point also controls the onward routing of said media stream(s).

[025] A system providing real-time data stream exchange or interaction between a plurality of parties in which control over aspects of said interactions is achieved by the deliberate insertion of and subsequent analysis and identification of one or more pre-determined phrases anchor visual cues within one or more of said real-time data streams.

[026] A system providing audio connectivity between a plurality of individuals within earshot of each other and at least one remote participant characterised in that audio is received via microphones in a plurality of said individuals’ smartphones, smartwatches, tablet computers, laptops, personal computing devices and selectively merged by a single controller to form a single resultant audio stream that is transmitted to the remote participants).

[027] A system consisting of an application running on a plurality of communications devices, each configured with the addresses of one or more mobile access points via which communication sessions containing at least one audio stream are established with one or more counterparty devices via one or more network connections and wherein a time- bounded sample of at least one of said audio streams is analysed so as to determine a set of characteristics of said audio stream and where said characteristics are compared against a previously measured reference set of characteristics in order to test the hypothesis that the person speaking in said audio stream is the same individual from whose speech said reference set of characteristics were obtained.

[028] A method of providing realtime data stream exchanges between a plurality of parties in which control over aspects of said interactions is achieved by the deliberate insertion of one or more pre-determined audio anchor visual cues within one or more of said realtime data streams.

BRIEF DESCRIPTION OF THE DRAWING FIGURES

[029] TTie accompanying drawings illustrate several aspects and, together with the description, serve to explain the principles of the invention according to the aspects. It will be appreciated by one skilled in the art that the particular arrangements illustrated in the drawings are merely exemplary, and are not to be considered as limiting of the scope of the invention or the claims herein in any way.

[030] Fig. 1 shows the top level telecommunications architecture that a typical multi-national company (“GLOBALCO”) will have servicing their operations in a given country. Note that many of the elements may be hosted or“in the cloud”, or running as virtual machines on a shared physical host The physical arrangement and location of the components is largely irrelevant to this invention. What is important is that, with the exception of the MAPs (24), this is existing infrastructure that the company already uses and manages for fixed line, VoIP and existing business mobile numbers.

[031] Fig. 2 shows a multinational system in three countries. Note that large countries such as the U.S. may be divided into smaller“regions” - each acting as a separate country - if required to localise traffic. For example, this may help to avoid time-zone issues, congestion, slow long distance setup or satellite hops and associated delays.

[032] The detail of Fig. 1 has been condensed into simplified functional blocks that the MAPs (24) interact with - regardless of the detail within them or the vendor/MNO/Telco providing them. In many cases, the voice network is arranged hierarchically, with a hub and spoke topology and will include disaster recovery (DR) capability. However the architecture is organised, there is existing capability for at least a subset of employees in a plurality of countries to each have a personalty assigned public telephone number anchor a personalty assigned mobile phone number. These can be used to place and receive calls from anywhere in the world using the numbers advertised on the employee’s business cards; corporate directories; email signatures, blogs, websites and so forth.

[033] Fig. 3 shows the key internal objects within a MAP. These interact with the existing telecommunications elements shown in Figures 1 and 2.

[034] Fig. 4 shows a flow chart of how an outbound business call from a mobile phone is handled in the case of an employee using a single business number for all business calls.

[035] Fig. 5 shows a flow chart of how an inbound business call to a mobile phone is routed. [036] Fig. 6 shows a flow chart of the“PathSelector” process running on the mobile phone. TTiis determines how a voice call should be established.

[037] Fig. 7 shows a flow chart of a“PathEvahiator” process running on the mobile phone. TTiis analyses the characteristics of a potentially available speech path.

[038] Fig. 8 shows a flow chart of the“PathFinder” process running on the mobile phone.

TTiis attempts to find alternative speech paths that may be available nearby.

[039] Fig. 9 shows a flow chart of the“PathManager” process running on the mobile phone. TTiis establishes and manages the most appropriate call paths from those currently available.

[040] Fig. 10 shows the major components (101, 114) of an exemplary system and the networks between them and infrastructure around them.

[041] Fig. 11 shows the relevant functional components within the mobile phone (101) and the service with which it interacts (114).

[042] Fig. 12 shows how outgoing calls are made from the mobile phone (101) and a subset of this (starting at 1222) is also used for inbound calls.

[043] Fig. 13 shows how inbound calls to the service are handled - and as an outbound call from the mobile becomes a special case of an inbound call at the service, also covers that scenario.

[044] Fig. 14 shows the major components of an exemplary system.

[045] Fig. 15 shows an exemplary network infrastructure allowing the invention to be deployed in a business setting.

[046] Fig. 16 shows an exemplary“Home” screen of the app that is in communication with the MAP.

[047] Fig. 17 shows an exemplary“Contact History” screen of the app that is in

communication with the MAP. [048] Fig. 18 shows the data structures used on the end user device and within the MAP to minimize the risks of personal data leakage via the end user device.

DETAILED DESCRIPTION [049] The inventor has conceived, and reduced to practice, a systems and methods for ...

[050] One or more different aspects may be described in the present application. Further, for one or more of the aspects described herein, numerous alternative arrangements may be described; it should be appreciated that these are presented for illustrative purposes only and are not limiting of the aspects contained herein or the claims presented herein in any way. One or more of the arrangements may be widely applicable to numerous aspects, as may be readily apparent from the disclosure. In general, arrangements are described in sufficient detail to enable those skilled in the art to practice one or more of the aspects, and it should be appreciated that other arrangements may be utilized and that structural, logical, software, electrical and other changes may be made without departing from the scope of the particular aspects. Particular features of one or more of the aspects described herein may be described with reference to one or more particular aspects or figures that form a part of the present disclosure, and in which are shown, by way of illustration, specific arrangements of one or more of the aspects. It should be appreciated, however, that such features are not limited to usage in the one or more particular aspects or figures with reference to which they are described. The present disclosure is neither a literal description of all arrangements of one or more of the aspects nor a listing of features of one or more of the aspects that must be present in all arrangements.

[051] Headings of sections provided in this patent application and the tide of this patent application are for convenience only, and are not to be taken as limiting the disclosure in any way.

[052] Devices that are in communication with each other need not be in continuous communication with each other, unless expressly specified otherwise. In addition, devices that are in communication with each other may communicate directly or indirectly through one or more communication means or intermediaries, logical or physical.

[053] A description of an aspect with several components in communication with each other does not imply that all such components are required. To the contrary, a variety of optional components may be described to illustrate a wide variety of possible aspects and in order to more fully illustrate one or more aspects. Similarly, although process steps, method steps, algorithms or the like may be described in a sequential order, such processes, methods and algorithms may generally be configured to work in alternate orders, unless specifically stated to the contrary. In other words, any sequence or order of steps that may be described in this patent application does not, in and of itself indicate a requirement that the steps be performed in that order. The steps of described processes may be performed in any order practical. Further, some steps may be performed simultaneously despite being described or implied as occurring non-simultaneously (e.g., because one step is described after the other step). Moreover, the illustration of a process by its depiction in a drawing does not imply that the illustrated process is exclusive of other variations and modifications thereto, does not imply that the illustrated process or any of its steps are necessary to one or more of the aspects, and does not imply that the illustrated process is preferred. Also, steps are generally described once per aspect, but this does not mean they must occur once, or that they may only occur once each time a process, method, or algorithm is carried out or executed. Some steps may be omitted in some aspects or some occurrences, or some steps may be executed more than once in a given aspect or occurrence.

[054] When a single device or article is described herein, it will be readily apparent that more than one device or article may be used in place of a single device or article. Similarly, where more than one device or article is described herein, it will be readily apparent that a single device or article may be used in place of the more than one device or article.

[055] The functionality or the features of a device may be alternatively embodied by one or more other devices that are not explicitly described as having such functionality or features. Thus, other aspects need not include the device itself

[056] Techniques and mechanisms described or referenced herein will sometimes be described in singular form for clarity. However, it should be appreciated that particular aspects may include multiple iterations of a technique or multiple instantiations of a mechanism unless noted otherwise. Process descriptions or blocks in figures should be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps in the process. Alternate implementations are included within the scope of various aspects in which, for example, functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those having ordinary skill in the art

[057] Consider a large, established company“GLOBALCO” with presence in many, perhaps most countries around the world. It will have built up a complex network of telephony equipment servicing its offices around the world. More recently, it will probably have tried to rationalise this, reducing the number of independent switches and moving traditional Tune Division Multiplexed (TDM) telephony circuits to Voice over IP (VoIP), typically using Session Initiation Protocol (SIP) for vendor independence. Traditional telephone calls are slowly being usurped by other real-time interaction services.

[058] However, there is usually still some significant“traditional” telephony infrastructure in place. This may be from a single, strategic vendor or, more hkefy, a mixture of multiple vendors’ equipment as a result of company acquisitions, changes in strategy anchor vendors falling out of favour or going out of business.

[059] Interactions are classified as“realtime” (telephony, video calling, desktop sharing, conference calls etc.) or“messaging” (SMS, Instant Messaging, Email etc.).

[060] The key differentiator being that a real-time interaction spans a contiguous period of time - within which the“called party” is expected to respond and engage in a (normally) two- way interaction with the initiator or“calling party”. The actual exchange of information typically does not begin until the called party“answers”.

[061] A messaging interaction, by contrast, consists of one or more discrete exchanges of information, the transmission time of each being unaffected by actions on the part of the recipient

[062] The term“counterparty” is used to identify an entity or group of entities with whom interactions take place. A counterparty may be a person, a business, an application (such as a “bot”) or a plurality of any combination of these.

[063] The term“nickname” is used for the (preferably shortened) name by which the app user identifies a counterparty. For those they interact with frequently, these are normally short and of little meaning to someone who does not know the individual. Should someone access the app without their permission, for example,“kids”,“direct reports”,“team”,“office”, “Dad”, *JB”,“Bank”,“Credit Card” are far less enlightening than their full names (as stored in an address book or“Contacts” would be).

[064] Encouraging the use of such shortened names not only minimizes the space required on screen, it also reduces data leakage that can occur by someone else looking at, or photographing the phone’s display.

[065] The term“decoration” refers to any supplementary shape, text or image associated with a display element Typical examples are the small red circles containing a number that are added to the comer of an icon to indicate how many messages are waiting.

[066] The term“display attribute” refers to any visual aspect of how text or an image is displayed. This may include but is not limited to colour, size, shape, shading texture, opacity, font, font-weight, italicization, underlining, strikethrough, motion, vibration, skew, speed of movement, flashing, rotation, reflection, scaling or arbitrary transform; decoration and so forth.

[067] The term“notification attribute” refers to any combination of display attributes, sounds anchor haptic (touclyVibration) signals.

Conceptual Architecture

[068] Fig. 1 shows the relevant major components of an exemplary telecommunications infrastructure that GLOBALCO will typically have already have in a particular country - plus the additional one or more Mobile Access Points“MAPs” (24) provided as part of this invention.

[069] Mobile phones (1) owned or leased by employees will be using at least one and often all of the mobile networks (2, 3) available in the country. Historically, and often still today, the company would have provided mobile phones (4) with numbers that it owns and are connected via a preferred MNO (5) with whom GLOBALCO has negotiated an overall contract

[070] A Private Branch Exchange (FBX) typically manages telephony within the business.

This typically contains a controller (6); zero or more TDM gateways (7) that connect to the PSTN (11) anchor private TDM circuits; zero or more IP gateways such as Session Border Controller (SBC) (8) that connect to SIP trunks or other IP routes (12) over which Voice over IP (VoIP) calls may be made. Typically, a Computer Telephony Integration (Ul^'l) server or service (9) is available to allow other devices to observe what calls are occurring within the FBX antybr to control them

[071] Connections to each of the above external data and/or voice“pipes” may be via a wide variety of interface standards - each of which is terminated on the appropriate equipment These may include but are not limited to: satellite, micro-wave, radio, fibre-optic, co-axial and copper cables.

[072] When employees need to speak to someone outside the business (13), using that counterparty’s public phone number, the call must normally route via the PSTN of the country in which that number is located. This call could be routed via a MNO if that phone number is a mobile or it could be routed via the internet (14) if they are on a VoIP capable device.

[073] The corporate network (10) is typically connected to the internet (14) via router(s) and firewall(s) (15) - allowing direct (and often cheaper and faster) connectivity to any customer that can be reached via VoIP rather than having to use their PSTN number.

[074] A company operating in a country typically has a physical presence in one or more buildings (16) and these are normally provided with a wired anchor wireless Local Area Network (LAN) (17) over which data and (commonly now) voice traffic flows to any IP telephone sets (19), mobile devices such as laptops or tablets (20) and desktop computers (21). Note that any of these latter two (20, 21) may be running a“softphone” application providing similar telephony services as would be provided by having a physical phone (19). Also note that the mobile devices (20) can usually operate across multiple buildings and“on the road” via internet anchor MNO connections.

[075] The corporate network (10) within the country will normally be connected to the global, corporate Wide Area Network (22) via router(s) and (often) firewall (23). This allows data - and often voice - traffic to flow between national corporate network(s) (10). Sometimes voice is carried over a parallel network to the data. [076] Note that this is only an example. TTiere are many variations on the exact mechanisms by which the elements are connected. For example, Router and firewall (23) could be a VPN connection over the internet

[077] TTiis invention adds into this existing infrastructure one or more“Media Access Points” (MAPs) (24). This is a software process that may run on a physical server or virtual “slice” on a server or on a“cloud server” so long as it has an IP path to and from the corporate network (10).

[078] Also note that many of the components may be logically anchor physically remote from the country. For example, several countries (or the whole world) may share a single PBX controller though TDM Gateways (7) tend to be distributed in-country.

[079] These details do not impact the architecture of this invention - merely the configuration of the MAP (24). For example, it may have to connect to a CTI feed (9) at corporate headquarters to observe the calls happening in the country it serves.

[080] Fig. 2 shows a higher level, logical view of the relevant parts of the global corporate communications infrastructure. Two countries are shown but there are often many. In each country there may be employees with their own smartphones (1) connected over a variety of MNOs. The company’s PBX system - including control (6), gateways (7), SBCs (8) and CTI (9) of Fig. 1 - connects the business to the PSTN (11) and often at least one MNO directly. Typically each country will have some form of local processing (203) allowing calls to be made even if isolated from the rest of the world.

[081] Phones (or sofiphones) (201) within the corporate infrastructure in a given country (202) running on analogue, digital anchor IP phone sets (19), mobile (20) or workstations (21) within the business typically have an internal number that is their unique address within the corporate network. Some also have a publicly available number that can be dialled from the public network using Direct Dial In (DDI) (also known as Direct Inward Dial or DiD). This is often, but not necessarily, the internal number prefixed with a fixed set of digits representing a contiguous block of national PSTN numbers ownectyrented by the company.

[082] Note that by using the existing PBX capability provided (though not necessarily physically present) in country (203) the MAP (24) serving (though not necessarily located in) that country can make and take phone calls anywhere in the world via that country’s PBX (203). This typically includes sophisticated security and cost optimisation algorithms to prevent fraud and to minimise call costs respectively. The MAP (24) also has access to the corporate network (10) and hence the corporate WAN (22) and internet (14) - allowing it to make and place VoIP calls worldwide direct to end users who have VoIP capability or via other MAPs (24) to“break out” via the PBX (203) in another country in the case of an international phone number being dialled. This is a common technique used to avoid international call charges. In many cases, the PBX controller (6) will already be configured to do this, avoiding the need for the MAP (24) to be concerned how the call should be routed.

[083] In (or accessible to) each country is one or more MAP processes (24). Each handles a certain load and typically at least one more MAP (24) is provided than is required to handle the overall load in that country. This provides fault tolerance in each territory.

[084] The MAPs (24) are typically administered by an IT manager in each territory but this is largely done via a user interface onto a central controller“MAPCON” (205) located in the corporate data centre (204). Configuration data is stored in a Emit tolerant database (206) and the appropriate subset passed to each MAP (24) when it is changed and cached there - so the MAP can operate on the most recent configuration it knows even if isolated from the MAPCON (205) and/or database (206).

[085] Typically, a disaster recovery site (209) mirrors the configuration of the data centre (204) allowing it (209) to take over in the event of major failures of the latter (204).

[086] Note that there is no real-time central control of the MAPs (24) and hence no single point of failure or bottleneck. Each can operate independently.

[087] Typically the corporation will also have an existing communications

recording/archiving infrastructure in place. This may be a consolidated system or separate silos but can include elements for recording of voice (208), text messages (210) and call details (207). This latter data (207) is used for billing and cost allocation but is also the metadata that makes the former stores (208, 210) easily searchable. As with the other elements, these may be in country, centralised, hub and spoke or“in the cloud”. The important point is that any logical recording service(s) already present are accessible directly to the MAP (24) anchor via the PBX (203). For example, if the MAP (24) places a call via PBX (203), that call may well be recorded automatically without the MAP (24) having to do anything itself.

[088] Fig. 3 shows the key logical components within each MAP (24). Standard elements common to any server application - such as logging, audit trail, login security, alarming - are present but not relevant to the detail of this invention hence are omitted for clarity.

[089] Note that“process” below may be replaced by“thread” in many cases - the entire MAP service typically runs as a single multi-threaded process but could be distributed if required.

[090] A persistent store holds a local copy of the configuration (301) of this MAP (24). Config tracker (310) is advised by or polls the MAPCON (205) or central database (206) directly for changes. This (301) is a cached snapshot of the subset of the overall configuration (206) that relates to the employees, mobile phones and business numbers in the country this MAP (24) serves. In addition to the definitions of these, it also includes addresses and credentials of the CTI feed(s) (9) that it needs to observe and utilise the FBX (203) facilities in country. It also contains details of the other MAPs (24) that this one shares the load in that country with.

[091] Note that each MAP (24) may interact with more than one PBX (203) - which may be in other countries - but for simplicity, a single one is shown here.

[092] Each MAP instantiates one or more pools of softphones (302, 303, 304), each member of which appears to the PBX (203) as an internal phone - that can make anchor take phone calls and access its advanced features in the same way that a physical phone set (19) would. This normally includes the ability to place on call on hold while making a second (or third) then conferencing or transferring them.

[093] TTiis is typically done via a first party call control or softphone interface (304) such as Telephony Application Programming Interface (TAPI), H.323, SIP, SIPS or a proprietary protocol. The interface and audio paths subsequently established with each endpoint (not shown) are preferably encrypted - typically using SRTP. TTiis is under the control of the PBX

(203).

[094] Three separate pools of internal endpoints are shown: [096] Business Number Endpoints (302): Each of these has its own internal number which may be assigned an external DDI number hence can be reached directly from the public network thanks to the DDI routing within the PBX;

[096] Shadow Endpoints (303): Only available on some PBXs, these softphones operate in “shared control” or“dependent” mode. Each has a corresponding physical or softphone with the same internal number as itself. That“controlling” softphone is used by someone in the business as their“business phone”. These endpoints (303) allow the MAP (24) to track that user’s actions on their phone (19) and/or softphone (20. 21); and

[097] Worker Endpoints (304): Each have an internal address but these numbers are not made public outside the MAP system. They are used to place calls which are, typically, then transferred to other numbers when ready.

[098] Optionally, a further pool of telephony endpoints (306) may be instantiated. These do not appear as internal numbers within the PBX (203). Instead, they are standard SIPS endpoints capable of placing and taking VoIP calls directly. They therefore act as a fell-back calling route should the PBX (203) be overloaded, too slow, too expensive or dead.

[100] In many cases it is also necessary to learn more about phone calls than can be observed from the events provided to the internal endpoints. Most PBXs (203) support one or more 3^rd Party CTT feeds (307) - such as Telephony Services Application Programming Interface (TSAPI) or similar. Via this feed, the Call Tracker (308) process or thread observes at least a subset of the calls passing through the PBX (203). It builds an in-memory model of the state of calls currently in progress (309).

[101] In many cases, this same Cl! feed (307) allows control of calls occurring on endpoints - whether those are locally instantiated (302, 303, 304) or not

[102] Where systems already exist to store voice, messages and/or call details (207, 208, 210), a process or thread manages the interactions with each of these (311, 312, 313 respectively).

[103] One or more App Interface (314) processe^threads interacts with a plurality of “Corporate Dialler” applications running on the employees’ mobile phones. This typically presents a RESTful interface over HTTPS and is accessible from the internet Commands from and messages to the employees using their mobile phones (1) for business calls pass through this process (314). However, to alert the employee to an incoming call, this process (314) also interfaces to the push notifications services of the operating systems supported on the smartphones.

[104] At the heart ox and interacting with all of these components is the Call Controller

(315) process/thread. Each of the other processes tells this controller (315) about relevant events and actions. It (315) then instructs the relevant component(s) to act. It (315) is also in direct contact with the Call Controllers (315) in other MAPs (24) within the same countryfregion and around the world. Heart-beating and sanity checks between Call Controllers (315) allows each to track the health and current loading of die others to facilitate load balancing and fault tolerant failover procedures letting a pool of MAPs (24) act as an N+n fault tolerant, load balancing pool.

[105] However, there is a danger that although a given pair of MAPs (24) may be able to communicate and neither sees a problem, it is possible that the telephony system they are connected to is not functioning as a single system. Such“split brain” modes can be detected by observing the state changes of one or more unique endpoints (302, 304) allocated to each of the other MAP(s) (24).

[106] If MAP (24)“A” sees telephony activity - even just an“off hook” event - via CTI feed (307) on an endpoint that it knows has been dedicated to MAP“B” it can deduce that MAP B is able to control it and that they are part of die same telephony system. Each MAP (24) therefore performs at least a basic telephony operation on a dedicated port (302 or 304) at least once every N seconds. Failure to see such an event within 3 x N seconds is a clear indication to other MAPs (24) that there is a problem with that MAP (24) and/or the PBX (203) serving it

[107] Turning to the use cases next There are four major variants on how employers want to use employees’ own mobile phones: two regional differences and whether or not the mobile phone being used supports only a single SIM or multiple (dual and/or eSIM).

[106] Within the set of employees there will be those for whom no call recording or routing is needed - but least cost routing may be required; those for whom call recording is optional or sometimes desirable and other for whom recording is mandatory and calls must not be completed without recording.

[109] In the United States, mobile phone numbers are indistinguishable from landlines. The “area code” making up the first 3 digits of a 10 digit national number normally relate to a specific geographic region but number portability - including between landline and mobile networks - has broken this strict mapping. It is common for employees in the U.S.A. to have a single business phone number. This needs to reach them when they are in the office or on the road. Those contacting them are unaware of whether the employee will answer on an office phone or a cellular phone - the external party always rings the same number (as do colleagues around the worid).

[110] Case 1: Single Business Number, Single-SIM Smartphone

[Ill] The employee has a personal service contract with one MNO - associating their “personal” number (EMP1) with their own phone (1). Calls ordinarily made from this phone (1) normally display their personal number (EMP1) to the called party and to call the phone someone must know this personal number (EMP1). Calls to and from the phone are not recorded - at least not by GLOBALCO.

[112] It is important that the phone user (employee) uses his personal number (EMP1) for personal calls - without the business seeing details or content of those calls - but appears to be using his allotted business number (BUSPUB1) and is able to take calls via this business number when working.

[113] To achieve these goals, a“corporate dialler” application is installed on the employee’s phone (1) - typically under the control of a Mobile Device Management (MDM) suite that separates business and personal data and applications within the phone (1) - allowing remote control, auditing and wiping of business data if needed.

[114] The employee either selects from personal or business hues - or the appropriate line is selected for them according to who they are dialling or messaging.

[115] Consider outbound calling first. Employee John Doe, normally resident in country “CO”, is currendy in country“Cl” (may or may not be the same as CO). He wants to make a business call to a customer at a specific phone number - which we will refer to as“EXT1”. [116] The flowchart of Fig. 4 shows the process.

[117] Hie user opens the corporate dialler app (401) and selects the business contact with (or enters) phone number EXT1 (402). Instead of dialling EXT1, the dialler, calls (403) one of a set of pre-coniigured phone numbers known to route to the MAPs (24) in country Cl - we will call this“BUSFFRIVr.

[118] BUSPRIV1 is configured in the public network to route (404) to a corporate PBX (203) in country Cl. Therefore an inbound, national call arrives at the appropriate PBX (203) - either direcdy from the mobile network or via an interlink to the PSTN.

[119] Typically a range of numbers around BUSPRIV1 are configured and die

DDI/DID/DNIS number presented to the PBX in the DN1S field is used by the PBX (203) to route the call to a specific internal endpoint -“BNE1C1”. This will be one of the Business Number Endpoints (302) on a MAP (24) in country Cl.

[120] This endpoint alerts (407) as the incoming call is presented- raising an event within die Business Number Endpoint thread (302) and prompting it to go“off-hook”, answering the call (409) and completing a bi-directional audio path between itself and the employee’s smartphone (1) via the PBX (203).

[121] This action typically also triggers at least one event (408) that is reported by CTI feed (307), acted on by the Call Tracker (308) and hence die Realtime Call Model (309) updated with at least the Calling Line Identifier (CLI) - also known as Automatic Number

Identification (ANI) - and the DDI number dialled. Note this latter number may not map to a single specific endpoint if hunt groups are used to pool multiple such endpoints. A further event (not shown) typically reports the call has been established between EMP1 and

BNE1C1.

[122] Thus the Call Controller (315) becomes aware of the call, which (personal) mobile (1) number (EMP1) the call is coming from and that the corporate dialler application dialled BUSFUB1.

[123] In parallel with this call setup via the MNO/PSTN, the corporate dialler application initiates a data interaction (405) via, for example, a RESTful interface to App Interlace 314. This can occur over Wi-fi or the MNO (if it supports data). [124] One challenge with all systems such as this it that call setup is, on average, slower than calling the end customer directly - as the call goes via the corporate server (24) in between. To mitigate this, the data request may be sent over both MNO and Wi-fi IP paths if available. Whichever reaches the App Interface (314) first can trigger other actions - even if the cellular call has not yet reached the PBX (203). The small amount of data wasted is normally immaterial.

[125] Further milliseconds can be shaved off call connect time by acting on the first, incoming call event (408) rather than waiting for the audio path to be established (409).

[126] As soon as it becomes aware of the number required (EXT1), the Call Controller (315) allocates (406) a Worker Endpoint (WE1C1) and instructs that endpoint or (often quicker), tells die PBX via 3^rd Party CTI feed (307) to dial (410) public number EXT1. This typically requires that it prefixes EXT1 with an access code indicating that the following digits are a public rather than internal number.

[127] If this MAP (24) is not in the same country (Cl) as the originating mobile phone (1) (e.g. is supporting several small countries from one regional hub or acting as fell-back or overflow for another MAP), the country code for Cl would be added if the EXT1 is a national rather than international number. Alternatively, an endpoint in the destination country may be used if that results in quicker connection that relying on the PBX’s (203) call setup.

[128] As soon as the Worker Endpoint WE1C1 has been instructed to call EXT1, that outbound call can be conferenced (411) into the call that has just been answered by die Business Number Endpoint BNE1C1. Again, this can be achieved by first party commands issued via WE1C1 but is often more quickly achieved with 3^rd party control via the CTI feed

(307).

[129] TTie outbound leg of the call proceeds and, hopefully, rings EXT1 (412) who then answers (413) and the call is connected.

[130] During this period, the Business Number Endpoint can be sending tones anchor announcements to the employee advising them of the progress of the call; whether or not this will be recorded and so forth. [131] Also in parallel with call setup, Call Controller (315) determines if the call is to be recorded (414) (as determined by the configuration database (301)). If so, the Call Recording Interface provides end point details (415) so that by the time the call is answered, a copy of the data streams to and from the business endpoint BNE1C1 may be forked (416) and directed to either file storage or to ports on the recording system (208).

[132] TTie call thus established continues until either party hangs up (417), at which point the call is tom down (418) and the recording terminated. Ports are freed for subsequent calls.

[133] Alternative approaches to connecting the original MNO call, the external party and (optionally) the recorder port(s) are possible - depending on the capabilities of the PBX and the need for separate streams of audio for each party. For example:

[134] Multiple line appearances on the Business Number Endpoint (302) can be used to set up the outbound call rather than a separate worker endpoint (304);

[135] “Single-step” or“fast” Conferencing and/or transfer features of the CT1 link (307) may allow the additional external party to be added without a second call being created; and

[136] Conference bridging can occur within the MAP (24) itself rather than using a bridge within the PBX. This approach, keeping what the PBX sees as two independent calls active (one on the Business Number Endpoint and one on a Worker Endpoint) gives greater control over what each party hears - for example, prior to the far end ringing. For example, should the far end not answer for 15s, the MAP could play an announcement to the employee offering them some options (“Do you wish to continue waiting; record a message to be played when they answer or let me try again later and call you back?”). Should the external party answer during that period, the MAP could play to them“One second please, John Doe is just coming” whilst interrupting the announcement to John Doe with“TTiey just answered. Fll connect you.” Voice control is also easier in this approach. For example, if John Doe says “let them know I tried” while EXT1 is still alerting, MAP could respond to John Doe (only) “OK, Fll let them know” - and hang up on John Doe. Should EXT1 then answer, MAP may say *John Doe was trying to reach you”. Such interactions are already well handled by various Interactive Voice Response (“IVR”) systems that could be conferenced into the call to perform these functions. [137] In this scenario, the call routing is handled by the FBX (203). It also typically provides the ability to present the calling party as a specific business number - so EXIT see the call as coming from John Doe’s business number rather than his personal number - which is never disclosed outside John Doe’s employer (where it is needed to match incoming call to employee).

[138] Where this ANI cannot be programmatically set, die call has to actually come from John Doe’s assigned business phone in order to show his number to the far end. If John Doe never actually uses a phone or sofiphone on the FBX (always works via his mobile) then a Worker Endpoint (304) can be registered withjohn Does’ internal business number and all calls transferred to or conferenced into it

[139] Some FBXs offer a“dependent mode” sofiphone registration. This allows a“Shadow Endpoint” (303) to be registered for such a purpose without stopping John Doe from using the physical phone (19) anchor soflphone(s) (20,21) with that number should he need to do so.

[140] During the call, the user interface shown on John Doe’s phone (1) is that of the corporate dialler. This will show the EXT1 number as the far end rather than BUSPUB1 (which means nothing to and does not need to be known byjohn Doe). Further interaction with the MAP (24) via App Interlace (314) can provide call status, duration, recording status and so forth which can be presented to John Doe via the Corporate Dialler app on his phone (1).

[141] Optionally, a speech recognition engine may be connected to the data streams, providing transcription of calls in near real-time. This and all call details can be sent to the archiving mechanism via Call Detail Archive Interface (313).

[142] Optionally, the Call Controller (315) may monitor the success/failure, responsiveness and call setup times of the FBX (203) and PSTN. Call Tracker (308) can also gauge how busy the system is from the rate of events being received for CT1 (307). These metrics may be used to change the call routing to use External Endpoints (306) instead of or even in parallel with the internal endpoints (302, 304). [143] When roaming overseas, the dialler app uses the phone’s (1) current location anchor MNO identity to determine which country it is in. The number it dials is selected to be that of the MAP(s) (24) in that country. Hence call charges on the MNO are minimized.

[144] Consider now an incoming call from external PSTN number EXT2 (in country C2) to John Doe’s (sole) business number (BUSPUB1) which is a public number in country CO (regardless of where John Doe currently is).

[145] Fig. 5 shows the process - beginning with the external party diallingjohn Doe’s business number (501). The PSTN routes his call to country CO (if not already there) and thence (502) to the PBX (203) with which it is currently associated - regardless of where in the world John Doe currently is.

[146] As above, this business number, BUSFUB1 is associated internally with either a dedicated Worker Endpoint (302) (if not used by a real phone or softphone) or a Shadow Endpoint (303) otherwise, in a MAP in country CO. For this example, we will use a Shadow Endpoint,“SEP1C0”. The PBX (204) thus routes the call to SEP1C0 (503), alerts SEP ICO (504) and raises a CT1 event to this effect (505). SEP 1 CO goes off hook to answer the call

(506).

[147] Alternatively, or additionally, the MAP (24) may express an interest in the employee’s phone number (the internal number for which BUSPUB1 is the public, external number) and hence receive events via CTI feed (307).

[148] Via at least one of these means, the Call Controller (315) in a MAP (24) becomes aware (505) of an incoming call to John Doe’s (internal) business number. It allocates (506) an available Worker Endpoint (304) WE1C0 or, as above, in the case of overload, fall-back etc. an External Endpoint(306) to call (507) John Doe’s personal mobile number (EMP1)

(SEP 1 CO). Using the same mechanism(s) described previously, where possible the ANI is set to that of the calling party so that the incoming call shows the original calling party.

[149] As with outbound calls, the Call Controller (315) causes the inbound and outbound legs to merge into a single call (508) - via PBX (203) conference bridge or internal bridging as for outbound calls. [150] As with outbound calls, additional rules within the Call Controller (315) can, for example, announce to the calling party that John Doe is not available, would you like to leave a message; have him call you back or shall I try his other number for you?”

[151] As with outbound, in parallel with call setup, the Call Controller (315) determines whether or not recording is needed (510) and, if so, determines appropriate end-point(s) (511) such that when EMP1 answers (512) the call is forked (513) to said end-point(s).

[152] When either party hangs up (514) the call is tom down and recording terminated (515).

[153] As with outbound, the MAP (24) has no need for international least cost routing algorithms - as it uses the PBX (203) to make the outbound call to John Doe’s mobile (1). If John is known to be in a different country (as reported to a MAP (24) in that country by his Corporate Dialler app), the call can be routed via that MAP and break out there rather than incur MNO international forwarding charges.

[154] However, even a national call outbound from landline to mobile can be expensive in itself. Where this is the case, a push notification from App Interlace (314) to John Doe’s

Smartphone can be sent (516) before or at the same time as the outbound PSTN call (507) to his smartphone is attempted. If the corporate dialler app on fire phone is alerted and responds before the outbound PSTN call reaches the phone, it can initiate a call (517) to the PBX (203) using a number (BUSPRIV0) passed in the push notification (516).

[155] The PBX (203) and hence allocated Worker Endpoint (WE1C0) will thus receive

“busy” for the outbound call but an inbound call appears almost (518, 519) immediately. The business endpoint mapped to BUSPRIV0 in the PBX answers the call (520) and this inbound call, rather than the (foiled) outbound one (via WEI CO) is conferenced into the original call. Thus the call comes out of John Doe’s contracted minutes (often unlimited) rather than a charged landline to mobile (1) call from the PBX (203).

[156] The above parallel calling approach results in increased traffic on the PBX (203) so, where speed is not as important as loading, the outbound PABX call (506) may be delayed by a pre-determined time and cancelled before it happens if the expected inbound call appears (519) during that period. [157] When roaming overseas, the dialler app regularly reports its current country to the App Interface of a MAP in that country - which alerts the other MAPs so that it can decide whether any inbound calls to John Doe should be routed via the corporate PBX to the country they are in and handled by the MAP there if this is cheaper than using the MAP where the inbound call originated.

[158] Case 2: Single Business Number, Dual or eSIM Smartphone

[15Q] This case has the same business requirements as case 1 above but the employee’s smartphone (1) is able to operate with at least two MNO numbers at the same time. This gives the opportunity to further separate business and personal calls. In case 1, business calls were, necessarily, being made over the employee’s personal mobile service contract (EMP1) and thus eating into his available minutes. Typically, a business would pay for an unlimited minutes contract - or at least refund the difference between that and the contract they would need without the business calls.

[160] In Case 2, however, many businesses simply provide the 2^nd contract/SIM on the phone and that’s sufficient. The requirement here, though, includes recording of at least some business calls and/or use of GLOBALCO’s least cost routing to minimise call chaiges. For die employees to whom that applies, this system overcomes the limitations of other systems as discussed under“Background” above.

[161] Consider employeejane Doe who has a phone (1) with eSIM capability but must have many of her business calls recorded. As before, the detail of the user interface on the phone that encourage ^/enforces use of the business or personal SIM is outside the scope of this patent but the requirement is to ensure that all calls to or from Jane Doe’s business number go via the MAP and hence can be logged and recorded if needed.

[162] As with Case 1, there will be a Corporate Dialler application installed and running on the phone (1).

[163] The eSIM (preferably, 2^nd physical SIM otherwise) will NOT be for the business phone number that Jane Doe uses for all business calls (BUSPUB2). It will, instead be from a block of unpublished numbers - though still owned by the business - say BUSPRIV1. [164] Preferably the Corporate Dialer on the smartphone prevents her from dialling any number manually and will, itself only dial one of GLOBALCOs own numbers in the country she is presently in - all of which route to a MAP (24) in that country.

[165] Thus Jane Doe cannot phone anyone directly and have her published business number presented as the calling party. Nor can anyone reach her mobile by calling the number she publishes as here work number.

[166] Outbound calls work identically to case 1 but are all made over the business SIM - avoiding any impact on Jane Doe’s personal contract They therefore present the unpublished number BUSPRIV1 to the PBX (203) instead of Jane Doe’s personal number - removing the need for the business to even know her personal number for calls outbound from the mobile phone (1).

[167] Inbound calls similarly, are very like Case 1 but, again, are all made over the business SIM - avoiding any impact on Jane Doe’s personal contract (inbound calls do attract charges on some mobile networks). Hence the number dialled to ringjane Doe’s phone when routing an inbound call through to her is BUSFRIV1 - removing the need to know her private number at all.

[168] Case 3: Separate“Office” and“Mobile” Business Numbers, single SIM Smartphone

[169] This case is the norm in the UK and much of Europe. Phone numbers are inherently either mobile (in the UK starting“07”) or not Both are published on business cards and email signatures and people consciously choose which to ring.

[170] John Smith has a smartphone on his personal contract that uses his personal number (EMP3). His business card shows his“Office” number (BUSPUB3) and his“Mobile” number (BUSMOB3).

[171] He has a phone on his desk - reachable via the PSTN by dialling BUSPUB3 or internally using (normally) a trailing subset of the digits of BUSPUB3 (BUSINT3).

[172] He may also have a softphone app on his laptop that lets him appear to be using this same office phone (BUSPUB3 externally, BUSINT3 internally) wherever he’s connected to Wi-fi.

[173] Consider outbound calls first [174] TTiose made from the“real” office phone - or a softphone of that number - obviously appear to come from that number and can be recorded by the existing infrastructure designed to record the PBX.

[175] Calls made from his personal mobile (1) direct to a customer (EXT3), however, won’t show his business mobile number (BUSMOB3) - and can’t be intercepted even if recording is mandatory. Nor will calls to BUSMOB3 ring his mobile phone (1) as that is EMP3.

[176] So the business deploys a Corporate Dialler app to John Smiths’ smartphone - just as in Cases 1 and 2. As previously, John Smith is encouraged/cajoled/forced/instructed to make all his business calls via the Corporate Dialler app.

[177] As with John Doe in Case 1, this Corporate Dialler app does not call the person John

Smith wishes to speak to on EXT3 directly but, rather, calls the in country MAP (24) via a non-published but public network number BUSFRIV3.

[178] This scenario is die same as outbound for Case 1 with the exception of the ANI presented to the Ear end. John Smith would expect this to be his business mobile number BUSMOB3 rather than his office number BUSPUB3.

[179] If the PBX (203) allows this to be set programmatically to the required mobile number (BUSMOB3) that can be done. Otherwise, that has to be achieved by the business having a block of mobile numbers (including John Smith’s business mobile number BUSMOB3) and a contract with an MNO that allows them to make calls over that number. As there is not actually a phone with a SIM in assigned to that number, it needs to be diverted to a

(normally) SIP endpoint (306) in the MAP (24) - which can then be used to place national calls using that number (BUSMOB3).

[180] However, if (as will be shown below) incoming calls to either John Smith’s office number (BUSPUB3) or his business mobile number (BUSMOB3) both reach him when he’s on his personal mobile (EMP3), does it matter whether the ANI shown on outbound calls is that of his mobile number (BUSMOB3)? Could it not be that of his office number

(BUSPUB3) - which the PBX definitely can present! This can avoid the need to make any outbound calls over this mobile number (BUSMOB3) - opening up the option of using a “Pay as You Go” SIM for it with zero monthly charges instead of a monthly contract [181] Now consider inbound calls - first to John Smith’s“office” number (BUSPUB3). These calls ring, as they have always done, on his desk phone (19) in his office and any softphone (20, 21) he has running on a laptop - or even his Wi-fi connected smartphone (1).

[182] With a MAP (24), as was in Case 1 with John Doe, a Shadow Endpoint (303) anchor CTI (307) observer can monitor the internal office phone number BUSINT3 for incoming calls and choose (optionally - potentially on schedule or personal preferences) whether or not his (personal) mobile phone (EMP3) should also ring.

[183] TTie outbound leg to his smartphone (1) is established exactly as in Case 1.

[184] Now consider inbound calls to John Smith’s“mobile” business number (BUSMOB3). Assume that this is a real mobile number and is therefore associated with a service contract from an MNO. As this number is not present on John’s (or anyone else’s) phone, he cannot undo a simple“divert” placed on it - which can be to a MAPs External Endpoint (302) in the country John’s (personal) phone (1) EMP3 is currently in. Alternatively, it can be diverted to an unpubhcised public network number BUSPRIV3 that routes via PBX (203) to an Endpoint (302) with number BUSINT3A on the MAP (24) in that country. In either case the call initially alerts and can be answered by an Endpoint (302, 306) in a MAP (24) in the country John Smith is currently using his phone in.

[185] Inbound calls to John Smith’s mobile business number BUSMOB3 are therefore visible to the MAP (24) in the country he is currently in. As with Case 1, the MAP can place an outbound call (anchor by to beat it with a notification to the corporate dialler app to make an inbound call to it).

[186] As with Case 1, business calls to and from John Smith’s mobile phone (1) necessarily go via his personal mobile contract (EMP3). The incremental cost of this needs to be home by the business.

[187] Case 4: Separate“Office” and“Mobile” Business Numbers, Dual/eSIM Smartphone

[188] Jane Smith has a smartphone (1) with a personal number (EMP4) and contract on it The business pays for a separate contract and places a second number on the eSIM (or physical SIM). As with Case 2, if no record of Jane’s business calls or their content are needed, Jane uses the phone number of this contract as her“Mobile” business number (BUSMOB4) and calls route directly to and from her phone.

[189] If details of her business calls (over and above what is obtained from the MNO’s billing records) is required, then the Corporate Dialler app can be deployed to log call details but there is no need to change the routing of calls.

[190] Similarly, the corporate dialler app may be used selectively to reduce call charges. For example, it may allow national calls (within the MNO contract) but divert international calls via the MAP (24) and hence use the corporate lead; cost routing plan to break out in the destination country or wherever is most cost effective.

[191] Ix however, Jane’s calls can only be allowed to complete if they are being recorded, then the same approach as in Case 2 can be used. The eSIM contract onjane’s (personal) phone does NOT correspond to her mobile business number BUSMOB4 - but, rather, is a private business number BUSPRIV4.

[192] Her public mobile business number BUSMOB4 is on a separate contract diverted to and thus terminating on a MAP in the country she is currently in (C4) as in Case 3.

[193] Operation is as for Case 3 but, as with Case 2, Jane Smith’s private mobile number (EMP4) need not be visible at all to the business and neither inbound nor outbound business calls impact her personal mobile contract

[194] Intermediate cases include those where recording of some calls or some parts of calls is required but there is no need to force all calls via the corporate infrastructure. In this case only calls that need recording or least cost routing will be routed via the MAP (24).

[195] Note that in all four cases, once the initial call has been established, advanced features such as hold, transfer, conference and so forth may be offered by the dialler app.

[196] The above discussion has assumed that all voice calls are carried over the PSTN or MNO networks at the employee’s mobile (1) end. That is no longer optimal in many cases. As Wi-fi and, (at least some) 4G networks support Voice over IP connections, so this alternative path for the voice channel becomes attractive. Where these networks are available, of sufficient speed and not overloaded, such calls can be established more quickly; they can use higher quality voice codecs and are typically cheaper (often free) than the PSTN/MNO route. There is therefore a great incentive to use them when available.

[197] However, unlike traditional telephone calls, such packet voice paths are more susceptible to quality problems as their bandwidth is not guaranteed for the duration of the call Dropped packets, out of order packets, duplicated packets, shorter range and lack of hand-off and variable delivery time all cause issues if present to excess.

[196] Wi-fi connections tend to be unmetered and hence effectively zero cost to the mobile phone user. Data usage over an MNO’s (for example) 4G network, is normally restricted and can be expensive - so should not be wasted.

[199] In cases where there is no, or very poor cellular network capability, it makes sense to try connection over Wirfi as it cannot be worse. This“Wi-fi calling” capability is already present in mobile phone settings - though not supported by all carriers and is optional on others.

[200] Whenever a cellular and Wi-fi connection are both available, there is at least an opportunity to try and speed up connection, improve voice quakty and/or reduce costs - but a danger of providing a worse user experience if the call is routed over a poorly performing network.

[201] Rg. 6 shows how the invention tracks the available voice and data networks to gain the optimum call experience - based on configuration data that may include but is not limited to settings that indicate the relative importance of factors such as:

[202] That the connection is established and remains established;

[203] TTiat the audio quality is as high as possible;

[204] That the audio quality is as consistent as possible;

[206] TTiat the cost of the call is minimized as far as possible;

[206] That the speed of connection is maximEed (delays minimized); and

[207] That the call holds up while the phone (1) moves. [208] A further setting or algorithm indicates whether the call need not be recorded, should be recorded or must be recorded. In the latter case, the call cannot be allowed to proceed unless it is definitety being recorded.

[209] Where a dual or eSIM phone is used, further settings indicate which are personal or business connections and whether a personal call can ever, never, or under specific circumstances (e.g. only the other network is available) be carried across the business line and/or vice versa.

[210] TTiis algorithm is started (601) whenever a call is needed now, may be needed very shortly or if the phone (1) must always be ready to make the best choice of voice path. The sooner the process is started, the better the decisions it can make. Thus, it is ideally started at boot time. Failing that, as soon as the user of the mobile phone accesses the corporate dialler

(or any other app exploiting this algorithm) - or, failing that“brings the app to the foreground” in mobile operating systems terms. The intent is that this process is already as informed as possible by the time die specific number or contact has been selected. Obviously a short-code button with a pre-defined counterparty will not give an early warning like opening the Corporate Dialler does. Nor will an incoming call - especially if that is being routed via a MAP (24). Where connection speed is critical, this process may be running continuously as a background task (at the expense of battery and bandwidth on the phone).

[211] However the process (or thread) is started (601), the state of the network interlaces is checked (602), looking for any that may allow a voice path to be established. The process also declares an interest in any network connectivity changes so as to receive a call-back event on change of connectivity where the operating system supports this.

[212] Acceptable networks could include:

[213] MNO voice connectivity - on any of the available SIVfyteSIM contracts;

[214] MNO Voice over the data channel (VoIP) - for example a high-speed data network

(such as a 4G connection);

[215] Wi-fi connection^); and

[216] Any other network interface appearing to provide IP connectivity. [217] TTius a set of potential voice paths is determined (602). Any parameters visible to the process that could indicate likely quality are harvested. These will include but are not limited to: signal strength; BSSID; error rates; data and/or packet volumes transmitted and/or received.

[218] If not already running, a separate Pathfinder process/thread (603)may be spawned to try and identify alternative paths.

[219] Preferably the parameters driving the choice of network paths are based on previous experience of said networks. This may be collected on this phone using the outcomes of the calls set up by the network selection process. Advantageously these include the phone’s location, time of day and day of week - allowing it to gradually learn which networks work best, where and when thus improving the accuracy of its decisions over time.

[220] Furthermore, as staff from a given company typically frequent many of die same locations, pooling such data gathered across all or a subset of the company’s phones provides a much better basis on which to decide the likely outcome of using a particular network. Preferably such data is anonymized so as not to provide a privacy leak. To reduce the chances of it being linked back to the originator, time data should be blurred (e.g. hour of day, day of week and month but not which day of the month for example).

[221] Advantageously, a shared service may be provided over the data network. This collects path evaluation and actual call outcome data from these mobiles. It makes aggregated, anonymised summaries available to those from other companies and/or the public. Titus a mobile phone can report its current location and its assessment of the networks around it The central service then responds with information about the likely connection and voice quality of those networks and the locations and identities of alternatives nearby that the user may wish to switch to and/or physically move to should they have problems.

[222] If any of the possible voice paths does not already have an associated PathEvaluator task running, one is started (604). [223] The set of voice paths is counted (605). If only a single voice path is available, the decision is easy. The sole available path is selected (606) and will be used for the next voice call

[224] However, phones move, signals change. Should a network connectivity status change occur, or a pre-determined, repeating“recheck timer” expire (607) the process again considers the available paths so as to provide an updated decision. Note that a PathEvaluator may trigger such a network availability call-back should a sudden change occur, forcing a re- evaluation of the paths immediately.

[225] The evaluation results from the PathEvaluators for all potential VoIP paths are gathered (608) - including an initial (hence rough and ready) one from those just started (607). These evaluations may include but are not limited to: data rate, signal strength, network type, hops to specific destination (s), round trip delay to specific destination(s), packet loss rates, jitter, historical experience. Each element may be more than a single figure - for example, it may contain a range, error bars, standard deviations, confidence levels, outliers and/or data covering specific time intervals.

[226] From the available information about each possible speech path and the configuration data the optimal set of paths is determined for each of a plurality of call categories - such as “personal”,“business”,“confidential”,“MiFID P regulated”. This may be a set or rules or a machine learning algorithm trained on the outcome of previous calls and parameters from PathEvaluators and/or shared historical data from other sources.

[227] The process then sleeps for a predetermined period or until a network availability event occurs (607).

[228] A FathEvahiator process or thread is shown in Fig. 7. One of these is started (701) for each potential voice path - whether that is identified by the PathSelector process of Rg. 6 or the Pathfinder process of Fig. 8.

[229] The current characteristics of the network are noted (702). If the path supports a data channel, one or more exploratory“probing” data transmissions are initiated (703). These may include but are not limited to: [230] Simple ping messages to check connectivity and round trip delay to specific IP addresses (corporate and/or third party). Other protocols that elicit a response from systems that are not deliberately assisting can be used too. For example, ICMP;

[231] “Path Reporting” packets to one or more“Pathfinder” services. These packets optionally report the current network characteristics and, preferably, the phone’s location.

The presence or absence of and time taken for responses to appear are noted as indicators of the speed and reliability of the network path used. The response may contain information regarding previous experience with this path from this and/or other locations nearby antyor alternative paths;

[232] Path probing packets or, preferably, bursts of packets at known intervals. For example, a burst of UDP packets may be sent to one or more pre-configured endpoints. These may be at pre-determined IP addresses or a DNS lookup of hostname (the latter being slower).

These are representative of a burst of RTP carrying voice but the payload provides data regarding the mobile phone (1) and its path evaluation and selection status.

A process at the destination end (such as MAP (24)’s App Interface (314)) echoes these back to the sender (1) - but with the payload now containing data regarding how well the stream was received. Typically the MAP (24) is capable of more accurate and consistent measurement of jitter than the PathEvaluator process which is at the mercy of the mobile operating system’s task scheduler; and

[233] Path reliability probes. These can include the establishment of persistent connections that require repeated activity to maintain them. For example a TCP/IP socket with each end sending a message or burst of messages once every N milliseconds. Should the socket tear down; a message cannot be sent or none is received for, say, 2 x N milliseconds, that can be used as an indication of inability to maintain the connection. Preferably a SIP/TCP connection is initiated as this can then be also be used to establish a media path.

[234] Each of these exploratory exercises (703) is typically performed on its own thread and results in a call-back when a response is received - or a timeout firing after a pre-determined interval deemed unacceptable for a voice path. In parallel with this, however, a timer is used to update the results prior to that This ensures that a path which responds in a usable time (say 250ms) is marked after (say) 100ms as NOT being a sub-lOOms candidate.

[235] As responses are received or time out, the time they took, jitter between them (in the case of bursts) and the data within them (where proactively populated by the far end) are analysed and the performance characteristics recorded (704).

[236] As long as the path remains of interest (705) the process repeats (706) after a specified interval or on a change in network connectivity state or change in call requirements. Note that the intervals for the different exploratory methods may vary - resulting in separate threads with different repeat times for each.

[237] The data gathered by a FathEvaluator is accessible to the PathSelector and other processes at any time. They can also initiate a call-back (706) forcing an immediate re- evaluation and hence fresh exploratory exercises.

[238] The PathFinder process is shown in Fig. 8. This process may be started (801) when the mobile phone boots, when the phone enters a geo-fenced location, on a schedule, on opening the corporate dialler application (as the PathSelector starts) or on command from an external application (e.g. via an appropriate“intent” being issued by another process).

[239] Two independent threads are started. One examines the Wi-fi networks that are visible (802) and what can be learned about their security, data rates, SSID and BSSID addresses, signal strength etc. The exact information available varies between operating systems but is that shown in many Wi-fi network locating apps that are readily available to help you find and access Wi-fi networks.

[240] This process keeps an up to date view of the Wi-fi environment around it - and stores the historical data - allowing PathSelector to use said data as part of its decision criteria regarding the likely suitability of any of these networks as a voice path.

[241] By comparing the currently available networks against pre-existing configuration and/or learned data (e.g. from noting which networks this user has used before) it may decide to wake the PathSelector (803) so that it can reassess the available paths and, potentially, suggest to the user that they may wish to switch to a specific network that is more likely to give them the quality of voice call they need. [242] TTiis process typically repeats its checks on any significant network availability event; a background timer and/or movement of more than a few metres (804).

[243] A second process or thread attempts to exchange information (805) with one or more shared Pathfinder services over the network. Preferably this utilises the same data packets sent as probes by the PathEvaluator. These may be corporate or shared services that gather data about voice path attempts and actual experiences and accept this mobile’s current view of its paths - adding that to their database - and respond with information regarding prior experience of those paths - ideally from that location at that time of day/day of week anchor predicted performance based on that prior knowledge.

[244] The Pathfinder analyses the response(s) (806), applying any corporate rules or policies to thin out the potential candidates (or the initial request may have included a corporate identifier if the business has an agreement with the service provider - in which case the results may be pre-processed accordingly by the service provider and only approved ones returned).

[245] This process too, may choose to alert the PathSelector to alternative networks (807) and/or concerns regarding the networks currently available that may justify a reassessment of which to use. Again, this repeats (808) on a schedule, on moving or on another process requesting that it refresh immediately.

[246] When a voice path is actually required the PathManager process controls which path or paths are established - as shown in Fig. 9. Note that, advantageously, a SIP or SIPS connection will, preferably, already have been established as part of a PathEvaluator’s role and is also used here.

[247] A PathManager is created (901) when a path is (or is likely to be) needed.

Configuration criteria determine whether the potential costs (money, battery, data...) of establishing path(s) before they are definitely needed and/or maintaining more paths than are needed outweigh the benefit of having at least part of the voice path to the end user already established and/or a backup path available.

[248] In the case of a MAP (24) being“in circuit” this allows the path to it to be prepared in advance - reducing the impact of the extra call leg required. In this case, the voice path required is not to the eventual phone number (EXT1 in the previous examples) but to one of the business numbers (such as BUSPRIV1) which is known before the user selects who they wish to talk to (e.g. EXT1). This process therefore is typically initiated as soon as the user accesses the Corporate Dialler.

[249] At some point, the endpoint (EXT1) becomes known (and any potential alternative numbers that may be used, for example, to reach the individual over an alternative route such as Skype^IM WhatsApp^1M or similar).

[250] TTie PalhManager first determines (902) the set of potential voice paths that are available now - from the PathSelector. From the current state of these and the configuration data that specify the balance between speed, reliability, cost and bandwidth, a subset of some or all of these are selected for path establishment (903).

[251] Connection(s) is/hre then initiated (904) over each of these selected voice paths. Note that this set may include both a circuit-switched voice call over an mobile network and a VoIP connection over the same network (e.g. 4G). Furthermore, it may include more than one possible connection over a single path. For example, a direct connection to the end user may be attempted over SIP and/or any of the alternative addresses known to the application - all via the same Wi-fi connection.

[252] Typically, each connection is handled on a separate thread and/or using

asynchronous call-backs that allow each to proceed independently. Each connection attempt results in subsequent connection progress events and/or timeouts (905) should the expected progress fail to occur. Each of these updates an in-memory view of the currently available connections and outstanding attempts.

[253] As connection states change and/or on a refresh timeout the process evaluates (906) the performance and state of each connection and hence the desirability of maintaining it If this has changed the optimum desired connection state then (907) the connection is started/stopped/demoted/promotec^has media startecfyhas media stopped as appropriate.

[254] Connections deemed appropriate for the voice call to use are made available to the other processes that are collecting the user’s speech and extracting audio from tire received data to be played. [255] Note that more than one such connection may be active and transmitting and/or receiving data at the same time.

[256] For example, if speed and reliability of connection are paramount - and cost is less of a concern, then if the first connection to become established is an MNO voice channel and that network is“showing 4 bars” (i.e. good signal) the call is likely to be reliable - so alternative channels may be dropped.

[257] As with the PathSelector, these decisions are typically based on simple rules initially, then more complex rules and ultimately on machine learning figuring out the best answer once sufficient training data has been accumulated and a model trained.

[258] Although the PathManager can measure incoming packet quality metrics, it cannot determine how well the outbound path is performing. Information on this can be measured by the MAP (24) end and transmitted back to the PathManager via RTCP and/or proprietary protocols.

[259] The process may therefore decide to modify the state of a connection (907) anchor to switch which is being used for transmission anchor reception. This modification can be more than simply dropping a connection. For example, a SIP connection can be established but the media stream(s) not flowing. In the case of extreme reliability being required, a call started as a normal voice call over a mobile network may have a second (or more) channel(s) established over VoIP.

[260] If data usage and/or cost are deemed more important than a completely uninterrupted call then the SIP channel may be kept open but no media flows until problems are experienced with the voice channel In this case, that may not be possible for the application to determine - but the corporate dialler app may present the user with a button such as “Switch to Wi-fi” that he can press if the cellular signal falters and he is unhappy with the call quality.

[261] Advantageously, even if a VoIP channel is not being used for the actual voice, short exploratory or“probing” bursts of data may be exchanged over it so as to have ongoing and growing confidence in the application’s understanding of the quality of the channel. This can then be reflected in whether or not the“Switch to Wi-fi” button is presented and/or an indication of the expected quality of that as an alternative channel is shown.

[262] In cases where reliability is important, more than one voice channel may be maintained at the same time. At the MAP (24) these multiple connections - typically one via the mobile network, PSTN and PBX and another via direct SIP connection - are known to be from the same mobile phone (1). They are therefore treated as such within the conference bridging process. Audio from only the currently selected“better” channel is added into the conference audio. The other channels) i¾/are treated as“listen only” participants. This ensures the same data is sent to the mobile over all such channels and it can play the audio from whichever it is receiving better (which may not be the same channel that the MAP has determined has“best” incoming audio).

[263] In other configurations, the MAP (24) may preferentially use the voice data received over VoIP as this may be of higher quality than that received over the mobile network’s voice circuit Even though the jitter and/or packet loss rate may make that a worse channel from the point of view of injecting audio into the call at tire MAP (24), it may still be a better option to feed analysis tools such as speech recognition engines.

[264] As long as the path is still of interest (908) the process regulariy refreshes its connections - or does so immediately it is notified of a change of state (909).

[265] Note that in the case of the MAP (24) being used, it is advantageous to maintain a SIP or SIPS connection even between calls - and at least burst test its media capabilities if not keeping it open as a full voice channel (or use a codec with silence suppression and let that drop the data rate down until needed). So long as the Corporate Dialler app is in foreground or there is a call in progress, a SIPS connection to die MAP is beneficial and should be maintained.

[266] It will be appreciated that the voice path selection and switching processes of Figures

6-9 have applicability outside of the multinational corporate environment that is the primary focus of this patent application. Small businesses and individuals can also benefit from the ability to find and use the most appropriate voice path that is available to them. This may be as simple as choosing between two SIMs and VoIP on the basis of“best” versus“cheapest” option. [267] Little mention has been made of SMS messaging above. This short text messaging service to and from mobile phone numbers can be accommodated within the MAP (24) architecture described. Intra-company messages can be routed via a central service and their details and content logged to the appropriate archival system. Messages to and from the public are received at or sent from die same endpoint that voice to or from that mobile number uses. This MAP Endpoint can then send a copy of the data to the archiving service if required before forwarding the message on via the same redirection algorithms it uses to reach the mobile phone for voice calls.

[268] Although the description has concentrated on“voice calls”, the same approaches and techniques apply if the call includes other real-time streams such as one or more video stream(s).

[269] Hg. 10 shows the main elements of an exemplary implementation of the invention.

[270] Mobile phone (101) (assumed to be a“smartphone”) supports voice connections to telephones (104) on the public switched telephone network (103) and voice and (in some cases) video connections to others on mobile phones, tablets, laptops or other computers which are connected via one or more mobile networks (102) and/or data networks such as a local Wi-fi network (105), the Internet (106) or a private network (107).

[271] Hg. 10 shows a number of components that may be present within such a network. These may be physically present in a building; distributed around the world; physical or virtual machines; owned by the company; hosted or“in the cloud”.

[272] The network joining them may be, for example, a Local Area Network (LAN), Wide Area Network (WAN), a Virtual Private Network (VPN) or directly on the internet.

Functional units discussed may be provided as physical servers or as services running elsewhere. The important factor is that the required components can communicate with each other and are configured and permissioned to do so.

[273] Where the user of mobile phone (101) is an employee or otherwise works for a business, private network (107) represents that businesses corporate I.T. infrastructure. This typically includes a Private Branch Exchange (FBX) (108), a plurality of internal phone numbers which may be mapped to physical phone sets (113) and/or applications running on laptops, tablets or desktop computers. Frequently, there is an Interactive Voice Response (IVR) system (109) and a corporate voicemail service (110). Optionally, voice recording capability (112) is present and, increasingly, a Speech Analysis server/service (111). These typically perform phonetic analysis, speech recognition, emotion detection antybr biometric analysis of live speech and/or recordings. Any of these servers or services may be combined into systems supporting multiple functions and/or exist as one or more separate

systems/services.

[274] One or more Mobile Access Points (114) is provided as part of this invention. Patent application GB1816697.5 describes how these are used to allow control of mobile phone calls by routing the call via said MAP (114) - allowing it to access the audio (and video if present) passing between the parties on the call and to manage each leg of the call - ideally with media stream processing - including bridging as needed - occurring in the MAP (114) rather than an external conference bridge - giving it access to the audio from each party separately and being able to combine, fork, block and inject audio to and from each party as needed for this invention. Hence MAP (114) can be considered to be a“stream management node” that controls how the call is handled.

[275] Where the user of mobile phone (101) is not an employee - or is making a personal call that is not related to their work for the aforementioned business, then (107) may be the infrastructure of a publicly available service to which the user may subscribe. This allows individuals to access many of the same features that were previously only available to business users.

[276] Also note that many mobile phones have a speech recognition capability. Some use a remote service (1015) and hence require a data path to it in order to perform speech recognition. Others include a local capability (1017) allowing them to perform at least some speech recognition when offline.

[277] Fig. 10 does not represent the user interface presented on mobile phone (101) but rather the presence of services and applications. In addition to the (optional) speech recognition services (117) there is typically a voice assistant service (118) - which may be always listening for key phrases (such as“Hey xxxx!” ) which trigger it to try and respond to spoken commands. [278] Fig. 11 shows tire components involved in managing a call so as to allow the provision of advanced calling functions optionally controlled by spoken command during the call.

[279] Mobile (smart) phone (101) has the“LetMeJust” application (16) installed on it This includes an overall CallManager (1106) component that takes user commands from the touch display (1110) and, optionally, headset or other peripherals such as a keyboard. It also displays call status information and optionally tips, hints and instructions on said display (1110).

[280] Audio from the microphone(s) (1101) and, optionally, video from camera(s) 1108 is received by TXHandler (1104). This may also invoke speech recognition anchor

keyword/phrase detection services on this audio stream and thus receive notification of what is being said and can identify one or more spoken commands. It can also fork a copy of the audio to one or more local or remove voice assistant (1122), voice recording (1119) anchor archiving services - or any arbitrary service that needs to use the stream.

p81] Preferably, the TXHandler (1104) transmits audio and, optionally, video out to the connected party(ies) over one or more networks (1103). This is not always what it receives from the microphone (1101). It also has access to additional audio sources such as prerecorded audio; internally generated tones; text to speech and other incoming streams. What is transmitted over the network connection^) can therefore be any combination of these, each processed, modified, supplemented or filtered and/or mixed at a specified volume. For example a recording tone may be mixed into the outgoing audio; the microphone may be muted; audio may be fed to a translation service and the output of that transmitted instead.

[282] In some cases, particularly where a“traditional” phone call is being made over a mobile network, it may not be possible to intercept the audio from the microphone that is being transmitted over the network. In this case, a call may be routed via a Media Access Point (14) so that the above functions can be carried out there instead.

[283] The received data stream(s) are handled by the RXHandler (1105). This also has the same suites of mixing, blocking, forking, injection, processing and analysis capabilities available to it as the TXHandler (1104) does. One analysis performed here that is not required in the TXHandler (1104) is tone detection (such as Dual Tone Multi Frequency DTMF detection) to detect in-band signalling arriving in the received stream. [284] An RXHandler may, in the general case receive one or more media streams and one or more signalling or control streams. It may also process any of said media streams to extract signalling/control information from them - such as DTMF tones or spoken commands. There is therefore a call-back mechanism whereby the RXHandler (1105) can notify the

CallManager (1106) of control information - whether received via an“out of band” signalling path or“in band” within the media flowing.

[285] This latter mechanism is also used for metadata passed within some media coding schemes (MP3, MP4 for example). Problems with the received or transmitted media stream can also generate such call-backs. For example, packet loss rates exceeding a threshold may result in a callback to the CaDManager (1106) warning it of deteriorating connection quality; RTCP packets received may trigger a call-back warning of problems in the opposite direction.

[286] It will be appreciated that this same architecture can be applied to an application running on a tablet, laptop or desktop computer - either as a standalone application, a web browser plug-in or a remote service accessed via a browser.

[287] Communication with the other party or parties (1115) on the call may occur directly via one or more networks (1103) or be routed via a MAP (14).

[288] Where connections are direct to endpoints (1115), the mobile phone (101) may need the more sophisticated multi-party TX/RX handier approach of the MAP (14) and/or there may be an additional connection to a MAP (14) allowing it to provide a subset of services on the call even if the audio/video stream(s) do not all pass through it

[289] Connections may be via VoIP (typically using SIP/SIPS and RTP/SRTP over a data connection) or virtual circuits over telephony networks. However, many phones only support one telephony network call at a time - and sometimes block data networks during such a call. The MAP (14) thus allows complex call scenarios with multiple counterparties to be established via it even if only one connection to the phone (101) is possible.

[290] Each MAP (14) hosts a number of concurrent calls. Each of these is controlled by a MAPCallManager (1110). This communicates with the CallManager (1106) on the mobile phone (101) preferably via a data network, typically using HTTPS and the native Push Notification mechanism of a given mobile phone (101).

[291] For each party on the call (apart from the MAP (14) itseli) the MAP (14) instantiates a TXHandler and RXHandler with the same capabilities as those on the phone (1104, 1105). The mobile phone (101) is hereafter referred to as Party 0 on the call - so TXOHandler (1111) and RXOHandler (1112) process the streams to and from it respectively. Additional handlers (1113, 1114) are created for each additional party added to the call - resulting in handlers TC0...TCN and RX0...RXN if the call has had N+l connections to date. Hereafter an arbitrary handler is referred to as TXnHandler or RXnHandler where 0 <= n <= N. As with the RXHandler (1105) in the phone, these RX Handlers not only process their respective incoming media stream(s) they also alert the MAPCallManager (1110) to signalling/control events they detect in-band or out of band.

[292] Within the MAP (14), each TX Handler (1111, 1113) has access to the incoming streams from all of the RX Handlers (1112, 1114) should it need them This allows each to construct the required stream for transmission to the specific party it handles - regardless of what is being sent to any of the others.

[293] To allow some enhanced functionality even when the mobile phone cannot access a data path, basic, low bandwidth signalling between the MAPCallManager (1110) and CaDManager (1106) can be achieved by instructing the TXHandler at either end (1104, 1111) to inject a sequence of DTMF tones. These can be identified by the corresponding

RXHandler (1112, 1105 respectively) and thus used to convey basic instructions between the CaDManager (1106) and MAPCaDManager (1110).

[294] Advantageously, the DTMF tones are transmitted at a low level and the RXHandlers (1112, 1114) suppress them ftirther if the incoming audio does not contain significant other content during these bursts of tones. Preferably said suppression consists of injecting a signal similar to the background noise level on the call rather than complete silence.

[295] MAPCallManager (1110) has access to a wide range of services that can be used to enhance the interaction established between mobile phone (101) and the remote endpoints) (1115). For example, speech analysis services may be available within the business (1117) and/or externally (1121). These latter may include Voice Assistant services (1122) that not only recognise the words, they interpret commands - typically involving spoken responses, confirmations and further clarification. They can therefore be thought of as yet another participant in all or part of the call - and TX and RX handlers established to route commands to them and receive responses from them. Note that said responses can be injected into the stream being sent to the mobile phone (101) without necessarily being injected into the stream to any other party (1115) on the call.

[296] Telephony Services (1116) include corporate PBX services for internal calls which can be used to exploit the corporate telephony network which may include sophisticated“least cost routing” schemes. TTiese services also include SIP/SIPS or similar connectivity allowing VoIP calls to be established to anywhere via the internet anchor corporate network. As additional connections are established, so these data streams are connected to newly instantiated TX and RX handlers.

[297] Many telephone connections provide user to user signalling - allowing arbitrary data to be passed in the signalling channel as part of the call. This can be received via the Telephony Services (1116) and passed to that party’s RX Handler (1114).

[298] IVR systems (9) are also typically accessed via these telephony services (1116). An IVR port typically appears as an internal telephone number (or pool of ports behind a shared number) and can be accessed by, for example, calling that number. Thus the IVR port becomes another party on the call and a TX and RX Handler are instantiated for it - allowing audio to be passed to it for automated handling and audio from the assigned IVR port - such as prompts, confirmation and dialog - to be injected to any or all of the other parties on the call as required.

[299] TTiere is typically a data connection to die IVR (9) as well - allowing the

MAPCallManager (1110) to direct the interaction and to receive the results of the interaction (choices made, digits entered etc.). A common use for this is when processing credit card payments over the phone. T¾e IVR (9) interacts with one party on the call only and die others do not hear and do not record that interaction.

[300] Telephone calls are gradually being replaced by calls via instant messaging which typically use VoIP based services. This component (1118) allows the MAPCallManager (1110) to use the Application Programming Interfaces (APIs) of these services to establish connections with counterparties via alternatives to the PSTN.

[301] Recording Services (1119) may be within the MAP (14) - writing to files locally or on a file-share - or streaming in real-time to a separate recording service on the corporate network, via the internet anchor VPN or“in the cloud”. Again, in the latter case, the recording service becomes a party on the call and one or more TX Handlers (1113) are instantiated to feed the appropriate audio to it In this case the RX Handler (1114) is largely redundant (though can pass on events from the recording system - such as“unable to record” or“pause recording”) - but there may be multiple TX antybr RX Handlers (1113, 1114) - one for each separate media stream where these are to be recorded separately (or half of a stereo pair of channels in a file). This also allows recording to be paused and resumed, stopped and started during the call - as may be required for regulatory compliance.

[302] Similarly, announcement services (1120) are used to play specific audio under the control of the MAPCallManager (1110). Thus the MAPCallManager (1110) can use a combination of Speech Services (HIT), tone detection within the RX Handlers (1113) and

Announcement Services (1120) as an alternative to IVR ports.

[303] Using the Announcement services (1120) results in an RX Handler (1113) being instantiated. This typically receives its audio (anchor video) stream from a file, an internal announcement service or speech to text rather than as a five stream over the network. The TX Handler in this case is normally redundant as nothing is transmitted back to the announcement service. However, a text to speech service may use a TX Handler (1113) to manage the flow of text to it, for example.

[304] A further service that may result in an additional patty joining the call is that of a concierge or private assistant service (1123) who may be provided with a copy of some or all of the call content antyor metadata and some instructions - spoken or otherwise - during and/or after the call.

[305] Thus the MAPCallManager (1110) can control complex and sophisticated call scenarios under the control of in band signalling such as spoken commands from potentially any party; DTMF signals; out of band messages from IVRs (9), PBX (8), remote services (1122) and so forth. [306] Hg. 12 shows a flowchart of how a an outbound call through this system is managed.

[307] Hie user opens or selects (1201) the application (16) - possibly via voice command (“Hey xxx! Make a business call” for example). Preferably this application (16) replaces or at least sits alongside the in-built telephony dialler application on the phone - encouraging or enforcing the use of said application (16).

[308] A background task immediately uses the phone’s location, schedule antybr other preferences/history that are available to determine die most appropriate MAP (14) to use from those known (1203). Alternatively, a locator service may be used. This responds with details of the MAP (14) to be used and, preferably, one or more fall-back alternatives.

[309] Communication is established (1204) with a MAP (14) via an available data network

(e.g. 4G or Wi-fi) allowing it to allocate resources ready for a call A VoIP channel is established (1208), typically using SIP/SIPS but media need not flow immediately. This thread continues to pass user actions and commands interpreted from the media streams to the MAP (14) and acts on commands incoming from the MAP (14) until the call ends. Typically this thread will actually maintain its connection with the MAP (14) for as long as a call is in progress or the application (16) is in foreground.

[310] Meanwhile, on the user interface thread(s), the user selects (via touch or speech command) an existing contact or group of contacts or enters a phone number/address of a party (1115) they wish to call. Selection may imply immediate connection (e.g. heavy/long press or press a phone icon or“Call Now” button or spoken command) or may simply select the entry, allowing others to be added. In the latter case, the MAP (14) is advised of the selection (1205) and may choose to initiate connection while others are being added to the call - as this implies a conference call, which will almost certainty go via the MAP, it may, for example, start or extend its probing of the VoIP channel (1208) so as to understand the quality of that as the potential voice path to/Erom the MAP (14).

[311] Having selected the set of initial participants) (1115) of the call, this set is examined to determine whether or not the call should be placed directly (1207) (normally only an option for a single counterparty) or via the MAP (14) (for example: multiple parties; recording required; business call; international call requiring least cost routing; anchor advanced features may be required). In this latter case, a voice path is established (1215) to the MAP (14) - preferably over the VoIP channel (1208) if viable - to the MAP. If the MAP (14) is providing speech recognition services, there may be no need to do so at the phone (101) as well - so, optionally, a speech recognition service taps (1218) into the audio from the microphone (1101).

[312] Other analyses of media streams can also be added. For example, image processing such as supported by OpenCV or similar may be used to, for example, detect visual commands (such as a wave or putting one’s hand up). This can be done on inputs that may not even be transmitted. For example, image processing may be applied to a stream from the camera (1108) even if the call is voice only - in which case the TXHandler (1104) is not passing that stream on, merely analysing it

[313] Where a call is placed direcdy (1207) to the end party (1115), the MAP (14) is advised of the identity of the called party (1209) and a speech recognition service (17) is tapped into the microphone stream on the phone. This allows commands to be recognised and acted on from this point forwards - including before the call has been answered (e.g.“Let me just leave a message” should the user give up waiting for the call to be answered). If the far end answers, the audio paths are connected but the system continues to listen (1213) for spoken commands from the user and/or instructions from the touch screen, headset or other peripherals.

[314] If the call is not answered after a timeout (or earlier if a spoken or UI command to abandon the call attempt is given) the call is abandoned (1212) and tom down. As with all state changes, the MAP (14) is advised (1220).

[315] Not shown is an optional scenario whereby a call originally called directly (1207), subsequently requires services only available at the MAP (14). In this case, the phone (101) establishes a second voice path, to the MAP (14) over a network that allows the existing connection to remain in place. By selectively conferencing and/or forking these, some (but not all) features of the MAP (14) can be provided. For example, a copy of the audio can be streamed to the MAP (14) for recording and/or remote analysis.

[316] Fig. 13 shows the preferred inbound call handling approach. Someone places a call to this mobile (101). Note that they may not have dialled (or even know) the actual mobile phone’s (101) public number. They may dial a number printed on the phone’s owner’s business card - which his employer associates with this mobile (101) via a MAP (14) or other redirection mechanism.

[317] Inbound calls to the mobile phone (101) are therefore preferably arranged to route to a unique phone number that terminates on a MAP (14) rather than taken directly on the mobile phone’s (101) own PSTN number. This can be done, for example, by applying a “divert all calls” feature or by advertising a different number in the first place (as described above).

[318] Whether the call is routed directly to a MAP (14) or via a PBX (8), a call alerts (1302) on the MAP (14) - causing (1303) a TXOHandler (1111) and RXOHandler (1112) to be instantiated ready for communication with the phone (101); a further TX and RX Handler (1113, 1114) to be instantiated in preparation for terminating the stream and a

MAFCallManager (1110) to control the call

[319] Thereafter, the MAFCallManager (1110) starts by advising (1304) the mobile (101) of the call details via a data network. [320] A media task tries (1305) to connect the TXOHandler (1111) and RXOHandler (1112) to the mobile phone (101). In many cases a second pair of handlers is created in parallel - allowing PSTN and VoIP call attempts to be made in parallel - with the first one to succeed being used (as long as it appears to be of adequate quality). The other channel may be dropped or maintained in case of fall-back.

[321] Optionally, a further TX and RX Handler pair (1113, 1114) may be instantiated and a call initiated to an IVR (9) port or pool. This is needed if the IVR (9) is to be used in either assistive mode (e.g. playing pre-recorded messages) and/or to take control of the call at any point (e.g. take credit card details).

[322] Optionally (not shown) a further TX and RX Handler (1113, 1114) may be assigned, ready to play an announcement (1120) or read some text via text to speech. Likewise, for recording services if required.

[323] The original call may or may not be answered immediately (1303). There are reasons to adopt each approach. For example, as soon as the call is answered, the far end may incur charges. The likelihood of a charge being incurred may be easily inferred in some cases. For example, a normal UK geographic number answering a call from a landline (first two digits not“07”) is very likely going to result in the caller incurring a call setup charge - which can be significant This call may therefore be allowed to continue alerting until the call to the mobile (101) has been answered or a decision is taken to respond with answering machine or voicemail (10) capability at which point the MAP (14) may provide such capability internally and/or route the call to an existing service (10). Other calls, determined likely to incur zero or minimal charges, or other rules applied to the call, destination, source, time or other parameters may be answered immediately (1303) so that progress notification tones anchor announcements can be played to the caller rather than basic ring tone.

[324] Before the connection to the mobile (101) has been established, if speech commands are to be supported from either party on the call, speech recognition services (1117) are tapped (1307, 1308) into the appropriate RX Handler (1114). Note that these may have different language, speaker anchor vocabulary models for the caller and called party.

[325] The various TX and RX Handlers (1111, 1112, 1113, 1114) route, fork, mute, mix, process, filter, analyse and generate audio, video or other data stream content (1309) as instructed by the MAPCallManager (1110) throughout the call until it is terminated. Any events they detect - from signalling or in-band analysis of the media stream are passed to the MAPCallManager (1110) for processing. These may result in changes to how media is flowing and/or connection/disconnection of streams.

[326] Processing at the mobile for this incoming call scenario is essentially a subset of that for outbound calling. Whether a push notification or an inbound call from a MAP (14) via the mobile network occurs first, the user is alerted to an incoming (enhanced) call as normal.

[327] TTie voice connection to the MAP (14) is completed immediately or when the user chooses to answer the call (depending on tariff details and user preferences) - joining the flowchart of Etg. 12 at 1222. As at the MAP, the TX/RX Handlers route media as instructed and advise of signalling/control events while the CallManager task (1106) is advised of and acts on events coming in from the MAP (14) and the mobile phone’s (101) User Interface (1110) anchor peripherals.

[328] Note that the processing at the MAP (14) end for an outbound call from the mobile (101) is very similar - as this also results in an inbound call from the mobile (101) to the MAP (14). When creating the handlers, the MAPCallManager (1110) recognizes the calling party as a supported mobile phone (101) (preferably it has recently been alerted to that by a data message over a data network (1203) and has already started preparing for the call In this case, the inbound call from the mobile (101) is answered immediately and the

counterparty's) (1115) is/are called instead of the mobile (101).

[329] A subset of the features described can be provided to callers from regular phones or mobile phones that do not have the application (16) present

[330] With the various media streams established and appropriate analysers processing them, we now turn to the enhanced feature set that can be provided to the user within this system.

[331] The overall goal is to provide easy, non-intrusive access to at least the features that users of dedicated phone terminals (19) and“agent desktop” interfeces make frequent use of in advanced call centres. TTiese typically require a complex user interface, in the form of a business telephone set, a“softphone” or“agent desktop” and often ancillary controls such as agent initiated recording controls, an auto-dialler user interface and so forth.

[332] Voice assistant devices are now in many homes and their speech interlace, triggered typically by a“wake” word or phrase is well proven and understood. Furthermore, these devices have accompanying Software Development Kits (SDKs) and Application

Programming laterfeces (APIs) making it easy to construct sets of commands - with varying degrees of complexity of dialog beyond the initial voice utterance that triggers a command sequence.

[333] Mobile phones can access these services directly. Most run as services over the internet with audio being transmitted to them and the response coming back. They can therefore be connected into a call via a TXHandler/RXHandler (1113, 1114) pair as described above.

[334] It is now commonplace to initiate phone calls via voice commands - especially using hands-free mode when driving. However, there is scope for significant extra functionality that can easily be added given the architecture described above. [335] TTiis needs to be done with minimal disruption to the flow of the call. Luckily, there are some very common phrases used ahead of most telephony operations - because as soon as these are performed, the audio path to the customer is often lost Today’s voice assistants are designed to have a command immediately following a wake word or phrase. This is ideal for use on a call. For example (using British English phrasing):

[336] “Let me just put you on hold for a minute or two.”

[337] “Let me just transfer you to sales.”

[338] “Let me just conference in my supervisor.”

[339] In all these cases, the intent is clearly stated after a common phrase“Let me just” that does not sound at all out of place or deliberately aimed at a voice assistant - even though it can be. Other regions and languages have similar phrases that can be used there.

[340] TTiere is also a natural pause (at least on the part of this speaker on the call) giving a clear demarcation at the end of the command. The other party often expects to hear silence - if only briefly - at this point This is an ideal opportunity for this speaker to conduct the remaining dialog in private.

[341] In this system, it is straightforward to route the responses from the voice assistant that is tapped into the caller’s audio stream back to that caller only. The other party therefore does not hear“Sony I don’t know that one”. Preferably, this phrase is replaced by a short but easily recognisable tone so as to distract the caller less. As brief rising tone indicating“huh?” (or, literally“Huh?”) is all that is required should a command go unrecognised or the wake phrase be used in more general conversation without a valid telephony command following it

[342] Some commands have serious consequences and hence should be confirmed before they are acted upon. Again, the beauty of the existing telephony system is that the other party already expects die line to go dead after many of these sentences. It is easy to play the confirmation or follow-up dialog only to the user giving the command. For example,“Are you sure you want to hang up?” - to which a rarely mistaken“Yes” or“No” is given - and that response goes solely to the voice assistant, not to the other party on the phone call - who may already be listening to an announcement or music on hold. [343] In the example dialogs below, the spoken responses and questions are deliberately terse. Preferably at least two options are provided to the end user from sets of utterances that could be characterised as“terse”,“pohte” and“verbose”.

[344] Users typically start with“verbose” - which can include tips and hints explaining commands while waiting (e.g. while listening to ring tone). Users can then move to“pohte” or “terse” if their time is more valuable than how they are seen to converse with their telephony voice assistant

[345] Different command sets are used during the three phases of: call setup; once the call has connected and after the call has ended. Within these stages of the call (or sub-call such as a consultation call within the overall interaction), each command may be enabled or disabled through corporate anchor personal preferences. Those occurring during the call may optionally be provided to the other party or parties on the call as well as to die user of the mobile phone (101).

[346] In the dialog examples below, the words spoken by the user of mobile phone (101) (‘John Doe”) are shown in normal typeface and those of the voice assistant’s responses in italics. The counterparty is referred to as‘Jane Smith”.

[347] Synonyms and alternative phrasing may obviously be added to improve command recognition accuracy.

[348] Note drat initial responses/confirmations are, preferably, unique for each command. This allows very short phrases to be used but the user can still be confident that the appropriate command has been understood.

[349] Many of these will be designed and configured by the business or even the individual user. Catalogues of command dialogs may be made available for users to pick and choose functionality from and, if they wish, assign to command phrases of their choice.

[350] Each user may choose from a wide range of tasks that the business can accumulate and share between employees. As with the in-call commands, buttons, text fields etc. may be presented on the user interface of his phone as an alternative to spoken responses - or just to let him correct any errors in what has been interpreted from his spoken responses. [351] Note that call status and recording state (on, off paused...) is also shown continuously on John Doe’s screen (1110) throughout the call and can be controlled by pressing buttons thereon. This provides a fall-back mechanism for the (increasingly rare) cases where the voice assistant cannot interpret his commands correctly.

[352] During call setup, assuming the call reaches a valid endpoint (1115) (has not been misdialled or wrong number/address used), the call will ring the far end (1115) until it is either answered or the caller gives up (“abandons”). During this period of alerting, a number of commands, including but not limited to the following supported. In this phase, responses are mixed with (a reduced volume copy of) the ongoing ring tone so that if the call is answered before a valid selection is made, the user is immediately aware; the call is connected and the partially completed action abandoned.

[353] “Let me just hang up.”

[354] Confirmation question asked (“End Call?’) played over ongoing ring tone. Call terminates after confirmation. If call is answered before positive confirmation, action is abandoned, call is connected.

[355] “Let me just leave a message.”

[356] Dependent on the contact details available for the counterparty (1115) being called, user is asked to choose from appropriate set of options (for example“ Record an email for Jane Smith, have me transcribe a spoken message or record a message for me to call her with later?"). If a valid option is not heard, assistant plays“Sorry, I didn’t recognise that option? or “ / didn’t hear your choice? and abandons the action.

[357] “Let me just try again later.”

[358] This then uses standard scheduling/alann/reminder dialog patterns to determine when to try again (“In an hour”,“At 3pm tomorrow”,“daily at 9 AM”,“Noon on 31**) ...). The dialog can include options such as“Shall I call you first and then Jane Smith or only call you once I have her on the line?”. In the latter case, a scheduled call is made to the counterparty (1115) who, on answering, are then played“John Doe has been trying to reach you. Fm just trying to connect him into the call now.”

[359] “Let me just ask Fred Bloggs to call them instead.” [360] “Fred Bloggs” is looked up in the user’s contacts list with the same or similar dialog and search approach used for voice controlled calling. Assuming a candidate party is found, an appropriate set of options is built from the contact details available. For example“Shall I email, text or call Fred Bloggs asking them to call Jane Smith?” tinea, on a valid option being selected,“Would you like to record a message to go with that request *?

[361] “Let me just give up on them.”

[362] Acts as for hang up (and requires confirmation) but also removes this counterparty (1115) from a pre-specified list of parties (as may be done with an auto-dialler dialling list).

[363] Once the call has been established between mobile phone (101) and at least one counterparty (1115) or backend service such as a transcription or concierge service (1123), several more commonly used phrases can be listened for - and acted on much as a business phone would do on having the corresponding button pressed. For example:

[364] “Let me just put you on hold.”

[365] Call immediately placed on hold with only a very brief positive“doodle-doop” tone or“OK”. This action is reversible so no need for explicit confirmation. Actually, there is no need to do anything in the PBX (1103) unless call or data costs can be saved by stopping media streaming within other networks. Typically the MAPCallManager simply stops John Doe’s voice stream from being added into the call, optionally replacing it with music or announcements). Normally this will also stop their voice being added to recording streams (though there are exceptions).

[366] “Let me just retrieve that call.”

[367] If there is a call on hold, it is retrieved after a brief positive confirmation tone or acknowledgment word/phrase as above. Again, this is reversible so no need for explicit confirmation. If as noted above, the hold did not involve actions outside the MAP (14) then neither does this action as it simply reverses tire actions take when putting the call on hold.

[368] “Let me just transfer you to ABC.”

[369] A phrase such as this is used to request an (implicitly)“blind” transfer - in which the original call will very shortly hear the new party ringing (or announcements played on alerting) until they answer. [370] TTie call is placed on hold and a confirmation played and responded to (“ Blind transfer to ABC T“Yes”) to ensure the destination has been understood correctly. In this case, not only the user’s Contacts should be consulted for a match but also the corporate directory.

[371] On confirmation, the call leg with the counterparty (1115) hears ABC alerting immediately, ending the call as far as the MAP (14) is concerned. Depending on the PBX (8) used, this may require the call to be maintained via the MAP (14) or the call on which the counterparty (1115) is connected to the MAP (14) may be transferred off to the new party, allowing this MAPCallManager (1110) to terminate. Note that if the counterparty (1115) is to be offered any of tire in call services offered by the MAP (14) then the transferred call must actually be made as two separate legs through the PBX (8) and the MAP (14) remain in circuit

[372] Note that the contact details held for ABC may include addresses on systems and networks other than the telephone network. The connection to this new party may therefore be attempted by what Instant messaging, VoIP or telephony service (s) are available and for which an address is known or can be determined for that party. This just influences the type of TX and RX Handler (1113, 1114) instantiated and the service used to establish the connections to anchor from them.

[373] “Let me just hand you over to ABC.”

[374] TTiis phrase may be configured to invoke an implicit“consultative transfer”. After confirmation that the destination has been correctly identified (“ Consult ABC ahead of transfer “Yes”), a call is initiated to ABC (new TX and RX Handler (1113, 1114) pair). John Doe hears that line ringing - during which period a subset of the pre-connection commands applies.

[375] Once connected to ABC, John Doe may leave the call (see below), in which case ABC is then connected to Jane Smith. Alternatively, Jon Doe may retrieve the original call (see above) resulting in a 3-way conference.

[376] “Let me just consult with ABC”. [377] Original call is“held” within the MAP. After confirmation (“ Consult ABC?"“Yes”), an additional connection to ABC is added as for consultative transfer but, in this case, should John Doe leave the consultation call, the original call may remain on hold until explicitly retrieved (often personal preference).

[378] “Let me just conference in ABC.”

[379] This is a“single-step conference” or“fast conference” so, after confirmation that the destination has been correctly identified, a connection to ABC (new TX and RX Handler (1113, 1114) pair) is added to the call immediately. John Doe and Jane Smith hear that line ringing - during which period a subset of the pre-connection commands applies.

[380] Once connected to ABC, John Doe may leave the call (see below), in which case ABC remains connected to Jane Smith.

[381] “Let me just ask you to give your card details in secret”

[382] A brief but distinct confirmation (“ Taking payment details. OK?") is played to John Doe (who gave the command). For efficiency’s sake, this may played concurrendy with a “ How would you like to pay z” dialog starting with Jane Smith. Should the command be misinterpreted, no harm is done in the few seconds it takes to cancel it with a“No”.

[383] Given the credit card industry standards, there are several IVR approaches to this. The call may therefore be temporarily connected to an existing such service or the dialog may all occur within the voice assistant framework that is handling the“Let me just...” commands.

[384] “Let me just get someone to help with that”

[385] Can conference in an internal or external resource pool - such as a transcription or concierge service (1123) or an automated personal assistant The response may request selection from a range of pre-configured resource options. This may or may not be audible to the other party (Jane Smith”).

[386] “Let me just send you a recording of this call.”

[387] Response may request confirmation of which party(ies) on the call are to be sent the recording. May include the above resource pool. [388] “Let me just send you a transcription.”

[389] Response may request confirmation of which party(ies) on the call are to be sent die transcription anchor options of how to transcribe it (automatically, manually internally, externally...).

[390] “Let me just tell you XYZ”

[391] For example“our terms for this offer”. Results in a pre-recorded announcement or text-to-speech output being played to the counterparty. As this is non-destructive, it can start playing immediately if“XYZ” is recognised as a valid choice. Otherwise a“huh? tone/Word can be played to John Doe (only).

[392] “Let me just stop that”

[393] Aborts anything being played as the result of“Let me just tell you...”. Used if“XYZ” was misheard, is no longer relevant or can be truncated.

[394] “Let me just mark this call as XXX”

[395] Adds metadata“XXX” to the call (same as“tags it with XXX”). May be confirmed or not according to how important the tag is.

[396] “Let me just set the XXX to UUΎ”

[397] Sets field with key XXX to value YYY as is commonly done with“user defined fields” in call recording systems. For example,“Set the priority to high.” May be confirmed or not according to how important the tag is. In this case, the confirmation may be played to the counterparty as well - giving both parties confidence that the action has been confirmed.

[398] “Let me just add some details to the call”

[399] Results in“Go ahead¹ - and subsequent speech from the user is not transmitted to tire counterparty (1115) but recorded for internal purposes. May or not be sent to the main recording system.

[400] “Let me just add some notes to the call”

[401] Results in“Recording note? - and subsequent speech from the user is not transmitted to the counterparty (1115) but recorded for internal purposes. May or not be sent to the main recording system. Terminates on another wake phrase in“Let me just return to the call”.

[402] “Let me just start recording”

[403] Appropriate after John Doe (or pre-recorded announcement) has explained to Jane Smith that this portion of the call needs to be recorded for contractual reasons (or other GDPR compliant reason).

[404] A short positive confirmation tone played to John Doe (only) gives him immediate confidence that recording has started. A background recording beep tone is typically then injected into the audio stream played to Jane Smith (varies according to local regulations).

[405] “Let me just pause recording”

[406] Appropriate ahead of sensitive information being disclosed. Stops or masks recording with a fixed tone. A brief distinctive tone gives John Doe immediate confidence that recording has been paused.

[407] “Let me just resume recording”

[408] Ends the masking caused by the pause command above. A brief distinctive tone

(different from the paused tone) gives John Doe immediate confidence that recording has been paused.

[409] There are many other mid-call commands that could be useful in specific scenarios. For example:“Let me just flag this call to ABC”,“Let me add a voice memo to this call” etc. There are several commands that result in John Doe leaving the call,

[410] “Let me just leave you with ABC”

[411] The parameter ABC is (unusually) irrelevant, this is merely an instruction to the system to clear John Doe’s connection but explicitly not to proactively clear the remaining parties.

[412] “Let me just drop off the call then.”

[413] Will clear John Doe’s connection and, if there is only one remaining real party, the entire call.

[414] “Let me wrap up the call them” [415] Will force clear all parties from the call even if there are two or more other parties who could otherwise continue to converse.

[416] Should the counterparty, Jane Smith, leave the call first, this should be announced to John Doe rather than simply tearing the call down - as the latter action is difficult to distinguish from a failed connection between John Doe’s phone (101) and the MAP (14).

[417] Optionally, a connection tojohn Doe may be maintained after the call to the counterparty (1115) has dropped.“Completion codes” are a common requirement For example, on clearing the counterparty from the call, the voice assistant could ask“ Please state the outcome of this calf;“ Shall I add Jane Smith to your prospect list ?;“Shall I schedule a follow-up call with Jane Smith ? etc.

[418] Some commands may also be offered to the counterparty (1115) during all or part of the call.

[419] For example, while John Doe has Jane Smith“on hold”, this is an ideal opportunity to play her some instructions - such as J“ohn Doe has placed this call on bold. If you need me to attract his attention, just say“Excuse Me”. If you need to leave the call please tell me. You can leave him a message if you need to drop off he call.”A sys

[420] Note that this does not disclose the wake phrase (“Let me just” in this example). This is partly because“Excuse me” is easier to remember and parity because the system does not necessarily want to draw the counterparty’s attention to the wake phrase.

[421] Other commands can be listened for without explicit instructions having been given.

For example,“Excuse me!” said while a pre-recorded announcement is being played may terminate it with a“Sorry, can I help? - preferably in John Doe’s voice rather than that of an assistant that the customer may not be aware of having been“present” on the calL

[422] Advantageousty, the system may“scrape” the configuration of the company PBX (8) via, for example, an administration interface or API. By reading all of the phone

numbeityhandles and other addresses and their associated name¾/descriptions where present it can stay up to date with the names and numbers that may be mentioned in voice commands. [423] Similarly, if the employees are configured in a human resource system, workforce optimisation suite, recording system, private branch exchange system, corporate directory, public directory, domain server, active directory, reading the current configuration and subsequent updates from there can also reduce the need for ongoing detailed configuration of the possible numbenyaddresses and associated job functions and employee names.

[424] A plurality of individuals within a room (1401) need to participate in a conference call, that includes at least an audio path, with at least one externally connected person and/or service such a speech recognition (1416) or transcription service.

[425] A number of computing devices with audio capabilities (microphone and/or loudspeaker) are also present in the room (1401). These may be brought into the room by participants or be part of the room’s infrastructure. They typically include smartphones (1402, 1403, 1404), tablet computers (1405), laptops or personal computers (1406).

[426] TTiere is also typically at least one desk phone (1408) and/or conference phone present in the room. Optionally, one or more loudspeaker devices such a Bluetooth speaker (1409) may be present Audio may, optionally, be sent to said speaker (1409) from at least one of the devices - for example, smartphone (1402) - in the room (1401).

[427] Devices within the room are typically able to communicate with each other and with those outside the room via a Wi-Fi (1407) and/or wired (typically ethemet) network. Those with mobile network access may also be able to use a public cellular network for data communication as well as voice. Even in the absence of any pre-existing network (1407) in the room, many such devices are able to use peer-to-peer wireless networking to

communicate with each other.

[428] Prior to this invention, one individual would typically use the desk or conference phone (1408) to dial whichever conference provider is hosting the overall conference. This results in an phone call (over the WAN, LAN, PSTN, mobile network(s) and/or Internet) to a conference bridge (1410). This is frequently routed via the company’s Private Branch Exchange (PBX) (1413). Other participants (1411), outside the room also dial in or connect via their browser or other application to this conference bridge (1410). [429] Note that this invention also works when there is a single remote party (1411) and no external conference bridge (1410). The issues around managing the audio pickup within the room (1401) are the same regardless.

[430] The conference bridge (1410) may be providing video anchor screen-sharing, whiteboarding, chat and other data sharing as well as recording facilities. Alternatively, these interaction mechanisms may be provided by a completely separate conferencing service to which the participants connect independently of this audio connection.

[431] In either case, users within the room (1401) will typically also connect via their browsers or dedicated applications to view and contribute to the visual streams if these are available. Although this connection method usually also offers an audio path, this is not a good option. If there is more than one audio path between the room (1401) and the conference bridge (1410) echo typically occurs as the microphone on one device picks up the audio output by another device in the same room. Even if users wear headsets, their microphones typically still pick up one another’s speech.

[432] Furthermore, it is usually easier to position the desk or conference phone (1408) centrally than to use any one of the personal devices. This phone (1408) often has better speakers and, especially if it is a dedicated conference phone also has better audio pickup than the personal devices.

[433] However, the volume and quality of audio picked up from each participant depends on their location in the room and the quality and orientation of the microphone(s) in or connected to the (single) device being used for the audio path.

[434] Should the participants instead choose to (or have to in the absence of phone (1408)) use one of their smartphones (1402, 1403, 1404) to connect to the conference bridge (1410) this can incur significant extra costs. Whereas PBX (1413) typically has access to low cost routing even if conference bridge (1410) is overseas, any time a mobile phone is used to call a foreign number or is used while roaming outside its home network, costs can be exorbitant. Conference calls often last for hours - incurring huge costs.

[435] This invention builds on the system described in UK patent application GB 1816697.5 - which describes, in detail, a system by which an application on each employee’s smartphone interacts with the company’s telephony infrastructure via a“Mobile Access Point” and an application on their smartphone. The conferencing features described in this patent application can be provided as additional functionality within that framework. In this case, the Conference Room Process (CRP) (1415) runs inside the Mobile Access Point and the Conference Participant Application (CPA) (1417) is part of the overall application running on the employees’ smartphones.

[436] Naturally, the scenario described applies not just where at least some of the participants are employees of one organisation but also to any set of individuals, at least some of whom have this (or a compatible) application (1417) installed on their devices and access to a common server process (1415) functionally equivalent to the MAP described above for use within a single business. In such cases, the central process (1415) is typically“in the cloud” - a service accessed via the Internet.

[437] Using this invention, any or all of the devices in the room that are capable of running at least the Conference Participant Application (CPA) (1417) should have this installed and run it to participate in the audio part of the conference - even if another application is providing the video, chat and other data streams. Alternatively, this CPA functionality may be embedded in such multimedia conferencing applications.

[438] For the invention to work, at least one of the devices in the room must be able to communicate with a CRP (1415). This may be located within the company’s network or in the cloud/intemet This component will receive audio streams from the personal devices

(1402, 1403, 1404, 1405, 1406) and, optionally, the desk/conference phone (1408) if present It typically accesses the latter via an internal phone call using the company’s PBX (1413).

[439] TTie CRP (1415) is responsible for establishing and maintaining a single audio connection to the conference bridge (1410) - thus appearing as a single (audio) participant in the overall conference.

[440] The CRP (1415) also receives the (single) audio stream from the conference bridge (1410) and routes it either direcdy to the desk/conference phone (1408) or, optionally, to one or more participating devices (1402, 1403, 1404, 1405, 1406). The device receiving this stream may output the audio directly via its own loudspeaker(s) or via a paired Bluetooth speaker or physically connected speaker. [441] Tire CRP (1415) receives audio streams from all of the personal devices that have joined the conference. It processes and compares these audio streams with each other and with the incoming audio stream from the conference bridge (1410) to determine which, if any streams, it will mix into the single audio stream it transmits to the conference bridge (1410).

[442] TTiis processing may include squelch level (do not send if level below a threshold); noise reduction, echo cancellation (within the room and with the remote parties) and/or automatic speech recognition algorithms.

[443] Tire process by which devices join the conference is described below.

[444] Each device running the CPA establishes a data communication path to the CRP (1415) and, preferably, reports its location and other information it can provide that will assist the CRP (1415) to determine where it is and hence which conference it is most likely to be involved in. This infoimation may include, but is not limited to: wireless network characteristics (such as base station address; signal strength; other networks visible; Wi-Fi SSID, BSSID and signal strength). Peer-to-peer networking can also be used to determine whether any other devices running this application are within range.

[445] One of the users of the CPA (1417) application will initiate a conference by selecting the“Start a Conference Call” button. They are then presented with any or all of:

[446] Dialpad for manual (or from clipboard) entry of a phone number or

[447] Contacts list(s)

[448] List of“well known” conference bridges (including typically those the company itself uses)

[449] They are also typically shown a set of options for“Audio Output Device”. This set may include a list of meeting rooms and their phone numbers - preferably filtered and ranked based on their current location such that the phone (1408) in the room they are in is at the top of the list.

[450] If they select one of these phones, the CRP (1415) immediately calls that phone (1408), typically via the corporate FBX (1413), preferably with a high quality connection (sampled at 16KHz or above and uncompressed or compressed with a higher quality codec than is normally used for PSTN or cellular calls) and someone in the room should answer it TTiis will serve as the default audio output in the room and, assuming it has speakerphone capability, one possible audio input stream that is now being received by the CRP (1415).

[451] Once the initial audio path has been established, it is beneficial to identify the other participants in the room and, if possible to ensure that their phones are silenced with regard to other incoming calls but their microphones are available to assist with the conference. To this end, the conference initiator’s phone (1402) advertises, (preferably via peer-to-peer Wi-Fi and/or Bluetooth) a specific service associated with this conferencing application.

[452] Where the operating system of the other phones in the room permits service detection from background mode, this is programmed to awake the application (1417). Where this is not possible, if the application (1417) has been advising the central process (1415) of each phones’ location, a“push” notification may be sent to the devices that are potentially within earshot of this conference - waking the application (1417) and thus allowing it to scan for the presence of this

[453] This alerts other phones near the initiator’s to the presence of a conference - the set of devices belonging to potential participants. However, not all of these may be in the room as the radio signals may be picked up in nearby rooms and/or the reported locations may not be accurate enough to determine which room each phone is in.

[454] TTie challenge is therefore to identify the set of devices that are present in the room and should be part of the conference. If a conference is being held in an open space where others may overhead the content, that is also of interest.

[455] The system therefore attempts to identify which devices are within earshot of the conference. It does this by alerting the previously identified said set of devices that a test audio signal is to be transmitted. TTiis test signal typically contains a spoken component (such as“Checking for potential participants”) and, optionally, a variable identifier that is easily recognisable in a received audio stream (such as a few DTMF digits or a sequence of single tones). TTiis variable identifier is sufficiently complex that it cannot be guessed or spoofed by an attacker but does not need to be overly complex as it only needs to be uniquely identifiable during the brief period of participant discovery. In other words, if a neighbouring table were setting up a similar conference at the same time, it is important to be able to distinguish between the two conferences. [456] TTie application (1417) on each of the devices that has been alerted to said discovery phase listens via any available microphone. For security reasons and to preserve bandwidth, each device analyses the audio it hears locally in preference to sending it to the central process (1415).

[457] Knowing the generic signal to expect (the speech) each can report to the central process (1415) whether or not it heard this in the few seconds following the alert and, if so, what volume level and signal to noise ratio it picked up. It can also report the identifying signal (DTMF or tones) that accompanied the spoken signal.

[458] The Conference Room Process (1415) may be managing many conferences but staggers the discovery phase transmissions so that only one is in progress at a time - so as to avoid any possibility of confasion across conferences. This typically provides sufficient security that the use of identifying tones is unnecessary. TTiere is a window oz typically, less than a second in which a report of hearing the discovery signal could be valid. Using several variants of the wording and/or speaker further enhances the level of security, making it very difficult for someone not in the room to know what audio to spoof exactly when to fool the system into thinking they are in earshot of a specific conference.

[45Q] Those devices that picked up the audio signal are therefore a subset of the aforementioned potential devices. This“within earshot” subset is preferably shown to the conference initiator - who may accept the fall set of participants with a single confirmatory touch or other action.

[460] Conversely, the initiator may reject any of the set of devices shown - but has at least been alerted to the fact that these devices (and presumably, therefore, their owners) can hear what’s going to be said in the conference.

[461] Should the“within earshot” subset of devices not include all of the people that the initiator can see in the room, this is an indication that not everyone will be able to hear the conference - given their current location relative to the audio output device and the volume it is producing.

[462] The initiator is therefore offered the options of increasing the volume of the output device and/or testing other devices as potential output devices. In the former case, they adjust the volume and the discovery signal is repeated. In the latter case, a discovery signal is played out of each of the“within earshot” set of devices in the hope that others beyond them pick it up. Thus additional devices may be added to the“within earshot” set - but their reception characteristics are noted relative to the device(s) that were playing the sound they detected.

[463] Having identified a potential participant and the initiator accepting them into the conference, a further audio signal (for example“Alex, joining”) may also be sent in the opposite direction - being played by the new participant’s device (1402) and hence picked up by the desk-phone (1408) anchor the other participants.

[464] Optionally, more sophisticated signalling can be employed to assess the distances between devices. For example, a“background” signal consisting of some music or tones may be played via, say fixed phone (1408) at (nominally) the same time as playing a greeting (“Alex joining”) at a specific phone (1402). The actual time at which each of these two audio signals plays will vary because of jitter and delays in the system.

[465] However, all microphones picking up the resultant audio in the room will be hearing the same thing - albeit from slightly different locations within the room. Again, the timestamps that they report back will not be directly comparable - because of jitter and delays in their own audio path. Within each received audio stream, however, the relative volumes of the two components of the received signal and, crucially, their relative offset from each other can be measured precisely.

[466] In a simple example, suppose the audio played by device A contains a single pure tone at N Hertz for 100ms and tire second, played by device B contains a single pure tone at M Hertz for 100ms. Each of the devices in the room can report the relative volumes at which it heard die two digits and the time difference between the centre of the burst of N Hertz and the centre of the burst of M Hertz. So, for example, if device C hears the N Hertz tone t milliseconds before the M Hertz tone but device D hears it t+2 ms later one can infer that (D to A) - (D to B) is approximately 0.7m more than (C to A) - (C to B).

[467] By disabling echo suppression during this phase, each device also hears its own audio output thus providing further contributions to the overall set of simultaneous equations that have to be solved to deduce the relative positions of each device. [468] By transmitting simultaneously or even in quick succession from more than two devices, this discovery phase can be reduced to a second to two even with many participants.

[469] Optionally, repeating the above test with the tones reversed (N Hertz played at device A and M Hertz at device B) allows any variation in the frequency response of the audio paths to be eliminated - by taking the average of the volume ratios across the two tests.

[470] By repeating this test for each pair of devices, a map of the locations of each device can be determined. The relative levels detected can also be used to infer how effective each microphone is at receiving audio from each of the other devices and hence a model built of which microphone(s) to use and what delay to apply to each in order to“beam form” the audio - to pick out individual speakers wherever they are situated in the room.

[471] The audio level and time-delay between the audio transmitted from a particular device and that received at each other device can be used to infer characteristics of the two devices and their distance from each other. A complete lack of correlation between transmitted and received sound is used to infer that the devices are not close enough to each other to be part of the same conference. This is a useful security measure that can help stop unauthorised listening in by those not in the room.

[472] The locally received audio during these exchanges (that picked up by desk-phone 8 while it is playing a predetermined sound and that picked up by the smartphone (1402) while it is playing a pre determined sound) may also be analysed to determine how good the local echo-suppression capability is at each device.

[473] In parallel with this, data packets are exchanged between each participating device (1402) and CRP (1415) to monitor and measure the suitability of the network path between them. If this is poor, the user may be prompted to select a phone (1408) in the room for the audio path rather than or as a backup to that via his device.

[474] Should the user not wish to - or not be able to - use a desk-phone (1408), the user may select“This device” as their audio output - in which case the CRP will stream the audio from the remote party (1411) or conference bridge (1410) to this device and the CPA (1417) will play it via the device’s loudspeakers). [475] Alternatively, the user may select a paired“Bluetooth Speaker” (1409) as their audio output path- in which case the CRP (1415) will stream audio to their smartphone (1402) but this will be played via a paired Bluetooth speaker (1409) rather than the internal loudspeaker.

[476] Each of the newly invited participants’ devices also prompts them at this time to silence their devices (or does so automatically where the operating system permits this).

[477] During the interaction, automated speech recognition (ASR) may be performed at any or all of the devices running the CPA (1417) anchor the CRP (1415). The latter preferably analyses each received media stream separately and may also analyse the differences between pairs of said audio streams. Preferably, said differences are calculated having first time-shifted one of the signals so as to maximize the correlation between the two - hence identifying and compensating for any time lag caused by the physical distance between the two microphones and the dominant sound source and the network links between the two devices and the CRP

(1415).

[478] The output of said ASR - including the confidence level it assigns the transcript can be used to infer which audio stream has the“clearest” audio signal (steady stream of transcript with high confidence level) and preferentially transmit that stream to the remote bridge (1410).

[479] Optionally, if the interaction is to be recorded, not only the audio stream transmitted to the remote bridge (1410) is recorded but also some or all of the individual input streams from the various devices running the CPA (1417). This multi-channel recording can be made available at replay time - to users anchor further ASR/transcription applications. By altering the gain of each channel and offering differences between channels, preferably automatically adjusting for time lag between each (estimated by time-shifted correlations) the

listener/application can hear more clearly what individual speakers were saying even if the resultant audio transmitted to the far end included multiple speakers talking over each other or had muted them as not coming from the strongest signal.

[480] In addition to the actual audio from each source being recorded, preferably a reduced bandwidth“summary” track is also recorded. This, for example, will typically include the volume (actually often“energy level” - proportional to volume squared) every 50ms or so; the output of any ASR; signal to noise ratio within that period. [481] An overall merged summary“track” can also be derived from these - showing who was speaking in a given time window and their transcript

[482] Preferably, interruptions are minimized by (where the operating system allows it and/or calls can be routed via the CPA (1417) or CRP(1415)} the suppression of audio alerts from other incoming calls and other notifications on the users’ devices. Where this cannot be achieved automatically, the user is reminded via the screen that they should mute or block such interruptions.

[483] Fig. 15 shows an exemplary architecture within which the invention may be deployed on a smartphone, tablet computer, laptop, desktop computer or similar (1501) whose user communicates with a plurality of counterparties via their devices (3, 4) - which may or may not be running this same application. Typically there are many different ways in which this communication occurs: via telephone, chat, email, text messaging, instant messaging and so on.

[484] Note that the discussion below primarily addresses a corporate employee using device (1501) and for whom the application provides their business phone number, via the corporate’s MAP (1502). The same architecture may be deployed by a company offering such services to members of the public. In this case, a subscriber or customer of this company can be provided with the same capabilities over their personal phone number.

[485] This example network includes a“Mobile Access Point” (MAP) (1502) as described in UK Patent Application GB 1816697.5. Whilst many of the features that follow may be deployed with an application entirely running within the end user’s device (1501), the routing of calls via said MAP (1502) rather than direct to counterparties (3, 4) with whom the user is communicating allows a number of additional benefits - namely the ability to analyse and act on the media streams to and from device (1501) and to reduce the amount of information held on device (1501).

[486] Device (1501) communicates with MAP (1502) via one or more network paths. For example, via a cellular base station (1510) and cellular (voice) network (1507); via a 4G cellular data network into the internet (1507); via public (1512) or corporate (1511) Wi-Fi. [487] Mobile Access Point (1502) may be a server within the corporate network (1505) or hosted in a public data centre or“in the cloud” accessed via the internet

[488] Note that the end user’s device (1501) has no need to be aware of the actual addresses of counterparties. So long as it can ask the MAP (1502) to establish a connection to a specific counterparty (1503), it merely needs to be able to identify counterparty (1503) to the MAP - allowing the MAP to handle the onward routing of the connection.

[489] The dominant smartphone platforms offer integral telephone dialler and messaging apps that have changed little in the last decade. TTiese typically keep separate logs of phones calls and messages despite the fact that the same phone number is used for both SMS messaging and actually speaking to people.

[490] However, there is typically a single“Phone” app - which handles calls to and from all mobile phone numbers (possible where the phone has dual- or e-SIM capability). Where the multiple contracts (and hence phone numbers) are provided in order to let the user appear as multiple persona (for example, private individual on one number; company employee on the other) a clear separation of calls for each number is required.

[491] It thus makes more sense to have one (messaging capable)“Phone” application instance per phone number present on the device than it does to have a single (voice only but all phone numbers)“Phone” and separate (message only but all phone numbers) “Messages” application.

[492] Most users also have one or more E-mail accounts accessible via their phone (1501). Although the phone’s integral“Contacts” app will associate an email address and phone numbers) with an individual, the emails exchanged with that individual are usually only visible in the email app.

[498] Unified Communications apps and instant messaging apps abound - providing further channels and associated identifiers for the people and business one interacts with. Knowing which route your friends, colleagues, business contacts or groups thereof are using at any one time is increasingly difficult - and changes frequently.

[494] Many“Unified Communications” applications have been developed but most are overkill for the basic telephone calls and text messages that still make up a substantial portion of the interactions with those outside one’s own company. This invention therefore remains “phone centric” but allows gradual integration of other communications channels as the user becomes familiar with the application.

[496] Rather than a separate app for each of the plethora of communications services, this invention can provide a single“hub” app that brings together as many as possible of the interaction channels to show a combined interaction history for each counterparty. This typically replaces die“Phone”,“Contacts” and“Messages” functions which are normally considered as separate apps.

[496] Where a single smartphone or tablet is used for both business and personal communications, it is increasingly important to keep data from the two domains separate - even if (thanks to the use of the MAP) the phone is reachable via more than one public and/or private phone number. This is achieved by having two (or more) instances of the app on the device and, preferably having two separate telephone numbers - one (often private, internal rather than publicly dialable (“DDF or“DID”) business number, owned by the employer and the other, personal number, owned by the individual. Each instance interacts with a specific MAP and presents the user as having a particular telephone number, keeping the interactions it enables separate from those on other instances.

[497] The app on user device (1501) operates as an extremely“thin” client with as much as possible of the business intelligence, routing, call control and - most importandy - personally identifiable information stored in a secure server - the“Mobile Access Point” or MAP (1502) rather than on the mobile device (1501) itself. Doing so allows businesses to deploy this app without requiring an MDM platform to be installed.

[496] Preferably, the interaction between the app and the MAP is restricted to a single data stream, using a single Unreliable Datagram Protocol (UDP) socket for signalling, administrative and real-time communications. However, separate channels for these may also be used. For example, signalling may be via a Session Initiation Protocol (SIP) channel while audio is carried over Realtime Transport Protocol (RTF).

[499] When used in a personal capacity, some or all of the functionality of the MAP (1502) may be running on the mobile device itself or the individual may subscribe to a publicly available MAP service. This affords the individual many of the same features and data security that business achieve using the MAP approach. For example, minimal data loss and exposure should the mobile device be lost, stolen or compromised.

[500] Note that although this specification refers to“mobile devices”, the app can be run on a wide range of platforms including but not limited to smartphones, tablet computers, laptops, smart TVs, desktop computers, browsers on any device and so on. The app may run “native” or via a browser or cross-platform tool. If the device on which it runs does not have an integral telephone capability or it needs to use a telephone number other than the one (or more) by which the device identifies itself on the public telephone network, this is achieved via the MAP (1502).

[501] Rg. 16 shows an exemplary“home” screen from the app - such as would be presented immediately upon accessing or opening the app.

[502] This app is likely to be fire most used app on the device and hence very likely to be permanently“pinned” to the bar of most commonly used apps at the bottom of screen.

[503] The most commonly used controls are therefore also placed at the bottom of the app’s home page - within easy reach of the (typically) thumb that just opened the app.

[504] The top bar allows access to user preferences (1601) and settings (1602) when needed.

[505] The body of the page is broken into three regions.

[506] At the bottom, always in the same position and easily reached with the same finger that just selected the app (assuming the app is pinned to the bottom bar as is usually the case), are controls that are used vey frequently.

[507] If the user knows all or part of the number they wish to dial, touching the Dialpad icon (1603) brings up die numeric telephone dial-pad. It also shows die numbers most recently dialled. As the user dials digits, recently dialled numbers that match and entries in the app’s Contact kst that match are shown - allowing the user to select one of those rather than continue dialling the full number.

[508] Increasingly, however, users are identifying who they want to contact by name rather than telephone number. Touching die search box (1604) brings up an alphanumeric keypad (and the search box (1604) automatically moves up above it to avoid being obscured by it) allowing the user to start entering all or part of the desired counterparty’s name. [509] Note the microphone icon (1605) present in the search box (1604). Touching this enables speech recognition capabilities allowing the user to speak the name of the counterparty they are looking for.

[510] Note that text entry and speech may be combined - for example typing enough characters to identify a subset of matching counterparties against which a spoken utterance is then matched (e.g. typing“ch” brings up several“Chris”s and you then touch the microphone icon (1605) and say“Blair” to pick one with that surname).

[511] TTie bulk of the home page is split into two parts - and the division between them may be varied - for example by dragging divider bar (1606) up or down. Each of the two regions is, itself scrollable, holding more information than is visible in Fig. 16.

[512] The lower section (1607), which is most easily reached with the same finger that (typically) just touched the bottom of the screen to open the app, shows the user’s“Favourite” counterparties (hereafter the“favourites pane”). This contains a number of touchable areas (hereafter“chips”) labelled with the nickname or name of a counterparty (individual or group). There is also a“voicemail” chip (1609) - preferably in a fixed position. The others represent individuals or groups of people, businesses or apps with whom the user most frequently interacts.

[513] TTie shape, cotour, texture, decoration and/far other visual attributes of each chip may be used to indicate specific attributes of that individual. For example, the colour may indicate the state of the most recent interaction with that party (sent a message, received but not read, read, deleted...) while the shape of the button may indicate whether or not this user initiated the contact; the size of the button may indicate its age; the border colour may indicated colleagues/customers/suppliers; a superimposed number may indicate the number of unread messages or missed calls - and so on.

[514] Preferably, users are gradually introduced to additional attribute mappings and are able to select which display attribute(s) are used to convey the values of which attributes of the interactions.

[515] With the exception of the voicemail button (1609), each button in this region (1607) actually has at least four touch functions. [516] Tap in the centre to bring up that counterparty’s contact history (as shown in Rg. 17).

[517] (Optionally) Tap at the left to initiate a real-time connection to that counterparty - and bring up their contact history.

[518] (Optionally) Tap at the right to start composing a message to that counterparty - and bring up their contact history.

[519] Long press (or heavy pressure where available) to bring up a dialog allowing you to change the definition of the favourites button. This may include:

[520] Removing the button from favourites.

[521] Modifying the nickname shown on the button.

[522] Changing the preferred means of real-time and/or messaging interactions with that counterparty.

[523] Adding that individual to or removing them from group(s) of which they are or should become a member.

[524] Block contact from this counterparty.

[525] Report counterparty for junfyhuisance calls.

[526] Note the vertical bar (1610). In this diagram the user is using English language settings so the chips flow left to right and then flow down the pane. So chips to the left of and above the bar (1610) are“before” it while those to the right or below the bar (1610) are“after” it

[527] Chips before the bar (1610) are proactively placed there by the user and stay in the position assigned. The user can rearrange the order of these chips by dragging and dropping them within the favourites pane (1607) - to bring die ones they need most often into easiest reach; to organise them (for example, bringing colleagues, customers and suppliers into contiguous regions.

[528] Chips after the bar (1610) are dynamically generated by the app according to the recent communications history. These represent counterparties that the user has interacted with most in the recent past - but who are not yet“pinned” to the left of the bar (1610). The duration of“recent past” is preferably a user preference setting. [529] These chips are ordered by frequency and/or nature of contact (may be weighted according to how recendy each contact was). For example, those nearest the bar (1610) may be counterparties whose most recent incoming calls have been missed. The following chips may then be ordered by“density” of contacts. Note that the upper portion (1608) of the screen shows interactions in chronological order - hence the most recent interactions are visible there. These dynamic buttons are therefore preferably assigned alternative ranking criteria.

[530] By dragging divider bar (1606) up or down, the user expands or contracts this “dynamics” region. Any available space is automatically filled with more counterparties. Dragging it up includes another row or two - showing those contacted less frequently or recently than the ones already showing.

[531] The user may drag a chip currently positioned after die bar (1610) to a position before the bar (1610). Doing so implicitly indicates to the app that this counterparty should now be treated as a“favourite”. A dialog is then shown which encourages the user to select a nickname for this counterparty - suggesting common combinations (initials, first name only, first name + initial of surname and so forth). This not only reduces the space needed for the label on the chip, it reduces the data leakage from any screenshot or over-the-shoulder sight of the app.

[532] Use of shortened names is further encouraged by restricting the maximum width of these chips - truncating the name if the user has not already provided a shortened nickname that will fit within the available space. If the user wants permanent, instant, one-touch access to this counterparty then they must accept that limited information can be permanendy visible on screen.

[533] Preferably, the application will not allow the user to label a favourites chip with the counterparty’s full name (surname and first name in either order, with or without spacing/separator character^)).

[534] The user is also prompted to choose or confirm die default real-time and messaging mechanisms by which they will contact the counterparty. These become the tap actions for the left and right-hand ends respectively of the resultant favourites chip. [535] TTie“+” button (1611) allows the user to add a favourite from the overall Contacts list even if they are not shown in the dynamics region.

[536] TTie remainder of die home page (the“interactions” pane) (1608) shows a primarily chronological record of this user’s interactions with others. As with the favourites area (1607), counterparties are labelled by nickname where one has been assigned.

[537] The user can drag a row from the upper portion of the screen (1608) into the favourites area to add a chip for that counterparty to their favourites pane.

[538] User preferences (typically set on a slide-in drawer accessed from the top left button (1601)) determine:

[539] Whether the most recent interaction is at the top or bottom of this“Interactions” pane

(1608).

[540] Whether any“pinned” counterparty entries are at the top or bottom of the pane

(1608).

[541] Whether counterparties appear only once in the list (according to the time of more recent interaction) or in the chronological list every time an interaction with them occurred.

[542] Whether interactions from each“channel” (phone, SMS, voicemail, Email..) are shown or not

[543] Whether all interactions in a channel are shown or just those in specific states - for example: undelivered, missed.

[544] Each entry in this pane (1608) represents the interactions between the user and a specific counterparty (individual or group). Note that it does NOT show the counterparty’s phone number or contact details as this (frequently used) screen is often visible to onlookers. It does show:

[545] The nickname (where available) of the counterparty, otherwise their full name.

[546] Time or relative time of last contact Deliberately only shows time of day (not today’s date) for yesterday’s calls. Only if you scroll down to much older entries will you find fully described dates (and even those will not include the year until looking at earlier than“Last Year”). TTiis ensures that any screenshot or photograph of the screen has as little value as possible if leaked.

[547] TTie left-most icon (1612) shows the preferred real-time communications channel (most commonly the telephone). TTie display attributes tell the user about the most recent real-time interaction with this user. For example, a red phone icon may indicate that the counterparty called but the user missed their call; grey translucent icon may indicate that they have not used this channel yet

[548] If the counterparty has left any messages via the real-time channel (such as a voicemail for example) an icon (1613) showing that real-time channel’s messaging service is shown. The display attributes tell the user about the most recent message. For example red or not according to whether or not they have already read it; size indicating how recent it is; number of unread messages in a small circle at one comer of the icon.

[549] Right-most icon (1614) shows die preferred messaging communications channel. TTie display attributes tell the user about the most recent messaging interaction with this user. For example, a red message bubble may indicate that there is an unread message. Note that the relative size of this icon (1612) and the two at the left (1612, 1613) can indicate which was the most recent

[550] Note the orientation of the phone (1612) and message (1614) icons. Traditionally, the integral telephone app on both major platforms has used a handset icon oriented North-west to South-east This is therefore used as the“outgoing” or“I used that app” icon while one reflected so as to be oriented North-east to South-west or (as shown here, rotated 180 degrees) is used to indicate a call placed by the counterparty TO me. TTie same apphes to the message icon (1614) - shown here with the tail bottom left (message from me) or top-right (message to me).

[551] A graphical“thumbnail” (1615) of the recent interactions with the counterparty is shown beneath their name (or, where available, nickname). This represents a chronological timeline of interactions using the icons associated with the various channels used to identify each contact Their display attributes show, for example, who called whom. Spacing symbols to show intervals of time (dot per hour during today, thereafter thin vertical bar for day boundary, thicker vertical line for week boundary and so on). To make it clear that the leftmost icon is most recent, the size of the icons decreases for older contacts shown to the right of the most recent one.

[552] Optionally, a brief (typically one line truncated with summary of the content of the last message exchanged may be shown (not shown in Fig. 16).

[553] Actions available by interacting with this pane (touching or mouse-click for example) include:

[554] Tap or click the icon (1612) on the left to initiate a realtime interaction with this counterparty via the preferred real-time channel (or most recently used if no preferred channel yet established).

[555] Tap or click the real-time channel messaging icon (1613) to access the most recent message left and also see it in the context of that counterparty’s interaction history (Fig. 17).

[556] Tap or click the icon on the right (1614) to start composing a message to this counterparty via the preferred messaging channel (or most recendy used if no preferred channel yet established).

[557] Tap or click the voicemail icon (1613) to view voicemail messages left by this counterparty and (optionally) automatically start playing the most recent

[558] Tap anywhere else on the entry to drill into that counterparty (see Fig. 17).

[559] Swipe left to remove the entry from the pane. Used on any interaction that does not require further action.

[560] Long press (or heavy pressure where available) to bring up a dialog allowing you to:

[561] Pin this counterparty’s entry to the interactions pane (1608).

[562] Add counterparty to favourites pane (1607).

[563] Block contact from this counterparty.

[564] Report counterparty for junk/nuisance calls.

[565] Block further incoming communications from this counterparty

[566] Report this counterparty for cold-calling; abusive calls etc. [567] Note that messaging (1614) and real-time channel controls (1612) are deliberately kept as lar apart as possible to avoid accidentally phoning someone when you only meant to message them.

[568] Fig. 17 shows the Contact Histmy screen for an exemplary counterparty - whose nickname (1701) is shown in the top bar (1702). No unnecessary contact details or personal information are shown. If these are required, the information button (1703) pulls up the full Contact details entry showing the usual number, address and preference details and the ability to add, edit and delete these. User preferences relating to this screen are accessed and modified via menu button (1704).

[569] Each of the interactions shown as an icon in“narrative” image (1615) in the interactions pane (1608) of the home screen (Fig. 16) would typically be expanded here as a separate item in the chronological list

[570] Where the counterparty’s time-zone anchor work hours are explicidy set or can be inferred (for example, from their telephone number’s country code) the time in that region is shown (1705). Pteferably the display attributes of this time indicator (and/or associated icon (1706) show whether this is in normal business hours; outside business hours or an anti-social time (such as 3AM). Preferably this timestamp and warning are shown close to the button (1707) that, if pressed will attempt a real-time connection to the counterparty.

[571] Alternatively, a pop-up dialog alerting the user to this anti-social time may be presented and the user given the option to cancel. This latter option should apply to any other buttons on screen that also trigger a real-time connection attempt (1708 for example). Alternative options of leaving a voice message, sending a text or email may be offered instead. Public holidays may also be taken into account in this process - ensuring people are not disturbed unnecessarily.

[572] Buttons (1707, 1709) at the bottom of the screen allow new contact with this party to be initiated via any of the channels through which they can be contacted.

[573] Options on the user preferences dialog, accessed by touching menu button (1704) allow;

[574] Hie entries to be filtered to just those from specific channels [575] TTie entries to be filtered by state. For example, just show unread or missed calls; urgent or“flagged” messages only.

[576] How far back to keep contacts

[577] Shows which groups this individual is part of and allows them to be added to other existing groups or create a new group with them in and then add others.

[578] Each real-time interaction or message shown in the body of this pane shows some or all of:

[579] Time since interaction and/or date/time of interaction.

[580] Direction of the interaction (via text, icon or nominated display attribute)

[581] Status of the interaction (via text, icon or nominated display attribute)

[582] Duration or size of the interaction (real-time and messaging respectively).

[583] Time taken to answer real-time interactions.

[584] Summary of content or full content if possible of a messaging interaction. Typically just the Subject line of emails; first line or two of a text. Keywords identified from automatic speech recognition anchor natural language processing.

[585] Whether or not a recording exists of the real-time interaction and, if so, play (from start) button and audio waveform allowing replay from any position.

[586] If speech recognition has been applied during the call or to a recording of it, any keywords or icons representing them.

[587] Actions available by interacting with an individual contact on this pane (touching or mouse-click for example) include:

[588] Tap to expand the content in-silu where possible or to open the content within the appropriate app to reacfyView/play the message or (if available) recording of the interaction.

[589] Icon to initiate a call with or compose a message of the same time to this counterparty. Preferably (not shown in Rg. 17) icons resulting in real-time interactions being initiated are placed as far as possible from those creating messages. This helps avoid disturbing counterparties by accident when a message was the intended touch action. [590] Swipe left to delete interaction.

[591] Long press (or heavy press where available) to bring up a dialog allowing:

[592] Pin this message to top or bottom of pane (out of chronological order)

[593] Flag this message as important

[594] Send this message for transcription, further processing.

[595] Send details of this interaction to other application (such as a CRM application).

[596] Forward, Reply or Reply All

[597] Escalate to management - sending contact history and recordings if available.

[598] The above features significantly increase die power - and hence, inevitably, the complexity of the user interface beyond the simple“dialler” and messaging applications typically present on a smartphone. Preferably, therefore, these advanced features may be initially suppressed and only introduced one at a time as the user becomes familiar with the app.“Tip” notifications may suggest that the user tries adding a further feature every so often.

[599] Privacy concerns - and associated regulations such as the EtFs GDPR - dictate that the amount of personally identifiable information present on screen and within the

(vulnerable) end user device should be minimized.

[600] The data structures that achieve this goal are shown in Fig. 18.

[601] The end user device (1501), has a globally unique DevicelD (1805) associated with the hardware it is running on and a globally unique application ID (1806) representing this instance of the application (for example, GUID created on first running the application). These are communicated to the MAP (1502) allowing the device (1501) to store as little as possible - namely:

[602] A local party identifier and nickname (1801) by which the user is able to recognize each counterparty to which it refers - but which is of little use to a thief without access to the mappings (1802) between these local IDs, the device’s ED (1805) and the Application instance ID (1806) and the contact’s globally unique identifier. The full contact details (1807) identified by the global identifier are stored securely, remote from the (vulnerable) end user device.

TTie latter a [603] For each message that it needs to show to the user (a small subset of the total) a local message ID and minimal information such as would be shown when the message is presented in a scrollable list (date/time, channel via which the message or call flowed) and a short summary. As with contact details, this information is of little value without the mapping table (1804) that associated it with a specific party and content

[604] Full details of a contact are only provided to the end user device (1501) if the user explicitly requests them and are not persisted there.

[605] Full details of a message are not provided to the end user device (1501) unless the user explicitly views the detail of a message - and are not persisted there.

[606] Note that although the data is shown inside MAP (1502), it is typically persisted in a database remote from the MAP and only a subset is held in memory within the MAP at any given time. The important point is that the information present on the end user’s device (1501) is pseudonymized with no easy route back to die actual content or message detail without access to the separately stored and tighdy secured mapping tables (1802, 1804) and the underlying data (1807, 1808). For added security, the mappings (1802, 1804) may be stored and secured in a separate database from the contact and message information (1807, 1808).

[607] The app running on User Device (1501) therefore minimizes data leakage and potential loss as follows:

[608] Displays nicknames rather than full names where practical.

[609] Displays partial timestamps not full date/time combinations where practical.

[610] Displays icons for usable communications channels but only shows the counterparty’s address on each when explicitly accessed.

[611] The MAP assigns a locally unique identifier to each counterparty dial this individual interacts with via device (1501). These could be GUIDs but, to reduce space and bandwidth requirements, given that each device is uniquely identifiable (normally through its hardware identity (1805)) and each instance of the app registers uniquely with the MAP using a GUID (1806), it can simply be a sequential (or, preferably random) integer. Only the MAP knows, for example, that this app’s counterparty 1745 is actually individual John Doe. [612] Assigns a locally unique identifier by which this, and only this instance of the app refers to a specific interaction with a counterparty. This, in combination with the unique counterparty ID makes it harder for data seized from multiple instances of the app to be cross referenced.

[613] Communication with the MAP therefore references the app’s local counterparty reference rather than the underlying identity.

[614] When showing potential matches to a Contact lookup, those counterparties that have not yet communicated with this instance of the app can be assigned a temporary ID which is released if they are not selected or are no longer included in the search results. Only on actually contacting or attempting to contact the counterparty is a local identifier assigned permanently.

[615] The mapping of counterparty attributes to chip/contact history display attributes is not stored in the phone or communicated with it except when a new attribute is being assigned. For example, chips appearing as square buttons may indicate that the counterparty is a customer - but once defined, this is never again explicit in the data passing between device and MAP and is not even stored on the device. Hence loss of tire device, even if the data and source code within tire app are accessed tells the hacker no more than“counterparty with nickname XY appears in square buttons” - which they could have seen by looking over your shoulder.

[616] Local party and message IDs may be reused rapidly. They only need to be unique within the device (1501) so if, for example, a set of messages is shown and then deleted, die local IDs used for those can be immediately reused for the next batch of messages. This further obfuscates the true identity of the messages. The same goes for counterparties. For example, 20 possible matches to a search are temporarily assigned local IDs but when that search completes, all the unused ones are destroyed allowing their IDs to be reused on the next search.

[617] Local IDs can actually be assigned and persisted in the MAP. An instruction from the app that a user wishes to“pin” a particular counterparty to their favourites pane may signal that a particular ID should now persist rather than potentially be recycled within the current session. [618] Local IDs may be cleared and reset to new, random values on successive uses of the application. For example, the initial handshake between app and MAP may include a refresh of some or all local ID data. Thus any IDs stolen are of very little use as they have a very short lifespan as well as very little information associated with them.

[619] TTie rate at which data can be extracted from the MAP may be throttled to prevent malicious code pretending to be an app (should someone crack the encryption and protocol by which a user would look up one or more contacts for example. TTie MAP may respond with a“Proof of Work” puzzle before accepting further requests for information. The server (MAP) sends client (putative app on device) a puzzle (of variable complexity depending on how threatened the server is feeling - for example, may omit on first request, include on second within a set timespan). Hie client has to solve the puzzle and send the solution with its next message. The puzzle is a trap-door algorithm - easy to set, difficult to solve, easy to check answer (without having to maintain state). Note that this same mechanism can be used as part of the initial handshake to protect against Denial of Service attacks.

[620] During contact searches, when more than one match is presented on partial entry of a name or addres^faumber, the search results are removed from view after a much shorter inactivity time than the normal screen dimming or idle timeout This is typically five seconds, which is long enough to take the next step in a search but quick enough to hide the results promptly if you abandon the search. On resuming the search, by focusing on the text entry field again, the previous results are shown allowing further refinement of the search and/or scrolling through the results. This reduces the chances of the information being leaked by screengrabs or photographs of the phone. It also reduces the incidence of“pocket dialling” - since the contacts* names and details are not shown on screen for long, there is less chance that they are accidentally touched and dialled as the phone is placed in a pocket or randomly touched.

[621] A key security concern is that it is relatively easy to“spoor any calling line identification - especially on a PSTN phone. So a call may not actually be from who it appears to be from: your bank, for example. It is also relatively easy to fool people into checking your identity via a telephone number that is itself fake, or intercepted somehow. [622] Furthermore, this application makes telephony available via mobile phones that can be equipped with this application and hence appear to be calls from a legitimate business number.

[623] It is therefore beneficial that the system:

[624] Ensures that calls made via the system are made by the authorized individual.

[625] Provides a means of verifying that the counterparty is who they purport to be.

[626] Adding multi-factor authentication - whereby confirmation codes are exchanged via another channel, for example - is cumbersome and not practical for many phone calls. TTiere is therefore a need to enhance the security of these real-time communications channels - particularly the public switched telephone network but also online streaming audio where it is much easier to hide your true identity behind some stolen images and a voice-only channel than it would be if you had to show your face on video.

[627] By routing all voice calls via the MAP (1502), this invention allows the easy and secure deployment of speaker authentication algorithms - making them immediately and transparently available for the analysis of either or both sides of any voice call. Such algorithms are already widely used - for example, in telephone banking lines where a few seconds of audio at the start of the (typically unscripted and arbitrary) conversation is sufficient to produce a parameterized model of that speaker’s voice suitable for comparison with a previously enrolled sample of a positively confirmed individual

[628] This has two primary use cases:

[629] Ensuring that only the appropriately authorised employee is able to make or take calls on a specific end user device (1501) over a particular business owned phone number.

[630] Totting an individual receiving a call on a device (1501) verify that the call is over a genuine business line owned by the company they are led to believe it is from and that the person they are speaking to is the appropriately authorised person entitled to be calling over that hne.

[631] The methods employed in these two cases are detailed below. [632] For case 1 above, an initial enrolment procedure requires that employees use a secure corporate website form where they provide the personal phone number (if any) that they will be using to place calls appearing to come from their business number.

[633] They are then provided with a unique QR Code (as this is more secure than having any human readable configuration info) and details of where to download the app. The app will preferably NOT be downloaded via public app stores - only by individual invitation through corporate route so as to allow embedding of company specific information and reduce the chance of this being obtained by non-employees. Accessing this download point will preferably require the employee to be signed in to their corporate active directory or Windows domain account

[634] On first running the app:

[635] The employee must provide necessary permissions (microphone, camera, [location]). Specifically, this must include access to the incoming SMS messages received by the phone.

[636] The employee is prompted to point the device’s camera at their QR Code.

[637] Tire app interacts with the MAP (1502) identified in the QR code - to validate the QR code (identify employee, cell number and that it has not be revoked).

[638] An activation code is sent to the cell-phone number associated with the employee.

[63Q] If this is, indeed, the cell-phone that it is purported to be, the incoming text message will be received by the app and the embedded security code accepted if received within the (very short) time window allowed. This time window is such that it would be very difficult for someone to transcribe the code from another device. Preferably said code include invisible characters and/or non-standard characters (such as obscure emojis) to further hamper such efforts).

[640] Employee is instructed to speak and repeat several phrases or read a paragraph as required for enrolment with the voice verification system.

[641] Whenever the app is then used by an employee to make or take realrtime calls that include a voice element (telephone, video conference, live-streaming) or voice messaging applications such as leaving a voicemail, their speech - which passes via MAP (1502) - may be analysed on-the-fly and compared against the sample obtained when they enrolled with the application or provided via their employer.

[642] Where the comparison does not provide a strong match, a check may be made by sending the employee an email asking them to confirm that this was indeed them - and, if not, block their number from making further calls. A strong rejection, on the other hand

(very likely not the authorised speaker) leads to the number being blocked immediately until reinstated manually.

[643] For the other use case above, where a user of the application receives a call from someone they do not know personally or whose voice they do not recognize for sure, they can also use the service to take a voiceprint of the counterparty. This may require the counterparty to be notified and/or give consent - according to the appropriate local, national or supra-national regulations. The application can assist with this - for example, by playing spoken disclaimers, explanations and capturing consent via a recording. Alternatively, and preferentially, verifying the identity of a caller - say, purporting to be John Doe from BigCorp - can be verified by emailing with the explanation and legal terms -

and a random security code. If the counterparty is then able to recite said security code, this proves they are able to receive emails sent to that address.

[644] However, for added security, BigCorp can subscribe to this invention’s publicly available“Voice Verification as a Service” (WaaS) offering as described below.

[645] A highly secure exchange of information between the WaaS provider and each subscribing company results in a trusted, secure communications channel being permanently established between the two.

[646] The subscribing company maintains a live list of current employees - for each of whom is stored:

[647] Business E-mail address

[648] Business phone number - of a phone running the app in this invention

[649] a code obtained from the WaaS uniquely identifying their phone handset (and not visible to anyone in the organization, including the employee being enrolled).

[650] (optional) schedule of hour¾/days this employee will be verified. [651] (optional) geofence of location(s) within which this employee will be verified.

[652] Securily level: whether voice-print only is sufficient or whether automatic verification via email (or other routes) is required.

[653] Whoyhow to alert on failed verification attempts.

[654] Expiry date - before which the entry must be refreshed or will be invalidated.

[655] On adding an employee to this list, the WaaS provides a one-time access code, with short lifetime, preferably in non-human-readable form (such as a unique QR code).

[656] The employee contacts the WaaS via the public switched network and scans said QR code into the application.

[657] The combination of unused, unexpired, not revoked QR code being received via the expected business telephone number allows enrolment to begin. Tie user may be asked to repeat some specific phrases to generate their initial“reference” voiceprint which is stored for future comparison.

[658] In parallel, the WaaS sends an email to the corresponding Email address. This contains a further, very short-lived QR code that the employee is instructed to scan immediately in order to complete the enrolment process. This ensures that the user has current access to the email account, not just a photograph of the original QR code taken from the real employee’s screen.

[659] Calls received by individual users of the app that appear to come from phone numbers thus registered with the WaaS are flagged as such - for example, by a distinctive ring-tone allowing them to be easily differentiated from cold calls from unknown and untraceable sources.

[660] The WaaS constructs a verification request meaning“does this voiceprint WW match that of the authorized employee for your phone number XXX and are they currently on a call to phone number or other address YYY?”.

[661] The above database may be held by the WaaS and updated by the subscribing company over said secure communications channel. In this case, the verification request is handled within the WaaS by querying the database for the stored voiceprint associated with the business number and an API call to the subscribing company merely queries whether business number XXX is currently calling YYY.

[662] Alternatively, die data may be held by the subscribing company and the full API call with parameters VVW, XXX and YYY passed over said secure communications channel to obtain the answer.

[663] Note that comparison of voiceprints results in a confidence level of how likely it is that the two originated from the same speaker. This can be represented on a continuous numerical scale, the ends of which represent“very likely match” and“very unlikely match”.

[664] However, the presence or absence of a call between XXX and YYY at the present time is a Boolean result Combining this with the voiceprint match allows a high degree of confidence that the speaker is who they claim to be even if the voiceprint match is not as strong as would be required to allow a user to access their bank account in tire absence of other confirmatory factors.

[665] Fraudsters may attempt to fool such a system by using a recorded fragment of the authorised speaker’s voice during the call - especially the initial greeting during which it may be expected that any analysis will be performed.

[666] There is also increasing use of pre-recorded greetings and fragments of speech recorded previously by the call centre agent and played on command to allow them to do other work while the recording is being played. It Ls important that the system identifies that the speech it hears is not all from speaker - as would be the case if someone fraudulently logged in to a genuine call centre agent’s desktop application. They would be able to place calls appearing to come from said agent and play these recordings - but would still have to use their own voice when interacting with the customer.

[667] Also, it is common to have music-on-hold or other pre-recorded announcements played while the call is placed on hold or is queuing for resources - such as during a transfer. Background noise during pauses could also trigger false negatives if a nearby speaker is detected and determined not to be the purported caller.

[668] These issues can be countered by, for example:

[669] Repeating the analysis at (preferably random) intervals throughout the call. [670] Waiting till the second turn of speech before analysing that of the caller (in other words, ignore the first contiguous utterance from the caller as this could well be a pre recorded greeting or announcement . Wait until both parties have spoken at least once, then analyse the subsequent speech.

[671] Using speaker separation algorithms (as widely used in speech recognition applications for tagging who is speaking on auto-generated subtides for example) to identify changes to the speaker and retesting the new speaker’s voice.

[672] Identifying exact repetitions of sections of audio that have been heard before and not using those firagments to verify the five individual on the call.

[673] Subscribing companies may provide copies of the recorded announcements (or instructions of how to call their call centre in such a way as to hear such an announcement). These are used in advance to generate reference voiceprints for the (typically small number of) announcers whose voices are used in these. These can then be compared against the current speech in a call and, rather than flag a“strong rejection” (since this speaker is definitely no/ the individual that is purported to be calling) they actually proved a slighdy increased degree of confidence that the call is from the source it purports to be.

[674] Music can be detected and excluded by various means. For example, monitoring the confidence levels of continuous speech transcription output will show a significant drop in recognition confidence during music. Alternatively, analysis of the frequencies present will result in voiceprints that he outside the scope of those that can be generated from human speech alone.

[675] Repeated fragments of audio - such as music on hold, announcements and pre recorded fragments of conversation can be detected and excluded from verification matching. This can be done, for example, by summarising the energy (amplitude squared) in a finite time window - generating an“energy envelope” pattern that can be compared against known common patterns using a sliding window to identify a match where there is high correlation. Typically a moving window automatic gain control algorithm is also employed to normalise the level of the reference and sample energy envelopes prior to comparison. [676] A“squelch” level can be applied. Following the first utterances from each side, the amplitude range of each direction of audio can be ascertained. Subsequent utterances may be ignored if these are suddenly significantly quieter than the previous interaction.

[677] Assuming there are (short) gaps between words, the minimum amplitude levels can be measured. This gives a signal to noise ratio which can be used to modify the threshold required for a voice verification match. For example, poorer signal to noise ratio means the voiceprint is unlikely to be as good a match as if it were taken from a cleaner signal.

[678] The initial reference levels for amplitude and signal-to-noise ratio may be modified over time by calculation of a moving average to allow for gradual changes such as may occur when walking around a building.

[679] Cn Integration with the subscribers telephony system may provide explicit information regarding the call that could influence the analysis of the speech that is being transmitted anchor received. In this case an application observes Cll events on the subscriber’s telephony system and alerts the WaaS via events each time a significant change occurs during the call. These may include but are not limited to: transfer to individual X; call on hold; call muted; announcement (preferably indicating which one) playing; conferenced in individual Z, recording state changed (often leads to tones being injected to indicate recording present or paused).

[680] As audio is typically being transmitted and received in packets of 20 or 30ms duration, the preferred mechanism of energy envelope determination is simply to sum the squares of the audio amplitude. This gives a low bandwidth“summary” of the audio levels much as you would see on a typical user interface for an audio system - where the individual words and gaps between them appear as peaks and troughs respectively.

[681] Note that an overall determination“NO” (this is unlikely to be who is claimed calling) does not expose any significant information about the validity or otherwise of the three independent input parameters to the query - VVW, XXX or YYY.

[682] These parameters of this request are so difficult to generate unless you’re actually on a call to that person, that getting a positive response provides a very high degree of confidence that the call and caller are who they purport to be. [683] If the recipient of the call wants further reassurance, they can ask the counterparty their name and submit that for verification too. A further level of assurance is available for highly sensitive calls by requesting that a security code number or word(s) be sent via email to the registered user of that business number. This could be typed in by the end user or randomly generated by the WaaS. On hearing the counterparty read that code or word(s) back before proceeding with die call, the user is assured that the counterparty is very likely who they claim to be.

Claims

What is claimed is:

1. A system consisting of an application running on a plurality of communications devices, each configured with the addresses of a plurality of mobile access points, each of which may be contacted via a telephone call over the public telephone network using one or more public network telephone numbers and over the internet by one or more addresses wherein all calls handled by said application are connected with desired counterparty address(es) such that the associated media stream(s) pass via at least one of said mobile access points and wherein said mobile access point also controls the onward routing of said media stream(s).

2. A system of claim 1 further characterised in that said public network telephone numbers are routed from the public switched telephone network to a private branch exchange in communication with said mobile access point over a company’s internal network.

3. A system of claim 1 in which voice antybr video connections from said mobile phone to said mobile access point are initiated concurrently via said public telephone number and said internet address.

4. A system of claiml in which said routing is via the corporate network of the company deploying said mobile access points.

5. A system of claim 4 in which said routing utilises the same least cost routing plan as used by said company for calls between its internal telephones and external parties.

6. A system of claim 1 in which a plurality of said mobile access points monitor each other’s state by means of network connections between them.

7. A system of claim 1 in which a plurality of said mobile access points monitor the integrity of the internal telephony system to which they and another mobile access point are connected by observing state changes occurring on at least one address assigned to said other mobile access point

8. A system of claim 1 in which one or more copies of the audio antybr video flowing to and from said mobile phone are transmitted to a recording antybr file storage or archival system.

9. A system of claim 1 in which only rails made via a subset of the mobile network connections available on said mobile phone are routed via said mobile access points.

10. A system of claim 1 in which calls originating from said mobile phone present calling party information identifying an alternative number.

11. A system of claim 1 in which said end address is communicated to said mobile access point via a data network in parallel with the establishment of the telephone call to said mobile access device and that said mobile device initiates a telephone call to said end address on receipt of said data message.

12. A system of claim 1 in which, prior to said end address answering the telephone call made to them the originating user is presented with a plurality of options each of which results in a different set of actions being taken with regard to how long the call rings and what is done afterwards.

13. A system of claim 1 in which more than one audio or video path is established for at least part of said call.

14. A system of claim 13 in which one or more performance characteristics of each of said paths is measured and used to determine which of said multiple paths is used for the transmission and/or reception of data.

15. A system of claim 13 in which a different audio stream is used for automated analysis than that which is used for transmission to and/or receipt of data from the participants in the call

16. A system of claim 1 in which the state anchor performance of multiple networks is monitored before and/or during said calls to determine which one or more of said networks should be used for said call.

17. A system of claim 1 in which said mobile phone is instructed via a data path to initiate a telephone call to a specified number thus establishing a call to a point to which the audio from an inbound call has also been routed and hence audio can be exchanged between said inbound call and said mobile phone without having to place a call to said mobile phone.

18. A system of claim 16 in which said network state and/or performance and location of said mobile phone is transmitted to a shared repository from which aggregated such data previously submitted by others and/or recommendations derived therefrom are retrieved.

19. A system of claim 16 in which said performance is determined at least partially by the transmission of known bursts of packets.

20. A system of claim 19 in which said bursts are triggered by starting media flowing over an RTF or SRTP channel previously established via SIP or SIPS.

21. A system of claim 19 in which said bursts mimic or are the result of using a codec supporting a silence suppression during short periods of sound interspersed with sustained periods of silence.

22. A system of claim 14 in which the connection used to transmit anchor receive voice or audio is changed during the call in response to user input and/or quality metrics taken of the data reception locally anchor data transmission quality as reported back by the recipient.

23. A system of claim 1 in which voice anchor video streams are transmitted anchor received over more than one path at the same time.

24. A system providing real-time data stream exchange or interaction between a plurality of parties in which control over aspects of said interactions is achieved by the deliberate insertion of and subsequent analysis and identification of one or more pre-determined phrases and/or visual cues within one or more of said real-time data streams.

25. A system of claim 24 further characterised in that said pre-determined phrases begin with a pre-determined word or phrase.

26. A system of claim 24 further characterised in that said control varies according to the state of the connections to the parties on the call.

27. A system of claim 26 further characterised in that said control affects the establishment of an as yet unconnected data stream and/or the alternative actions to be performed should said stream not connect

28. A system of claim 24 further characterised in that at least some of said data streams are routed via not direcdy between said parties but via a stream management node that variably derives the data transmitted onwards in any of said data streams from the data streams arriving at it from each party anchor derived for sources within or connected to said management node.

29. A system of claim 24 further characterised in that at least some of said control is performed by a private branch exchange controlled by commands from said stream management node.

30. A system of claim 24 further characterised in that at least some of said phrases are chosen from the set of those commonly used within telephone calls to advise the counterparty of an imminent action likely to affect said call

31. A system of claim 28 further characterised in that data regarding said interaction is transmitted across a second network connection to said node before or in parallel with the established of said data stream between the initiator of said interaction and said node.

32. A system of claim 28 further characterised in that in the absence of a viable data connection between the initiating party and said stream management node that data is transmitted in-band via the real-time data stream established between said initiator and said node.

33. A system of claim 24 further characterised in that said analysis is performed by an

Interactive Voice Response system accessed via a telephony system.

34. A system of claim 28 further characterised in that connection of two or more real-time data streams is attempted and optionally maintained between said stream management node and one or more of said parties.

35. A system of claim 24 further characterised in that inbound calls intended for a specific party are instead routed to a stream management node that then establishes a realtime data stream connection to said specific party.

36. A system of claim 35 further characterised in that said real-time data stream connection to said specific party is a circuit switched call over a telephony network and is initiated by said specific party in response to an instruction from said stream management node sent over a packet data network.

37. A system of claim 24 further characterised in that said phrases initiate a spoken dialog response between the speaker of said phrase and the server or service providing said analysis in which elements of said dialog are selectively transmitted to or concealed from one or more of the other parties in the interaction.

38. A system of claim 24 further characterised in that said control includes one or more of the standard telephony functions: hold, retrieve, blind-transfer, consultative-transfer, conferenced transfer, consult, conference, abandon, hang-up, call.

39. A system of claim 24 further characterised in that said control includes one or more standard auto-dialler functions including but not limited to: reschedule, reassign, change next contact attempt method, leave a message, store call outcome code.

40. A system of claim 24 further characterised in that said control includes one or more standard recording control functions including but not limited to: start, stop, pause, resume, tag, transcribe.

41. A system of claim 24 further characterised in that said control includes one or more standard agent assistive functions including but not limited to: play pre-recorded

announcement, play text to speech output, take payment or other details in secret

42. A system of claim 24 further characterised in that said control includes connection to anchor transfer to one of the following services with interaction continues: transcription, concierge, automated personal assistant

43. A system of claim 28 further characterised in that a subset of said controls may be performed by a party on the call other than the party that resulted in said stream

management node being inserted into the interaction.

44. A system of claim 43 further characterised in that said a further subset of said subset of controls may be given and acted upon whilst the content of the data stream or streams transmitted by said party are not being forwarded to any of the other parties in the interaction.

45. A system of claim 42 further characterised in that said subset of controls includes but is not limited to any o£ attract other party’s attention, advise other party of this party’s intended departure from the interaction, record a message for the other party ahead of disconnecting.

46. A system of claim 24 further characterised in that one or more existing databases containing employee, counterparty and/or contact details are read in order to populate the parameters for control dialogs involving employees, job functions and their associations with telephone numbers or other messaging service addresses.

47. A system of claim 46 further characterised in that said databases include those in one or more of: human resource system, workforce optimisation suite, recording system, private branch exchange system, corporate directory, public directory, domain server, active directory.

48. A method of providing real-time data stream exchanges between a plurality of parties in which control over aspects of said interactions is achieved by the deliberate insertion of one or more p re -determined audio anchor visual cues within one or more of said real-time data streams.

49. A system providing audio connectivity between a plurality of individuals within earshot of each other and at least one remote participant characterised in that audio is received via microphones in a plurality of said individuals’ smartphones, smartwatches, tablet computers, laptops, personal computing devices and selectively merged by a single controller to form a single resultant audio stream that is transmitted to the remote participants).

50. A system of Claim 49 in which sound generated by at least one of said devices is received via at least one other of said devices in order to determine their ability to participate in a shared audio interaction anchor to determine their relative positions.

51. A system of Claim 49 further characterised in that the audio stream from the remote participants) is output via a desk telephone, a conference phone, one or more of said personal computing devices or one or more loudspeakers connected physically or wirelessly to any of these devices.

52. A system of Claim 49 in which the contribution to said resultant audio stream is determined by the audio level anchor quality received at each such microphone.

53. A system of Claim 49 in which any of the orientation, gain, quality of and/or relative distances between said microphones and individual currently speaking are inferred from any of the relative volume, spectrum signal-to-noise ratio and time shift of audio received by them.

54. A system of Claim 49 in which any of the orientation, gain, quality of and/or distance of said microphones from any of said audio output devices is inferred from any of the volume, spectrum, signal-to-noise ratio anchor time shift of audio received at said microphone in response to a specific audio signal being played at one or more of said output devices.

55. A system of Claim 49 in which the audio from each microphone anchor the differences between pairs of audio streams are processed via automatic speech recognition algorithms so as to output a putative transcript and associated confidence level for each word or phrase.

56. A system of Claim 55 in which said differences are calculated from derivatives of the received audio streams which include time-shifts that maximise correlation between said streams.

57. A system of Claim 55 in which said transcript and/or confidence levels are used to determine which of said audio streams or differences between said streams is added to said resultant audio stream

58. A system of Claim 55 in which said transcript is used to access and show related information in real time on a shared display and/or on the individuals’ personal computing devices.

59. A system of Claim 49 in which one or more of the individual microphones’ audio streams is recorded along with the resultant audio stream such that each such stream, combinations of anchor differences between such streams can be replayed as required with each stream optionally automatically time-shifted so as to minimise echo.

60. A system of Claim 55 in which the strength of audio received at each microphone anchor the outputs of said speech recognition is noted and used to determine which individual was speaking at a given time.

61. A system of Claim 49 in which said audio connectivity is part of a multimedia conference between said individuals and said remote participants).

62. A system of Claim 49 in which said personal computing devices, on joining said audio connection automatically mute and/or prompt the user to mute their call alerts and/or block other calls from interrupting said shared audio connection.

63. A system of Claim 49 in which said personal computing devices join said shared audio connection via peer-to-peer data messages anchor audio signals sent between them that, on being received, identify at least a subset of the potential participants and the shared connection which they wish to join.

64. A system consisting of an application running on a plurality of communications devices, each configured with the addresses of one or more mobile access points via which communication sessions containing at least one audio stream are established with one or more counterparty devices via one or more network connections and wherein a time- bounded sample of at least one of said audio streams is analysed so as to determine a set of characteristics of said audio stream and where said characteristics are compared against a previously measured reference set of characteristics in order to test the hypothesis that the person speaking in said audio stream is the same individual from whose speech said reference set of characteristics were obtained.

65. A system of claim 64 wherein said hypothesis is tested repeatedly throughout said communication session.

66. A system of claim 64 wherein said hypothesis is tested repeatedly at random intervals during said communication session.

67. A system of claim 64 wherein said audio streams are transmitted in discrete network packets and the level of audio transmitted in each direction of a stream in a finite period is measured as the sum of the audio amplitude squared throughout a single packet of audio and the time sequence of said audio levels is stored as a representation of the energy envelope of said audio stream.

68. A system of claim 64 wherein said hypothesis is tested after the energy envelope of each direction of audio exceeds pre-determined level and duration thresholds in each direction of said audio stream.

69. A system of claim 64 wherein said sample is selected from periods whose energy envelope does not correlate with any of those in a predetermined library of energy envelopes.

70. A system of claim 64 wherein said audio stream is also subject to continuous speech recognition analysis resulting in transcription, confidence levels and speaker enumeration outputs.

71. A system of claim 70 in which output indicating a change of speaker triggers a further test of said hypothesis.

72. A system of claim 64 wherein the result of said test falling outside a p re-determined range is used to terminate said communication session in the event of said hypothesis being strongly rejected.

73. A system of claim 64 in which said test results are considered in combination with the current state of the telephony system from which said individual is purported to be calling, said state being inferred from the receipt of computer telephony integration events.

74. A system of claim 73 in which said events include but are not limited to transfer; hold; muted; announcement playing; conferenced; recording state changed.