WO2023144724A1

WO2023144724A1 - Method and system for providing post-interaction assistance to users

Info

Publication number: WO2023144724A1
Application number: PCT/IB2023/050635
Authority: WO
Inventors: Abhishek S N; Apurva MANDALIKA; Sitaram Ramachandrula; Addepalli Srisai Venkata Kamal MOHAN; Arjun Harish; Gattigorla NAGENDAR; Bandari MANOJ
Original assignee: [24]7.ai, Inc.
Priority date: 2022-01-26
Filing date: 2023-01-25
Publication date: 2023-08-03

Abstract

A method and system for providing post-interaction assistance to users by intelligent recording and timely retrieval of assistance is disclosed. The method includes receiving an assistance request, the assistance request includes an input corresponding to an object. Further, the method includes accessing one or more augmented content records from a database associated with the system based, at least in part, on the object. The one or more augmented content records include at least one or more assistances corresponding to the object. Further, the method includes generating an augmented visual session based, at least in part, on the one or more augmented content records. Further, the method includes facilitating a display of the augmented visual session on an electronic device associated with a user.

Description

METHOD AND SYSTEM FOR PROVIDING POST-INTERACTION ASSISTANCE TO USERS

Cross-reference to related applications

This application claims priority from Indian provisional patent application 202241004339, filed on 26th January 2022, which is incorporated herein in its entirety by this reference thereto.

The present technology generally relates to user service mechanisms and, more particularly, to a method and system for providing post-interaction assistance to users by intelligent recording and timely retrieval of assistance.

Background

Existing and potential users of an enterprise contact a customer service center or a user support center of the enterprise to seek assistance with a product or service. For example, a user may visit a website of a banking enterprise and connect with a user service representative such as a customer service representative to seek assistance in filling out a housing loan form. In another illustrative example, a user may connect with a customer service representative to seek assistance with different modes of operating an appliance such as an oven for example.

Conventionally, customer service representatives efficiently provide desired assistance to users in form of voice support, visual support, and the like which may include a sequence of instructions or steps to resolve the query and/or to troubleshoot the issues. For example, a user may seek assistance from a customer service representative to change his shipping address for a shipment ordered on an enterprise website. As such, the customer service representative may provide the required visual support, for example, by screen sharing to navigate options on the enterprise website to change the shipping address. Most often, the user acts on the solution after the interaction with the customer service representative and as such, makes a mental note of the instructions. However, the user may forget the sequence of steps to change the shipping address or have some queries at a later point in time while changing the shipping address and may not be able to navigate through options on the enterprise website. As the user requires further support, the user may reach out to the customer service representative again. In many scenarios, the users may have to wait a long time to interact with a customer service representative as all available customer service representatives may be engaged in serving other users. The prolonged wait to receive assistance from a customer service representative may be very frustrating for the user and can lead to ruining the interaction experience of the user while wasting their time as well. Moreover, even if the user interaction between the customer service representative and the user is recorded for further reference as an interaction record, the user may have to listen to the interaction record or read through the entire transcript of the interaction record to resolve the issue which may be cumbersome and time intensive for the user. In one example scenario, the user may have navigated through options on the enterprise website to track the shipping order but may face problems with locating an option to request a change of the shipping address. In such scenarios, the user may require assistance only with the final steps and not the entire process of changing the shipping address. As such, going through the entire interaction record may be a time-consuming process which may be frustrating for the user.

In light of the foregoing, there is a need to improve user service mechanisms such that the user does not face the hassle of going through an entire recorded user interaction for accessing some portions of assistance provided earlier by a customer service representative. More specifically, there is a need to facilitate intelligent storage of user interactions with customer service representatives to retrieve the relevant portions of the assistance provided to the user by the customer service representative. Further, it would be advantageous to utilize user interactions to create automated agents that provide an independent interactive mode of conversation to serve prospective users seeking assistance with the same or similar query thereby reducing the resources and efficiently serving the users.

A computer-implemented method is disclosed. The method performed by the system includes receiving an assistance request, the assistance request includes an input corresponding to an object. Further, the method includes accessing one or more augmented content records from a database associated with the system based, at least in part, on the object. The one or more augmented content records include at least one or more assistances corresponding to the object. Further, the method includes generating an augmented visual session based, at least in part, on the one or more augmented content records. Further, the method includes facilitating a display of the augmented visual session on an electronic device associated with a user.

A system including at least one processor and a memory having stored therein machine-executable instructions is disclosed. The machine-executable instructions, executed by the at least one processor, cause the system to receive an assistance request, the assistance request includes an input corresponding to an object. Further, the system is caused to access one or more augmented content records from a database associated with the system based, at least in part, on the object. The one or more augmented content records include at least one or more assistances corresponding to the object. Further, the system is caused to generate an augmented visual session based, at least in part, on the one or more augmented content records. Further, the system is caused to facilitate a display of the augmented visual session on an electronic device associated with a user.

A non-transitory computer-readable storage medium is disclosed. The non-transitory computer-readable storage medium includes computer-executable instructions that, when executed by at least a processor of a system, cause the system to perform a method. The method includes receiving an assistance request, the assistance request includes an input corresponding to an object. Further, the method includes accessing one or more augmented content records from a database associated with the system based, at least in part, on the object. The one or more augmented content records include at least one or more assistances corresponding to the object. Further, the method includes generating an augmented visual session based, at least in part, on the one or more augmented content records. Further, the method includes facilitating a display of the augmented visual session on an electronic device associated with a user.

A computer-implemented method is disclosed. The method performed by the system includes receiving an interactive session initiation request. The interactive session initiation request includes a two-dimensional input corresponding to a damaged object. Further, the method includes generating an augmented content record based, at least in part, on the input. The augmented content record indicating a three-dimensional model of the damaged object. Further, the method includes accessing one or more augmented content records corresponding to one or more users from a database associated with the system. The one or more augmented content records includes at least one or more assistances corresponding to the two-dimensional input. Further, the method includes generating an automated agent based, at least in part, on the one or more augmented content records. Further, the method includes determining via the automated agent, one or more damaged regions on the damaged object based, at least in part, on the augmented content record. Further, the method includes facilitating a display of one or more assistances on an electronic device of a user. The one or more assistances is generated by the automated agent based, at least in part, on analysing the one or more damaged regions.

The advantages and features of the invention will become better understood with reference to the detailed description taken in conjunction with the accompanying drawings, wherein like elements are identified with like symbols, and in which:

Fig.1

shows a representation for illustrating an example environment relevant to at least some embodiments of the present invention.

Fig.2

is a block diagram of the system configured to provide post-interaction assistance to users, in accordance with an embodiment of the invention.

Fig.3

depicts an example representation of a UI showing a medical insurance claim form displayed by a user to an agent to seek assistance during an interaction session, in accordance with an embodiment of the invention.

Fig.4

shows an example representation of a UI depicting an agent providing annotations to assist the user with filling up the medical insurance claim form shown in during the interaction session, in accordance with an embodiment of the invention.

Fig.5

depicts a user scanning a portion of a medical insurance claim form with a user device to retrieve relevant portions of agent assistance from an augmented content record, in accordance with an embodiment of the invention.

Fig.6

shows an example representation of a UI depicting options for accessing different portions of agent assistance provided for resolving a query from an augmented content record, in accordance with an embodiment of the invention.

Fig.7

shows a dialog tree created in relation to an interaction session between a user and an agent for building an automated agent, in accordance with an embodiment of the invention.

Fig.8

shows a flow diagram depicting a conversation flow between an automated agent and a user seeking assistance, in accordance with an embodiment of the invention.

Fig.9A, Fig.9B and Fig.9C

, and collectively, depict various graphical user interfaces representing steps for retrieving relevant portions of an agent’s assistance from a stored augmented content record to provide an augmented visual session, in accordance with various embodiments of the invention.

Fig.10A, Fig.10B and Fig.10C

, and collectively, depict example representations of the various stages of an interactive session with an automated agent for receiving assistance while processing an insurance claim, in accordance with various embodiments of the invention.

Fig.11

shows a flow diagram of a method for providing post-interaction assistance to users, in accordance with an embodiment of the invention.

Fig.12

shows a flow diagram of a method for providing assistance to users, in accordance with an embodiment of the invention.

Fig.13

shows a flow diagram of a method of generating augmented content records, in accordance with an embodiment of the invention.

Fig.14

shows a flow diagram of a method for generating an automated agent in response to the subsequent request received from the user, in accordance with an embodiment of the invention.

Fig.15

shows a flow diagram of a method for generating damage assessment result of a damaged vehicle, in accordance with an embodiment of the invention.

Detailed Description

The best and other modes for carrying out the present invention are presented in terms of the embodiments, herein depicted in FIGS. 1 to 15. The embodiments are described herein for illustrative purposes and are subject to many variations. It is understood that various omissions and substitutions of equivalents are contemplated as circumstances may suggest or render expedient but are intended to cover the application or implementation without departing from the scope of the invention. Further, it is to be understood that the phraseology and terminology employed herein are for the purpose of the description and should not be regarded as limiting. Any heading utilized within this description is for convenience only and has no legal or limiting effect.

The terms “a” and “an” herein do not denote a limitation of quantity, but rather denote the presence of at least one of the referenced items.

is an example representation 100 of an environment related to at least some example embodiments of the invention. The environment 100 is depicted to include a user of an enterprise, such as for example user 102. It is noted that the terms ‘user’ and ‘customer’ are used interchangeably hereinafter. The term ‘enterprise’ as used herein may refer to a corporation, an institution, a small/medium-sized company, or even a brick-and-mortar entity. For example, the enterprise may be a banking enterprise, an educational institution, a financial trading enterprise, an aviation company, a retail outlet, an e-commerce entity, or any such public or private sector enterprise. It shall be noted that the enterprise has not been shown in . Further, it is understood that many users may use products, services, and/or information offered by the enterprise. The user 102 is depicted to be a user of products/services offered by the enterprise for example purposes only. It is noted that the users of the enterprise offerings may not be limited to individuals. Indeed, in many example scenarios, groups of individuals or other enterprise entities may also be users of the enterprise offerings.

Most enterprises, nowadays, have set up dedicated Customer Service And Support (CSS) centres for providing desired assistance to the users or customers. A typical CSS centre includes a number of customer service representatives, such as agents, chat bots, self-assist systems, such as either Web or mobile digital self-service, and/or interactive voice response (IVR) systems. The customer service representatives are trained to interact with the users for providing information, sell to them, answer their queries, address their concerns, and/or resolve their issues. The environment 100 is depicted to include an example CSS centre 108. The CSS centre 108 is depicted to include two customer service representatives in form of a human agent 110 and a virtual agent 112 for illustration purposes. It is understood that the CSS centre 108 may include several human and virtual agents for assisting users/customers of an enterprise with their respective queries.

The environment 100 further depicts the user 102 accessing a website 106 associated with an enterprise using a user device such as an electronic device 104 over a communication network 114 shown in (referred to herein as ‘network 114’). Examples of the network 114 may include wired networks, wireless networks, or a combination thereof. Examples of wired networks may include the Ethernet, local area networks (LANs), fiber-optic cable networks, and the like. Examples of wireless networks may include cellular networks like GSM/3G/4G/CDMA based networks, wireless LANs, Bluetooth or Zigbee networks, and the like. An example of a combination of wired and wireless networks may include the Internet. It is noted that the user 102 may use any electronic device, such as but not limited to, a smartphone, a laptop, a tablet computer, a desktop, a mobile phone, a personal digital assistant, a web-enabled wearable device, and the like. Further, it is noted that the electronic device 104 may include necessary applications, such as for example, a web browser application to enable the user 102 to access the website 106 over the network 114.

In an example scenario, the user 102 may wish to interact with the enterprise in order to troubleshoot an equipment. For example, the user 102 may require assistance of a customer service representative to reset a router 118. In an illustrative example, the user 102 may access the website 106 and request an interaction session with a customer service representative for assistance with resetting the router 118. The term ‘interaction session’ or ‘interactive session initiation request’ as used herein refers to a temporary session of user interaction i.e., information exchange between the user 102 and an agent 110 in which the agent 110 may provide information (e.g., a sequence of steps) to assist the user 102 in troubleshooting a device/product or resolving a query. It should be noted that an interaction session between the user 102 and the agent 110 of an enterprise is initiated based, at least in part, on the input corresponding to an object received from the user 102. It is noted that the term ‘object’ may represent one or more objects as well. The input received from the user 102 comprises at least one of a visual including one of a video and image, an audio and audio-visual input.

The user 102 is connected with a customer service representative, for example, the agent 110, skilled at serving users with troubleshooting routers. The agent 110 may request the user 102 to display the router 118 to ascertain the router model, router type, etc. The user 102 may utilize an image capturing module configured in the electronic device 104 to capture the router 118 and display it for the agent 110. Accordingly, the agent 110 may provide assistance to the user 102 for resetting the router 118. In one illustrative example, the agent 110 may efficiently provide desired assistance for the user 102 in the form of voice support i.e., orally list out sequential steps to be followed to reset the router 118. In another illustrative example, the agent 110 may provide visual support such as, animated representations on how to locate a reset button and press the reset button to reset the router 118.

Most often, the user 102 acts on the solution after the interaction with the agent 110 and as such, makes a mental note of the instructions. However, the user 102 may forget a sequence of steps to reset the router 118 or have some queries at a later point of time while trying to reset the router 118. The user 102 may require further support and reaches out to the CSS centre 108 again. In many scenarios, the user 102 may have to wait a long time to interact with a user care representative as all available user care representatives may be engaged in serving other users. The prolonged wait to receive assistance from a customer service representative may be very frustrating for the user 102 and can lead to ruining the interaction experience of the user 102. Moreover, even if the user interaction between the agent 110 and the user 102 is recorded for further reference as an interaction record, the user 102 may have to listen to the entire interaction record or read through the entire transcript of the interaction record to resolve the issue (i.e., reset the router) which may be cumbersome for the user 102.

To overcome the aforementioned drawbacks and provide additional advantages, a system 150 is provided. The system 150 is configured to be in operative communication with the CSS centre 108 over the network 114. On account of being in operative communication with the CSS centre 108, the system 150 is configured to be notified when every user calls the CSS centre 108 to seek assistance from the agents. The term ‘call’ as used herein may refer to a Web Real-Time Communication (WebRTC) to connect to an agent 110 at the CSS centre 108, or any such form of communication enabled by technologies and infrastructure that function on a smart electronic device 104 and that the user can use to contact the CSS centre 108.

The system 150 is configured to intelligently record selected portions of agent assistance during an interaction session between the user 102 and the agent 110 and store such selected portions of the interaction session subsequent to completion of the user interaction with the agent. For example, the agent provides textual annotations and voice support for the user to fill up various fields of an insurance form and such textual annotations and voice support are stored as agent assistance. Moreover, such agent assistance is associated with objects (i.e., related and aligned to the real-world objects or virtual objects) detected from image frames displayed by the user to provide assistance for resolving an issue/troubleshooting a product as will be explained in detail later. As portions of the agent assistance between the user 102 and the agent 110 are stored, the user 102 may automatically access relevant portions of the stored agent assistance for reference at a later time by displaying the same object or scene using the electronic device 104 thereby, saving time and significantly improving a user service experience. Further, the agent assistance that was recorded may be utilized by the system 150 to build automated agents that provide an independent interactive mode of conversation to serve prospective users seeking assistance with the same or similar query thereby reducing the resources and efficiently serving the users. The system 150 is explained in further detail with reference to .

In one embodiment, the environment 100 has a database 116. In one example, the database 116 may be embodied within the system 150 or communicably coupled to the network 114. In one embodiment, the database 116 provides a storage location for the augmented content record and one or more assistance provided by the agent 110. In some embodiments, the database 116 provides a storage location for the conversational content, promotional content, user interaction data, metadata, JSON structural content, dynamic HTML, and any other data associated with the system 150.

is a block diagram of the system 150 configured to provide post-interaction assistance to users, in accordance with an embodiment of the invention. To that effect, the system 150 may be configured to generate an augmented content record in relation to an interaction session between a user and the agent 110 for providing post-interaction assistance for the same user or other users as will be described in detail later.

The term ‘augmented content record’ as used herein primarily refers to a record of select content from an interaction session that may be used to augment a subsequent interaction session for recreating assistance (i.e., the assistance provided during the interaction session) in relation to desired portions to help the user. More specifically, the augmented content record corresponds to portions of an agent-assistance (i.e., select content) during the interaction session between a user and an agent 110 that may be used to recreate portions of the agent assistance in future interaction sessions for assisting the same or different user. Accordingly, the augmented content record also includes extracted features of an object in relation to which the assistance is provided for the user. The features of the object refer to characteristics that define the object and aid in identifying the object from an image frame. In one illustrative example, if the agent 110 is providing assistance to a user to fix a sediment filter in a water purifier, features related to the water purifier and/or the sediment filter that help in identifying a product type, manufacturer, make/model of the sediment filter and/or the water purifier are referred to as features. In another illustrative example, if an agent 110 is assisting a user to fill up a form, features of the object extracted from the form refer to physical dimensions of the form, co-ordinates of the data fields in the form, and the like that help in identifying the form.

The agent assistance may be in form of voice support, or visual support and in some example scenarios, the visual/voice support is enriched with augmented information provided by the agent 110 in form of screen annotations. More specifically, the user 102 may display an object with which he may require assistance and the agent 110 may provide screen annotations during the interaction session in relation to the object displayed by the user (i.e., image or video of the object captured via image capturing module of a user device) for assisting the user to resolve a query. As such, the term ‘annotations’ as used herein may include a short explanation, markings, note or comment to explain any object or any part thereof in relation to a query of the user. Accordingly, any information provided by the agent 110 to assist the user to resolve a query such as, textual content, image content, video content, audio content (e.g., vocal instructions), graphical content, web links/URLs or any combination thereof are referred to as annotations, herein. Further, touch point based drawings (also referred to herein as touch co-ordinates) provided by the agent 110 for augmenting information in relation to objects to provide assistance (e.g., provide annotations) to the user is also referred to as annotation. For example, if the agent 110 is providing assistance to a user for replacing a sediment filter in a water purifier, the agent 110 may encircle the sediment filter and draw arrows indicating the sediment filter among different filters in the water purifier and further provide annotations (i.e., text) indicating how to detach the sediment filter.

In addition, the augmented content record may also include geometrical co-ordinates of the annotations and correlation data between extracted features and corresponding annotations. For example, if the agent 110 is providing assistance to a user to fix a sediment filter in a water purifier, features related to the water purifier and/or the sediment filter are extracted in relation to which the agent 110 provides annotations. The correlation data includes correlation information between extracted features and geometrical co-ordinates and may be utilized for associating and aligning annotations with objects. Additionally, the augmented content record may also include one or more image frames in relation to the object displayed by the user. In some example embodiments, the augmented content record may also include an intent corresponding to the interaction session as provided by the agent 110 in the CSS centre such as, the CSS centre 108 (shown in ). The term ‘intent’ as used herein refers to a main objective or purpose of the interaction session. Accordingly, the agent 110 provides an intent as a tag for each augmented content record. In one illustrative example, if the interaction session corresponds to user interaction with an agent 110 seeking assistance to troubleshoot a router for changing the password for WiFi, the agent 110 may provide a tag corresponding to the intent as ‘WiFi password update for a router’ for the augmented content record. In one implementation, the user may interact with a second agent (not shown) after the first interaction session is over. In such a scenario, the second agent is provided by an option for editing the augmented content record of the previous interaction session. In particular, the second agent may edit the various annotations in relation to the one or more portions of the previous interaction session.

The system 150 includes a processing module 152 and a memory module 154. In an embodiment, the memory module 154 is capable of storing machine executable instructions, also referred to as platform instructions 155. Further, the processing module 152 is capable of executing the stored machine executable instructions. In an embodiment, the processing module 152 may be embodied as a multi-core processor, a single core processor, or a combination of one or more multi-core processors and one or more single core processors. For example, the processing module 152 may be embodied as one or more of various processing devices, such as a coprocessor, a microprocessor, a controller, a digital signal processor (DSP), a processing circuitry with or without an accompanying DSP, or various other processing devices including integrated circuits such as, for example, an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), a microcontroller unit (MCU), a hardware accelerator, a special-purpose computer chip, or the like. In an embodiment, the processing module 152 may be configured to execute hard-coded functionality. In an embodiment, the processing module 152 is embodied as an executor of software instructions, wherein the instructions may specifically configure the processing module 152 to perform the algorithms and/or operations described herein when the instructions are executed.

The processing module 152 is further depicted to include a feature extraction module 162, a correlation module 164, an augmented content record generation module 166, an assistance recreation module 168, and an automated agent generation module 170. The modules of the processing module 152 may be implemented as software modules, hardware modules, firmware modules, or as a combination thereof.

The memory module 154 may be embodied as one or more volatile memory devices, one or more non-volatile memory devices, and/or a combination of one or more volatile memory devices and non-volatile memory devices. For example, the memory module 154 may be embodied as magnetic storage devices (such as hard disk drives, floppy disks, magnetic tapes, etc.), optical magnetic storage devices (e.g. magneto-optical disks), CD-ROM (compact disc read only memory), CD-R (compact disc recordable), CD-R/W (compact disc rewritable), DVD (Digital Versatile Disc), BD (Blu-ray® Disc), and semiconductor memories (such as mask ROM, PROM (programmable ROM), EPROM (erasable PROM), flash ROM, RAM (random access memory), etc.).

Thus, upon receiving an assistance request from the user 102 subsequent to the completion of the interaction session, an augmented visual session is generated based, at least in part, on the augmented content record. The generated augmented visual is then displayed on an electronic device 104 associated with the user 102. In an embodiment, one or more features from at least one image frame of the object are first extracted. Then, the extracted one or more features is compared with one or more features in the augmented content record. At least one match of the one or more features in the extracted one or more features and the augmented content record is determined and the augmented content record corresponding to at least one match is accessed. Based on the augmented content record the augmented visual session is generated and displayed to the user 102.

In at least some embodiments, the memory module 154 stores logic and/or instructions, which may be used by modules of the processing module 152 to: (1) facilitate an interaction session between a user 102 and an agent 110, (2) process video signals corresponding to an object displayed by the user during the interaction session to extract features in relation to the object, (3) correlate the extracted features with corresponding annotations to determine correlation data, (4) generate an augmented content record including annotations in relation to the one or more portions of the interaction session and the correlation data, (5) store augmented content record, (6) retrieve at least one portion of agent assistance from the augmented content record in response to an assistance request from the user, and (7) generate an automated agent based on the augmented content record. The augmented content record stored subsequent to the completion of the interaction session provides assistance to users from a specific portion in relation to which the user may seek assistance at a later time thereby, enhancing user experience substantially. In some example embodiments, the memory module 154 may store a linear dialog flow corresponding to the augmented content record that facilitates the automated agent in assisting users with the same or similar query/intent.

The system 150 also includes an input/output module 156 (hereinafter referred to as ‘I/O module 156’) and a communication module 158. The I/O module 156 is configured to facilitate provisioning of an output to an operator of the system 150. The I/O module 156 is configured to be in communication with the processing module 152 and the memory module 154. Examples of the I/O module 156 include, but are not limited to, an input interface and/or an output interface. Examples of the input interface may include, but are not limited to, a keyboard, a mouse, a joystick, a keypad, a touch screen, soft keys, a microphone, and the like. Examples of the output interface may include, but are not limited to, a display such as a light-emitting diode display, a thin-film transistor (TFT) display, a liquid crystal display, an active-matrix organic light-emitting diode (AMOLED) display, a microphone, a speaker, a ringer, a vibrator, and the like.

In an example embodiment, the processing module 152 may include I/O circuitry configured to control at least some functions of one or more elements of the I/O module 156, such as, for example, a speaker, a microphone, a display, and/or the like. The processing module 152 and/or the I/O circuitry may be configured to control one or more functions of the one or more elements of the I/O module 156 through computer program instructions, for example, software and/or firmware, stored on a memory, for example, the memory module 154, and/or the like, accessible to the processing module 152.

The communication module 158 may include communication circuitry such as for example, a transceiver circuitry including an antenna and other communication media interfaces to connect to a communication network, such as the network 114 shown in . The communication circuitry may, in at least some example embodiments enable reception of (1) a request for initiating an interaction session with the agent 110 of an enterprise, (2) video signal corresponding to an object displayed by the user during an interaction session between a user and an agent 110 from a CSS centre such as, the CSS centre 108 (shown in ), (3) one or more annotations from the agent 110 in relation to at least one image frame among a plurality of image frames in the video signal, (4) geometrical co-ordinates associated with the one or more annotations, and (5) an intent in relation to the interaction session. The communication circuitry may further be configured to provide one or more portions of an augmented content record in response to an assistance request received from users such as, the user 102.

The system 150 is further depicted to include a storage module 160. The storage module 160 is any computer-operated hardware suitable for storing and/or retrieving data. In one embodiment, the storage module 160 includes a repository, which is configured to store augmented content records associated with a plurality of interaction sessions in which agents provide assistance to different queries of users. As such, the repository may serve as the database 116 of augmented content records for a variety of intents and users may access any portion of an augmented content record for assistance to resolve a query. The storage module 160 may include multiple storage units such as hard drives and/or solid-state drives in a redundant array of inexpensive disks (RAID) configuration. In some embodiments, the storage module 160 may include a storage area network (SAN) and/or a network-attached storage (NAS) system. In one embodiment, the storage module 160 may correspond to a distributed storage system, wherein individual databases are configured to store custom information, such as user interaction logs.

In some embodiments, the processing module 152 and/or other components of the processing module 152 may access the storage module 160 using a storage interface (not shown in ). The storage interface may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing the processing module 152 and/or the modules of the processing module 152 with access to the storage module 160.

The various components of the system 150, such as the processing module 152, the memory module 154, the I/O module 156, the communication module 158, and the storage module 160 are configured to communicate with each other via or through a centralized circuit system 172. The centralized circuit system 172 may be various devices configured to, among other things, provide or enable communication between the components of the system 150. In certain embodiments, the centralized circuit system 172 may be a central printed circuit board (PCB) such as a motherboard, a main board, a system board, or a logic board. The centralized circuit system 172 may also, or alternatively, include other printed circuit assemblies (PCAs) or communication channel media.

In at least one example embodiment, the communication module 158 is configured to receive a request for initiating an interaction session with the agent 110 of an enterprise. The request for the interaction session may be signalled by the user 102 on the enterprise website, enterprise application, social media, and the like. In one illustrative example, the user 102 may initiate a chat interaction with an agent 110 of an enterprise on an enterprise website. In another illustrative example, the user may place a phone call to a user support number provided on the enterprise website for resolving a query. The system 150 may authenticate the personal identity of the user 102 utilizing automated authentication techniques such as, IVR authentication techniques, biometric authentication techniques, One Time Password (OTP), and the like and is not explained herein for the sake of brevity. The system 150 on authenticating identity of the user 102 may facilitate an interaction session between the agent 110 at a CSS facility such as, the agent 110 at the CSS centre 108 and the user 102. In one example embodiment, the system 150 is configured to connect the user 102 to an appropriate agent trained to address user 102 concern based on at least one of user selection of IVR options, user conveyed issue, or prediction of user concern. The system 150 is further configured to intelligently record the interaction session between the agent 110 and the user 102 for assisting users as will be explained in detail.

In an example scenario, the user 102 may require support with filling up different fields of a medical insurance claim form during an interaction session. As such, the user 102 may display an object (i.e., a real-world object or virtual object such as a display screen) with which he may require assistance to the agent 110. More specifically, the user 102 may utilize an image-capturing module configured on the electronic device 104 of the user 102 to display the object. For example, the user 102 may display the medical insurance claim form to the agent 110 during the interaction session. In another illustrative example, the user 102 may require the assistance of the agent 110 to navigate through options on an enterprise website for requesting refund of a returned product, and accordingly, displaying an object implies displaying on a display screen of the electronic device 104 corresponding to the enterprise website for the agent 110. In at least one example embodiment, a video signal corresponding to the object (i.e., a real-world object or virtual object) may be captured by the electronic device 104 of the user 102 for displaying to the agent 110. An example of the user 102 displaying a medical insurance claim form to seek the assistance of an agent 110 is explained next with reference to .

depicts an example representation of a UI 300 showing a medical insurance claim form 302 displayed by the user 102 to the agent 110 to seek assistance during an interaction session, in accordance with an embodiment of the invention.

In one example scenario, the user 102 may require assistance to fill in details of a medical insurance claim form 302 and accordingly, may connect with an agent 110 at a CSS facility such as, the CSS centre 108. The agent 110 provisions an option (not shown in ) for an Augmented Reality (AR) based video support supported on a video chat platform. In general, such AR based video support ensures assistance provided the agent 110 may be intelligently recorded and retrieved for further reference of the user 102 as will be explained in detail later. When the user 102 provides a click/touch input on the option for initiating the AR based video support, an interaction session with AR based video support may be facilitated between the agent 110 and the user 102 as shown in FIGS. 3 and 4. Further, the agent 110 may prompt the user 102 to display the object in relation to which the user 102 may require assistance from the agent 110. Accordingly, the user 102 may capture the medical insurance claim form 302 by focusing an image-capturing module configured on a user device (i.e., the electronic device 104 associated with the user 102) to display the medical insurance claim form 302 for the agent 110 on the UI 300. The UI 300 corresponds to an image frame among the plurality of image frames (i.e., video signal) displayed on a display screen of an agent device 304 associated with the agent 110.

The UI 300 includes an icon 306 associated with text ‘REC’. The agent 110 may choose to record the interaction session by providing a selection input on the icon 306. Accordingly, on receiving a click input to initiate recording of the interaction session on the video chat platform of the agent device (e.g., a smartphone of the user 102), the feature extraction module 162 of the system 150 initiates recording of the interaction session. In an example embodiment, the feature extraction module 162 records information from multiple digital channels associated with the interaction session. In one illustrative example, one or more image frames corresponding to an object (i.e., the medical insurance claim form 302) displayed by the user 102 with which he may require assistance is recorded from a digital channel. Similarly, the assistance provided by the agent 110 in form of voice support and annotations provided by the agent 110 in relation to objects in the image frame during the interaction session is recorded over a different digital channel. Subsequent to initiating the recording of the interaction session, the agent 110 may stop the recording by providing another click input on the icon 306. As such, one or more image frames corresponding to the medical insurance claim form 302 displayed by the user 102 and agent assistance (i.e., voice support and annotations) provided by the agent 110 in relation to the medical insurance claim form 302 displayed by the user 102 may be separately recorded by the system 150.

The UI 300 also includes selectable icons 308 and 310. The selectable icon 308 represents call termination and the selectable icon 310 represents annotation tools that may be used by the agent 110 for providing annotations. The agent 110 may provide a click/touch input on the selectable icon 308 to end the interaction session and disconnect the call with the user. When the agent 110 provides a selection input on the icon 310, a plurality of annotations tools may be provided for the agent 110. The annotation tools may include mark up, drawing (e.g., banner, shapes, arrows, etc.) or textual tools that provision an option to add information for assisting the user 102. Some examples of annotation tools include but not limited to, pencil tool, highlighter tool, bounding boxes, tools for adding text, images, videos, or web links, and the like. The agent 110 may utilize any of these tools for augmenting information on the medical insurance claim form 302 to provide assistance for the user 102.

It shall be noted that only a portion of the medical insurance claim form 302 is shown herein for example purposes and the user may display other portions/parts of the medical insurance claim form 302, other pages of the medical insurance claim form and/or additional claim forms to seek the assistance of the agent 110 during the interaction session and as such, the image frames corresponding to the agent assistance in relation to all the parts/pages/portions may be stored as one single record. Further, it shall be noted that the fields shown in the medical insurance claim form 302 are for illustration purposes only and may include fewer or lesser fields and/or additional sections/parts. Although, the UI 300 has been depicted as an agent interface, the UI displayed for a user on a display screen of the user device may depict the medical insurance claim form 302 with more or fewer options, for example, an option to switch the camera from a front camera to a back camera configured on the user device such as, the electronic device 104.

Referring now to , the feature extraction module 162 in conjunction with the instructions stored in the memory module 154 is configured to extract features in relation to the object displayed by the user during the interaction session. In an embodiment, features are extracted from image frames displayed by the user using computer vision techniques. For example, features such as Scale Invariant Feature Transform (SIFT) features, Speeded Up Robust Features (SURF), and the like may be extracted in relation to the object from the image frames displayed by the user. The extracted features are representative of the object with which the user requires assistance. In one illustrative example, the features extracted from the medical insurance claim form 302 may be edges, corners of the medical insurance claim form 302, and elements of the medical insurance claim form 302 such as data fields, titles, texts, and their respective co-ordinates (such as the position on the form). Such features extracted from an object (e.g., the medical insurance claim form 302) may be utilized to identify the object in future interaction sessions as will be explained in detail later. The extracted features are forwarded to the correlation module 164 which correlates annotations of agents with extracted features of objects. An example of the agent 110 providing annotations in relation to an object for assisting the user is shown and explained next with reference to .

shows an example representation of a UI 400 depicting an agent 110 providing annotations to assist the user with filling up the medical insurance claim form 302 shown in during the interaction session, in accordance with an embodiment of the invention. The UI 400 may be displayed on a display screen of the agent device 304 associated with the agent 110. More specifically, video signals of the objects i.e., the medical insurance claim form 302 are received from the user device and the agent 110 provides annotations corresponding to fields on the medical insurance claim form 302 in which the user requires assistance on the display screen of the agent device 304.

As already explained with reference to , the user may seek assistance to fill up the medical insurance claim form 302 and as such may connect with the agent 110 at the CSS facility. The user may be provided an option (not shown in ) to use AR based video support in which agent assistance (i.e., voice support, annotations, etc.) may be captured and recreated based on user request subsequent to completion of the interaction between the user and the agent 110. When the user requests the AR based video support by providing a touch/click input on the option, the user may be prompted to display parts of the medical insurance claim form 302 with which the user requires assistance. The medical insurance claim form 302 may be captured by the user using a user device (e.g., the electronic device 104) equipped with image capturing modules, and image frame/video signals are displayed for the agent (shown as the UI 400).

In one example scenario, the user may seek the agent's assistance to understand the difference between the employee code and the member ID on the medical insurance claim form 302. Accordingly, the agent 110 may provide annotations 418 and 420 to guide the user to fill up the medical insurance claim form 302. The annotations 418 and 420 include textual content in bounded boxes indicating data to be filled up by the user in respective fields. More specifically, the agent 110 may augment information in form of annotations 418 and 420 by drawing patterns on the display screen of the agent device 304 and writing instructions to assist the user fill up the medical insurance claim form 302. In general, touch co-ordinates provided by the agent 110 on the display screen of the agent device 304 corresponding to the medical insurance claim form 302 are captured as annotations and forwarded to the communication module 158.

The UI 400 also includes a field 410 which the agent 110 may use to provide additional information in relation to the user interaction with the agent 110. More specifically, the agent 110 may provide additional instructions post-facto recording of the interaction session. For example, the additional instructions may include information on what more to do, what has to be done if user does not have required information, etc. The agent 110 may type in the additional information as textual content in the field 410. Such additional information may also be provided in the field 410 in different forms as depicted by icons 412 and 414. For example, the agent 110 can add audio clips to provide assistance for the user by clicking on the icon 412. In such cases, voice signal of the agent 110 is recorded and stored along with other representations/information corresponding to the medical insurance claim form 302 as annotations. The agent 110 may also upload documents, web links, images, and the like that provide detailed information for assisting the user further in relation to filling up the medical insurance claim form 302 by clicking on the icon 414. In one illustrative example, the agent 110 may attach a URL that provides detailed stepwise guidance on how to submit the medical insurance claim form 302 via an online portal. In another illustrative example, if the query is related to assembling furniture, the agent 110 may provide instructions in form of a pictorial description and additionally add a video depicting the stepwise assembling of the same furniture by clicking on the icon 414. The field 410 includes a tab 416 associated with the text ‘SAVE’ and the agent 110 can save such additional information/instruction as annotations for providing assistance to the users. Such additional information is also forwarded to the communication module 158 as annotations.

When the agent 110 terminates the call and saves additional information in the field 410, the agent 110 may be prompted to save the interaction session with a tag (not shown in ) indicating the intent of the interaction session. Accordingly, the agent 110 may provide an intent of the interaction sessions as the tag. In one embodiment of the invention, the interaction session with the tag is recorded in the database 116.

Referring to , the one or more image frames recorded as a part of the interaction session, the features extracted in relation to detected objects (i.e., objects detected from the one or more image frames), and the intent provided by the agent 110 are received by the communication module 158. In addition, the communication module 158 may also receive the one or more annotations provided by the agent 110 for assisting the user during the interaction session along with corresponding geometrical co-ordinates. The communication module 158 is configured to forward the extracted features, annotations, and geometrical co-ordinates of the annotations to the correlation module 164.

The correlation module 164 in conjunction with the instructions stored in the memory module 154 is configured to correlate the geometrical co-ordinates of annotations with extracted features. More specifically, the annotations and the geometrical co-ordinates of the annotations are correlated with the extracted features to generate correlation data which provides information for aligning the recorded annotations with video signals related to an object displayed by the user (i.e., video signals of the object with which the user requires assistance) as will be explained in detail later. The correlation data is forwarded to the augmented content record generation module 166.

The augmented content record generation module 166 may be configured to receive annotations, intent, and image frames corresponding to the interaction session from the CSS facility and forward it to the augmented content record generation module 166 of the processing module 152. In addition, the augmented content record generation module 166 is configured to receive correlation data from the correlation module 164 and features from the feature extraction module 162. The augmented content record generation module 166 in conjunction with the instructions stored in the memory module 154 may be configured to process such inputs (i.e., annotations, intent, image frames, correlation data, and extracted features of objects) to generate an augmented content record. As such, the augmented content record may include extracted features of the object, the annotations, geometrical co-ordinates of each annotation, correlation data between each extracted feature and corresponding annotation, and intent of the interaction session.

In addition, the augmented content record may include one or more image frames corresponding to agent assistance. More specifically, the augmented content record generation module 166 is intelligent to identify portions of the interaction session that correspond to agent assistance. More specifically, only image frames in the video signal that corresponds to agent assistance (i.e., one or more image frames with annotations) and corresponding voice support provided by the agent 110 are segregated and stored in relation to the interaction session. In general, image frames depicting objects shown by the user, annotation provided by the agent 110, voice support or visual support provided by the agent 110 (e.g., assistance by providing screen sharing) alone are stored in the augmented content record. For example, one or more image frames of the interaction session in which the user displays the medical insurance claim form 302, image frame in which the agent 110 provides the annotation 418, image frame in which the agent 110 provides the annotation 420, voice support provided in relation to the annotation 418 by the agent 110, voice support provided in relation to the annotation 420 alone are stored in relation to the interaction session. Such recordings of portions of the interaction session i.e., the user interaction with the agent 110, is referred to herein as interaction content. The augmented content record may be utilized to guide or assist the user to resolve a query (e.g., troubleshooting a product) subsequent to the completion of the interaction session as will be explained in detail later.

The augmented content record may be stored in the storage module 160. Such intelligent storage of portions of the interaction session with annotations ensures that the user has seamless access to the assistance provided by the agent 110 during the interaction session i.e., the augmented content record multiple times, thereby precluding a need to connect with the agent 110 again to seek assistance. The user 102 may also retrieve one or more portions of the agent assistance from the augmented content record subsequent to the completion of the interaction session based on requirements as will be explained in detail later. In one embodiment, the agent 110 (for example, automated agent) is generated based, at least in part, on the one or more augmented content records. In one embodiment, the user is provided with one or more options to be displayed on the electronic device 104 corresponding to the one or more assistances. At least one assistance of the one or more assistances is rendered based, at least in part, on the user selecting at least one option of the one or more options.

The assistance recreation module 168 in conjunction with the instructions stored in the memory module 154 may be configured to process an assistance request from the user subsequent to the completion of the interaction session for viewing a portion of agent assistance. To that effect, the assistance recreation module 168 may include a learning algorithm trained to analyze image frames and interpret a required augmented content record or portions of the augmented content record for assisting the user 102. Accordingly, objects may be detected from the image frames using machine learning, image retrieval, or computer vision techniques known in the art. For example, the user may display a scene including an object such as, a product or any parts thereof (i.e., video signals corresponding to the object) in relation with which he requires assistance subsequent to completion of the interaction session. To that effect, the learning algorithm may be trained to analyze at least one image frame from the video signal for detecting features (i.e., boundaries, textual data, visual objects, or real-world objects) and map the detected features to at least one augmented content record storing same/similar features with annotations. It is noted that the term ‘map’ as used herein also includes comparing features of the image corresponding to the product with extracted features stored in a plurality of augmented content records in the storage module 160 for determining a possible match. When a match is found, the augmented content record or portion of the augmented content record that matches at least one image frame is retrieved from the storage module 160 and displayed for the user on the user device as will be explained with reference to .

depicts a user 502 capturing a portion of a medical insurance claim form 504 with a user device 506 to retrieve relevant portions of agent assistance from an augmented content record, in accordance with an embodiment of the invention. More specifically, video signals corresponding to the medical insurance claim form 504 are displayed on a UI 500 (i.e., a display screen of the user device 506) subsequent to the user 502 turning on an image capturing module configured in the user device 506 to capture the medical insurance claim form 504 shown in .

As already explained, the user 502 may have interacted with the agent 110 seeking assistance to fill up the medical insurance claim form 504 and the interaction session may be recorded and stored along with annotations provided by the agent 110 as the augmented content record as explained with reference to . In one example scenario, the user 502 may start filling out the medical insurance claim form 504 after the interaction session and may not be able to recall the difference between member ID number and employee code in the medical insurance claim form 504. As such, the user 502 may prefer to refer to the annotations provided by the agent 110 and as such would require access to portions of the augmented content record for assistance. Accordingly, when the user 502 logs into his account on the enterprise website and accesses ‘My Tickets’ Section (not shown in ), he may be provisioned with an option (not shown in ) to display object (e.g., the medical insurance claim form 504) with which he may require assistance. This option provides an opportunity for the user 502 to access specific portions of an agent assistance or entire agent assistance by displaying the object i.e., the medical insurance claim form 504. As shown in , the user 502 captures a video signal of the object i.e., the medical insurance claim form 504, using the image capturing module (not shown in ) configured in the user device 506. After capturing, the assistance recreation module 168 is configured to compare features extracted from at least one image frame of the video signal (i.e., video signal captured in relation to the document) with extracted features stored in the augmented content records. In other words, one or more features are determined based, at least in part, on the one or more image frames. More specifically, augmented content records that are stored in relation to a user profile associated with the user in the storage module 160 are compared with extracted features related to the object to determine at least one match. When a match is found for the medical insurance claim form 504, the corresponding augmented content record is retrieved from the storage module 160 and provided to the assistance recreation module 168. More specifically, when a portion of the document (i.e., the medical insurance claim form 504) captured by the user device 506 matches with any portion of an augmented content record stored in the storage module 160, the assistance recreation module 168 is configured to retrieve the annotations corresponding to the extracted features from the augmented content record and recreate the agent assistance (i.e., annotations) for the user 502 on an image frame 508 of the video signal corresponding to the medical insurance claim form 504 displayed on the user device 506. In general, the assistance recreation module 168 is configured to align the annotations 418 and 420 on corresponding points in the medical insurance claim form 504 for assisting the user using image registration techniques. In one embodiment one or more geometrical co-ordinates corresponding to the object based, at least in part, on the one or more assistances are determined. More specifically, geometrical co-ordinates of the annotations and correlation data between the extracted features and the annotations may be utilized for rendering the annotations on the UI 500 (i.e., video signals corresponding to the medical insurance claim form 504). In other words, the assistance recreation module 168 is configured to augment the annotations (i.e., superimpose the annotations) on the video signals corresponding to the object based on the correlation data and the geometrical co-ordinates of the annotations. To that effect, touch co-ordinates related to the annotations provided by the agent 110 are placed on corresponding anchor points in the image frame 508 using augmented-reality techniques.

It shall be noted that the object being captured i.e., the medical insurance claim form 504 may not be required to be in the same perspective as was displayed by the user during the interaction session. More specifically, the assistance recreation module 168 employs image registration techniques to estimate changes in the perspective of the object/product and accordingly, the annotations are also adapted to provide correct assistance for the user 502. As such, the annotations may be resized or orientations may be adapted to augment the annotations at appropriate places of the captured document displayed on the UI 500. Further, it shall be noted that the user 502 may seek assistance with specific portions of the medical insurance claim form 504 by capturing select portions and as such, portions in the augmented content record corresponding to those select portions alone are retrieved from the storage module 160 and aligned with the captured document displayed on the UI 500 of the user device 506 for assisting the user 502 subsequent to completion of the interaction session. Further, it shall be noted that the additional information provided by the agent 110 as annotation (see, field 410 in ) may also be displayed on the user device 506 along with other annotations in the augmented content record for assisting the user 502. Moreover, the user 502 may prefer to skip some portions of agent assistance and as such may scroll through snippets of agent assistance which is explained next with reference to .

shows an example representation of a UI 600 depicting options for accessing different portions of agent assistance provided for resolving a query from an augmented content record, in accordance with an embodiment of the invention. In an example scenario, the medical insurance claim form 504 may include multiple pages, for example, Page 1, Page 2, and Page 3. It shall be noted that each page may include multiple parts/sections and each part/section may include multiple fields of data. Moreover, the augmented content record may include agent assistance corresponding to all 3 pages and the user 102 may access any portion of the augmented content record for assistance. In an example scenario, the user may initially access agent assistance in relation to Page 1 of the medical insurance claim form 504. As such, the UI 600 represents a portion of augmented content record (i.e., agent assistance corresponding to Page 2) in which annotations are superimposed on the medical insurance claim form 302, for example, Page 2 of the medical insurance claim form to assist the user.

It shall be noted that although the UI 600 depicts rendering the annotations on one image frame corresponding to the medical insurance claim form, it is apparent that the annotations such as, mark ups, textual data, video content, and voice support are superimposed on a video signal corresponding to the medical insurance claim form displayed by the user 102. For example, the agent 110 may have provided additional information (see, field 410 in ) related to the medical insurance claim form i.e., online submission of the medical insurance claim form, by uploading a video illustrating sequential steps to be followed to submit the medical insurance claim form in an online portal. When the user seeks assistance to fill up and submit the medical insurance claim form, the annotations in form of marking/notes initially assist in filling up the medical insurance claim form 302, and the video provided as an annotation is played back after that to guide the user to submit the medical insurance claim form 302 in the online portal. It shall be noted that video or video links may also be embedded as annotations as part of agent assistance in an image frame and such assistance will be provided at an appropriate place in a timely manner during recreation.

As already explained with reference to , the annotations (see, annotations 410, 418, and 420) provided by the agent 110 to assist the user during the interaction session are aligned with video signal corresponding to the medical insurance claim form 302 displayed by the user. For example, the annotations 418 and 420 provided by the agent 110 to clearly indicate the different identifiers that need to be provided in each section are adapted (i.e., resized or re-oriented) based on correlation data to appropriately assist the user in filling the medical insurance claim form as shown in the UI 600. Further, the UI 600 also includes an option 604 where the agent 110 provides an indication of where the user has to sign in the medical insurance claim form 302. Further, the annotations may also include additional information shown in the field 602 which may be provided by the agent 110 subsequent to the completion of the interaction session. In this example representation, the agent 110 has provided additional information on documents (i.e., cover letter with contact details, insurance ID card, discharge summary, and original hospital final bill) to be submitted along with the claim form for reimbursement.

The UI 600 also includes two options 604 and 606 associated with text ‘PREVIOUS’ and ‘NEXT’, respectively. The options 604 and 606 provision an option for the user to explore the different portions of the agent assistance i.e., different portions of the augmented content record, provided by the agent 110 for assisting the user during an interaction session. For example, the user 102 may provide a touch/click input on the option 604 to access portions of the augmented content record corresponding to Page 1 and a touch/click input on the option 606 provides access to portions of the augmented content record corresponding to Page 3 of the medical insurance claim form. In some example scenarios, the user 102 may prefer to explore an entire sequence of agent assistance or skip some portions of agent assistance and in such cases, the options 604 and 606 may be utilized by the user 102 to navigate through the augmented content record to retrieve necessary assistance. It shall be noted that the user may also choose to capture portions of the medical insurance claim form 302 in relation to which he may require assistance and as such, relevant portions may be retrieved by the system 150 as already explained.

Referring now to , the augmented content record may be provided to the automated agent generation module 170. The automated agent generation module 170 in conjunction with instructions stored in the memory module 154 may be configured to process the augmented content record corresponding to an intent to build an automated agent. Such automated agents can assist new users seeking assistance for the same or similar intent. In other words, the automated agents utilize a linear dialog flow generated from the interaction session to assist new users with the same query without the need for the intervention of a human agent. In one illustrative example, an automated agent may be generated to automatically assist users who seek assistance to fill up medical insurance claim forms such as, the medical insurance claim form 302 (shown in ) based on the augmented content record generated in relation to the interaction session between the user and the agent 110. The automated agents such as interactive voice response (IVR) systems and chat bots, are deployed in user service centres are trained to understand user queries or user intent and to provide desired assistance to the users based on the linear dialog flow.

In at least one example embodiment, the automated agent generation module 170 on receiving the augmented content record may be configured to generate a dialog tree with nodes and edges. More specifically, the automated agent generation module 170 is configured to determine a sequential order for default dialog flow based on the intent, one or more image frames, extracted features, and annotations in the augmented content record. The dialog tree is usually generated in relation to one intent and in some cases, the dialog tree may include multiple intents interpreted from the interaction session between the user and the agent 110 of an enterprise. Alternatively, a single intent may be decimated to form two different sub-intents and these sub-intents may be addressed by two different trees which may be utilized to build two different automated agents. The generation of a dialog tree based on augmented content record is explained next with reference to .

shows a dialog tree 700 created in relation to an interaction session between a user and an agent 110 for generating an automated agent, in accordance with an embodiment of the invention. The dialog tree 700 is a non-linear and hierarchical structure but does not store data in a sequential form.

In one illustrative example, an augmented content record in relation to assistance provided by an agent 110 for filling up details in a medical insurance claim form such as, the medical insurance claim form 302 and submitting the medical insurance claim form 302 via an online portal is stored in the storage module 160. The augmented content record may be stored with an intent of ‘filling up a medical insurance claim form and online submission’. In this example representation, the dialog tree 700 corresponds to two different intents i.e., an intent to fill up the medical insurance claim form and another intent to submit the medical insurance claim form via an online portal. These two different intents recorded during a single interaction session may be used to generate two different automated agents that assist users for each of these intents separately. As such, one or more objects may be detected from each image frame corresponding to agent assistance and each node in the dialog tree 700 may be associated with one such object (also referred to herein as ‘reference object’).

As shown in , a root node 702 may be associated with text ‘MEDICAL INSURANCE CLAIM FORM’. The node 702 may be associated with one or more image frames and/or features associated with a medical insurance claim form to identify the object when displayed to an automated agent. The root node 702 may have directed edges that branch out to two nodes 704 and 706 associated with text ‘FILLING THE MEDICAL INSURANCE CLAIM FORM’ and ‘ONLINE SUBMISSION OF MEDICAL INSURANCE CLAIM FORM’. The node 704 may be associated with image frames or features corresponding to medical insurance claim form, such as, a page in the medical insurance claim form or part/section/field of the insurance claim form and the node 706 may be associated with image frames or features corresponding to an online portal for submitting the medical insurance claim form. The node 704 corresponding to ‘FILLING THE MEDICAL INSURANCE CLAIM FORM’ may again branch out to 3 different nodes 708, 710, and 712 associated with data such as, ‘PAGE 1’, ‘PAGE 2’, and ‘PAGE 3’, respectively. Each of these 3 different nodes 708, 710, and 712 branch out to respective leaf nodes 718, 720, and 722 that provide assistance on filling up the different pages of the medical insurance claim form. For example, the leaf node 718 corresponding to node ‘PAGE 1’ may have annotations that assist the user to fill up a first page (i.e., page 1) of the medical insurance claim form 302. For example, the annotations may include markings, textual comments, and voice support corresponding to each field in the first page of the medical insurance claim form 302.

The node 706 may branch out to two different nodes 714 and 716 associated with data ‘USER LOGIN’ and ‘UPLOAD MEDICAL INSURANCE CLAIM FORM’, respectively. For example, the node 714 may be associated with image frames or features corresponding to the online portal for logging into user account and the node 716 may be associated with image frames or features in the online portal for uploading the medical insurance claim form. Each of these nodes (i.e., the nodes 714 and 716) branch out to respective leaf nodes 724, and 726 that provide assistance on accessing the different pages on the online portal. For example, the leaf node 726 includes annotations that may assist the user to upload the medical insurance claim form on the online portal.

It shall be noted that the nodes 708, 710, and 712 may branch out to more nodes that may be associated with different sections or fields in a page. For example, a node associated with page 1 may branch out to 2 different nodes associated with patient details and hospitalization details. Further, it shall be noted that all these data associated with nodes 702 – 726 may be gleaned from an interaction session between the agent 110 and the user. More specifically, the annotations provide by the agent 110 during the interaction session constitute the instructions or steps in the leaf nodes 718, 720, 722, 724, and 726 to be followed for assistance. As such, the automated agent generation module 170 is configured to construct the dialog tree 700 which may be used to build/generate the automated agent corresponding to the intent (i.e., filling up a medical insurance claim form and online submission’) based on the data corresponding to different nodes 702 – 726. In addition, the nodes may also store metadata related to the annotation such as, geometrical co-ordinates, position, font color, size, depth and all other information related to the annotation along with the reference object.

Such storage of annotations enables recreation of the instructions provided as assistance by the agent 110 during the interaction session that was recorded and used for building the automated agent. However, it shall be noted that such automated agents have a default linear flow and may follow a linear sequence corresponding to the sequence in which it was recorded i.e., the augmented content record, which is different from automated agents that have different possibilities in each node. For example, a user may have required assistance with Page 2 of the medical insurance claim form and then Page 1 and as such, the nodes in the dialog tree 700 also follow the same sequence in which Page 2 is addressed first and then Page 1. However, it shall be noted that the automated agent is configured to skip between different nodes to provide desired assistance for the user 102. Accordingly, if the user seeks assistance by presenting Page 1 of the medical insurance claim form, the automated agent will recreate assistance corresponding to Page 1 initially, and then if the user presents Page 2, agent assistance corresponding to Page 2 is recreated for the user by skipping between nodes of the dialog tree 700.

As such, although the automated agent is linear and does not respond with respect to an unlearned reference object, for example, a field in the medical insurance claim form that was not provided assistance in the interaction session that was recorded, the automated agent is helpful with saving resources on recurring questions and is very quick in action. In scenarios where the augmented content record does include agent assistance in relation to certain data fields, pages or parts of the medical insurance claim form, the automated agent may not be able to assist users who may request assistance with such data fields, pages or parts of the medical insurance claim form. For example, if the agent 110 had provided assistance with Page 1 and Page 3 of the medical insurance claim, the augmented content record only includes agent assistance in relation to Pages 1 and 3 and the automated agent may not be able to assist a user requiring assistance with Page 2. In such cases, the automated agent may provide options for the user to connect with a human agent to receive the desired assistance. In general, when a user or the automated agent requests assistance with reference to an unlearned object, then a human agent such as the agent 110 may assist the user with an AR based video support during an interaction session. As such, agent assistance in relation to the interaction session may be stored as a separate augmented content record. Such agent assistance provided during the interaction session may be linked with a pre-existing augmented content record of the medical insurance claim form to append agent assistance provided in relation to Page 2 of the medical insurance claim form.

The automated agents built based on the augmented content record are completely independent of agents (i.e., human agent or any other automated agent) and may independently provide support to the user based on the intent. Such timely assistance by automated agents ensures that the users do not have to wait in long queues to reach agents for the same query that was previously answered by an agent 110 for a different user.

When an automated agent is deployed to assist a user, the automated agent is configured to predict the intent of the user from the conversation. For example, the automated agent may understand the intent for an interaction session based on the image corresponding to real-world object displayed for the agent 110 or object name specified by the user. For example, the user may request assistance with a medical insurance claim form. The automated agent automatically interprets that the interaction session is related to the intent of filling the medical insurance claim form 302. Thereafter, the automated agent is built such that each node will be triggered when a specific reference object is identified. For example, if the user scans a medical insurance claim form and based on the identified intent as explained above from user interaction, nodes corresponding to a page (i.e., Page 2 of the medical insurance claim form) may be triggered. As such, annotations provided by the agent 110 in relation to different fields/sections in Page 2 of the medical insurance claim form may be superimposed on the video signal displaying Page 2 of the medical insurance claim form for assisting the user. After displaying the annotations, the automated agent will wait for next object (i.e., a captured image frame) to be shown to trigger the next annotations and when an object is detected corresponding annotations are retrieved from the augmented content record to recreate the corresponding agent assistance.

The automated agent can flow randomly between the nodes, as and when the node specific objects are shown by the user. For example, the user can skip some steps and the automated agent will provide the annotation corresponding to the step the user requires assistance. For example, an augmented content record may correspond to an interaction session where the agent 110 had assisted the user to remove a damaged filter in an air purifier and replace the same with a new filter. When a new user interacts with the automated agent to replace the filter in the air purifier, he may request for assistance only to fix the new filter and as such, the automated agent seamlessly jumps to a node that includes annotations to assist the user with fixing the new filter in the air purifier. In general, the user may display a real-world object or any components and the automated agent displays corresponding annotations for the user which were provided for a different user to resolve the same issue. It shall be noted that the user 102 may display real-world objects in any orientation or any other order (i.e., the user can skip or jump a few steps in relation to the augmented content record) different from the order in which it was recorded and the automated agent seamlessly assists the user 102. Moreover, if the user 102 is unsure of what to show the automated agent, for example, a long pause of the user may prompt the automated agent to follow the default linear flow and prompts the user to show objects/locations in a sequence corresponding to a sequence in which the interaction session was recorded. An example of a flow diagram depicting conversation flow adapted by an automated agent which was built based on an augmented content record is shown and explained with reference to

shows a flow diagram 800 depicting a conversation flow between an automated agent and a user seeking assistance, in accordance with an embodiment of the invention. The conversation flow starts at 802. In an example scenario, the user may seek assistance with filling up details of a medical insurance claim form. The automated agent on identifying intent of the user, may prompt the user to display the medical insurance claim form for which the user seeks assistance. As such, the user may capture any part of the medical insurance claim form using image capturing modules configured on a user device associated with the user. Accordingly, a video signal corresponding to the medical insurance claim form may be displayed for the automated agent.

At 802, the automated agent performs object detection to identify an object from the video signal (i.e., a plurality of image frames) displayed by the user. If the automated agent identifies at least one object from an image frame of the plurality of image frames captured by the user device, the automated agent refers to a corresponding node in the dialog tree 700 to decide on a subsequent action/response for the user. On the contrary, it shall be noted that the automated agent may follow a default linear dialog flow (i.e., recorded dialog flow) sequentially for the user if the user seeks assistance in filling up all details of the medical insurance claim form. For example, the automated agent may perform operation 806 (i.e., rendering annotations that assist in filling up Page 1 of the medical insurance claim form) followed by operation 808 (i.e., rendering annotations that assist in filling up Page 2 of the medical insurance claim form) and then operation 810 (i.e., rendering annotations that assist in filling up Page 3 of medical insurance claim form). Alternatively, if the automated agent does not identify any object in the video signal, the automated agent prompts the users to display Page 1 of the medical insurance claim form at 804.

If the user displays the Page 1 of the medical insurance claim form, the automated agent performs operation 806. At 806, if the automated agent identifies Page 1 of the medical insurance claim form from at least one image frame displayed to the automated agent, the automated agent automatically renders annotations stored in relation to the page 1 for the user on the video signal. More specifically, the annotations are correlated with the identified objects and superimposed on the video signal. Moreover, the annotations may be dynamically adapted in real-time to align with the object in the video signal. For example, orientations, text size, position, and the like of the annotations are aligned with the image frames displayed by the user to provide prompt assistance for the user. After operation 806, if the user displays a different page of the medical insurance claim form, the automated agent may move to a corresponding node in the dialog tree 700. For example, if the user displays Page 3 after Page 1, the automated agent may perform operation 810. More specifically, the automated agent moves/jumps from leaf node 718 to node 712 of the dialog tree 700 to implement the operation 810. Alternatively, if the automated agent does not identify any object from video signals i.e., user does not display any new page, the automated agent may prompt the user to display Page 2 as shown by operation 812. The automated agent may execute an operation based on page displayed by the user. For example, if the user displays Page 2, then the automated agent may perform operation 808. Alternatively, if the user displays Page 3 of the medical insurance claim form, then the automated agent performs operation 810 as will be explained later.

In some example scenarios, the user displays Page 2 of the medical insurance claim form. In such cases, the automated agent identifies the object (i.e., page 2 of the medical insurance claim form) and triggers a corresponding node in a dialog tree such as, the dialog tree 700. For example, the node 710 of the dialog tree 700 may be triggered. As such, the automated agent may be configured to render annotations corresponding to page 2 (stored in the leaf node 720 of the dialog tree 700) of the medical insurance claim form on the video signal displaying Page 2 of the medical insurance claim form as shown by operation 808. After, the operation 808, if the automated agent does not identify any new object from the video signal i.e., the user 102 does not display any new page, the automated agent may prompt the user to display Page 3 as shown by operation 814. The automated agent may execute an operation based on page displayed by the user.

In another example scenario, if the user displays Page 3 of the medical insurance claim form when prompted to display page 1 of the medical insurance claim form, the automated agent performs operation 810. For example, when the automated agent detects objects corresponding to Page 3 of the medical insurance claim form, the automated agent moves/jumps to a node in the dialog tree 700 that corresponds to Page 3 of the medical insurance claim form. At operation 810, the automated agent super imposes annotations corresponding to Page 3 on image frames currently displayed by the user. More specifically, the node 712 of the dialog tree 700 may be triggered and as such, the automated agent may render annotations stored in the leaf node 722 of the dialog tree 700 for the user.

As seen from , the automated agent may assist the user by moving/jumping from one node to another node of the dialog tree 700 based on objects identified in the video signal. The conversation does not follow a default linear dialog flow based on the dialog tree 700 (shown by solid lines in ) unless the user sequentially displays objects in the manner i.e., displaying Page 1, Page 2, and Page 3 sequentially. In general, the automated agent is trained to jump/move between nodes based on detected objects in image frames displayed by the user (shown by dashed lines in ). The conversation flow stops at 816.

FIGS. 9A-9C, collectively, depict various graphical user interfaces representing steps for retrieving relevant portions of an agent’s assistance from a stored augmented content record to provide an augmented visual session. In an example, the agent’s assistance may correspond to an initial interactive session related to filling a form. In a non-limiting example, the form may be one or more of medical insurance forms, insurance claim forms, legal forms, legal documents, education-related forms, job application forms, admission forms, and the like.

In the present scenario, it is assumed that the user 102 has already completed an interaction session with the agent of the enterprise. Now, the user 102 wishes to access a relevant portion of the previous interactive session. The following description of FIGS. 9A-9C, elucidates the process of accessing relevant portions of the agent’s assistance from a stored augmented content record. It is understood that the relevant portion of the agent’s assistance may be extracted from a second augmented content record stored in the database 116 of the system 150. The second augmented content record is generated based on an interactive session of another agent with another user.

To that end, a user 902 capturing a video signal of the object i.e., legal form 900, using the image capturing module (not shown in ) configured in the electronic device 904 of the user 902. After capturing, the assistance recreation module 168 is configured to compare features extracted from at least one image frame of the video signal (i.e., video signal captured in relation to the legal form 900) with extracted features stored in the augmented content records. Alternatively, the augmented content records corresponding to different users may also be used for the comparison process upon determining that a different user has already received one or more assistances with the same legal form 900 in the past.

When a match is found for the legal form 900, the corresponding augmented content record is retrieved from the storage module 160 and provided to the assistance recreation module 168. To that end, the assistance recreation module 168 is configured to retrieve the annotations corresponding to the extracted features from the retrieved augmented content record and provide annotations for the user 902 on an image frame 906 of the video signal corresponding to the legal form 900 displayed on the electronic device 904. An example representation of the same is depicted by .

In one scenario depicted by , the assistance recreation module 168 facilitates the display of a plurality of bounded boxes (i.e., a UI element) for each of the identified relevant portions of the legal form 900. For example, bounded boxes may be generated and displayed for fields such as user details (see, bounded box 952), branch (see, 954), loan account number (see, 956), PAN (see, 958), product (see, 960), GSTIN (see, 962), loan plan (see, 964), loan amount (see, 966) and the like.

In general, the assistance recreation module 168 is configured to align the plurality of bounded boxes on corresponding points in the legal form 900 for assisting the user 902 using image registration techniques or computer vision techniques.

Further, as depicted by , upon receiving a user input from the user 902, the assistance recreation module 168 is configured to retrieve user-related information from a user profile associated with the user 902 in the storage module 160. In the present example depicted by , upon receiving a user input (here, shown as a touch-based input), for the GSTIN field (see, 962), an annotation (see, 972) indicating the GSTIN details is displayed through the UI. It is understood that this aspect enables the user to quickly fill forms with information that is already stored within their user profile.

It is understood that assessment of defects and/or damage associated with products is generally required for various activities such as Quality Assurance (QA), issuing insurance claims, logistics, and the like. Such assessments can be performed by an agent of the enterprise through an interactive session with the user. During the interactive session, the user can display the product to the agent through a video feed. This video feed can be used by the agent to perform their assessment and provide the necessary assistance. It is noted that it is difficult to interpret the nature of a defect or damage by viewing the 2D video feed due to a lack of 3D information such as the 3D extent of damage, depth of the damage and the like. Such inefficiencies can also be addressed by the various embodiments of the present disclosure. For example, an automated agent, created by using stored augmented content records from the database, may either analyze and compare augmented content records generated using the 2D input, i.e., the 2D video feed received during the interactive session from the user with one or more stored augmented content records corresponding to one or more users from the database or be trained over the stored augmented content records corresponding to one or more users from the database (such as database 116) to determine/detect the defect or damage in a product in 3D. In other words, 2D input is used to generate a 3D model, then one or more damaged regions are determined by the automated agent.

To that end, FIGS. 10A-10C, collectively, depict example representations of the various stages of an interactive session with an automated agent for receiving assistance while processing an insurance claim, in accordance with various embodiments of the invention.

depicts a user 1002 recording a damaged vehicle 1004 using an electronic device1006. In an embodiment, the user 1002 initiates the interactive session initiation request using the electronic device 1006.

The system 150 receives an interactive session initiation request from the user 1002. It is noted that the interactive session initiation request includes the video feed, i.e., an input corresponding to the damaged vehicle 1004, i.e., an object. It is noted that the terms ‘input’ and ‘object’ can interchangeably be used to describe ‘one or more inputs’ and ‘one or more objects’. In various alternative embodiments, the interactive session initiation request may also include, but is not limited to, one or more images, videos, audio-visual data and/or a combination thereof. Upon determining that the object is the damaged vehicle 1004 and the user intent is to process an insurance claim, the system 150 selects an automated agent for providing one or more assistances assistance to the user 1002 in relation to the damaged vehicle 1004. In an example, the one or more assistances may include generating a cost to user table, cost to insurer table, repair costs table and the like. Alternatively, a human agent or a chat bot can also be selected by the system 150 to assist the user 1002. Further, the system 150 facilitates the interaction session between the user 1002 and the automated agent.

In order to provide assistance for assessing the damage associated with the damaged vehicle 1004, the automated agent may access the one or more stored augmented content records in a database associated with the system (such as database 116). In an embodiment, the one or more stored augmented content records may include a three-dimensional model of an undamaged version of the damaged vehicle 1104. In an example, the 3D model of undamaged version of the damaged vehicle 1104 may have been generated through past interaction sessions initiated with the same user i.e., the user 1002 and stored as augmented content record. In an alternative example, the 3D model of undamaged version of the damaged vehicle 1104 may have been generated through past interaction sessions initiated with another user that owns the same vehicle and stored as augmented content record on the database 116 associated with the system 150 with a suitable tag.

It is understood that generating an augmented content record by the system 150 includes extracting one or more frames (such as image frames) from the input, i.e., video feed. In an alternative example, instead of being a video feed, the input includes one or more images 1010 (displayed on a UI 1008, i.e., a display screen of the electronic device 1006) taken from different viewpoints of an object present in the input. Then, the system 150 determines one or more geometrical co-ordinates corresponding to the object, i.e., the vehicle. In an example, the one or more geometrical co-ordinates can be determined using classical computer vision techniques like photogrammetry, stereo vision, or the like. Further, correlation data is generated based on the one or more geometrical co-ordinates. This correlation data is used, at least in part, to construct the augmented content record. Since, the augmented content record is constructed using the 3D co-ordinated of the vehicle, it may represent the 3D model of the vehicle. In a non-limiting example, if the vehicle is undamaged when the augmented content record is generated, then it represents a 3D model of the undamaged vehicle. In an alternative non-limiting example, if the vehicle is damaged when the augmented content record is generated, then it represents a 3D model of the damaged vehicle 1004.

In an embodiment, the automated agent compares the augmented content record of the undamaged vehicle with the augmented content record of the damaged vehicle 1004 to determine one or more damaged regions on the object, i.e., the vehicle. In other words, 3D model of the undamaged vehicle is compared with the 3D model of the damaged vehicle 1004 to determine one or more damaged regions on the object, i.e., the vehicle. Before comparison, both the 3D models are aligned with each other using techniques such as 3D-3D registration. Then, in one example, voxel-wise comparison is performed by the automated agent to determine the one or more damaged regions on the vehicle. During the voxel-wise comparison, a comparison of the regional grey matter ‘density’ between the damaged 3D model from the undamaged 3D model is performed to identify the one or more damaged regions. In an example representation depicted by , the one or more damaged regions on the vehicle through one or more bounding boxes (see, 1108). In an embodiment, deep learning techniques can be used by the automated agent to assess the one or more damaged regions. In a non-limiting, the deep learning techniques may include techniques such as Egocentric Network (EGONET). In one implementation, the deep learning network first detects the object in the input and draws a 2D bounding box around it. A 3D bounding box and alignment of the object are then inferred from the system 150 using an automated agent trained using similar data. A 3D bounding box is also drawn around the undamaged 3D model. A transformation matrix is then calculated to align the 3D bounding box of the 3D model of the undamaged vehicle to the 3D bounding box inferred from the interactive session, which in turn aligns the undamaged 3D model to the pose of the inferred 3D model (i.e., 3D model of the damaged vehicle).

In another scenario, when the constructed 3D model of the undamaged vehicle is not precise or if it cannot be constructed, a trained deep neural network can be used by the automated agent to predict the location and type of defect/damage on the 2D image of the product (i.e., vehicle). For example, a classical computer vision technique called ray tracing can be used, to project 2D damages onto the 3D model of the undamaged vehicle. During the ray tracing process, an estimated 3D pose(s) of the product in the input, i.e., 2D image is received. Then, the generic 3D model, i.e., the undamaged 3D model is aligned with respect to the estimated pose. In other words, 2D image of the damaged vehicle is analysed and a pose of the damaged vehicle in the 3D space is determined. Then, the generic 3D model of the undamaged vehicle is aligned with the pose of the damaged vehicle in the 2D image. Further, a ray is projected from each point within the damaged region (i.e., from the 2D image) onto the 3D model (such as 3D model of undamaged vehicle). Then, the point on the 3D model which is nearest to the projected ray of the 2D damage point is identified and highlighted. This process is known as inverse projection, as it is inverse to the normal way of projection, which is from 3D to 2D. In inverse projecting, the centroid of each region is identified using deep learning networks and represents a damaged region on the vehicle. It should be noted that all the points within the damaged region can be projected on to 3D model without limitation. This allows labeling of the 3D model, such as type of damage such as scratch, dent, etc. In some embodiments, techniques such as 3D image registration may also be used by the automated agent to generate the one or more assistances.

In another embodiment, the automated agent displays the one or more damaged regions on the object to the user 1002 during or after the interactive session (see 1108 of ). In another embodiment, the automated agent analyzes the one or more damaged regions on the object to provide one or more assistances to the user 1002. In a non-limiting example, the one or more assistances includes providing a table 1050 (see, ) depicting a service cost estimation sheet obtained after performing an assessment on the one or more damaged regions on the object. In one example, the table 1050 may include various fields such as Serial Number (S. No), 1052, date 1054, nature of expense 1056, amount (in currency XX) 1058.

In one implementation, damage assessment is performed by the automated agent using the deep neural network. The parameters that are assessed by automated agent may include but are, not limited to, the shape of the defect, the location of the defect, the extent and severity of the defect, the defective part name, and the like based on the one or more damaged regions. Further, labeling information of the ideal 3D model is used to decide each corresponding sub-part that has been damaged in the test 3D model. In an example, the cost for total damage is calculated by automated agent using the product catalogue. In a non-limiting example, the product catalogue may include information related to the repair costs, costs of spare parts, costs of accessories along with other corresponding charges.

shows a flow diagram of a method 1100 for providing post-interaction assistance to users, in accordance with an embodiment of the invention. The various steps and/or operations of the flow diagram, and combinations of steps/operations in the flow diagram, may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or by a system such as the system 150 explained with reference to FIGS. 1 to 10 and/or by a different device associated with the execution of software that includes one or more computer program instructions. The method 1100 starts at operation 1102.

At operation 1102 of the method 1100, a system such as the system 150 receives an interactive session initiation request from a user. The interactive session initiation request includes an input corresponding to an object. For example, if a user has a doubt about filling out a form, the user can take photographs (i.e., the input) of the form (i.e., the object) using their electronic device 104. Further, the user can send an interactive session initiation request to the system 150, for availing an assistance with regards to filling the form. In another example, if the user wishes to get a cost estimate for availing services for his/her damaged vehicle, then the user can take photographs or videos of the damaged vehicle during the interactive session. Then, the agent can provide the cost estimate of the damaged vehicle 1004 after assessing the damages in the vehicle.

At operation 1104 of the method 1100, the system 150 initiates an interaction session between the user 102 and an agent 110 of an enterprise based, at least in part on, the input. For example, based on images or videos of the form that needs to be filled by the user 102 received by the system, will initiate the interaction session between the agent 110 and the user 102. During the interactive session, the agent 110 can assist and guide the user 102 to complete the form. In the case of an insurance form to be filled by the user 102, the enterprise will be the insurance company, and the user 102 will be a customer of the enterprise.

At operation 1106 of the method 1100, the system 150 facilitates the interaction session between the user 102 and the agent 110 for assisting the user 102. The agent 110 provides one or more assistances to the user, to fill and complete filling the form. In one embodiment, the agent 110 provides assistance in the mode prescribed by the user. For example, if the user has preferred only text assistance, the agent 110 provides assistance in the text to complete the filling of the form. It should be noted that filling out the form is an example used in various embodiments, other requests such as, performing an assessment of the vehicle, obtaining a cost estimate after assessment, etc, are also possible without limitation.

At operation 1108 of the method 1100, the system 150 records the interaction session in a database such as the database 116 associated with the system 150. The interaction session can be used to assist any user requests that are received subsequent to the current interaction session, for similar or identical requests.

At operation 1110 of method 1100, the system 150 generates and stores an augmented content record corresponding to the interaction session based, at least in part, on the one or more assistances and at least a portion of the interaction session in the database 116.

It should be noted that the augmented content record refers to augmented reality-based record of any assistance provided to the user to complete the particular request initiated by the user. In one embodiment, the augmented content records correspond to portions of an agent’s assistance during the interaction session between a user and an agent 110 that recreates portions of the agent assistance. These augmented content records can be used during future interaction sessions for assisting the same or different users. In one embodiment, the augmented content record may include extracted features of the object, the annotations, geometrical coordinates of each annotation, correlation data between each extracted feature and corresponding annotation, and the intent of the interaction session. In one embodiment, the augmented content record may include additional information inputted by the agent 110 along with the one or more assistances, after completing the request of the user. In an embodiment, the augmented content record includes 3D models of products. In another embodiment, the augmented content record can be updated and more assistance data can be added to an existing augmented content record, based on the assistance provided in each subsequent request.

At operation 1112 of method 1100, the system 150 generates an augmented visual session based, at least in part, on the augmented content record, upon receiving an assistance request from the user subsequent to the completion of the interaction session.

At operation 1114 of method 1100, the system 150 facilitates display of the augmented visual session on an electronic device associated with the user.

shows a flow diagram of a method 1200 for providing assistance to users, in accordance with an embodiment of the invention. The various steps and/or operations of the flow diagram, and combinations of steps/operations in the flow diagram, may be implemented by, for example, hardware, firmware, a processor, circuitry, and/or by a system such as the system 150 explained with reference to FIGS. 1 to 11 and/or by a different device associated with the execution of software that includes one or more computer program instructions. The method 1200 starts at operation 1202.

At operation 1202 of the method 1200, a system such as the system 150 receiving an assistance request. The assistance request includes an input corresponding to an object.

At operation 1204 of the method 1200, the system 150 accesses one or more augmented content records from a database associated with the system based, at least in part, on the object. The one or more augmented content records includes at least one or more assistances corresponding to the object.

At operation 1206 of the method 1200, the system 150 generates an augmented visual session based, at least in part, on the one or more augmented content records. It is noted that by using one or more augmented content records for generating the augmented visual session, different forms or types of assistances can be collated together to generate a master augmented content record that includes most information regarding assistances that may be required by the user.

At operation 1208 of the method 1200, the system 150 facilitates a display of the augmented visual session on an electronic device associated with a user.

shows a flow diagram of a method 1300 of generating augmented content records, in accordance with an embodiment of the invention. The various steps and/or operations of the flow diagram, and combinations of steps/operations in the flow diagram, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or by a system such as the system 150 explained with reference to FIGS. 1 to 12 and/or by a different device associated with the execution of software that includes one or more computer program instructions. The method 1300 starts at step 1302.

At step 1302 of the method 1300, a system such as the system 150 extracts one or more image frames from the input. In an embodiment, the input can be one or more images or videos of the object (for example, the insurance form, damaged vehicle, etc.) for which the user seeks assistance from the agent.

At step 1304 of the method 1300, the system 150 determines one or more features of the object based, at least in part, on the one or more image frames. For example, in the case of the insurance form, the one or more features can be one or more fields in the form for which the user request assistance.

At step 1306 of the method 1300, the system 150 determines one or more geometrical co-ordinates corresponding to the object based, at least in part, on the one or more assistances. At step 1308 of the method 1300, the system 150 generates correlation data based, at least in part, on correlating the one or more geometrical co-ordinates with the extracted one or more features.

At step 1310 of the method 1300, the system 150 receives from the agent 110, one or more intents of the user corresponding to the input from the interaction session.

At step 1313 of the method 1300, the system 150 generates the augmented content record based, at least in part, on the correlation data and the one or more intents of the user. The augmented content record is stored in the database 116. Further, the augmented content may be used by another agent or an automated agent to respond to an assistance request used from the same or another user.

shows a flow diagram of a method 1400 for generating an automated agent, in accordance with an embodiment of the invention. The various steps and/or operations of the flow diagram, and combinations of steps/operations in the flow diagram, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or by a system such as the system 150 explained with reference to FIGS. 1 to 13 and/or by a different device associated with the execution of software that includes one or more computer program instructions. The method 1400 starts at step 1402.

At 1402 of the method 1400, a system such as system 150 accesses one or more augmented content records corresponding to one or more users from a database such as the database 116.

At 1404 of the method 1400, the system 150 generates an automated agent based, at least in part, on the one or more augmented content records. In some embodiments, the automated agent may be generated using deep learning techniques, i.e., the automated agent is trained using one or more augmented content records by utilizing deep learning techniques.

shows a flow diagram of a method 1500 for providing assistance to users through an automated agent, in accordance with an embodiment of the invention. The various steps and/or operations of the flow diagram, and combinations of steps/operations in the flow diagram, may be implemented by, for example, hardware, firmware, a processor, circuitry and/or by a system such as the system 150 explained with reference to FIGS. 1 to 13 and/or by a different device associated with the execution of software that includes one or more computer program instructions. The method 1500 starts at operation 1502.

At 1502 of the method 1500, a system such as the system 150 receives an interactive session initiation request. The interactive session initiation request includes a two-dimensional input corresponding to a damaged object. In a non-limiting example, the damaged object may be a vehicle.

At 1504 of the method 1500, the system generates an augmented content record based, at least in part, on the input. The augmented content record indicates a three-dimensional model of the damaged object. The augmented content record indicates a 3D model of the damaged object. The steps for generating the 3D models have already been explained with reference to FIGS. 10A-10C and the same is not repeated here for the sake of brevity.

At 1506 of the method 1500, the system 150 accesses one or more augmented content records corresponding to one or more users from a database 116 associated with the system 150. The one or more augmented content records includes at least one or more assistances corresponding to the two-dimensional input. In one embodiment of the invention, the one or more augmented content record further includes at least a three-dimensional model of the undamaged vehicle. In an example, the assistance may include identifying one or more damaged regions on the object of the user. It is noted that various embodiments relating to same have already been explained with reference to FIGS. 10A-10C earlier in the present disclosure.

At 1508 of the method 1500, the system 150 generates an automated agent based, at least in part, on the one or more augmented content records.

At 1510 of the method 1500, the system 150 determines via the automated agent, one or more damaged regions on the damaged object based, at least in part, on the augmented content record.

At 1512 of the method 1500, the system 150 facilitates the display of one or more assistances on an electronic device of a user. The one or more assistances being generated by the automated agent based, at least in part, on analysing the one or more damaged regions. In a non-limiting example, the automated agent may generate a damage analysis report or fill an insurance claim for the user based on analysing the one or more damaged regions.

Various embodiments disclosed herein provide numerous advantages. More specifically, the embodiments disclosed herein suggest techniques for providing post-interaction assistance for users in an efficient manner. The annotations provided by the agent 110 to assist the user are stored in relation to extracted features and corresponding geometrical co-ordinates as agent assistance in an augmented content record. Such an intelligent way of storing agent assistance enables recreation of the agent assistance when requested by the user. Moreover, such intelligent storage of agent assistance ensures easy retrieval of relevant portions of the agent assistance for recreation when requested by the user thereby, sparing the user the hassle of going through the entire recorded interaction session. As the annotations are stored in reference to extracted features along with geometrical co-ordinates, the annotations are adapted to align with video signals displayed by the user seeking assistance for providing the desired assistance without the intervention of an agent 110. Moreover, such augmented content records are used to build automated agents for each intent/troubleshooting-scenario. These automated agents may be deployed to assist users with the same or similar query with the annotations thereby significantly improving user experience.

The foregoing descriptions of specific embodiments of the present invention have been presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the present invention to the precise forms disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiment was chosen and described in order to best explain the principles of the present invention and its practical application, to thereby enable others skilled in the art to best utilize the present invention and various embodiments with various modifications as are suited to the particular use contemplated.

Claims

A computer-implemented method, the method comprising:
receiving, by a system, an assistance request, the assistance request comprising an input corresponding to an object;
accessing, by the system, one or more augmented content records from a database associated with the system based, at least in part, on the object, the one or more augmented content records comprising at least one or more assistances corresponding to the object;
generating, by the system, an augmented visual session based, at least in part, on the one or more augmented content records; and
facilitating, by the system, a display of the augmented visual session on an electronic device associated with a user.
The computer-implemented method of claim 1, further comprising:
receiving, by the system, an interactive session initiation request, the interactive session initiation request comprising the input corresponding to the object;
initiating, by the system, an interaction session between the user and an agent of an enterprise based, at least in part, on the input;
facilitating, by the system, the interaction session between the user and the agent for assisting the user, wherein the agent provides the one or more assistances to the user;
recording, by the system, the interaction session in the database; and
generating and recording, by the system, an augmented content record corresponding to the interaction session based, at least in part, on the one or more assistances and at least a portion of the interaction session in the database.
The computer-implemented method of claim 2, wherein recording the interaction session further comprises:
receiving, by the system, one or more additional information corresponding to the object from the agent;
receiving, by the system, a tag indicating the intent of the interaction session from the agent, wherein the tag is set by the agent to the one or more additional information; and
recording, by the system, the interaction session with the tag in the database.
The computer-implemented method of claim 1, wherein generating the augmented content record further comprises:
extracting, by the system, one or more image frames from the input;
determining, by the system, one or more features based, at least in part, on the one or more image frames;
determining, by the system, one or more geometrical co-ordinates corresponding to the object based, at least in part, on the one or more assistances;
generating, by the system, correlation data based, at least in part, on correlating the one or more geometrical co-ordinates with the extracted one or more features;
receiving, by the system, one or more intents of the user corresponding to the input from the interaction session; and
generating, by the system, the augmented content record based, at least in part, on the one or more intent, the one or more assistances in relation to the one or more portions of the interaction session, and the correlation data.
The computer-implemented method of claim 1, further comprising:
accessing, by the system, the one or more augmented content records corresponding to one or more users from the database; and
generating, by the system, an automated agent based, at least in part, on the one or more augmented content records.
The computer-implemented method of claim 1, further comprising:
facilitating, by the system, one or more options to be displayed on the electronic device corresponding to the one or more assistances; and
rendering, by the system, at least one assistance of the one or more assistances based, at least in part, on the user selecting at least one option of the one or more options.
The computer-implemented method of claim 1, further comprising:
receiving, by the system, the assistance request comprising at least one image frame of the object, from the user;
accessing, by the system, the one or more portions of the one or more assistances in relation to the image frame from the augmented content record;
aligning, by the system, the accessed one or more assistances in the at least one image frame of the object; and
facilitating, by the system, a display of the at least one image frame with one or more assistances to the user.
The computer-implemented method of claim 7, further comprising:
extracting, by the system, at least one or more features from the at least one image frame;
comparing, by the system, the extracted one or more features with one or more features in the augmented content record;
determining, by the system, at least one match of the one or more features in the extracted one or more features and the augmented content record;
accessing, by the system, the augmented content record corresponding to the at least one match; and
generating, by the system, the augmented visual session based, at least in part, on one portion of the augmented content record.
The computer-implemented method of claim 1, wherein the input comprises at least one of a visual comprising one of a video and image, an audio and audio-visual input.
The computer-implemented method of claim 1, wherein the agent comprises at least one of human and an automated agent.
A system, comprising:
at least one processor; and
a memory having stored therein machine executable instructions, that when executed by the at least one processor, cause the system, at least in part, to:
receive an assistance request, the assistance request comprising an input corresponding to an object;
access one or more augmented content records from a database associated with the system based, at least in part, on the object, the one or more augmented content records comprising at least one or more assistances corresponding to the object;
generate an augmented visual session based, at least in part, on the one or more augmented content records; and
facilitate a display of the augmented visual session on an electronic device associated with a user.
The system of claim 11, wherein the system is further caused, at least in part, to:
receive an interactive session initiation request, the interactive session initiation request comprising the input corresponding to the object;
initiate an interaction session between the user and an agent of an enterprise based, at least in part, on the input;
facilitate the interaction session between the user and the agent for assisting the user, wherein the agent provides the one or more assistances to the user;
record the interaction session in the database; and
generate and store an augmented content record corresponding to the interaction session based, at least in part, on the one or more assistances and at least a portion of the interaction session in the database.
The system of claim 12, wherein to record the interaction session, the system is further caused, at least in part, to:
receive one or more additional information corresponding to the object from the agent;
receive a tag indicating the intent of the interaction session from the agent, wherein the tag is set by the agent to the one or more additional information; and
record the interaction session with the tag in the database.
The system of claim 11, wherein to generate the augmented content record, the system is further caused, at least in part, to:
extract one or more image frames from the input;
determine one or more features based, at least in part, on the one or more image frames;
determine one or more geometrical co-ordinates corresponding to the object based, at least in part, on the one or more assistances;
generate correlation data based, at least in part, on correlating the one or more geometrical co-ordinates with the extracted one or more features;
receive one or more intents of the user corresponding to the input from the interaction session; and
generate the augmented content record based, at least in part, on the one or more intent, the one or more assistances in relation to the one or more portions of the interaction session, and the correlation data.
The system of claim 11, wherein the system is further caused, at least in part, to:
access the one or more augmented content records corresponding to one or more users from the database; and
generate an automated agent based, at least in part, on the one or more augmented content records.
The system of claim 11, wherein the system is further caused, at least in part, to:
facilitate one or more options to be displayed on the electronic device corresponding to the one or more assistances; and
render at least one assistance of the one or more assistances based, at least in part, on the user selecting at least one option of the one or more options.
The system of claim 11, wherein the system is further caused, at least in part, to:
receive the assistance request comprising at least one image frame of the object, from the user;
access one or more portions of the one or more assistances in relation to the image frame from the augmented content record;
align the accessed one or more assistances in the at least one image frame of the object; and
facilitate a display of the at least one image frame with one or more assistances to the user.
The system of claim 17, wherein the system is further caused, at least in part, to:
extract at least one or more features from the at least one image frame;
compare the extracted one or more features with one or more features in the augmented content record;
determine at least one match of the one or more features in the extracted one or more features and the augmented content record;
access the augmented content record corresponding to the at least one match; and
generate the augmented visual session based, at least in part, on one portion of the augmented content record.
The system of claim 11, wherein the input comprises at least one of a visual, an audio and audio-visual input.
The system of claim 11, wherein the agent comprises at least one of human and an automated agent.
A non-transitory computer-readable storage medium comprising computer-executable instructions that, when executed by at least a processor of a system, cause the system to perform a method comprising:
receiving an assistance request, the assistance request comprising an input corresponding to an object;
accessing one or more augmented content records from a database associated with the system based, at least in part, on the object, the one or more augmented content records comprising at least one or more assistances corresponding to the object;
generating an augmented visual session based, at least in part, on the one or more augmented content records; and
facilitating a display of the augmented visual session on an electronic device associated with a user.
A computer-implemented method, comprising:
receiving, by a system, an interactive session initiation request, the interactive session initiation request comprising a two-dimensional input corresponding to a damaged object;
generating, by the system, an augmented content record based, at least in part, on the input, the augmented content record indicating a three-dimensional model of the damaged object;
accessing, by the system, one or more augmented content records corresponding to one or more users from a database associated with the system, the one or more augmented content records comprising at least one or more assistances corresponding to the two-dimensional input;
generating, by the system, an automated agent based, at least in part, on the one or more augmented content records;
determining, by the system via the automated agent, one or more damaged regions on the damaged object based, at least in part, on the augmented content record; and
facilitating, by the system, a display of one or more assistances on an electronic device of a user, the one or more assistances being generated by the automated agent based, at least in part, on analysing the one or more damaged regions.