WO2024089682A1 - Task-oriented engaged conversations between computers and human subjects - Google Patents
Task-oriented engaged conversations between computers and human subjects Download PDFInfo
- Publication number
- WO2024089682A1 WO2024089682A1 PCT/IL2023/051082 IL2023051082W WO2024089682A1 WO 2024089682 A1 WO2024089682 A1 WO 2024089682A1 IL 2023051082 W IL2023051082 W IL 2023051082W WO 2024089682 A1 WO2024089682 A1 WO 2024089682A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- subject
- conversation
- task
- human
- statement
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 47
- 230000004044 response Effects 0.000 claims abstract description 34
- 230000000007 visual effect Effects 0.000 claims abstract description 9
- 230000001186 cumulative effect Effects 0.000 claims abstract description 6
- 238000004458 analytical method Methods 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 13
- 238000012549 training Methods 0.000 claims description 11
- 238000004590 computer program Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 6
- 206010042008 Stereotypy Diseases 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 238000011160 research Methods 0.000 claims description 4
- 241000282412 Homo Species 0.000 description 12
- 239000003795 chemical substances by application Substances 0.000 description 5
- 230000014509 gene expression Effects 0.000 description 4
- 238000012805 post-processing Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000000875 corresponding effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000033001 locomotion Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 102100034761 Cilia- and flagella-associated protein 418 Human genes 0.000 description 1
- 208000025967 Dissociative Identity disease Diseases 0.000 description 1
- 101100439214 Homo sapiens CFAP418 gene Proteins 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000003339 best practice Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000008094 contradictory effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 210000004709 eyebrow Anatomy 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 230000000977 initiatory effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
- 238000013024 troubleshooting Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
- G06F40/35—Discourse or dialogue representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
- G06V40/28—Recognition of hand or arm movements, e.g. recognition of deaf sign language
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/02—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail using automatic reactions or user delegation, e.g. automatic replies or chatbot-generated messages
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/216—Handling conversation history, e.g. grouping of messages in sessions or threads
Definitions
- the present disclosure in some embodiments, concerns conversations between computers and human subjects, and more specifically, but not exclusively, to systems and methods for planning, conducting, and analyzing a conversation between the computer and the human subject.
- Digital humans are computer programs that communicate autonomously with people, through media such as text, audio, and video. Digital humans may be deployed as avatars or "chatbots.” Digital humans may be utilized to communicate with people on a variety of matters, including but not limited to sales, employment interviews, customer support, and IT support.
- the present disclosure provides a system and method for conducting a task- oriented, engaged conversation between a digital human and a human subject.
- task-oriented signifies that each sentence spoken by the digital human is selected with a particular objective in mind.
- engaged conversation signifies that the digital human adapts the flow of the conversation to the responses of the subject in real time.
- the task-oriented engaged conversation may be adapted to any type of business objective, including, for example, development of sales leads, closing of sales, interviewing of candidates, performing IT consultations, responding to customer service requests, performing financial consultations, or providing personalized training.
- each "act” consists of a new, unscripted statement determined in real time based on the state of the conversation to that point in view of the overall objective of the conversation.
- the digital human considers not only verbal responses by the subject but also nonverbal cues and any information about the subject learned during the conversation.
- a method of conducting a task-oriented, engaged conversation between a computer and a human subject includes: defining a task and one or more key performance indicators for the task; determining an outline of a conversation with the subject in accordance with the task and one or more key performance indicators; performing a dialogue with the subject in a series of acts, wherein each act comprises: (i) determining a strategic goal for the act; (ii) formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, (iii) presenting the statement to the subject; (iv) receiving a response from the subject, (v) analyzing the response based on both the text and visual cues generated by the subject; and (vi) determining a strategic goal for a subsequent act in accordance with the response and a cumulative state of the conversation.
- the method further includes performing the step of defining a task and one or
- the method further includes performing the step of defining a task and one or more key performance indicators by a human manager of the computer.
- the steps of presenting and receiving the response comprise animating a digital human with at least one of a set of communication modes comprising audio, video, and text.
- the method further includes generating an appearance and speech mannerisms of the digital human in accordance with an expected preference of the human subject.
- the method further includes selecting a personality type for the digital human in accordance with an expected preference of the human subject.
- the method further includes evaluating whether one or more key performance indicators may be better achieved with a person instead of the digital human, and, if the evaluating step results in a determination that the key performance indicators would be better achieved with the person, transitioning the conversation from the digital human to the person.
- the method further includes transitioning a delivery of the response during the conversation between different communication modes.
- the method further includes training the computer with a large-language model and logic regarding conducting of conversations derived from stored data.
- the training step further comprises performing a fine-tuning of the large language model based on the logic.
- the stored data comprises one or more of: (1) previously recorded conversations involving the computer; (2) previously recorded conversations from other media; and (3) industry research.
- the method further includes commencing each act within approximately 200 milliseconds of completion of a previous act.
- the step of analyzing the response comprises analyzing the text and visual cues as they are received in real time.
- the method further includes performing a post-conversation analysis, and using results of the post-conversation analysis to improve subsequent performance of the computer.
- the post-conversation analysis includes analysis regarding optimization of the outline, optimization of each statement that was presented, and achievement of the one or more key performance indicators.
- the task is interviewing the subject for an employment position.
- the one or more key performance indicators comprise determining vectors corresponding to a plurality of personality traits.
- the method further includes comparing the determined vectors to predefined vectors representing ideal personality traits for the employment position.
- the comparing step comprises plotting the vectors on a polygon and comparing a shape of the polygon of the determined vectors to a shape of the polygon of the predefined vectors.
- the task is selling a service or product to the subject.
- the one or more key performance indicators comprise one or more of: classifying a lead, developing a lead, closing a sale, conducting a follow-up conversation after a sale, or conducting a full sales conversation.
- a computer program product for conducting a task-oriented, engaged conversation between a computer and a human subject includes: a cognitor module for: defining a task and one or more key performance indicators for the task; determining an outline of a conversation with the subject in accordance with the key performance indicator; and, during performance of a dialogue with the subject in a series of acts, analyzing responses by the subject based on both text and visual cues generated by the subject, and determining a strategic goal for each act in accordance with responses received from the subject and a cumulative state of the conversation; and a communicator module for: formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, presenting the statement to the subject, and receiving a response from the subject.
- the communicator is configured to present the statement by animating a digital human with at least one of audio, video, and text.
- the communicator is configured to generate an appearance and speech mannerisms of the digital human in accordance with an expected preference of the human subject.
- FIG. 1 illustrates steps in a method for training a computer system to perform a task-oriented engaged conversation, performing the conversation, and conducting post-conversation analysis, according to embodiments of the present disclosure
- FIG. 2 illustrates a more detailed view of the method steps of FIG. 1, according to embodiments of the present disclosure
- FIG. 3 illustrates a graphical user interface depicting a conversation between a digital human and a human subject, according to embodiments of the present disclosure
- FIG. 4 illustrates output generated by the system when used to analyze suitability of a candidate for an employment position, according to embodiments of the present disclosure.
- FIG. 5 illustrates the planning of different sales conversations, according to embodiments of the present disclosure.
- the present disclosure in some embodiments, concerns conversations between computers and human subjects, and more specifically, but not exclusively, to systems and methods for planning, conducting, and analyzing a conversation between the computer and the subject.
- the systems and methods described herein are implemented by a computer.
- the computer may include a processor and a memory.
- the memory may have a computer program stored thereon containing software instructions that, when executed by the processor, cause the processor to perform various functions, as set forth herein.
- the processor and memory may be stored on a physical computer, on a cloud-based or virtualized computer, or on any combination thereof.
- the computer may include various hardware components that enable performance of the conversational functions described herein. These hardware components may include an image sensor, a display, a microphone, and a speaker. In certain implementations, the computer engages in a conversation with a human subject through a multichannel combination of audio, video, and text outputs. These outputs may be communicated through a device (e.g., a computer or mobile telephone) of the human subject. The conversation proceeds, from the perspective of the human subject, similar to a video chat between two people. In such embodiments, the computer need not utilize the integrated hardware components for conducting of the conversation. Regardless, such hardware may be useful for involving a manager in the conversation. The manager may use the display, microphone, camera, and speaker in order to participate in the conversation between the digital human and the subject, as will be discussed further herein.
- the computer is described as conducting a conversation in the realms of employment interviewing or sales. These uses are merely exemplary, and the computer is equally capable of conducting the conversation in any task- oriented application, including but not limited to education, consulting, IT troubleshooting, or customer service.
- the conversation may be "business to business” (B2B) or "business to client” (B2C).
- FIG. 1 schematically depicts stages in the training and operation of a computer system for a task-oriented engaged conversation, according to embodiments of the present disclosure.
- FIG. 2 depicts a more detailed view of the same stages.
- the operation of the computer system is divided into three stages: a training stage 110, a real-time conversation stage 120, and a post-processing stage 130.
- the output of the computer system is a task-oriented, engaged conversation 140.
- training stage 110 the computer system is trained by combining two primary types of inputs 111: data 112 and knowledge 114.
- Data 112 refers to examples of spoken text taken from a large language model (LLM) 112a.
- the large language model 112a may include an extremely large amount of textual parameters that exemplify the syntactical use of language.
- the data 112 may also derive from previously recorded conversations 112b, whether conversations recorded by the computer system itself or from independently sourced recordings. For example, a database of recorded interviews, consisting of hundreds of thousands of interviews, may be uploaded into a neural network. Each interview may be annotated with one or more tags regarding, for example, the purpose of the interview, the objectives of the interview, how well the candidate performed during the interview, or how statements of the candidate were correlated with skills or personality traits.
- the data obtained from prior conversations may be processed in order to enable more efficient analysis. This processing may include: transcription; correction of syntax, and translation.
- Knowledge 114 refers to a logic regarding performance of the types of conversations described herein.
- the logic may relate to the flow of conversations, including best practices or expected patterns in the performance of these conversations.
- a Large Language Model that is trained to generate prose may have difficulty generating a response when the text is a math problem presented in prose format, and the expected response is an answer to the math problem.
- the LLM needs to be specifically trained to recognize the math problem and generate a suitable response.
- a typical Large-Language Model is not specifically trained for performance of conversations in general, and business conversations in particular, and requires training regarding the typical flow and structuring of such conversations.
- the knowledge 114 may also relate to recognition of certain conclusions from patterns in speech. For example, a user that repeats the same answer multiple times may be deemed evasive or non-responsive.
- Knowledge 114 also relates to global industry information 114a regarding tasks, including key performance indicators (KPIs) 114b for different types of tasks or conversations.
- KPIs key performance indicators
- knowledge 114 may include desirable traits for certain positions. For example, an executive may require the ability to delegate and manage, whereas an executive assistant may require the ability to multitask.
- This knowledge may be derived from published literature and hundreds of thousands of data points regarding (for example) sales, personality traits, and leadership qualities.
- the KPIs may include quantity or price points of sales.
- the data 112 and knowledge 114 are combined into a conversation model 116.
- the conversation model 116 incorporates artificial intelligence (Al) and natural language processing (NLP) and is used to generate human-like sentences, specifically for dialogue.
- the conversation model 116 is a fine-tuned version of an autoregressive model such as Generative Pre-Trained Transformer 3 (GPT-3).
- Fine-tuning refers to a process of modifying or tweaking particular layers of a neural network in order to adapt a model that has already been trained for one given task to make it perform a second similar task.
- the fine-tuning process is also known as transfer learning.
- the fine-tuning is performed in order to adjust a model initially designed for generation of continuous coherent text (which is the primary application of large language models such as GPT-3) and applying this learning toward the generation of intelligent dialogue, as discussed.
- client information 113 may include information about an interview subject (e.g. age, gender, experience).
- Animation generator 115 is a computer program or module that is used to generate and animate a digital human.
- the training 110 may proceed in multiple sub-stages 1-11.
- the sub-stages address processing of the inputs in order to derive therefrom the planning and presentation of a task-oriented engaged conversation, as discussed above and below.
- the computer performs a conversation according to the following sequence of steps.
- the computer system defines a task and one or more key performance indicators (KPIs) for the task.
- the task may be, for example, conducting a sale, and the key performance indicators may include selling a certain number of units or services at a given price.
- the task may alternatively be conducting an interview, and the key performance indicators may be determining scores for the interviewee regarding certain personality traits.
- the computer system itself determines the nature of the task and key performance indicators.
- the system may determine the task and KPIs based on constraints that are preprogrammed into the system. For example, the system may be programmed to sell at a target price point, and may be authorized to give discounts up to 25%.
- the human subject may also state the reason for initiating contact, which, in turn, helps the computer system define the task and KPIs.
- a manager of the computer system may set a task and KPIs prior to each interview.
- the system determines an outline of the conversation with the subject in accordance with the task and key performance indicators.
- the outline dictates, in general terms, the expected structure of the conversation, including milestones which the conversation is expected to pass.
- the conversation may include an "introduction” phase, in which the computer system and the human subject introduce themselves and make small talk; a "body” phase, which is directed to the task at hand, and a “wrap-up” phase, in which the computer summarizes the conversation and identifies followup actions.
- the system may further identify a particular order in which it will attempt to achieve the relevant KPIs.
- the system performs a dialogue with the human subject of the conversation.
- the dialogue proceeds in a series of acts.
- act refers to a round of dialogue between the computer and the subject.
- Each act includes the following sub-steps: determining a strategic goal for the act, formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, presenting the statement to the subject and receiving a response from the subject; and determining a strategic goal for a subsequent act in accordance with the response and a cumulative state of the conversation.
- the strategic goal may be to evaluate the candidate's adaptability.
- the specific question that is formulated and delivered to the subject may be a question to the candidate regarding how he or she would respond to a particular scenario.
- the computer system evaluates what the strategic goal for the next question should be.
- This flow of the conversation may be executed by two separate software modules, which are referred to herein as “cognitor” 121 and “communicator “122.”
- the cognitor 121 performs the strategic planning, both prior to the conversation (setting a task and KPIs, formulating an outline) and during the conversation (determining the objective of each subsequent act).
- the communicator 122 executes the strategic plan by formulating the specific wording of each statement, as well as by determining the appearance of the digital human and the tone of the digital human during the delivery of the statement.
- the communicator 122 also determines the physical gestures of the digital human and adapts the gestures to match the text that is being delivered (for example, delivering an ironic statement with a wink).
- Control of the conversation proceeds cyclically between the cognitor 121 and communicator 122.
- communicator 122 delivers output 123, e.g., a statement, to the subject.
- this output is delivered via a digital human having the form of an avatar, referred to herein as an "Ivatar.”
- the subject responds to the statement, providing client input 124.
- the process then proceeds with intent recognition.
- Intent recognition is performed through recognition of the text spoken by the human subject in combination with cues such as facial appearance, speaking tone, and speed of speech.
- the text spoken by the human subject is analyzed using standard natural language processing techniques that are known to those of skill in the art. All textual information is considered, even if not directly responsive to the previous statement by the computer. Such information may be of relevance for an overall understanding of the subject, which may in turn affect the strategy for how to proceed with the conversation. For example, the subject may indicate parenthetically that he has a daughter, or drives a pickup truck, or runs marathons. Each of these points may cause the cognitor 121 to reevaluate the conversation and redirect the conversation to a different angle than that which was previously planned.
- the video and audio of the human subject are analyzed using machine learning techniques for detecting voice inflection and volume, and for detecting hand and facial motions.
- the intent recognition module evaluates statements of subjects through analysis of the subject's "Regular Expressions" (Regex).
- Regular Expressions refers to repeating patterns of syntax of a speaker.
- Regular Expression would be to use a certain word W with a frequency of every X sentences or every time period T.
- Regular Expressions may be used to evaluate speaking patterns on the following planes: textual (e.g., using language such as "we” as opposed to “I”); audio (speed and tone of speech); visual (use of facial expressions that are appropriate for the conversation) and video (appropriate body language and hand gestures).
- the cognitor 121 evaluates the state 142 of the conversation as a whole, in view of the most recently completed act 141, and in view of the overall goals of the conversation.
- the "state" 142 may be analogized to a state of a chessboard following a given move, and the last act 141 may be analogized to the last move.
- the strategic analysis of the next goal to attain is influenced by both the overall state and the last act, or the most recent acts. For example, if a subject has evaded answering a question, the system may formulate a different strategy for proceeding compared to a subject that has consistently answered forthrightly.
- This analysis of the next goal to achieve considers both the expected achievement of the task and the engagement of the subject (the flow of the dialogue). For example, if the cognitor 121 initially has a goal to complete a sale, but the human subject states during the conversation that he requires his partner's approval before completing the sale, the cognitor will no longer attempt to complete the sale, but instead invite the human subject to bring his partner, or to schedule a new time to continue the conversation.
- the communicator delivers the next statement within a typical time of completion of the last sentence by the subject (e.g., up to around 200 milliseconds).
- the human subject does not perceive any lag in the pace of the conversation.
- This loop of acts proceeds until the cognitor 121 determines that there is no need to continue the conversation. This may be because all the KPIs have been achieved or because, in view of the responses, it is not possible to advance the KPIs any further.
- the cognitor 121 then instructs the communicator 122 to deliver a statement ending the conversation.
- the communicator 122 may determine a desired persona for the virtual human.
- the persona includes an appearance for the virtual human and a language and accent in which the digital human is to communicate.
- the communicator 122 may generate a visual appearance and mannerisms that are adapted to the style of each customer or interviewee.
- the communicator 122 may also select a personality type for the digital human in accordance with an expected preference of the human subject.
- the system may engage the human subject in a series of questions designed to identify the demographic characteristics and personality type of the human subject.
- the personality type may be classified based on the PAEI model developed by Adizes, the Big 5 personality taxonomy, or the OPQ (Occupation Personality Questionnaire) personality test.
- the system may generate a database of stock characters with certain appearances and personas with different personality types as defined above. The appearances and personas may be chosen based on research and data regarding which personality types are best suited to conduct conversations with particular goals, as well as which personality types are best-suited to conduct conversations with other personality types.
- FIG. 3 illustrates an example of a screenshot 100 of a conversation between a digital human 101 and a human subject 102.
- the digital human 101 appears on one side of the split screen and the human subject 102 appears on the other side of the split screen.
- the digital human 101 is introducing itself to the human subject.
- the digital human may communicate with the subject using a combination of audio, video, and text.
- the video may include physical gestures performed by the digital human, such as opening of mouth, opening of eyes, and movement of eyebrows, as illustrated.
- the video presentation may include a transcript of the statement being spoken by the digital human, as illustrated.
- the system is built in an "omnichannel" manner that enables seamless transition between different modes of communication.
- a conversation may start with a video dialogue between a human subject and a digital human but may continue as a text-based or email-based discussion between the human subject and the computer system.
- the computer system may also evaluate whether one or more key performance indicators may be better achieved with a manager instead of the digital human. This evaluation may proceed, for example, on a basis of the pace of progress of the conversation, or in response to a specific request by the human subject. If the computer system determines that the key performance indicators would be better achieved with the manager, the computer may transition the conversation from the digital human to the manager. In the example of a sale, the human subject may display dissatisfaction with the progress of the conversation with the digital human and request a human representative. In the context of interviews, the system may reach inconclusive or contradictory results regarding the suitability of a candidate for a position, and thus may choose to refer the candidate to a manager for further evaluation.
- a conversation that includes a transfer between the system and a human agent is referred to herein as a "hybrid" conversation.
- the computer system engages in up to two types of analysis: task analysis 131 and conversation analysis 132. These analyses may be automated and/or may be performed under supervision of a manager.
- task analysis 131 the computer evaluates whether it achieved its desired tasks and KPIs. The computer further evaluates whether it structured the conversation in the most effective manner in order to achieve the desired task and KPIs. For example, if a particular statement by the computer generated a negative reaction by the human subject, the computer may take note of this and incorporate this negative outcome into the data sets. In this way, the logic model employed by the cognitor 121 continually improves.
- conversation analysis 132 the computer evaluates the flow of the conversation itself. This evaluation may include statistical analysis of the length of the conversation, the types of questions asked, and the order of questions asked. This analysis may also include comparison of a given conversation to other conversations generated at similar times or having similar KPIs.
- the system described herein may be implemented for various types of task-oriented conversations.
- the system is used for two specific types of conversations: employment interviews and sales conversations.
- the system may be used to screen candidates and to determine suitability for a particular role.
- the specific KPIs for the conversation may include determining vectors corresponding to a plurality of personality traits. These personality traits may be selected based on proprietary, research-based data regarding which traits are best-suited to particular positions.
- FIG. 4 illustrates a form of post-processing output that may be generated by the system when used specifically for employment interviews.
- the human subject is given scores for the character traits of awareness 201a, motivation 201b, cooperation 201c, effectiveness under time pressure 201d, argumentative discourse and critical thinking 201e, flexibility and versatility 201f, positive leadership 201g, solving problems and making decisions 201h, creative thinking originality and imagination 201i, and interpersonal communication 201j.
- Each of these traits is graded on a scale of 0 to 4.
- the traits are then mapped on a circular plot having a radius of 4, with a ten-sided polygon inscribed at ten points along the circumference of the circle, in which each vertex represents one of the measured personality traits.
- the post-processing module may then compare the polygon generated for each interviewee with an ideal polygon representing the preferred personality traits of the desired employee, in order to determine which candidate is the best fit for the job.
- FIG. 5 illustrates different steps for deploying the system for use in sales, according to embodiments of the present disclosure.
- a "lead classifier" conversation 301a identifies leads, and determines which leads are most relevant and most promising. This information may be sent to sales agents for additional work.
- a "lead heater” conversation 301b takes place with a client that has already expressed initial interest. This conversation supplies information to a customer about a product and may further include referring the customer to a sales agent.
- a "complete sale” conversation 301c may encompass the entire sales process from initial intake to completing the sale, and a “lead closing" conversation 301d may focus just on completing a sale.
- FIG. 5 further illustrates a pipeline for developing the sales conversation, which may be the same regardless of the specific task.
- a key performance indicator (KPI) 302 is identified.
- the KPI may be to sell a certain number of units to the buyer at a certain price.
- an algorithm is applied, based on the knowledge and data resources, as discussed, to generate an outline of the "acts" 304. The outline includes different milestones for the conversation, as discussed above. The conversation then proceeds by development of a script 305 for each act 304, as discussed above.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Social Psychology (AREA)
- Evolutionary Computation (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A method of conducting a task-oriented, engaged conversation between a computer and a human subject includes: defining a task and one or more key performance indicators for the task; determining an outline of a conversation with the subject in accordance with the task and one or more key performance indicators; performing a dialogue with the subject in a series of acts, wherein each act comprises: (i) determining a strategic goal for the act; (ii) formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, (iii) presenting the statement to the subject; (iv) receiving a response from the subject, (v) analyzing the response based on both the text and visual cues generated by the subject; and (vi) determining a strategic goal for a subsequent act in accordance with the response and a cumulative state of the conversation.
Description
Task-Oriented Engaged Conversations Between Computers and Human Subjects
Related
[0001] This Application claims the benefit of priority to U.S. Provisional Patent Application No. 63/419,354, filed October 26, 2022, entitled "Task-Oriented Engaged Conversations Between Computers and Human Subjects," the contents of which are hereby incorporated by reference as if fully set forth herein.
Field of the Invention
[0002] The present disclosure, in some embodiments, concerns conversations between computers and human subjects, and more specifically, but not exclusively, to systems and methods for planning, conducting, and analyzing a conversation between the computer and the human subject.
Background of the Invention
[0003] Companies currently conduct certain business conversations using digital humans. Digital humans are computer programs that communicate autonomously with people, through media such as text, audio, and video. Digital humans may be deployed as avatars or "chatbots." Digital humans may be utilized to communicate with people on a variety of matters, including but not limited to sales, employment interviews, customer support, and IT support.
[0004] In recent years, the percentage of such conversations that are entrusted to digital humans has steadily increased. One advantage of digital humans over human agents is that digital humans are always available and have unlimited time. Another advantage is that digital humans are capable of learning and inferring better than human agents. In addition, a digital human, once trained, may be less expensive to maintain than an employee.
[0005] However, various challenges limit the utility of digital humans. People perceive digital humans as lacking understanding of human emotion and lacking human problem-solving intelligence. One particular challenge for digital humans is adjusting to the flow of a conversation. Many digital humans are programmed to ask a closed set of questions. For example, the digital human may run through an entire checklist of questions, regardless of how the subject responded to the prior questions. In other examples, the subject has a limited number of acceptable responses (e.g., yes or no, or selection from a menu), and the digital human provides the next question based on the selected response.
However, if the digital human poses non-sequitur questions, or asks questions that do not naturally flow from the prior responses, the person engaging the digital human may quickly lose interest.
Summary of the Invention
[0006] The present disclosure provides a system and method for conducting a task- oriented, engaged conversation between a digital human and a human subject. The term "task-oriented" signifies that each sentence spoken by the digital human is selected with a particular objective in mind. The term "engaged conversation" signifies that the digital human adapts the flow of the conversation to the responses of the subject in real time.
[0007] The task-oriented engaged conversation may be adapted to any type of business objective, including, for example, development of sales leads, closing of sales, interviewing of candidates, performing IT consultations, responding to customer service requests, performing financial consultations, or providing personalized training.
[0008] In order to enable tailoring of the conversation to the circumstances of the conversation, the digital human conducts the conversation in a series of "acts." Each "act" consists of a new, unscripted statement determined in real time based on the state of the conversation to that point in view of the overall objective of the conversation. When determining the next statement to deliver, the digital human considers not only verbal responses by the subject but also nonverbal cues and any information about the subject learned during the conversation.
[0009] According to a first aspect, a method of conducting a task-oriented, engaged conversation between a computer and a human subject is disclosed. The method includes: defining a task and one or more key performance indicators for the task; determining an outline of a conversation with the subject in accordance with the task and one or more key performance indicators; performing a dialogue with the subject in a series of acts, wherein each act comprises: (i) determining a strategic goal for the act; (ii) formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, (iii) presenting the statement to the subject; (iv) receiving a response from the subject, (v) analyzing the response based on both the text and visual cues generated by the subject; and (vi) determining a strategic goal for a subsequent act in accordance with the response and a cumulative state of the conversation.
[0010] In another implementation according to the first aspect, the method further includes performing the step of defining a task and one or more key performance indicators autonomously with the computer based on predetermined constraints.
[0011] In another implementation according to the first aspect, the method further includes performing the step of defining a task and one or more key performance indicators by a human manager of the computer.
[0012] In another implementation according to the first aspect, for at least one act in the series of acts, the steps of presenting and receiving the response comprise animating a digital human with at least one of a set of communication modes comprising audio, video, and text.
[0013] Optionally, the method further includes generating an appearance and speech mannerisms of the digital human in accordance with an expected preference of the human subject.
[0014] Optionally, the method further includes selecting a personality type for the digital human in accordance with an expected preference of the human subject.
[0015] Optionally, the method further includes evaluating whether one or more key performance indicators may be better achieved with a person instead of the digital human, and, if the evaluating step results in a determination that the key performance indicators would be better achieved with the person, transitioning the conversation from the digital human to the person.
[0016] Optionally, the method further includes transitioning a delivery of the response during the conversation between different communication modes.
[0017] In another implementation according to the first aspect, the method further includes training the computer with a large-language model and logic regarding conducting of conversations derived from stored data.
[0018] Optionally, the training step further comprises performing a fine-tuning of the large language model based on the logic.
[0019] Optionally, the stored data comprises one or more of: (1) previously recorded conversations involving the computer; (2) previously recorded conversations from other media; and (3) industry research.
[0020] In another implementation according to the first aspect, the method further includes commencing each act within approximately 200 milliseconds of completion of a previous act.
[0021] In another implementation according to the first aspect, the step of analyzing the response comprises analyzing the text and visual cues as they are received in real time.
[0022] In another implementation according to the first aspect, the method further includes performing a post-conversation analysis, and using results of the post-conversation analysis to improve subsequent performance of the computer.
[0023] Optionally, the post-conversation analysis includes analysis regarding optimization of the outline, optimization of each statement that was presented, and achievement of the one or more key performance indicators.
[0024] In another implementation according to the first aspect, the task is interviewing the subject for an employment position.
[0025] Optionally, the one or more key performance indicators comprise determining vectors corresponding to a plurality of personality traits.
[0026] Optionally, the method further includes comparing the determined vectors to predefined vectors representing ideal personality traits for the employment position.
[0027] Optionally, the comparing step comprises plotting the vectors on a polygon and comparing a shape of the polygon of the determined vectors to a shape of the polygon of the predefined vectors.
[0028] In another implementation according to the first aspect, the task is selling a service or product to the subject.
[0029] Optionally, the one or more key performance indicators comprise one or more of: classifying a lead, developing a lead, closing a sale, conducting a follow-up conversation after a sale, or conducting a full sales conversation.
[0030] According to a second aspect, a computer program product for conducting a task-oriented, engaged conversation between a computer and a human subject includes: a cognitor module for: defining a task and one or more key performance indicators for the task; determining an outline of a conversation with the subject in accordance with the key performance indicator; and, during performance of a dialogue with the subject in a series of acts, analyzing responses by the subject based on both text and visual cues generated by the subject, and determining a strategic goal for each act in accordance with responses received from the subject and a cumulative state of the conversation; and a communicator module for: formulating a statement to present to the subject and a manner of presenting the
statement to the subject, in accordance with the goal, presenting the statement to the subject, and receiving a response from the subject.
[0031] In another implementation according to the second aspect, the communicator is configured to present the statement by animating a digital human with at least one of audio, video, and text.
[0032] Optionally, the communicator is configured to generate an appearance and speech mannerisms of the digital human in accordance with an expected preference of the human subject.
Brief Description of the Drawings
[0033] FIG. 1 illustrates steps in a method for training a computer system to perform a task-oriented engaged conversation, performing the conversation, and conducting post-conversation analysis, according to embodiments of the present disclosure;
[0034] FIG. 2 illustrates a more detailed view of the method steps of FIG. 1, according to embodiments of the present disclosure;
[0035] FIG. 3 illustrates a graphical user interface depicting a conversation between a digital human and a human subject, according to embodiments of the present disclosure;
[0036] FIG. 4 illustrates output generated by the system when used to analyze suitability of a candidate for an employment position, according to embodiments of the present disclosure; and
[0037] FIG. 5 illustrates the planning of different sales conversations, according to embodiments of the present disclosure.
Detailed Description of the Invention
[0038] The present disclosure, in some embodiments, concerns conversations between computers and human subjects, and more specifically, but not exclusively, to systems and methods for planning, conducting, and analyzing a conversation between the computer and the subject.
[0039] Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not necessarily limited in its application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the examples. The invention is capable of other embodiments or of being practiced or carried out in various ways.
[0040] The systems and methods described herein are implemented by a computer. The computer may include a processor and a memory. The memory may have a computer
program stored thereon containing software instructions that, when executed by the processor, cause the processor to perform various functions, as set forth herein. The processor and memory may be stored on a physical computer, on a cloud-based or virtualized computer, or on any combination thereof.
[0041] The computer may include various hardware components that enable performance of the conversational functions described herein. These hardware components may include an image sensor, a display, a microphone, and a speaker. In certain implementations, the computer engages in a conversation with a human subject through a multichannel combination of audio, video, and text outputs. These outputs may be communicated through a device (e.g., a computer or mobile telephone) of the human subject. The conversation proceeds, from the perspective of the human subject, similar to a video chat between two people. In such embodiments, the computer need not utilize the integrated hardware components for conducting of the conversation. Regardless, such hardware may be useful for involving a manager in the conversation. The manager may use the display, microphone, camera, and speaker in order to participate in the conversation between the digital human and the subject, as will be discussed further herein.
[0042] In the below detailed description, the computer is described as conducting a conversation in the realms of employment interviewing or sales. These uses are merely exemplary, and the computer is equally capable of conducting the conversation in any task- oriented application, including but not limited to education, consulting, IT troubleshooting, or customer service. The conversation may be "business to business" (B2B) or "business to client" (B2C).
[0043] FIG. 1 schematically depicts stages in the training and operation of a computer system for a task-oriented engaged conversation, according to embodiments of the present disclosure. FIG. 2 depicts a more detailed view of the same stages. The operation of the computer system is divided into three stages: a training stage 110, a real-time conversation stage 120, and a post-processing stage 130. The output of the computer system is a task-oriented, engaged conversation 140.
[0044] Referring first to training stage 110, the computer system is trained by combining two primary types of inputs 111: data 112 and knowledge 114.
[0045] Data 112 refers to examples of spoken text taken from a large language model (LLM) 112a. The large language model 112a may include an extremely large amount of textual parameters that exemplify the syntactical use of language. The data 112 may also
derive from previously recorded conversations 112b, whether conversations recorded by the computer system itself or from independently sourced recordings. For example, a database of recorded interviews, consisting of hundreds of thousands of interviews, may be uploaded into a neural network. Each interview may be annotated with one or more tags regarding, for example, the purpose of the interview, the objectives of the interview, how well the candidate performed during the interview, or how statements of the candidate were correlated with skills or personality traits. Optionally, the data obtained from prior conversations may be processed in order to enable more efficient analysis. This processing may include: transcription; correction of syntax, and translation.
[0046] Knowledge 114 refers to a logic regarding performance of the types of conversations described herein. The logic may relate to the flow of conversations, including best practices or expected patterns in the performance of these conversations. By way of analogy, a Large Language Model that is trained to generate prose may have difficulty generating a response when the text is a math problem presented in prose format, and the expected response is an answer to the math problem. The LLM needs to be specifically trained to recognize the math problem and generate a suitable response. Similarly, a typical Large-Language Model is not specifically trained for performance of conversations in general, and business conversations in particular, and requires training regarding the typical flow and structuring of such conversations. The knowledge 114 may also relate to recognition of certain conclusions from patterns in speech. For example, a user that repeats the same answer multiple times may be deemed evasive or non-responsive.
[0047] Knowledge 114 also relates to global industry information 114a regarding tasks, including key performance indicators (KPIs) 114b for different types of tasks or conversations. In the context of interviews, knowledge 114 may include desirable traits for certain positions. For example, an executive may require the ability to delegate and manage, whereas an executive assistant may require the ability to multitask. This knowledge may be derived from published literature and hundreds of thousands of data points regarding (for example) sales, personality traits, and leadership qualities. In the context of sales, the KPIs may include quantity or price points of sales.
[0048] The data 112 and knowledge 114 are combined into a conversation model 116. The conversation model 116 incorporates artificial intelligence (Al) and natural language processing (NLP) and is used to generate human-like sentences, specifically for dialogue.
[0049] Preferably, the conversation model 116 is a fine-tuned version of an autoregressive model such as Generative Pre-Trained Transformer 3 (GPT-3). Fine-tuning refers to a process of modifying or tweaking particular layers of a neural network in order to adapt a model that has already been trained for one given task to make it perform a second similar task. The fine-tuning process is also known as transfer learning. In the present context, the fine-tuning is performed in order to adjust a model initially designed for generation of continuous coherent text (which is the primary application of large language models such as GPT-3) and applying this learning toward the generation of intelligent dialogue, as discussed.
[0050] As shown in FIG. 2, other inputs 111 may be provided prior to the commencement of the conversation. Specifically, client information 113 may include information about an interview subject (e.g. age, gender, experience). Animation generator 115 is a computer program or module that is used to generate and animate a digital human.
[0051] As seen in FIG. 2, the training 110 may proceed in multiple sub-stages 1-11. The sub-stages address processing of the inputs in order to derive therefrom the planning and presentation of a task-oriented engaged conversation, as discussed above and below.
[0052] Referring now to the real-time conversation stage 120, the computer performs a conversation according to the following sequence of steps.
[0053] First, prior to commencing the conversation, the computer system defines a task and one or more key performance indicators (KPIs) for the task. The task may be, for example, conducting a sale, and the key performance indicators may include selling a certain number of units or services at a given price. The task may alternatively be conducting an interview, and the key performance indicators may be determining scores for the interviewee regarding certain personality traits.
[0054] Optionally, the computer system itself determines the nature of the task and key performance indicators. The system may determine the task and KPIs based on constraints that are preprogrammed into the system. For example, the system may be programmed to sell at a target price point, and may be authorized to give discounts up to 25%. The human subject may also state the reason for initiating contact, which, in turn, helps the computer system define the task and KPIs. Alternatively, a manager of the computer system may set a task and KPIs prior to each interview.
[0055] The system then determines an outline of the conversation with the subject in accordance with the task and key performance indicators. The outline dictates, in general terms, the expected structure of the conversation, including milestones which the
conversation is expected to pass. For example the conversation may include an "introduction" phase, in which the computer system and the human subject introduce themselves and make small talk; a "body" phase, which is directed to the task at hand, and a "wrap-up" phase, in which the computer summarizes the conversation and identifies followup actions. When the conversation includes multiple distinct KPIs (for example, testing of multiple personality traits), the system may further identify a particular order in which it will attempt to achieve the relevant KPIs.
[0056] Next, the system performs a dialogue with the human subject of the conversation. The dialogue proceeds in a series of acts. As used in the present disclosure, the term "act" refers to a round of dialogue between the computer and the subject.
[0057] Each act includes the following sub-steps: determining a strategic goal for the act, formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, presenting the statement to the subject and receiving a response from the subject; and determining a strategic goal for a subsequent act in accordance with the response and a cumulative state of the conversation.
[0058] For example, the strategic goal may be to evaluate the candidate's adaptability. The specific question that is formulated and delivered to the subject may be a question to the candidate regarding how he or she would respond to a particular scenario. Following receipt of the response, the computer system evaluates what the strategic goal for the next question should be.
[0059] This flow of the conversation may be executed by two separate software modules, which are referred to herein as "cognitor" 121 and "communicator "122." The cognitor 121 performs the strategic planning, both prior to the conversation (setting a task and KPIs, formulating an outline) and during the conversation (determining the objective of each subsequent act). The communicator 122 executes the strategic plan by formulating the specific wording of each statement, as well as by determining the appearance of the digital human and the tone of the digital human during the delivery of the statement. The communicator 122 also determines the physical gestures of the digital human and adapts the gestures to match the text that is being delivered (for example, delivering an ironic statement with a wink).
[0060] Control of the conversation proceeds cyclically between the cognitor 121 and communicator 122. Following identification of a strategic goal, communicator 122 delivers output 123, e.g., a statement, to the subject. Optionally, this output is delivered via
a digital human having the form of an avatar, referred to herein as an "Ivatar." The subject then responds to the statement, providing client input 124. The process then proceeds with intent recognition.
[0061] Intent recognition is performed through recognition of the text spoken by the human subject in combination with cues such as facial appearance, speaking tone, and speed of speech. The text spoken by the human subject is analyzed using standard natural language processing techniques that are known to those of skill in the art. All textual information is considered, even if not directly responsive to the previous statement by the computer. Such information may be of relevance for an overall understanding of the subject, which may in turn affect the strategy for how to proceed with the conversation. For example, the subject may indicate parenthetically that he has a daughter, or drives a pickup truck, or runs marathons. Each of these points may cause the cognitor 121 to reevaluate the conversation and redirect the conversation to a different angle than that which was previously planned.
[0062] With regard to nonverbal cues, the video and audio of the human subject are analyzed using machine learning techniques for detecting voice inflection and volume, and for detecting hand and facial motions. Optionally, the intent recognition module evaluates statements of subjects through analysis of the subject's "Regular Expressions" (Regex). As used in the present disclosure, the term "regular expressions" refers to repeating patterns of syntax of a speaker. One example of a Regular Expression would be to use a certain word W with a frequency of every X sentences or every time period T. Regular Expressions may be used to evaluate speaking patterns on the following planes: textual (e.g., using language such as "we" as opposed to "I"); audio (speed and tone of speech); visual (use of facial expressions that are appropriate for the conversation) and video (appropriate body language and hand gestures).
[0063] Following the intent recognition, the cognitor 121 evaluates the state 142 of the conversation as a whole, in view of the most recently completed act 141, and in view of the overall goals of the conversation. The "state" 142 may be analogized to a state of a chessboard following a given move, and the last act 141 may be analogized to the last move. The strategic analysis of the next goal to attain is influenced by both the overall state and the last act, or the most recent acts. For example, if a subject has evaded answering a question, the system may formulate a different strategy for proceeding compared to a subject that has consistently answered forthrightly.
[0064] This analysis of the next goal to achieve considers both the expected achievement of the task and the engagement of the subject (the flow of the dialogue). For example, if the cognitor 121 initially has a goal to complete a sale, but the human subject states during the conversation that he requires his partner's approval before completing the sale, the cognitor will no longer attempt to complete the sale, but instead invite the human subject to bring his partner, or to schedule a new time to continue the conversation.
[0065] The analysis of intent recognition and the subsequent determination of the next act occur in real time. As a result, the communicator delivers the next statement within a typical time of completion of the last sentence by the subject (e.g., up to around 200 milliseconds). The human subject does not perceive any lag in the pace of the conversation.
[0066] This loop of acts proceeds until the cognitor 121 determines that there is no need to continue the conversation. This may be because all the KPIs have been achieved or because, in view of the responses, it is not possible to advance the KPIs any further. The cognitor 121 then instructs the communicator 122 to deliver a statement ending the conversation.
[0067] As discussed above, prior to commencement of the conversation, the communicator 122 may determine a desired persona for the virtual human. The persona includes an appearance for the virtual human and a language and accent in which the digital human is to communicate. The communicator 122 may generate a visual appearance and mannerisms that are adapted to the style of each customer or interviewee. The communicator 122 may also select a personality type for the digital human in accordance with an expected preference of the human subject. In order to enable this assessment, prior to commencement of the conversation (or during the initial stages of the conversation), the system may engage the human subject in a series of questions designed to identify the demographic characteristics and personality type of the human subject. For example, the personality type may be classified based on the PAEI model developed by Adizes, the Big 5 personality taxonomy, or the OPQ (Occupation Personality Questionnaire) personality test. In addition, the system may generate a database of stock characters with certain appearances and personas with different personality types as defined above. The appearances and personas may be chosen based on research and data regarding which personality types are best suited to conduct conversations with particular goals, as well as which personality types are best-suited to conduct conversations with other personality types.
[0068] FIG. 3 illustrates an example of a screenshot 100 of a conversation between a digital human 101 and a human subject 102. The digital human 101 appears on one side of the split screen and the human subject 102 appears on the other side of the split screen. In the captured screen shot 100, the digital human 101 is introducing itself to the human subject. The digital human may communicate with the subject using a combination of audio, video, and text. The video may include physical gestures performed by the digital human, such as opening of mouth, opening of eyes, and movement of eyebrows, as illustrated. The video presentation may include a transcript of the statement being spoken by the digital human, as illustrated.
[0069] Optionally, the system is built in an "omnichannel" manner that enables seamless transition between different modes of communication. Thus, a conversation may start with a video dialogue between a human subject and a digital human but may continue as a text-based or email-based discussion between the human subject and the computer system.
[0070] Optionally, the computer system may also evaluate whether one or more key performance indicators may be better achieved with a manager instead of the digital human. This evaluation may proceed, for example, on a basis of the pace of progress of the conversation, or in response to a specific request by the human subject. If the computer system determines that the key performance indicators would be better achieved with the manager, the computer may transition the conversation from the digital human to the manager. In the example of a sale, the human subject may display dissatisfaction with the progress of the conversation with the digital human and request a human representative. In the context of interviews, the system may reach inconclusive or contradictory results regarding the suitability of a candidate for a position, and thus may choose to refer the candidate to a manager for further evaluation. A conversation that includes a transfer between the system and a human agent is referred to herein as a "hybrid" conversation.
[0071] At post-processing stage 130, the computer system engages in up to two types of analysis: task analysis 131 and conversation analysis 132. These analyses may be automated and/or may be performed under supervision of a manager. In task analysis 131, the computer evaluates whether it achieved its desired tasks and KPIs. The computer further evaluates whether it structured the conversation in the most effective manner in order to achieve the desired task and KPIs. For example, if a particular statement by the computer generated a negative reaction by the human subject, the computer may take note of this
and incorporate this negative outcome into the data sets. In this way, the logic model employed by the cognitor 121 continually improves. In conversation analysis 132, the computer evaluates the flow of the conversation itself. This evaluation may include statistical analysis of the length of the conversation, the types of questions asked, and the order of questions asked. This analysis may also include comparison of a given conversation to other conversations generated at similar times or having similar KPIs.
[0072] As discussed above, the system described herein may be implemented for various types of task-oriented conversations. In the examples described herein, the system is used for two specific types of conversations: employment interviews and sales conversations.
[0073] With respect to employment interviews, the system may be used to screen candidates and to determine suitability for a particular role. During employment interviews, the specific KPIs for the conversation may include determining vectors corresponding to a plurality of personality traits. These personality traits may be selected based on proprietary, research-based data regarding which traits are best-suited to particular positions.
[0074] FIG. 4 illustrates a form of post-processing output that may be generated by the system when used specifically for employment interviews. In the illustrated example, the human subject is given scores for the character traits of awareness 201a, motivation 201b, cooperation 201c, effectiveness under time pressure 201d, argumentative discourse and critical thinking 201e, flexibility and versatility 201f, positive leadership 201g, solving problems and making decisions 201h, creative thinking originality and imagination 201i, and interpersonal communication 201j. Each of these traits is graded on a scale of 0 to 4. The traits are then mapped on a circular plot having a radius of 4, with a ten-sided polygon inscribed at ten points along the circumference of the circle, in which each vertex represents one of the measured personality traits. The post-processing module may then compare the polygon generated for each interviewee with an ideal polygon representing the preferred personality traits of the desired employee, in order to determine which candidate is the best fit for the job.
[0075] FIG. 5 illustrates different steps for deploying the system for use in sales, according to embodiments of the present disclosure. Within sales, there can be different types of conversations. A "lead classifier" conversation 301a identifies leads, and determines which leads are most relevant and most promising. This information may be sent to sales agents for additional work. A "lead heater" conversation 301b takes place with a client that
has already expressed initial interest. This conversation supplies information to a customer about a product and may further include referring the customer to a sales agent. A "complete sale" conversation 301c may encompass the entire sales process from initial intake to completing the sale, and a "lead closing" conversation 301d may focus just on completing a sale. A specific example of a "lead closing" conversation may be a "full cart" conversation, in which the system responds to abandonment of a shopping cart in an e- commerce website and opens a conversation in order to encourage the customer to complete the sale. Another type of sales conversation may be a follow-up conversation after a sale. [0076] FIG. 5 further illustrates a pipeline for developing the sales conversation, which may be the same regardless of the specific task. For each type of sales conversation, a key performance indicator (KPI) 302 is identified. For example, the KPI may be to sell a certain number of units to the buyer at a certain price. At step 303, an algorithm is applied, based on the knowledge and data resources, as discussed, to generate an outline of the "acts" 304. The outline includes different milestones for the conversation, as discussed above. The conversation then proceeds by development of a script 305 for each act 304, as discussed above.
Claims
1. A method of conducting a task-oriented, engaged conversation between a computer and a human subject, comprising: defining a task and one or more key performance indicators for the task; determining an outline of a conversation with the subject in accordance with the task and one or more key performance indicators; performing a dialogue with the subject in a series of acts, wherein each act comprises: (i) determining a strategic goal for the act; (ii) formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, (iii) presenting the statement to the subject; (iv) receiving a response from the subject, (v) analyzing the response based on both the text and visual cues generated by the subject; and (vi) determining a strategic goal for a subsequent act in accordance with the response and a cumulative state of the conversation.
2. The method of claim 1, further comprising performing the step of defining a task and one or more key performance indicators autonomously with the computer based on predetermined constraints.
3. The method of claim 1, further comprising performing the step of defining a task and one or more key performance indicators by a human manager of the computer.
4. The method of claim 1, wherein, for at least one act in the series of acts, the steps of presenting and receiving the response comprise animating a digital human with at least one of a set of communication modes comprising audio, video, and text.
5. The method of claim 4, further comprising generating an appearance and speech mannerisms of the digital human in accordance with an expected preference of the human subject.
6. The method of claim 4, further comprising selecting a personality type for the digital human in accordance with an expected preference of the human subject.
7. The method of claim 4, further comprising evaluating whether one or more key performance indicators may be better achieved with a person instead of the digital human, and, if the evaluating step results in a determination that said key performance indicators would be better achieved with the person, transitioning the conversation from the digital human to the person.
8. The method of claim 4, further comprising transitioning a delivery of the response during the conversation between different communication modes.
9. The method of claim 1, further comprising training the computer with a large- language model and logic regarding conducting of conversations derived from stored data.
10. The method of claim 9, wherein the training step further comprises performing a fine-tuning of the large language model based on the logic.
11. The method of claim 9, wherein the stored data comprises one or more of: (1) previously recorded conversations involving the computer; (2) previously recorded conversations from other media; and (3) industry research.
12. The method of claim 1, further comprising commencing each act within approximately 200 milliseconds of completion of a previous act.
13. The method of claim 1, wherein the step of analyzing the response comprises analyzing the text and visual cues as they are received in real time.
14. The method of claim 1, further comprising performing a post-conversation analysis, and using results of the post-conversation analysis to improve subsequent performance of the computer.
15. The method of claim 14, wherein the post-conversation analysis includes analysis regarding optimization of the outline, optimization of each statement that was presented, and achievement of the one or more key performance indicators.
16. The method of claim 1, wherein the task is interviewing the subject for an employment position.
17. The method of claim 16, wherein the one or more key performance indicators comprise determining vectors corresponding to a plurality of personality traits.
18. The method of claim 17, further comprising comparing the determined vectors to predefined vectors representing ideal personality traits for the employment position.
19. The method of claim 18, wherein the comparing step comprises plotting the vectors on a polygon and comparing a shape of the polygon of the determined vectors to a shape of the polygon of the predefined vectors.
20. The method of claim 1, wherein the task is selling a service or product to the subject.
21. The method of claim 20, wherein the one or more key performance indicators comprise one or more of: classifying a lead, developing a lead, closing a sale, conducting a follow-up conversation after a sale, or conducting a full sales conversation.
22. A computer program product for conducting a task-oriented, engaged conversation between a computer and a human subject, comprising: a cognitor module for: defining a task and one or more key performance indicators for the task; determining an outline of a conversation with the subject in accordance with the key performance indicator; and, during performance of a dialogue with the subject in a series of acts, analyzing responses by the subject based on both text and visual cues generated by the subject, and determining a strategic goal for each act in accordance with responses received from the subject and a cumulative state of the conversation; and a communicator module for: formulating a statement to present to the subject and a manner of presenting the statement to the subject, in accordance with the goal, presenting the statement to the subject, and receiving a response from the subject.
23. The computer program product of claim 22, wherein the communicator is configured to present the statement by animating a digital human with at least one of audio, video, and text.
24. The computer program product of claim 23, wherein the communicator is configured to generate an appearance and speech mannerisms of the digital human in accordance with an expected preference of the human subject.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US202263419354P | 2022-10-26 | 2022-10-26 | |
US63/419,354 | 2022-10-26 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024089682A1 true WO2024089682A1 (en) | 2024-05-02 |
Family
ID=90830241
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/IL2023/051082 WO2024089682A1 (en) | 2022-10-26 | 2023-10-17 | Task-oriented engaged conversations between computers and human subjects |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2024089682A1 (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210097140A1 (en) * | 2019-09-30 | 2021-04-01 | Accenture Global Solutions Limited | System and method for generation of conversation graphs |
WO2021218029A1 (en) * | 2020-04-26 | 2021-11-04 | 平安科技(深圳)有限公司 | Artificial intelligence-based interview method and apparatus, computer device, and storage medium |
US20220020360A1 (en) * | 2017-12-29 | 2022-01-20 | DMAI, Inc. | System and method for dialogue management |
-
2023
- 2023-10-17 WO PCT/IL2023/051082 patent/WO2024089682A1/en unknown
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220020360A1 (en) * | 2017-12-29 | 2022-01-20 | DMAI, Inc. | System and method for dialogue management |
US20210097140A1 (en) * | 2019-09-30 | 2021-04-01 | Accenture Global Solutions Limited | System and method for generation of conversation graphs |
WO2021218029A1 (en) * | 2020-04-26 | 2021-11-04 | 平安科技(深圳)有限公司 | Artificial intelligence-based interview method and apparatus, computer device, and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Glas et al. | Erica: The erato intelligent conversational android | |
US20190143527A1 (en) | Multiple interactive personalities robot | |
US11625425B2 (en) | Dialogue management system with hierarchical classification and progression | |
US20050246165A1 (en) | System and method for analyzing and improving a discourse engaged in by a number of interacting agents | |
CN111201566A (en) | Spoken language communication device and computing architecture for processing data and outputting user feedback and related methods | |
CN116049360A (en) | Intelligent voice dialogue scene conversation intervention method and system based on client image | |
Sadoughi et al. | Joint learning of speech-driven facial motion with bidirectional long-short term memory | |
Qiu et al. | Estimating conversational styles in conversational microtask crowdsourcing | |
US20230260536A1 (en) | Interactive artificial intelligence analytical system | |
Stange et al. | Self-explaining social robots: an explainable behavior generation architecture for human-robot interaction | |
Gunawan et al. | Development of intelligent telegram chatbot using natural language processing | |
Schmitt et al. | The role of AI-based artifacts’ voice capabilities for agency attribution | |
Häuselmann | Disciplines of AI: An Overview of Approaches and Techniques | |
Hwang et al. | Demonstration of hospital receptionist robot with extended hybrid code network to select responses and gestures | |
Malchanau et al. | Towards integration of cognitive models in dialogue management: designing the virtual negotiation coach application | |
WO2024089682A1 (en) | Task-oriented engaged conversations between computers and human subjects | |
Baka et al. | Social robots and digital humans as job interviewers: A study of human reactions towards a more naturalistic interaction | |
CN116009692A (en) | Virtual character interaction strategy determination method and device | |
Gatto et al. | MET-iquette: enabling virtual agents to have a social compliant behavior in the Metaverse | |
Qiu et al. | Remote work aided by conversational agents | |
Niewiadomski et al. | Vitality forms analysis and automatic recognition | |
JP2021182390A (en) | Information processor, method for processing information, and program | |
JP2022531994A (en) | Generation and operation of artificial intelligence-based conversation systems | |
CN114270435A (en) | System and method for intelligent dialogue based on knowledge tracking | |
Tan et al. | Multimodal human-robot interaction with Chatterbot system: extending AIML towards supporting embodied interactions |