GB2574035A

GB2574035A - Interaction with an intelligent artificial agent

Info

Publication number: GB2574035A
Application number: GB1808495.4A
Authority: GB
Inventors: James Ben
Original assignee: To Play For Ltd
Current assignee: To Play For Ltd
Priority date: 2018-05-23
Filing date: 2018-05-23
Publication date: 2019-11-27
Also published as: GB201808495D0

Abstract

A natural language user interface (eg. a chatbot using Automatic Speech Recognition) receives initial event data for a user communication and generates a response (or a plurality of responses) which is stored for a delay period of no detected additional communication before being transmitted to the user, allowing editing or update of the action (and deletion of the response from storage) by the user or users. The delay period may depend upon the initial event, eg. a longer delay if an error is detected or a shorter delay if the user speaks quickly or urgently.

Description

Interaction with an intelligent artificial agent

Field of invention

The present invention relates to natural-language user interfaces, and in particular to a system and method for generating responses to user actions (referred to here as “events”).

Background of the invention

Natural-language user interfaces have become increasingly commonplace in everyday life. When interacting with natural-language user interfaces, a user typically inputs data in the form of typed text or speech. One example of a natural-language user interface is an automated telephony system which a user can interrogate by spoken language, or less commonly by typing. The telephony system can, for example, allow a user to obtain information from a database, such as information relating to the user. Another example of a natural-language user interface is a “chatbot” which conducts a conversation with a user. Chatbots may be designed to simulate how a human would behave as a conversational partner. Another example is virtual assistants, such as Apple’s Siri® or Amazon’s Alexa®.

Commonly natural-language user interfaces operate on a “question-and-answer” principle, in which user(s) ask questions and the natural-language user interface responds; thus the users and the user interfaces take turns to communicate. Typically, a natural language user interface responds as soon as the response is formulated.

Some natural-language user interfaces process inputs received from the user to generate a response utilizing additional context information which may be available (that is, data indicating the context in which the user asked the question, for example data characterizing the user). For example, if a user submits the question ‘what is the weather like today?’ to a chatbot, the chatbot may use location information indicating the user’s present location, and display to the user an up-to-date weather forecast for that location.

Nevertheless, despite advances in the Artificial Intelligence (Al) for generating responses, interactions with chatbots and virtual assistants are far from perfect. One significant risk is that the natural-language user interface will incorrectly register the user’s question. For example, a user may submit a typed sentence with a typographical error, and the naturallanguage user interface may process it as if it were correct, thereby producing an unhelpful or misleading response. Suppose that a user mistypes a word such as create as crate. If the user corrects this in a second line of chat, the interface will typically treat the second line of chat as a separate word out of context. In another example, a natural-language interface may incorrectly interpret phonemes in received speech, e.g. if a user has an unusual speech pattern (e.g. a regional accent from a region in respect of which the user interface was not trained). In either case, the chatbot will respond to the incorrectly registered question before the user can correct it.

Summary of the invention

It is an object of the present invention to provide a more accurate method for user interfaces to handle interactions with users.

In general terms the invention proposes that the natural-language user interface receives (e.g. from a device associated with the user) “initial event data” characterizing a user action or possibly an occurrence (e.g. in the real world) which does not involve the user. The natural-language user interface processes the event data and generates a response, which is stored rather than being immediately transmitted to the user. Only if no additional event data is received within a given delay period, will the response be sent to the user. Thus, the user is enabled to replace, supplement or correct the event data by transmitting further event data, which causes the response to be updated.

Specifically, in the case that an additional action is performed by the user before the expiry of the time period, the natural-language user interface will receive further event data indicative of the action and will delete the obsolete response accordingly. The naturallanguage user interface will then generate a new response, for example based on both the initial event data and the further event data. The process of waiting and listening then begins again, with the new response being sent if no further action is detected.

In the case that the natural-language user interface receives further data indicative of a user action whilst the user is receiving a transmission of a previous response, the server may be configured to not interrupt said transmission.

The delay period may be in the range 200ms to one second, or even longer. It may depend upon the communication method used to transmit the event data, e.g. be longer for event data representing user typing than for event data representing user speech. The delay period may be the minimum time from the receipt of the most recently received event data to the beginning of the transmission of the response to the user.

The natural-language user interface may not be restricted to generating just a single response in response to a single item of event data; multiple responses may be generated based on event data and stored. The receipt of further event data may cause the server to delete some or all of the responses and provide a new set of responses, e.g. based on both the initial event data and the further event data. One example of when this may be useful is if the natural-language user interface is simultaneously interacting with a plurality of users, and the initial event data causes the generation of responses in the form of multiple messages directed to corresponding multiple ones of the users. The natural-language user interface may identify a sub-set of the users to receive corresponding ones of the multiple responses.

The delay period for which the natural-language user interface waits following the generation and storage of the response may be predetermined (i.e. set before the initial event data is received by the natural-language user interface). For example, it may be a parameter set by an author of the natural-language user interface. Alternatively, it may be selected by the user, e.g. before commencing a conversation with the natural-language user interface. In another example, the delay period may be based (that is, at least in part) on prior interactions between the user and the natural-language user interface. For example, it may be depend upon an error rate in previous messages received by the server from the same user.

In a further example, one application of the invention is to implement a simulated environment containing one or more simulated characters with whom the user may interact (e.g. one character at a time). The simulated environment may be one which is used to present a narrative to the user, for example for the purpose of story-telling. Thus, interaction with the characters can cause the narrative to evolve along a different path (e.g. in the case of story-telling, the later part of the story is changed). Each character may be described by corresponding character data. The character data may describe a personality for a character, and the delay time may different for different ones of the characters. For example, the user may choose to interact with a character having a less assertive character or a more assertive character, who correspondingly may have longer or shorter delay periods. By providing different characters with different corresponding delay times, the author of the environment is enabled, for example, to differentiate the characters from each other (e.g. to make the environment more interesting), or to provide a choice of characters, some of whom may be more suitable than others for a given user.

Alternatively, the delay time may depend on the initial event data. For example, the naturallanguage user interface may include a module for estimating, based on the initial event data, the likelihood that further event data will be received within a certain time, and if this likelihood is high, the delay time may be increased, e.g. to be longer than the certain time. For example, the module may be implemented as a module for detecting whether there is an error in the event data, which may indicate that the user will perform a further communication to correct it. In another example, the natural-language user interface may contain a module for detecting a parameter in the event data indicative of urgency (e.g. that the user is speaking quickly), and accordingly reducing the delay period.

In another example, in the context of the characters mentioned above, the character data may comprise emotion data characterizing a present state of the character which can be influenced by the initial event data, and which may in turn influence the delay time. For example, the event data may “excite” the character, which may cause the delay period to be extended or reduced.

The action taken by the user, and which is registered by the natural-language user interface (e.g. since it necessitates a response from the natural-language user interface), may be referred to in artificial intelligence terminology as an “event”. Furthermore, the term “event” will also be used here to include an external event, which is an occurrence which does not involve the user, but for which the natural-language user interface receives information characterizing the occurrence. Again, in the case of an external event the natural-language user interface may generate a response to the user.

The user’s action may be a communication action. In this document, we use the term “communication action” to include any action which the user may take with the intention of communicating with the natural-language user interface. One example is the user transmitting a message to the natural-language user interface (e.g. in a natural language format, by speech or typing) but another may be clicking on one of a series of options presented on a screen.

However, the user’s action may take another form, such as an action performed by user within a game environment (e.g. shooting a target), e.g. without a specific intention to communicate with the natural-language user interface. Thus, more generally, the user event may be in the form of any one of: a message, a typing event, an audio event, and a video game event.

A message event is the transmission of a communication, which is a section of natural language, from the user device to the natural-language user interface; for example a typed message sent to an airline messaging system chatbot or an audio message sent using a phone microphone to a virtual assistant such as Siri®. Furthermore, sign-language or gestures detected by a video camera and recognised by gesture recognition technology may constitute a message event. The event data encodes the communication.

A typing event is that a user or character begins or ends typing; this event may result in a typing indication either appearing or disappearing in the client’s user interface to indicate that a user or character is or is not typing.

Similarly, an audio event is that a user or character begins or ends speaking; a response in this instance could result in, for example, muting a microphone while a user or character are talking. Thus typing and audio events can be seen as forms of initiation and conclusion events, wherein it is not the content of the communication that forms the event, but rather the fact that a communication action has been initiated or concluded.

A video game event occurs within a video game played by the user, and when the user performs an action within that video game which necessitates a response. For example, shooting a character in a video game may cause the server to generate a response for that character, and may even impact the delay period settings of the character or server.

The natural-language user interface may comprise a module for distinguishing between event data and other data it may receive which does not characterize user actions (e.g. background music playing in a user’s location which is accidentally transmitted to the natural language user interface). The other data may be rejected, or used as context data to condition generation of the response.

The natural-language user interface may be provided in the form of a chatbot.

In principle, the natural-language user interface may be provided as a software module of a user device operated by the user. However, alternatively it may be provided as a computer system remote from the user, and which a user device communicates with using a communications network. That is, the event data and the response may be exchanged over the communications network. Optionally, the computer system may be implemented by one or more servers, such as multiple servers in different respective locations. In principle, the natural-language user interface may include a first portion on a user device and a second portion on a distant computer via the communications network.

One specific expression of the invention is a method performed by a natural-language user interface for generating successive responses to successive user events which are user actions, the method comprising:

receiving initial event data characterizing an initial event;

upon receiving the initial event data, generating a response to the initial event;

upon generating the response, storing the response;

at one or more successive times following the generation of the response, determining whether further event data characterizing an additional event has been received; and if the one or more determinations are negative and a delay period has passed since the initial event, transmitting the response to a user.

Another expression of the invention is a computer system (made up of one or more computers) comprising:

a data interface operative to receive the event data characterizing events;

a memory operative to store generated responses to events;

a processor operative to generate responses to events, to assign a delay period to each generated response, to store each generated response in the memory, and, at one or more successive times following the generation of the response, to determine whether further event data characterizing an additional event has been received;

wherein the processor is further operative, if the one or more determinations are negative and a delay period has passed since the initial event, to transmit the response to a user.

Brief description of the figures

An embodiment of the invention will now be described for the sake of example only with reference to the following drawings in which:

Fig. 1 is a diagram showing multiple user devices in two way communication with a central server in an embodiment of the invention.

Fig. 2 is a diagram showing a possible configuration of the server of the embodiment of Fig. 1.

Fig. 3 is a flow diagram showing a method employed by the server of Fig. 1 to process and respond to user events.

Fig. 4 is a flow diagram showing a further method employed by the server of Fig. 1 to process and respond to user events.

Detailed description of the embodiments

Referring first to Figure 1, a server 1 is illustrated which is an embodiment of the invention, and which is arranged to perform methods which are embodiments of the invention described in more detail below with reference to Figs. 3 and 4.

The server 1 arranged to communicate with one or more user devices 2 over a communications network 10. There may be any number of user devices 2. In the case of multiple user devices 2 communicating with the same server 1, these user devices 2 may interact with the server 1 independently or collaboratively. For example, some airlines have recently begun utilising natural-language user interfaces in the form of chatbots to handle queries as well as bookings. In the case of an individual travelling by themselves, they may connect to the server 1 as an individual. Alternatively, multiple users travelling together and utilizing respective user devices 2 may each connect to the same chatbot message server, and interact with the server 1 collaboratively, e.g. to buy seats on the same aeroplane. In another example, the multiple user devices 2 may interact collaboratively with the server 1 so as to make their corresponding users to act as respective participants within a common virtual simulated environment, which may be a game environment or a learning environment, or even an environment for controlling machinery.

Figure 2 shows the principal features of the server 1 of Fig. 1. The server 1 comprises a memory 3, processor 4 and a data interface 5. The data interface 5 is operative to receive event data describing a user action and pass this event data to the processor 4. User events are actions performed by the user of one of the user devices 2, which generates corresponding event data and transmits it to the sever 1. The event data is recognised by the server 1, for example as keyboard presses or spoken messages. The processor 4 is operative to process the received event data and generate a response to the event. This response is then passed by the processor 4 to the memory 3, where the response is stored. As described below, the processor 4 is operative to recall the response from storage in the memory 3 and transmitting the responses to the user device 2 for presentation to the corresponding user (e.g. on a screen and/or using speakers of the user device 2). Note that although the memory 3 is illustrated in Fig.2 as being external to the processor 3, it may alternatively be internal to it (e.g. an internal buffer). The memory 3 may also be used to store program instructions operative, when performed by the processor 4, to cause the processor to implement a method according to the invention. The memory may be a tangible memory device for storing data in non-transitory form.

The method which the server 1 of Fig. 2 performs is shown in the flow diagram of Figure 3. In a first step, event data (“initial event data”) is received characterising a user event which is a user action (S1). The event data is processed to generate (S2) a response to the event. As in known natural language user interfaces, this process may employ a state engine.

This response is then stored (S3) in the memory 3. Although the response is ready to be transmitted to the user 2, it is kept stored in the memory 3 instead.

During this time, the server 1 remains alert for further event data characterising a further user action. The server 1 performs a determination as to whether new event data has been received (S4). If it is determined that no further event data has been received (No to step S4) then, a determination is made of whether a delay period has passed (S5). If it is determined that the delay period has passed (S5 returns Yes), the response that is stored in the memory 3 is transmitted (S6) to the user device 2 using the data interface 5. If it is determined that the delay period has not passed (S5 returns No), the method returns to step S4. Thus, during the delay period (While S5 returns No) a plurality of successive determinations will be performed at successive times.

If new event data (“further event data”) is received during the delay period (Yes to step S4), then the response is not sent out and instead the server 1 returns to step S2. A new response is now generated by the processor 4, based on the further event data, and optionally also the initial event data.

For example, the processor 4 may be operative during this second performance of step S2 to recognise the further event data as data which corrects the initial event data. For example, the processor 4 may identify that the further event data contains letters or phoneme data which correspond to a word in a database, and which satisfy a similarity condition with respect to letters/phonemes in the initial event data. In this case, the processor 4 may modify the initial event data by replacing the letters/phonemes in the initial event data with the similar letters/phonemes in the further event data, thereby generating corrected event data.

In this case, the processor 4 generates a replacement response based on the corrected event data (e.g. the replacement response is the response which would have been generated in the previous performance of step S2 if the corrected event data had been received at step S1).

The delay period may depend on several factors, e.g. on the initial event data. For example, if a given message has speech enabled, the length of the speech may be determined, and the delay period may be set as the length of the speech added to a default delay. The length of speech may be determined by evaluating the number of bytes in the audio file, and the bitrate of the delivery of the file.

Figure 4 shows a second embodiment of the invention which is another method performed by the server 1. The method is identical to that of Fig. 3, with the exception that following a positive determination at step S4, the response generated based on the initial event data is deleted (S7) from the memory 3.

The effect of the methods of Figs. 3 and 4 is to enable the natural-language user interface to more effectively respond to any corrections to the initial event data. To return to the airline messaging system example, if a user submits a request for flight information on a specific date, a conventional server 1 may process this event data and provide a near immediate response to the user. However, the user may have initially entered either an incorrect date, or one date of a range of dates (for example to try and compare prices across different days). Thus, in order to obtain information about correct days, the user may have to begin the process again. Thus, because it treats each user event as something to respond to in isolation (by responding to each event as quickly as possible) a conventional server may end up performing more information look ups and therefore taking longer to provide the user with all the information he or she requires. Alternatively the conventional server may provide the user with unwanted information. By contrast, the embodiments of the present method may give the user the time to correct or add to their initial message. In this way, the system will output responses to the user that are more relevant and accurate to the user. More rapid database access is therefore possible. The same effect holds true for any natural-language user interface, e.g. in the case that the natural language user interface is for controlling mechanical equipment, more accurate control is possible because orders issued by a user can be corrected.

As a further illustration of the uses of a message event, a communication from the user may be recognised by a natural-language user interface operative to control machinery (e.g. one or more spoken words may be detected by a speech recognition module of the naturallanguage user interface, or a gesture performed by the user may be detected by a gesture recognition module of the natural-language user interface). The communication may then generate a response causing the machinery to operate in different ways.

The natural-language user interface may also be able to react to external events. For example, in the case of the airline messaging system there may be automatically generated external events indicative of the fact that new flight information has been uploaded to the server’s database. The system may then respond to the user device 2 informing them that new flights are available or by generating an updated response based on a question the user has already asked. Note that this is related to another possible functionality of the system, namely that if there are a plurality of user devices all interacting with the server, a specific event may cause the server to identify a sub-set of the users to whom a response may be sent.

In some applications of the embodiment, the server 1 may (occasionally or constantly) receive data which is not indicative of an event having occurred. For example, it may receive data generated by the communication network indicating the current status of the network, but not indicating anything about the user. Or, it may receive data indicating background noise in the user’s location. The server 1 therefore has to determine whether or not the received data is relevant according to some criteria (i.e. constitutes an ‘event’, and is thus “event data”). For example, in the case of auditory data, a microphone of the user device 2 may be constantly switched on and sending data to the server 1. The server 1 determines whether or not the audio data transmitted by the user device constitutes an audio event or a message before generating the response in accordance with the above methods.

A further specific embodiment will now be described. In this embodiment it will be assumed that the event is a message event. The server 1 receives event data from multiple user devices 2 such as mobile phone(s), laptop(s), or other internet-connected device(s).

On receipt of the event data, the server 1 creates a job (“respond job”) assigned to that event in order to process it. The respond job is then put into a job queue.

The respond job is then picked up by a separate ‘worker’ environment that can be a separate worker server, or an area of the server 1 allocated to run the task of processing respond jobs.

The worker receives the respond job, and uses the data associated with the respond job, combined with data from the server’s database (for example flight information in the case of the airline system interface), and data stored from any messages previously received from the same user, to generate a reply to that message.

The worker server then puts this reply into storage in the memory 3, creates an “emit” job, and adds that to the job queue, with a delay period attached. The reply and the emit job will have a matching unique ID within the database. The delay period may, for example, have a default of 1.5 seconds, or another delay as set by the natural-language user interface.

Once the specified time has arrived (i.e. the delay period has passed), the emit job is run on the worker server. In the case that there are no new events, the reply that has been created is retrieved from the memory 3 and transmitted using the data interface 5 to the user device

2. Once the server has transmitted the response to the user, the server will clear from the memory 3 the list of user events and responses.

Throughout this process, if there has been further relevant event data sent from the same user device 2, then the previously generated and stored response is removed from the database, which causes the emit job to emit nothing. Instead a new respond job is generated and put into the queue. Alternatively, in a variation of the embodiment, the emit job itself may be removed from the job queue. An updated reply is then generated, and again put into the memory 3 with another emit job scheduled.

This process carries on repeatedly until the server 1 stops receiving relevant event data from the user device 2. When this happens, then the reply in the memory 3, which will be the most recent one generated, is transmitted to a client using the data interface 5.

In order to preserve atomicity, the server 1 has a mechanism in place to ensure that simultaneously receiving a user action and sending a response will not cause the new user action to be deleted from the list of stored user events when the server clears the list of user events upon responding.

Additionally, the server 1 may ensure that responses are not removed while they are being emitted, i.e. if a user performs an action whilst simultaneously receiving a response generated based on an earlier user action, then the server will not remove the emit job from the worker or the reply from the database until transmission to the user is complete.

The delay period may be variable. For example, it may vary based on the initial event data. In one example, the server 1 may operate a virtual environment comprising one or more characters, each defined by respective character data, such that each response is associated with one of the characters. The delay period may depend upon which character is associated with the response. In particular, it may be depend on character data comprising “permanent” (unchanging) data characterizing the character and “emotion” (varying) data characterizing a present state of the character, where the latter may depend on the initial event data.

The characters may be characters in a narrative, such as a story which the server 1 “tells” to the user (that is, the server 1 provides the user with information about the evolution of a situation (e.g. an environment) at successive times). The evolution may be pre-planned, and the user’s interaction with the characters can change the evolution of the story, e.g. cause the situation to evolve along a story path which it would not otherwise have entered. The possible story paths may be written by the author of the story.

Note that the application of the embodiment to story-telling is not limited to situations in which characters are defined. Rather the server 1 may tell a story to the user by informing him or her of successive evolution of the situation, and the user can influence that evolution (e.g. without interacting with a simulated character in the situation).

Although specific embodiments of the invention have been described, the invention is not limited in these respects. For example, although in the explanation of the embodiments above the server 1 is described as remote from the user devices 2, in a variation the methods of Figs. 3 and 4 may be performed by software running on a user device 2.

Claims

1. A method performed by a natural-language user interface for generating successive responses to successive events, the method comprising:

receiving initial event data characterizing an initial event;

upon generating the response, storing the response;

at one or more times following the generation of the response, determining whether further event data characterizing an additional event has been received; and if the one or more determinations are negative and a delay period has passed since the initial event, , transmitting the response to a user.

2. The method of claim 1, wherein the method further comprises the step of:

if a said determination is positive, deleting from storage the response to the initial event and generating an updated response based on the further event data.

3. The method of claim 2, further comprising generating the updated response based on both the initial event data and the further event data.

4. The method of any one of the preceding claims, wherein, if further event data is received while the response is being transmitted to the user, the natural-language user interface does not interrupt the transmission of the response to the user.

5. The method of any one of the preceding claims, wherein a plurality of responses are generated and stored using the initial event data, and the method further comprises the steps of:

if a said determination is positive, deleting from storage at least one of the plurality of responses and generating an updated set of responses based on the received event data.

6. The method of claim 5, wherein there are a plurality of users and the responses are generated for transmission to respective ones of the plurality of users.

7. The method of claim 6 wherein, for each response, there is a further step of determining which users each response will be transmitted to.

8. A method of any one of the preceding claims, wherein the natural-language user interface comprises a control system for database access, the response is generated by accessing a database based on the initial event data to extract data, and generating the response based on the extracted data.

9. The method of any one of the preceding claims, wherein the delay period is a predetermined length of time, set before the receipt of the event data.

10. The method of any one of the preceding claims, wherein the delay period is determined by natural-language user interface settings which are updated based on the event data.

11. The method of any one of the preceding claims, in which the natural-language user interface defines one or more characters characterized by corresponding character data, and in which the response is generated as a response from one of the characters, the delay period being based upon the corresponding character data.

12. A method of claim 11 in which the character is a character in a narrative.

13. The method of any one of the preceding claims, wherein the natural-language user interface is in the form of a chatbot.

14. The method of any preceding claim in which at least one of the initial event and the additional event is a user action.

15. The method of claim 14, wherein the user event is a message, wherein the message is a portion of natural language sent by the user to the natural-language user interface.

16. The method of claim 14 wherein the event data is indicative of an initiation or conclusion of a communication by the user to the natural-language user interface.

17. The method of claim 14 wherein the user action is a video game event, indicative of a user action within a video game necessitating a response from the natural-language user interface.

18. The method of any one of claims 1 to 13 wherein the event is an external event, indicative of an occurrence which does not involve the user.

19. A computer comprising:

a data interface for receiving event data describing user events which are user actions;

a memory operative to store generated responses; and a processor;

the memory storing program instructions operative when performed by the processor to cause the processor to perform a method according to any one of claims 1 to 18.

20. A computer system comprising:

a data interface operative to receive the event data characterizing events;

a memory operative to store generated responses to events;

21. A computer system according to claim 19 or claim 20 in which the data interface is a communications interface for interacting with a communications network, whereby the event data is transmitted to the computer via the communication network from at least one user device, and the response is transmitted to the at least one user device via the communications network.