JPWO2021183681A5

JPWO2021183681A5 -

Info

Publication number: JPWO2021183681A5
Application number: JP2022555120A
Authority: JP
Publication date: 2024-03-11

Claims

A method for providing virtual assistance, the method comprising:
receiving user input including a user request for action or information;
generating the two or more primary interpretations of the user input by processing the user input to generate two or more primary interpretations of the user input, the two or more primary interpretations comprising: including a unique possible transcription of the user input, the method further comprising:
generating one or more secondary interpretations for one or more of the two or more primary interpretations by processing one or more of the primary interpretations to form an alternative interpretation;
determining one or more primary actions in response to the two or more primary interpretations and the one or more secondary interpretations;
preparing one or more results from performing the one or more primary actions;
determining whether one or more secondary behaviors are present in response to at least one of the one or more primary behaviors;
If said one or more secondary actions are present, two or more of said primary interpretations, said one or more secondary interpretations, said one or more continuing to process the primary behavior and the one or more secondary behaviors; specifying it as the final result, and
scoring the one or more final results;
designating the final result with the highest score as the top result;
outputting at least the top result to a user or taking an action defined by the top result.

Scoring is
a first scoring factor based on a conversation state, wherein the conversation state includes the two or more primary interpretations, the one or more secondary interpretations, the one or more actions, and the one or more outcomes; a first scoring factor comprising;
a second scoring factor based on a user profile, the user profile including user preferences and user history stored on one or more servers;
a third scoring factor based on auxiliary metadata, the auxiliary metadata comprising data stored on the one or more servers that is not related to user preferences and not related to user history; 2. The method of claim 1, wherein the method is based on one or more of the following scoring factors.

3. The method of claim 1 or 2 , wherein the user input is an utterance spoken by a user.

A method according to any one of claims 1 to 3 , wherein generating two or more primary interpretations for the user input is performed simultaneously and in parallel.

The method according to any one of claims 1 to 4, further comprising requesting the user to clarify which of the two or more primary interpretations or one or more secondary interpretations is correct. .

A method according to any one of claims 1 to 5 , wherein the method is performed by an artificial intelligence layer running on the operating system of the user device.

Outputting at least the top result to the user or taking an action defined by the top result may include playing a song, initiating a telephone call, providing information to the user; Any of claims 1 to 6 , comprising one or more of: playing a video, sending a text message, recording a video, sending information from a user device, and controlling lighting. or the method described in paragraph 1 .

A virtual assistant system,
a user interface configured to receive input from a user and provide a response to the user;
a processor configured to execute machine-executable code;
a memory storing non-transitory machine-executable code, the machine-executable code comprising:
The machine executable code is configured to process user input to generate two or more primary interpretations, the two or more primary interpretations including unique possible transcriptions of the user input, and the machine executable code further comprises: ,
generating one or more secondary interpretations based on one or more of the two or more primary interpretations by processing one or more of the primary interpretations to form an alternative interpretation;
processing the primary interpretation and alternative interpretations to produce a result that results in two or more end states;
scoring the two or more final states to rank the two or more final states such that the highest ranked final state is the top result;
A virtual assistant system configured to present the top results to the user or run the top results to the user.

9. The system of claim 8, wherein the user interface includes a microphone and a speaker.

further comprising a transceiver, the transceiver configured to execute a second virtual assistant machine executable code to assist the virtual assistant system in generating the top results for the user. 10. The system of claim 8 or 9 , configured to communicate with a device via a network.

The system according to any one of claims 8 to 10 , wherein the virtual assistant system is a smartphone.

A system according to any one of claims 8 to 11 , wherein a plurality of final states are presented to the user for consideration and selection by the user.

9. Executing the top result includes one of the following actions: displaying text, displaying an image, playing music, playing a video, performing a transaction, and turning a device on/off. The system according to any one of items 1 to 12 .

The machine executable code is
providing feedback to the user requesting additional information regarding one or more of a primary interpretation, an alternative interpretation, a result, and a final state;
further configured to, in response to receiving additional information from the user, process the additional information to generate additional alternative interpretations or rescore the two or more final states; The system according to any one of claims 8 to 13 .

A method for providing virtual assistance, the method comprising:
receiving user input including a request for action or information;
generating two or more interpretations of the user input by processing the user input, the two or more interpretations including unique possible transcriptions of the user input, the method further comprising:
matching at least one of the two or more interpretations to the one or more primary agents based on the one or more primary agents being configured to process at least one interpretation; And,
selecting one or more skills configured to process at least one of the two or more interpretations by the one or more primary agents;
generating one or more results by processing the at least one of the two or more interpretations using the one or more skills;
determining whether one or more secondary agents can match the one or more results for further processing of the results by one or more of the secondary agents;
if one or more secondary agents match, continuing to process the one or more results to generate additional results;
designating at least one of the one or more results and at least one of the additional results as two or more final results;
scoring the two or more final results;
designating the final result with the highest score as the top result;
A method comprising: outputting at least the top result to a user; or taking an action defined by the top result.

16. The method of claim 15, wherein the agent is a software module or routine executable to perform parallel hypothesis reasoning.

17. The method of claim 15 or 16 , wherein a skill is a software module or routine executable to perform a task or generate a result in response to a single user query.

18. A method according to any one of claims 15 to 17, further comprising generating one or more secondary interpretations for at least one of the primary interpretations.

A method according to any one of claims 15 to 18 , wherein receiving user input comprises receiving utterances from the user and converting the utterances into digital signals.