US11798545B2 - Speech interaction method and apparatus, device and storage medium - Google Patents

Speech interaction method and apparatus, device and storage medium Download PDF

Info

Publication number
US11798545B2
US11798545B2 US16/932,148 US202016932148A US11798545B2 US 11798545 B2 US11798545 B2 US 11798545B2 US 202016932148 A US202016932148 A US 202016932148A US 11798545 B2 US11798545 B2 US 11798545B2
Authority
US
United States
Prior art keywords
task
information
user terminal
speech
next task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US16/932,148
Other versions
US20210210088A1 (en
Inventor
Luyu GAO
Tianwei Sun
Baiming Ma
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Pinecone Electronic Co Ltd
Original Assignee
Beijing Xiaomi Pinecone Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Pinecone Electronic Co Ltd filed Critical Beijing Xiaomi Pinecone Electronic Co Ltd
Assigned to Beijing Xiaomi Pinecone Electronics Co., Ltd. reassignment Beijing Xiaomi Pinecone Electronics Co., Ltd. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GAO, LUYU, MA, Baiming, SUN, TIANWEI
Publication of US20210210088A1 publication Critical patent/US20210210088A1/en
Application granted granted Critical
Publication of US11798545B2 publication Critical patent/US11798545B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • G06F9/453Help systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • G06F9/4887Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues involving deadlines, e.g. rate based, periodic
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present application generally relates to the technical field of speech interaction (voice interaction), and more particularly, to a speech interaction method and apparatus, a device and a storage medium.
  • Speech interaction refers to interaction with a machine using a speech as an information carrier.
  • a series of inputs and outputs are generated by interaction, communication and information exchange of a person and a computer to complete a task or achieve a purpose.
  • speech interaction is faster and simpler.
  • the machine may look for a result matching the speech in a corpus and then feeds back the result to the user. If the speech corresponds to a control task, the machine may execute the corresponding control task.
  • the intelligent speaker may be connected to a network and, after acquiring an input speech of a user, execute a task corresponding to the input speech.
  • speech interaction is usually in a question-answer form. For example, a user asks: Xiaoai (wakeup word), how is the weather today? The intelligent speaker answers: it is clear to cloudy, with north wind of grade 3, the temperature is 19 to 26 degrees centigrade, and the air quality is good.
  • a speech assistant may only give a single reply and may not meet a requirement of a complex scenario requiring multiple replies.
  • a speech interaction method includes: acquiring speech information of a user; determining a task list corresponding to the speech information, the task list including at least two ordered tasks; and for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, querying and sending response information of the next task to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
  • a speech interaction method includes: sending acquired speech information to a speech interaction system; and receiving response information of a present task sent by the speech interaction system before execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives.
  • a speech interaction apparatus includes: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to: acquire speech information of a user; determine a task list corresponding to the speech information, the task list including at least two ordered tasks; and for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, query and send response information of the next task to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
  • a non-transitory computer-readable storage medium having stored therein instructions that, when executed by a processor of a device, cause the device to perform the speech interaction method in the first or second aspect.
  • FIG. 1 is an application scenario diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • FIG. 2 is a flow chart of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • FIG. 3 is a flow chart of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • FIG. 4 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • FIG. 5 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • FIG. 6 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
  • FIG. 7 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
  • intelligent speech assistants have been applied to daily life extensively, and involved in a broad scope of applications, from smart phones and intelligent home appliances to vehicle scenarios and smart home care. According to different application scenarios and various complex requirements, interaction manners for users and speech assistants are also enriched.
  • a series of inputs and outputs are generated by interaction, communication and information exchange of a person and a computer to complete a task or achieve a purpose.
  • Speech interaction refers to interaction with a machine using a speech as an information carrier. Compared with a conventional man-machine interaction manner, speech interaction is faster and simpler. For example, when a song is played, a few minutes may be needed to input, query and play for a conventional graphic user interface (GUI), while a shorter time is needed for speech interaction.
  • GUI graphic user interface
  • the user's hands are freed, complex operations over an application (APP) are avoided, and a speech task may be assigned to a terminal at the same time of driving.
  • the speech task may be a question-answer task, namely a user asks and the terminal answers.
  • the speech task may also be a control task, namely the user controls another device, particularly a smart home device and the like, through a speech.
  • the multi-round dialog may be:
  • the multi-round dialog manner is an interaction manner provided for the condition that an intention of the user is ambiguous due to loose questioning.
  • the intelligent speech assistant is required to actively continue the dialog to further acquire a complete requirement of the user and then give a single reply to this requirement.
  • Each round of the dialog is also in a question-answer form.
  • the speech assistant may only give a single reply every time and may not meet a requirement of a complex scenario requiring multiple replies.
  • a mapping relationship between speech information and a task list may be pre-configured, and the task list may include multiple tasks. After speech information of a user is obtained, a task list corresponding to the speech information may be determined, so that one speech may correspond to multiple tasks, the user is not required to input multiple speeches, speech interaction responses are enriched, man-machine interaction efficiency is improved, and the problem of speech recognition reduction caused by multiple speech interactions is solved.
  • response information of the next task is queried and sent to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives, to ensure high timeliness and accuracy of the response information of the task.
  • Embodiments of the present disclosure may be applied to an application scenario including a user terminal and a speech interaction system.
  • the user terminal and a server of the speech interaction system may be configured in the same electronic device and may also be configured in different electronic devices.
  • audio input, audio processing, task execution and the like may be completed by the same device.
  • processing pressure of the user terminal may be alleviated.
  • the user terminal may be a terminal with a speech acquisition function and, for example, may be a smart phone, a tablet computer, a personal digital assistant (PDA), a wearable device, an intelligent speaker, and the like.
  • the speech interaction system may be a server with a speech processing capability.
  • the user terminal may be an intelligent speaker and the speech interaction system may be the server.
  • FIG. 1 is an application scenario diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • a user may speak to a user terminal 101 , the user terminal 101 acquires speech information and sends the acquired speech information to a speech interaction system 102 , and the speech interaction system 102 may perform speech processing.
  • the speech interaction system 102 may include an automatic speech recognition (ASR) module, a natural language processing (NLP) module, a task scheduling module, and a text to speech (TTS) module.
  • ASR automatic speech recognition
  • NLP natural language processing
  • TTS text to speech
  • the ASR module converts a speech into a text.
  • the NLP module interprets the text and gives a feedback.
  • the task scheduling module performs task scheduling.
  • the TTS module converts output information into a speech. It is to be understood that the speech interaction system 102 may be other architectures in the related art.
  • FIG. 2 is a flow chart showing a speech interaction method, according to an exemplary embodiment of the present disclosure. The method may include the following steps.
  • step 202 speech information of a user is acquired.
  • a task list corresponding to the speech information is determined, and the task list includes at least two ordered tasks.
  • step 206 for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, response information of the next task is queried and sent to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
  • the speech interaction method provided in the embodiment may be implemented by software, or hardware, or a combination of software and hardware, and the involved hardware may be formed by two or more physical entities and may also be formed by one physical entity.
  • the method in the embodiment may be applied to a speech interaction system, and the speech interaction system may be configured in an electronic device with a speech processing capability or formed by an electronic device with the speech processing capability.
  • the electronic device may be a terminal device, may also be a server device and may be configured as required.
  • the speech interaction system executes the speech interaction method in the embodiments below.
  • the speech information of the user may be speech information acquired by the user terminal, and is sent to the speech interaction system by the user terminal.
  • a mapping relationship between speech information and a task list may be pre-configured.
  • the task list corresponding to the speech information may be configured by the user. For example, a configuration interface is provided, and a response is made to a configuration instruction input by the user to obtain the task list corresponding to the speech information.
  • the task list corresponding to the speech information may also be recommended by the system and may be configured as required.
  • the task list corresponding to the speech information is configured to achieve correspondence of one speech and multiple tasks.
  • the task list includes the at least two ordered tasks.
  • the task list may include various types of tasks, and there may be an execution sequence requirement for each task.
  • the tasks in the task list include, but not limited to, a question-answer task, a control task, and the like.
  • the question-answer task may be a task that requires query and response of the speech interaction system.
  • the control task may be a task that requires the speech interaction system to control a device, for example, controlling a smart home device, such as controlling a smart lamp to be turned on/off and controlling a smart rice cooker to be turned on/off, etc.
  • Recognition and semantic comprehension of an input speech may be considered as recognition of a scenario. For example, if the speech information is “good morning,” it may be considered as a getting-up scenario, and the tasks in the corresponding task list may include: play soothing music (for 20 minutes) and simultaneously open a bedroom curtain; then play morning news (for 20 minutes); next play the weather forecast; and finally play a traffic condition of a road to work.
  • a task in the task list may have a real-time result query requirement, for example, a traffic condition of the road to work. If the speech “good morning” is received, response information of each question-answer task is immediately queried, for example, the traffic condition of the road to work is queried, and is collectively transmitted to the user terminal for caching.
  • the cached traffic condition may be a traffic condition at least 40 minutes ago, making the traffic condition inaccurate.
  • the user terminal is enabled to make various complex replies and simultaneously ensure high timeliness and accuracy of contents of the replies.
  • the response information of the next task is queried and sent to the user terminal before the execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
  • the response information of the next task may be queried when the next task is about to be executed, so that timeliness and accuracy of the response information obtained by the user terminal are ensured.
  • execution time of each task is marked in the task list, and in such case, the present time state may be determined to be before the execution time of the next task arrives, for example, query is performed at a preset time before the execution time arrives.
  • a task request of executing the next task is received from the user terminal, it is determined that the present time state is before the execution time of the next task arrives.
  • the method further includes that: a task request containing task information of the next task sent by the user terminal is received. A time when the user terminal sends the task request may be before the execution time of the next task arrives.
  • the user terminal may determine completion time of the present task, so that, if the next task is executed immediately after the present task, the user terminal may send the task request of the next task to the speech interaction system when the present task is completed or at a preset time before completion, to enable the speech interaction system to, when receiving the task request, judge that the present time state is before the execution time of the next task arrives and further start querying the response information corresponding to the next task. For another example, if the user terminal knows the execution time of the next task, the user terminal may send the task request, etc. to the speech interaction system at a preset time before the execution time of the next task arrives.
  • the execution time of the next task may be determined by the task list transmitted by the speech interaction system, the task list recording execution time of the tasks; and it may also be determined by the task information, transmitted by the speech interaction system, of the next task, the task information including the execution time of the next task.
  • the user terminal may store the task list to determine the next task.
  • the task list may be historically stored, and may also be fed back by the speech interaction system after the speech information is sent to the speech interaction system.
  • the speech interaction system when sending the present task to the user terminal, may also send the task information of the next task.
  • the user terminal determines the next task according to a task identifier in the task information.
  • the task request may further contain the task information of the next task.
  • the task information may at least be configured to uniquely identify the next task.
  • the task information of the next task at least includes identification information of the next task, so that the next task may be determined by the task information.
  • the task information in the task request may be determined by the user terminal according to the task list, and may also be the task information, contained when the speech interaction system sends response information of the present task, of the next task.
  • increase of requests may increase a request processing burden of the speech interaction system and test performance (for example, high concurrency, running time complexity, and the like) of the speech interaction system.
  • the performance of the speech interaction system may be improved to solve the problem.
  • query time may be reduced due to the provided task information.
  • the task information of the next task may further include, but not limited to, one or more of index information of a question in the next task, a type of the question in the next task and the execution time of the next task.
  • the index information is information configured to index an answer to the question in the next task.
  • the answer corresponding to the question in the next task may be rapidly queried through the index information, so that the query time is reduced.
  • Different questions are classified, and answers corresponding to different types of questions are stored in different databases, so that, according to the type of a question, data query may be directly performed in the database corresponding to this type, and query efficiency may further be improved.
  • the task information may further include another field capable of improving the query efficiency.
  • the present time state may also be determined in other means.
  • the preset time may be pre-configured time with less influence on accuracy of the response information, and from the preset time to the execution time of the next task, query, feedback and the like of the response information may be completed. Even in some scenarios, because of a relatively high query speed, the preset time may be 0, namely the response information of the next task is queried and sent to the user terminal when the execution time of the next task arrives.
  • the user terminal may send the task request containing the task information of the next task to the speech interaction system.
  • the next task includes any task in a question-answer task with a real-time performance requirement or a question-answer task without the real-time performance requirement.
  • query is performed before the execution time of the task arrives, and the response information is fed back to the user terminal.
  • the task information of the next task may also be sent to the user terminal when the response information of the present task is sent to the user terminal, and correspondingly, when the response information of the next task is sent to the user terminal, task information of an unprocessed task (a third-round task) adjacent to the next task is sent to the user terminal, to implement simultaneous transmission of task information of an adjacent task every time when response information of a task is transmitted. Therefore, the user terminal, when the response information of the next task is completely executed, may know the task required to be executed next and, even before execution time of the unprocessed task adjacent to the next task arrives, send a task request containing the task information to the speech interaction system to request for execution of the third-round task.
  • the response information of the present task and the task information of the next task are transmitted every time, the response information of the present task is played, and before the execution time of the next task arrives, a request for the response information of the next task may be made by using the task information, so that the timeliness of the response information may be ensured.
  • real-time query may bring relatively high processing pressure to the speech interaction system, so that the number of query times of the speech interaction system may be reduced in a manner of reducing the number of task requests.
  • question-answer tasks are divided into question-answer tasks with the real-time performance requirement and question-answer tasks without the real-time performance requirement.
  • the question-answer task with the real-time performance requirement refers to a question-answer task with a requirement on the timeliness of response information, for example, a task of playing a real-time traffic condition, and the like.
  • the question-answer task without the real-time performance requirement may be a question-answer task with no or low requirement on the timeliness of response information, for example, a task of playing soothing music for 20 minutes, and the like.
  • a type of the question-answer task may be configured by the user, and may also be obtained by big data analysis of the system.
  • the next task is a question-answer task with the real-time performance requirement.
  • the response information of the next task may be queried and sent to the user terminal before the execution time of the next task arrives, so that the processing pressure of the speech interaction system is reduced.
  • the task information is required to be sent, in an embodiment, when the response information of the next task is sent to the user terminal, the following operation is further included: the task information of the unprocessed task adjacent to the next task is sent to the user terminal.
  • the user terminal may judge whether the next task is a question-answer task with the real-time performance requirement or a question-answer task without the real-time performance requirement according to the task information and, when determining according to the task information that the next task is a question-answer task with the real-time performance requirement, may send the task request containing the task information to the speech interaction system before the execution time of the next task arrives.
  • the user terminal may judge whether to send the task request.
  • the following operation is further included: responsive to that the unprocessed task adjacent to the next task is a question-answer task with the real-time performance requirement, the task information of the unprocessed task adjacent to the next task is sent to the user terminal, and the user terminal stores the task list.
  • only task information of a task with the real-time performance requirement is sent, so that resource waste caused by information transmission when task information of a task without the real-time performance requirement is also sent is avoided, and judgment processes of the user terminal are reduced.
  • the user terminal under the condition that the task information of the unprocessed task adjacent to the next task is not received, may directly determine the next task according to the task list and locally acquire and output the response information of the unprocessed task adjacent to the next task.
  • the response information of such a question-answer task may be fed back to the user terminal at a specified time.
  • one or more question-answer tasks without the real-time performance requirement in the task list may be determined, and response information of all of the one or more question-answer tasks without the real-time performance requirement is transmitted to the user terminal, such that the user terminal locally acquires and outputs the response information according to a sequence of the one or more question-answer tasks in the task list.
  • the response information of the question-answer task without the real-time performance requirement may be collectively transmitted at one time, so that the number of the task requests may be reduced, and pressure of a server may be alleviated.
  • the response information corresponding to the next task may be queried, and the obtained response information is sent to the user terminal.
  • the response information may be audio information, text information, picture information, and the like.
  • outputting the response information may be playing the audio information and displaying the text information and the picture information, etc.
  • interaction is usually performed through an audio in the speech interaction process, so that the response information may be the audio information, and outputting the response information may be playing the audio information.
  • the tasks in the task list may also be control tasks configured to control devices.
  • the method further includes that: if the next task is a control task of controlling a smart home device, a control instruction corresponding to the control task is sent to an Internet of things system to enable the Internet of things system to control the corresponding smart home device.
  • question-answer be implemented, but also the smart home device may be controlled, so that application scenarios of speech interaction are extended.
  • a new user speech may be received when a task in the task list is not completely executed.
  • execution of the task that is not completely executed in the task list may be delayed; and in another embodiment, the user terminal may directly be stopped from executing the task that is not completed in the task list, and the task that is not completely executed is cleared.
  • the user terminal may directly be stopped from executing the task that is not completed in the task list, and the task that is not completely executed is cleared. For example, when a task in the task list is not completely executed, if a new user speech is received, execution of the task that is not completed by the user terminal in the task list is interrupted.
  • prompting information may also be output, and whether to continue executing the task that is not completely executed in the task list is determined according to a user instruction, so task controllability is achieved.
  • FIG. 3 is a flow chart showing another speech interaction method, according to an exemplary embodiment of the present disclosure. The method may include the following steps.
  • step 302 acquired speech information is sent to a speech interaction system.
  • step 304 response information of a present task sent by the speech interaction system is received before execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives.
  • the method in the embodiment may be applied to a user terminal.
  • the user terminal may be a device with a speech acquisition function such as a smart phone, a tablet computer, a PDA, a wearable device, an intelligent speaker, and the like.
  • the response information of the present task sent by the speech interaction system may be received before the execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives.
  • the present task may be a question-answer task with a real-time performance requirement, may also be a question-answer task without the real-time performance requirement and may also be a control task, etc.
  • the response information of the present task may be queried and obtained by the speech interaction system at a short time before the execution time of the present task arrives (for example, a preset time before the execution time of the present task arrives), so that high timeliness and accuracy of the response task of the task are ensured.
  • the preset time may be a relatively short time configured as required.
  • a task request containing task information of the next task is sent to implement timely query for any type of question-answer task.
  • a task request is sent only for a question-answer task with the real-time performance requirement or a control task (in some examples, the task request is also not required to be sent even for the control task).
  • the user terminal may pre-store response information of question-answer tasks without the real-time performance requirement to determine that the next task is a question-answer task without the real-time performance requirement and, when execution time of the next task arrives, locally acquire and output the response information of the next task.
  • the response information of the present task is sent, regardless of the type of the next task, task information of the next task is transmitted such that task information of each task, except for a first task, in the task list is sent; or, when the response information of the present task is sent, if the next task is a question-answer task with the real-time performance requirement or a control task, the task information of the next task is transmitted.
  • Whether to send the response information of the present task may be determined by, for example, whether a task request is received.
  • the speech interaction system may send the task information of each task, except for the first task, in the task list.
  • the response information of the present task further contains task information of a next task, and the method further includes that: a task request containing the task information of the next task is sent to the speech interaction system before execution time of the next task arrives, such that the speech interaction system feeds back corresponding response information before the execution time of the next task arrives. Therefore, response information of each question-answer task may be queried and obtained in real time.
  • response information of all question-answer tasks without the real-time performance requirement is obtained in advance.
  • the method further includes that: when the present task is the first task in the task list corresponding to the speech information, the response information of all of the question-answer tasks without the real-time performance requirement in the task list is further received from the speech interaction system.
  • whether the next task is a question-answer task with the real-time performance requirement or a control task may be determined according to the task information, transmitted by the speech interaction system, of the next task; if it is determined according to the task information that the next task is a question-answer task with the real-time performance requirement or a control task of controlling a smart home device, the task request containing the task information is sent to the speech interaction system before the execution time of the next task arrives; and if it is determined according to the task information that the next task is a question-answer task without the real-time performance requirement, the response information of the next task is locally acquired and output when the execution time of the next task arrives.
  • the user terminal judges the type of the next task according to the task information to further determine whether to send the task request, so that the number of times for sending task requests may be reduced, and furthermore, processing pressure of the speech interaction system is alleviated.
  • the speech interaction system may send task information of a task with the real-time performance requirement or task information of a control task in the task list. If the user terminal receives the task information of the next task, it is indicated that the next task is a question-answer task with the real-time performance requirement or a control task, and the task request containing the task information is sent to the speech interaction system at the preset time before the execution time of the next task arrives.
  • the user terminal pre-stores the task list corresponding to the speech information, and the method further includes that: the task information of the next task in the task list is determined; and the task request containing the task information of the next task is sent to the speech interaction system before the execution time of the next task arrives such that the speech interaction system feeds back the corresponding response information before the execution time of the next task arrives.
  • the next task is a task of which the response information has been stored. Therefore, if only the response information of the present task sent by the speech interaction system is received, the response information is output, the next task is determined according to the task list, and the response information of the next task is locally acquired and output when the execution time of the next task arrives.
  • a request is sent to the speech interaction system only when a task is a question-answer task with the real-time performance requirement or a control task, so that the number of sending times is reduced.
  • the user terminal may store the task list and may also not store the task list. For example, if the speech interaction system transmits task information of each task, the user terminal is not required to store the task list.
  • the speech interaction system may perform query for each task, and may also query and send the response information of the next task to the user terminal for a task with the real-time performance requirement before the execution time of the task arrives. Whether the task is a question-answer task with the real-time performance requirement may be judged by the user terminal and may also be judged by the speech interaction system.
  • FIG. 4 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • a task request is sent for each task except for a first task and a user terminal is not required to store a task list.
  • the method includes the following steps.
  • step 401 acquired speech information is sent by a user terminal to a speech interaction system.
  • a task list corresponding to the speech information is determined by the speech interaction system, and the task list includes at least two ordered tasks.
  • step 403 response information of a present task and task information of a next task are sent by the speech interaction system.
  • step 404 responsive to the response information of the present task sent by the speech interaction system, the response information is output by the user terminal, and a task request containing the task information is sent to the speech interaction system before execution time of the next task arrives.
  • step 405 when receiving the task request and determining according to the task information that the next task is a question-answer task, response information of the next task is queried by the speech interaction system.
  • step 406 the response information of the next task and task information of a task arranged after the next task are sent by the speech interaction system to the user terminal.
  • the process may be repeated until all the tasks in the task list are completely executed.
  • the speech interaction system when receiving the task request and determining according to the task information that the next task is a control task, may send a control instruction corresponding to the control task to an Internet of things system to enable the Internet of things system to control a corresponding smart home device.
  • the response information of the present task and the task information of the next task are transmitted every time, and after the response information of the present task is completely played, a request for the response information of the next task may be made by using the task information, so that timeliness of the response information may be ensured.
  • FIG. 5 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
  • task requests are sent only for a control task and a question-answer task with a real-time performance requirement and a user terminal stores a task list.
  • the method includes the following steps.
  • a user terminal sends acquired speech information to a speech interaction system.
  • the speech interaction system determines a task list corresponding to the speech information, and the task list includes at least two ordered tasks.
  • step 503 the speech interaction system sends response information of a present task to the user terminal, and when a next task is a question-answer task with a real-time performance requirement or a control task, sends task information of the next task to the user terminal, otherwise does not send the task information.
  • step 504 the user terminal, responsive to the response information of the present task and the task information of the next task sent by the speech interaction system, outputs the response information, and sends a task request containing the task information of the next task to the speech interaction system at a preset time before execution time of the next task arrives; and responsive to the received response information of the present task sent by the speech interaction system (the task information of the next task is not received, which indicates that the next task is not a question-answer task with the real-time performance requirement or a control task), outputs the response information, determines the next task according to the task list and locally acquires and outputs response information of the next task when the execution time of the next task arrives.
  • step 505 the speech interaction system, when receiving the task request and determining according to the task information that the next task is a question-answer task, queries response information of the next task.
  • step 506 the response information of the next task is sent to the user terminal, when an unprocessed task adjacent to the next task is a question-answer task with the real-time performance requirement or a control task, task information of the task is sent to the user terminal, otherwise the task information is not sent.
  • the process may be repeated until the tasks in the task list are completely executed.
  • the speech interaction system in a process of executing a first task in the task list, determines one or more question-answer tasks without the real-time performance requirement in the task list and transmits response information of all of the one or more question-answer tasks without the real-time performance requirement to the user terminal.
  • task information of a control task or a task with the real-time performance requirement is sent only, and task information of a task without the real-time performance requirement is not sent, so that the number of requests may be reduced, and processing pressure of the speech interaction system is further alleviated.
  • FIG. 6 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
  • the apparatus includes: an information acquisition module 62 configured to acquire speech information of a user; a list determination module 64 configured to determine a task list corresponding to the speech information, the task list including at least two ordered tasks; and an information feedback module 66 configured to, for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, query and send response information of the next task to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
  • the next task is a question-answer task with a real-time performance requirement.
  • the information feedback module 66 is configured to, before the response information of the next task is queried and sent to the user terminal, receive a task request containing task information of the next task sent by the user terminal.
  • the information feedback module 66 is configured to, when the response information of the next task is sent to the user terminal, send task information of an unprocessed task adjacent to the next task to the user terminal; or, responsive to that the unprocessed task adjacent to the next task is a question-answer task with the real-time performance requirement, send the task information of the unprocessed task adjacent to the next task to the user terminal, the user terminal storing the task list.
  • the task information of the next task at least includes identification information of the next task, and the task information of the next task further includes at least one of index information of a question in the next task, a type of the question in the next task or the execution time of the next task.
  • the information feedback module 66 is further configured to, before the response information of the next task is queried and sent to the user terminal, in a process of executing a first task in the task list, determine one or more question-answer tasks without the real-time performance requirement in the task list and transmit response information of all of the one or more question-answer tasks without the real-time performance requirement to the user terminal, such that the user terminal locally acquires and outputs the response information according to a sequence of the one or more question-answer tasks in the task list.
  • the apparatus further includes a task interruption module (not shown in FIG. 6 ), configured to: when a task in the task list is not completely executed, if a new user speech is received, interrupt execution of the task that is not completed by the user terminal in the task list.
  • a task interruption module (not shown in FIG. 6 ), configured to: when a task in the task list is not completely executed, if a new user speech is received, interrupt execution of the task that is not completed by the user terminal in the task list.
  • FIG. 7 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
  • the apparatus includes: a speech sending module 72 configured to send acquired speech information to a speech interaction system; and an information receiving module 74 configured to receive response information of a present task sent by the speech interaction system before execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives.
  • the response information of the present task further contains task information of a next task
  • the apparatus further includes a first request sending module (not illustrated in FIG. 7 ), configured to: send a task request containing the task information of the next task to the speech interaction system before execution time of the next task arrives, such that the speech interaction system feeds back corresponding response information before the execution time of the next task arrives.
  • a task list corresponding to the speech information is pre-stored, and the apparatus further includes: a task information determination module (not shown in FIG. 7 ) configured to determine task information of a next task in the task list; and a second request sending module (not shown in FIG. 7 ) configured to send a task request containing the task information of the next task to the speech interaction system before execution time of the next task arrives, such that the speech interaction system feeds back corresponding response information before the execution time of the next task arrives.
  • a task information determination module (not shown in FIG. 7 ) configured to determine task information of a next task in the task list
  • a second request sending module (not shown in FIG. 7 ) configured to send a task request containing the task information of the next task to the speech interaction system before execution time of the next task arrives, such that the speech interaction system feeds back corresponding response information before the execution time of the next task arrives.
  • the next task is a question-answer task with a real-time performance requirement.
  • the apparatus further includes a task execution module (not shown in FIG. 7 ), configured to: determine according to the task information that the next task is a question-answer task without the real-time performance requirement and locally acquire and output response information of the next task when the execution time of the next task arrives.
  • a task execution module (not shown in FIG. 7 ), configured to: determine according to the task information that the next task is a question-answer task without the real-time performance requirement and locally acquire and output response information of the next task when the execution time of the next task arrives.
  • the apparatus embodiments substantially correspond to the method embodiments.
  • the apparatus embodiments described above are only exemplary, modules described as separate parts therein may or may not be physically separated, and parts displayed as modules may be located in the same place or may also be distributed to multiple networks. Part or all of the modules therein may be selected according to a practical requirement.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, in which a computer program is stored, the program being executed by a processor to implement the steps of any one of the above described methods.
  • the storage medium includes, but is not limited to, a disk memory, a compact disc read-only memory (CD-ROM), an optical memory, and the like.
  • the computer-readable storage medium includes nonvolatile, volatile, removable and unremovable media and may store information by any method or technology.
  • the information may be a computer-readable instruction, a data structure, a program module or other data.
  • Examples of the computer storage medium include, but not limited to, a Phase change random access memory (PRAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), a random access memory (RAM) of another type, a read-only memory (ROM), an electrically erasable programmable Rom (EEPROM), a flash memory or another memory technology, a CD-ROM, a digital video disk (DVD) or another optical memory, a cassette memory, a magnetic tape, a disk memory or another magnetic storage device or any other non-transmission medium, and may be configured to store information accessible for a computer device.
  • PRAM Phase change random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable Rom
  • flash memory or another memory technology
  • CD-ROM compact disc-read only memory
  • DVD digital video disk
  • cassette memory a magnetic tape
  • An embodiment of the present disclosure provides an electronic device, which includes: a processor; and a memory configured to store instructions executable by the processor; wherein the processor is configured to execute the instructions to implement any one of the above speech interaction methods.
  • FIG. 8 is a schematic diagram of a speech interaction apparatus 800 , according to an exemplary embodiment.
  • the apparatus 800 may be provided as a user terminal or a speech interaction system.
  • the apparatus 800 includes a processing component 822 , which may further include one or more processors, and a memory resource represented by a memory 832 , configured to store instructions executable by the processing component 822 , for example, an application (APP).
  • the APP stored in the memory 832 may include one or more modules, each of which corresponds to a set of instructions.
  • the processing component 822 is configured to execute the instructions to execute the above speech interaction methods.
  • the apparatus 800 may further include a power component 826 configured to execute power management of the apparatus 800 , a wired or wireless network interface 850 configured to connect the apparatus 800 to a network and an input/output (I/O) interface 858 .
  • the apparatus 800 may be operated based on an operating system stored in the memory 832 , for example, Android, iOS, Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
  • non-transitory computer-readable storage medium including instructions such as the memory 832 including instructions, and the instructions may be executed by the processing component 822 of the apparatus 800 to implement the above described methods.
  • the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Display Devices Of Pinball Game Machines (AREA)

Abstract

A speech interaction method includes: acquiring speech information of a user; determining a task list corresponding to the speech information, the task list comprising at least two ordered tasks; and for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, querying and sending response information of the next task to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS
This application is based upon and claims priority to Chinese Patent Application No. 202010017436.7, filed on Jan. 8, 2020, the entire content of which is incorporated herein by reference.
TECHNICAL FIELD
The present application generally relates to the technical field of speech interaction (voice interaction), and more particularly, to a speech interaction method and apparatus, a device and a storage medium.
BACKGROUND
With the continuous improvement of artificial intelligence technologies, man-machine speech interaction has also been developed, and various speech assistants (voice assistants) and man-machine interaction devices have been favored by more and more users. Speech interaction refers to interaction with a machine using a speech as an information carrier. A series of inputs and outputs are generated by interaction, communication and information exchange of a person and a computer to complete a task or achieve a purpose. Compared with a conventional man-machine interaction manner, speech interaction is faster and simpler.
In an existing speech interaction process, after a user inputs a speech into a machine, if the speech corresponds to a question-answer task, the machine may look for a result matching the speech in a corpus and then feeds back the result to the user. If the speech corresponds to a control task, the machine may execute the corresponding control task. Take an intelligent speaker as an example, the intelligent speaker may be connected to a network and, after acquiring an input speech of a user, execute a task corresponding to the input speech.
In related art, speech interaction is usually in a question-answer form. For example, a user asks: Xiaoai (wakeup word), how is the weather today? The intelligent speaker answers: it is clear to cloudy, with north wind of grade 3, the temperature is 19 to 26 degrees centigrade, and the air quality is good. However, in such an interaction manner, a speech assistant may only give a single reply and may not meet a requirement of a complex scenario requiring multiple replies.
SUMMARY
According to a first aspect of embodiments of the present disclosure, a speech interaction method includes: acquiring speech information of a user; determining a task list corresponding to the speech information, the task list including at least two ordered tasks; and for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, querying and sending response information of the next task to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
According to a second aspect of embodiments of the present disclosure, a speech interaction method includes: sending acquired speech information to a speech interaction system; and receiving response information of a present task sent by the speech interaction system before execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives.
According to a third aspect of embodiments of the present disclosure, a speech interaction apparatus includes: a processor; and a memory for storing instructions executable by the processor; wherein the processor is configured to: acquire speech information of a user; determine a task list corresponding to the speech information, the task list including at least two ordered tasks; and for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, query and send response information of the next task to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
According to a fourth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored therein instructions that, when executed by a processor of a device, cause the device to perform the speech interaction method in the first or second aspect.
It is to be understood that the above general description and detailed description below are only exemplary and explanatory and not intended to limit the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the present disclosure.
FIG. 1 is an application scenario diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
FIG. 2 is a flow chart of a speech interaction method, according to an exemplary embodiment of the present disclosure.
FIG. 3 is a flow chart of a speech interaction method, according to an exemplary embodiment of the present disclosure.
FIG. 4 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
FIG. 5 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure.
FIG. 6 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
FIG. 7 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
FIG. 8 is a schematic diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure.
DETAILED DESCRIPTION
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise represented. The embodiments set forth in the following description of exemplary embodiments do not represent all embodiments consistent with the present disclosure. Instead, they are merely examples of apparatuses and methods consistent with aspects related to the present disclosure as recited in the appended claims.
Terms used in the present disclosure are only adopted for the purpose of describing exemplary embodiments and not intended to limit the present disclosure. For example, the term “if” used here may be explained as “while” or “when” or “responsive to determining,” which depends on the context.
With the arrival of the artificial intelligent age, intelligent speech assistants have been applied to daily life extensively, and involved in a broad scope of applications, from smart phones and intelligent home appliances to vehicle scenarios and smart home care. According to different application scenarios and various complex requirements, interaction manners for users and speech assistants are also enriched. A series of inputs and outputs are generated by interaction, communication and information exchange of a person and a computer to complete a task or achieve a purpose. Speech interaction refers to interaction with a machine using a speech as an information carrier. Compared with a conventional man-machine interaction manner, speech interaction is faster and simpler. For example, when a song is played, a few minutes may be needed to input, query and play for a conventional graphic user interface (GUI), while a shorter time is needed for speech interaction. The user's hands are freed, complex operations over an application (APP) are avoided, and a speech task may be assigned to a terminal at the same time of driving. The speech task may be a question-answer task, namely a user asks and the terminal answers. The speech task may also be a control task, namely the user controls another device, particularly a smart home device and the like, through a speech.
A multi-round dialog manner has emerged for meeting increasingly complex requirements of a user on an intelligent speech assistant. For example, the multi-round dialog may be:
    • user: Xiaoai, set the alarm clock.
    • Xiaoai: what time do you want to set the alarm clock at?
    • user: set the alarm clock at 7 a.m.
    • Xiaoai: OK, the alarm clock has been set for you at 7 a.m.
The multi-round dialog manner is an interaction manner provided for the condition that an intention of the user is ambiguous due to loose questioning. The intelligent speech assistant is required to actively continue the dialog to further acquire a complete requirement of the user and then give a single reply to this requirement. Each round of the dialog is also in a question-answer form.
For the question-answer manner or the multi-round dialog manner, the speech assistant may only give a single reply every time and may not meet a requirement of a complex scenario requiring multiple replies.
In view of this, embodiments of the present disclosure provide speech interaction solutions. A mapping relationship between speech information and a task list may be pre-configured, and the task list may include multiple tasks. After speech information of a user is obtained, a task list corresponding to the speech information may be determined, so that one speech may correspond to multiple tasks, the user is not required to input multiple speeches, speech interaction responses are enriched, man-machine interaction efficiency is improved, and the problem of speech recognition reduction caused by multiple speech interactions is solved. Moreover, for each task in at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, response information of the next task is queried and sent to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives, to ensure high timeliness and accuracy of the response information of the task.
Embodiments of the present disclosure may be applied to an application scenario including a user terminal and a speech interaction system. The user terminal and a server of the speech interaction system may be configured in the same electronic device and may also be configured in different electronic devices. When the user terminal and the server are configured in the same electronic device, audio input, audio processing, task execution and the like may be completed by the same device. When the user terminal and the server are configured in different electronic devices, processing pressure of the user terminal may be alleviated. The user terminal may be a terminal with a speech acquisition function and, for example, may be a smart phone, a tablet computer, a personal digital assistant (PDA), a wearable device, an intelligent speaker, and the like. The speech interaction system may be a server with a speech processing capability. In an embodiment, the user terminal may be an intelligent speaker and the speech interaction system may be the server.
FIG. 1 is an application scenario diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure. In the application scenario, a user may speak to a user terminal 101, the user terminal 101 acquires speech information and sends the acquired speech information to a speech interaction system 102, and the speech interaction system 102 may perform speech processing. In an embodiment, the speech interaction system 102 may include an automatic speech recognition (ASR) module, a natural language processing (NLP) module, a task scheduling module, and a text to speech (TTS) module. The ASR module converts a speech into a text. The NLP module interprets the text and gives a feedback. The task scheduling module performs task scheduling. The TTS module converts output information into a speech. It is to be understood that the speech interaction system 102 may be other architectures in the related art.
FIG. 2 is a flow chart showing a speech interaction method, according to an exemplary embodiment of the present disclosure. The method may include the following steps.
In step 202, speech information of a user is acquired.
In step 204, a task list corresponding to the speech information is determined, and the task list includes at least two ordered tasks.
In step 206, for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, response information of the next task is queried and sent to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
The speech interaction method provided in the embodiment may be implemented by software, or hardware, or a combination of software and hardware, and the involved hardware may be formed by two or more physical entities and may also be formed by one physical entity. For example, the method in the embodiment may be applied to a speech interaction system, and the speech interaction system may be configured in an electronic device with a speech processing capability or formed by an electronic device with the speech processing capability. The electronic device may be a terminal device, may also be a server device and may be configured as required. For illustrative purposes, it is assumed that the speech interaction system executes the speech interaction method in the embodiments below.
The speech information of the user may be speech information acquired by the user terminal, and is sent to the speech interaction system by the user terminal. In the speech interaction system, a mapping relationship between speech information and a task list may be pre-configured. The task list corresponding to the speech information may be configured by the user. For example, a configuration interface is provided, and a response is made to a configuration instruction input by the user to obtain the task list corresponding to the speech information. The task list corresponding to the speech information may also be recommended by the system and may be configured as required.
In the embodiment, the task list corresponding to the speech information is configured to achieve correspondence of one speech and multiple tasks. The task list includes the at least two ordered tasks. For example, the task list may include various types of tasks, and there may be an execution sequence requirement for each task. For example, the tasks in the task list include, but not limited to, a question-answer task, a control task, and the like. The question-answer task may be a task that requires query and response of the speech interaction system. The control task may be a task that requires the speech interaction system to control a device, for example, controlling a smart home device, such as controlling a smart lamp to be turned on/off and controlling a smart rice cooker to be turned on/off, etc. Recognition and semantic comprehension of an input speech may be considered as recognition of a scenario. For example, if the speech information is “good morning,” it may be considered as a getting-up scenario, and the tasks in the corresponding task list may include: play soothing music (for 20 minutes) and simultaneously open a bedroom curtain; then play morning news (for 20 minutes); next play the weather forecast; and finally play a traffic condition of a road to work.
In some embodiments, a task in the task list may have a real-time result query requirement, for example, a traffic condition of the road to work. If the speech “good morning” is received, response information of each question-answer task is immediately queried, for example, the traffic condition of the road to work is queried, and is collectively transmitted to the user terminal for caching. When it is time to play traffic condition information, the cached traffic condition may be a traffic condition at least 40 minutes ago, making the traffic condition inaccurate.
Therefore, in some embodiments, the user terminal is enabled to make various complex replies and simultaneously ensure high timeliness and accuracy of contents of the replies. In an embodiment, for each task in the at least two ordered tasks, responsive to that the next task of the present task is a question-answer task, the response information of the next task is queried and sent to the user terminal before the execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
In the embodiment, the response information of the next task may be queried when the next task is about to be executed, so that timeliness and accuracy of the response information obtained by the user terminal are ensured.
In an embodiment, execution time of each task is marked in the task list, and in such case, the present time state may be determined to be before the execution time of the next task arrives, for example, query is performed at a preset time before the execution time arrives. In another embodiment, when a task request of executing the next task is received from the user terminal, it is determined that the present time state is before the execution time of the next task arrives. For example, before the operation that the response information of the next task is queried and sent to the user terminal, the method further includes that: a task request containing task information of the next task sent by the user terminal is received. A time when the user terminal sends the task request may be before the execution time of the next task arrives. For example, the user terminal may determine completion time of the present task, so that, if the next task is executed immediately after the present task, the user terminal may send the task request of the next task to the speech interaction system when the present task is completed or at a preset time before completion, to enable the speech interaction system to, when receiving the task request, judge that the present time state is before the execution time of the next task arrives and further start querying the response information corresponding to the next task. For another example, if the user terminal knows the execution time of the next task, the user terminal may send the task request, etc. to the speech interaction system at a preset time before the execution time of the next task arrives. The execution time of the next task may be determined by the task list transmitted by the speech interaction system, the task list recording execution time of the tasks; and it may also be determined by the task information, transmitted by the speech interaction system, of the next task, the task information including the execution time of the next task.
In an embodiment, the user terminal may store the task list to determine the next task. The task list may be historically stored, and may also be fed back by the speech interaction system after the speech information is sent to the speech interaction system. In another embodiment, the speech interaction system, when sending the present task to the user terminal, may also send the task information of the next task. The user terminal determines the next task according to a task identifier in the task information. In an embodiment, the task request may further contain the task information of the next task. The task information may at least be configured to uniquely identify the next task. For example, the task information of the next task at least includes identification information of the next task, so that the next task may be determined by the task information. The task information in the task request may be determined by the user terminal according to the task list, and may also be the task information, contained when the speech interaction system sends response information of the present task, of the next task.
In some scenarios, increase of requests may increase a request processing burden of the speech interaction system and test performance (for example, high concurrency, running time complexity, and the like) of the speech interaction system. On one hand, the performance of the speech interaction system may be improved to solve the problem. On the other hand, query time may be reduced due to the provided task information. For example, besides the identification information capable of uniquely identifying the next task, the task information of the next task may further include, but not limited to, one or more of index information of a question in the next task, a type of the question in the next task and the execution time of the next task. The index information is information configured to index an answer to the question in the next task. The answer corresponding to the question in the next task may be rapidly queried through the index information, so that the query time is reduced. Different questions are classified, and answers corresponding to different types of questions are stored in different databases, so that, according to the type of a question, data query may be directly performed in the database corresponding to this type, and query efficiency may further be improved.
It is to be understood that the task information may further include another field capable of improving the query efficiency. In addition, the present time state may also be determined in other means. The preset time may be pre-configured time with less influence on accuracy of the response information, and from the preset time to the execution time of the next task, query, feedback and the like of the response information may be completed. Even in some scenarios, because of a relatively high query speed, the preset time may be 0, namely the response information of the next task is queried and sent to the user terminal when the execution time of the next task arrives.
In an embodiment, for a next task that requires execution of a real-time query operation, regardless of a type of the next task, the user terminal may send the task request containing the task information of the next task to the speech interaction system. The next task includes any task in a question-answer task with a real-time performance requirement or a question-answer task without the real-time performance requirement. In other words, for the question-answer task with the real-time performance requirement or the question-answer task without the real-time performance requirement, query is performed before the execution time of the task arrives, and the response information is fed back to the user terminal.
Furthermore, the task information of the next task may also be sent to the user terminal when the response information of the present task is sent to the user terminal, and correspondingly, when the response information of the next task is sent to the user terminal, task information of an unprocessed task (a third-round task) adjacent to the next task is sent to the user terminal, to implement simultaneous transmission of task information of an adjacent task every time when response information of a task is transmitted. Therefore, the user terminal, when the response information of the next task is completely executed, may know the task required to be executed next and, even before execution time of the unprocessed task adjacent to the next task arrives, send a task request containing the task information to the speech interaction system to request for execution of the third-round task.
In the embodiment, the response information of the present task and the task information of the next task are transmitted every time, the response information of the present task is played, and before the execution time of the next task arrives, a request for the response information of the next task may be made by using the task information, so that the timeliness of the response information may be ensured.
In some scenarios, real-time query may bring relatively high processing pressure to the speech interaction system, so that the number of query times of the speech interaction system may be reduced in a manner of reducing the number of task requests. For example, question-answer tasks are divided into question-answer tasks with the real-time performance requirement and question-answer tasks without the real-time performance requirement. The question-answer task with the real-time performance requirement refers to a question-answer task with a requirement on the timeliness of response information, for example, a task of playing a real-time traffic condition, and the like. The question-answer task without the real-time performance requirement may be a question-answer task with no or low requirement on the timeliness of response information, for example, a task of playing soothing music for 20 minutes, and the like. A type of the question-answer task may be configured by the user, and may also be obtained by big data analysis of the system.
In another embodiment, the next task is a question-answer task with the real-time performance requirement. In other words, under the condition that the next task is a question-answer task with the real-time performance requirement, the response information of the next task may be queried and sent to the user terminal before the execution time of the next task arrives, so that the processing pressure of the speech interaction system is reduced. For the condition that the task information is required to be sent, in an embodiment, when the response information of the next task is sent to the user terminal, the following operation is further included: the task information of the unprocessed task adjacent to the next task is sent to the user terminal. The user terminal may judge whether the next task is a question-answer task with the real-time performance requirement or a question-answer task without the real-time performance requirement according to the task information and, when determining according to the task information that the next task is a question-answer task with the real-time performance requirement, may send the task request containing the task information to the speech interaction system before the execution time of the next task arrives.
According to the embodiment, no matter whether the task has the real-time performance requirement, the task information is transmitted, and then the user terminal may judge whether to send the task request.
In another embodiment, when the response information of the next task is sent to the user terminal, the following operation is further included: responsive to that the unprocessed task adjacent to the next task is a question-answer task with the real-time performance requirement, the task information of the unprocessed task adjacent to the next task is sent to the user terminal, and the user terminal stores the task list.
In the embodiment, only task information of a task with the real-time performance requirement is sent, so that resource waste caused by information transmission when task information of a task without the real-time performance requirement is also sent is avoided, and judgment processes of the user terminal are reduced. The user terminal, under the condition that the task information of the unprocessed task adjacent to the next task is not received, may directly determine the next task according to the task list and locally acquire and output the response information of the unprocessed task adjacent to the next task.
For the question-answer task without the real-time performance requirement, the response information of such a question-answer task may be fed back to the user terminal at a specified time. For example, in a process of executing a first task in the task list, one or more question-answer tasks without the real-time performance requirement in the task list may be determined, and response information of all of the one or more question-answer tasks without the real-time performance requirement is transmitted to the user terminal, such that the user terminal locally acquires and outputs the response information according to a sequence of the one or more question-answer tasks in the task list.
In the embodiment, the response information of the question-answer task without the real-time performance requirement may be collectively transmitted at one time, so that the number of the task requests may be reduced, and pressure of a server may be alleviated.
With respect to the operation that the response information of the next task is queried and sent to the user terminal, the response information corresponding to the next task may be queried, and the obtained response information is sent to the user terminal. The response information may be audio information, text information, picture information, and the like. Correspondingly, outputting the response information may be playing the audio information and displaying the text information and the picture information, etc. Exemplarily, interaction is usually performed through an audio in the speech interaction process, so that the response information may be the audio information, and outputting the response information may be playing the audio information.
Besides question-answer tasks, the tasks in the task list may also be control tasks configured to control devices. In an embodiment, the method further includes that: if the next task is a control task of controlling a smart home device, a control instruction corresponding to the control task is sent to an Internet of things system to enable the Internet of things system to control the corresponding smart home device.
In the embodiment, not only may question-answer be implemented, but also the smart home device may be controlled, so that application scenarios of speech interaction are extended.
In some scenarios, a new user speech may be received when a task in the task list is not completely executed. In an embodiment, execution of the task that is not completely executed in the task list may be delayed; and in another embodiment, the user terminal may directly be stopped from executing the task that is not completed in the task list, and the task that is not completely executed is cleared. For example, when a task in the task list is not completely executed, if a new user speech is received, execution of the task that is not completed by the user terminal in the task list is interrupted. In addition, prompting information may also be output, and whether to continue executing the task that is not completely executed in the task list is determined according to a user instruction, so task controllability is achieved.
FIG. 3 is a flow chart showing another speech interaction method, according to an exemplary embodiment of the present disclosure. The method may include the following steps.
In step 302, acquired speech information is sent to a speech interaction system.
In step 304, response information of a present task sent by the speech interaction system is received before execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives.
The method in the embodiment may be applied to a user terminal. For example, the user terminal may be a device with a speech acquisition function such as a smart phone, a tablet computer, a PDA, a wearable device, an intelligent speaker, and the like.
The response information of the present task sent by the speech interaction system may be received before the execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives. The present task may be a question-answer task with a real-time performance requirement, may also be a question-answer task without the real-time performance requirement and may also be a control task, etc. In an embodiment, when the present task is a question-answer task, such as a question-answer task with the real-time performance requirement, the response information of the present task may be queried and obtained by the speech interaction system at a short time before the execution time of the present task arrives (for example, a preset time before the execution time of the present task arrives), so that high timeliness and accuracy of the response task of the task are ensured. The preset time may be a relatively short time configured as required.
In an embodiment, there is no pre-stored task list corresponding to the speech information, and the user terminal determines a task through task information sent by the speech interaction system. In an embodiment, there is pre-stored the task list corresponding to the speech information, and the user terminal may determine the task in the task list. In an embodiment, regardless of a type of a next task, a task request containing task information of the next task is sent to implement timely query for any type of question-answer task. In an embodiment, a task request is sent only for a question-answer task with the real-time performance requirement or a control task (in some examples, the task request is also not required to be sent even for the control task). The user terminal may pre-store response information of question-answer tasks without the real-time performance requirement to determine that the next task is a question-answer task without the real-time performance requirement and, when execution time of the next task arrives, locally acquire and output the response information of the next task. In the speech interaction system, when the response information of the present task is sent, regardless of the type of the next task, task information of the next task is transmitted such that task information of each task, except for a first task, in the task list is sent; or, when the response information of the present task is sent, if the next task is a question-answer task with the real-time performance requirement or a control task, the task information of the next task is transmitted. Whether to send the response information of the present task may be determined by, for example, whether a task request is received.
In an embodiment, the speech interaction system may send the task information of each task, except for the first task, in the task list. In the user terminal, the response information of the present task further contains task information of a next task, and the method further includes that: a task request containing the task information of the next task is sent to the speech interaction system before execution time of the next task arrives, such that the speech interaction system feeds back corresponding response information before the execution time of the next task arrives. Therefore, response information of each question-answer task may be queried and obtained in real time.
In another embodiment, response information of all question-answer tasks without the real-time performance requirement is obtained in advance. In the user terminal, the method further includes that: when the present task is the first task in the task list corresponding to the speech information, the response information of all of the question-answer tasks without the real-time performance requirement in the task list is further received from the speech interaction system. In such case, whether the next task is a question-answer task with the real-time performance requirement or a control task may be determined according to the task information, transmitted by the speech interaction system, of the next task; if it is determined according to the task information that the next task is a question-answer task with the real-time performance requirement or a control task of controlling a smart home device, the task request containing the task information is sent to the speech interaction system before the execution time of the next task arrives; and if it is determined according to the task information that the next task is a question-answer task without the real-time performance requirement, the response information of the next task is locally acquired and output when the execution time of the next task arrives.
According to the embodiment, the user terminal judges the type of the next task according to the task information to further determine whether to send the task request, so that the number of times for sending task requests may be reduced, and furthermore, processing pressure of the speech interaction system is alleviated.
In an embodiment, the speech interaction system may send task information of a task with the real-time performance requirement or task information of a control task in the task list. If the user terminal receives the task information of the next task, it is indicated that the next task is a question-answer task with the real-time performance requirement or a control task, and the task request containing the task information is sent to the speech interaction system at the preset time before the execution time of the next task arrives. For example, the user terminal pre-stores the task list corresponding to the speech information, and the method further includes that: the task information of the next task in the task list is determined; and the task request containing the task information of the next task is sent to the speech interaction system before the execution time of the next task arrives such that the speech interaction system feeds back the corresponding response information before the execution time of the next task arrives.
Furthermore, if only the response information of the present task sent by the speech interaction system is received, it is indicated that the next task is a task of which the response information has been stored. Therefore, if only the response information of the present task sent by the speech interaction system is received, the response information is output, the next task is determined according to the task list, and the response information of the next task is locally acquired and output when the execution time of the next task arrives.
In the embodiment, a request is sent to the speech interaction system only when a task is a question-answer task with the real-time performance requirement or a control task, so that the number of sending times is reduced.
In an embodiment, the user terminal may store the task list and may also not store the task list. For example, if the speech interaction system transmits task information of each task, the user terminal is not required to store the task list. The speech interaction system may perform query for each task, and may also query and send the response information of the next task to the user terminal for a task with the real-time performance requirement before the execution time of the task arrives. Whether the task is a question-answer task with the real-time performance requirement may be judged by the user terminal and may also be judged by the speech interaction system.
FIG. 4 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure. In the embodiment, a task request is sent for each task except for a first task and a user terminal is not required to store a task list. The method includes the following steps.
In step 401, acquired speech information is sent by a user terminal to a speech interaction system.
In step 402, a task list corresponding to the speech information is determined by the speech interaction system, and the task list includes at least two ordered tasks.
In step 403, response information of a present task and task information of a next task are sent by the speech interaction system.
In step 404, responsive to the response information of the present task sent by the speech interaction system, the response information is output by the user terminal, and a task request containing the task information is sent to the speech interaction system before execution time of the next task arrives.
In step 405, when receiving the task request and determining according to the task information that the next task is a question-answer task, response information of the next task is queried by the speech interaction system.
In step 406, the response information of the next task and task information of a task arranged after the next task are sent by the speech interaction system to the user terminal.
The process may be repeated until all the tasks in the task list are completely executed.
Furthermore, the speech interaction system, when receiving the task request and determining according to the task information that the next task is a control task, may send a control instruction corresponding to the control task to an Internet of things system to enable the Internet of things system to control a corresponding smart home device.
In the embodiment, the response information of the present task and the task information of the next task are transmitted every time, and after the response information of the present task is completely played, a request for the response information of the next task may be made by using the task information, so that timeliness of the response information may be ensured.
FIG. 5 is a flow diagram of a speech interaction method, according to an exemplary embodiment of the present disclosure. In the embodiment, task requests are sent only for a control task and a question-answer task with a real-time performance requirement and a user terminal stores a task list. The method includes the following steps.
In step 501, a user terminal sends acquired speech information to a speech interaction system.
In step 502, the speech interaction system determines a task list corresponding to the speech information, and the task list includes at least two ordered tasks.
In step 503, the speech interaction system sends response information of a present task to the user terminal, and when a next task is a question-answer task with a real-time performance requirement or a control task, sends task information of the next task to the user terminal, otherwise does not send the task information.
In step 504, the user terminal, responsive to the response information of the present task and the task information of the next task sent by the speech interaction system, outputs the response information, and sends a task request containing the task information of the next task to the speech interaction system at a preset time before execution time of the next task arrives; and responsive to the received response information of the present task sent by the speech interaction system (the task information of the next task is not received, which indicates that the next task is not a question-answer task with the real-time performance requirement or a control task), outputs the response information, determines the next task according to the task list and locally acquires and outputs response information of the next task when the execution time of the next task arrives.
In step 505, the speech interaction system, when receiving the task request and determining according to the task information that the next task is a question-answer task, queries response information of the next task.
In step 506, the response information of the next task is sent to the user terminal, when an unprocessed task adjacent to the next task is a question-answer task with the real-time performance requirement or a control task, task information of the task is sent to the user terminal, otherwise the task information is not sent.
The process may be repeated until the tasks in the task list are completely executed.
Furthermore, the speech interaction system, in a process of executing a first task in the task list, determines one or more question-answer tasks without the real-time performance requirement in the task list and transmits response information of all of the one or more question-answer tasks without the real-time performance requirement to the user terminal.
In the embodiment, task information of a control task or a task with the real-time performance requirement is sent only, and task information of a task without the real-time performance requirement is not sent, so that the number of requests may be reduced, and processing pressure of the speech interaction system is further alleviated.
FIG. 6 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure. The apparatus includes: an information acquisition module 62 configured to acquire speech information of a user; a list determination module 64 configured to determine a task list corresponding to the speech information, the task list including at least two ordered tasks; and an information feedback module 66 configured to, for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task, query and send response information of the next task to a user terminal before execution time of the next task arrives, such that the user terminal outputs the response information when the execution time of the next task arrives.
In an embodiment, the next task is a question-answer task with a real-time performance requirement.
In an embodiment, the information feedback module 66 is configured to, before the response information of the next task is queried and sent to the user terminal, receive a task request containing task information of the next task sent by the user terminal.
In an embodiment, the information feedback module 66 is configured to, when the response information of the next task is sent to the user terminal, send task information of an unprocessed task adjacent to the next task to the user terminal; or, responsive to that the unprocessed task adjacent to the next task is a question-answer task with the real-time performance requirement, send the task information of the unprocessed task adjacent to the next task to the user terminal, the user terminal storing the task list.
In an embodiment, the task information of the next task at least includes identification information of the next task, and the task information of the next task further includes at least one of index information of a question in the next task, a type of the question in the next task or the execution time of the next task.
In an embodiment, the information feedback module 66 is further configured to, before the response information of the next task is queried and sent to the user terminal, in a process of executing a first task in the task list, determine one or more question-answer tasks without the real-time performance requirement in the task list and transmit response information of all of the one or more question-answer tasks without the real-time performance requirement to the user terminal, such that the user terminal locally acquires and outputs the response information according to a sequence of the one or more question-answer tasks in the task list.
In an embodiment, the apparatus further includes a task interruption module (not shown in FIG. 6 ), configured to: when a task in the task list is not completely executed, if a new user speech is received, interrupt execution of the task that is not completed by the user terminal in the task list.
FIG. 7 is a block diagram of a speech interaction apparatus, according to an exemplary embodiment of the present disclosure. The apparatus includes: a speech sending module 72 configured to send acquired speech information to a speech interaction system; and an information receiving module 74 configured to receive response information of a present task sent by the speech interaction system before execution time of the present task arrives, such that the response information of the present task is output when the execution time of the present task arrives.
In an embodiment, the response information of the present task further contains task information of a next task, and the apparatus further includes a first request sending module (not illustrated in FIG. 7 ), configured to: send a task request containing the task information of the next task to the speech interaction system before execution time of the next task arrives, such that the speech interaction system feeds back corresponding response information before the execution time of the next task arrives.
In an embodiment, a task list corresponding to the speech information is pre-stored, and the apparatus further includes: a task information determination module (not shown in FIG. 7 ) configured to determine task information of a next task in the task list; and a second request sending module (not shown in FIG. 7 ) configured to send a task request containing the task information of the next task to the speech interaction system before execution time of the next task arrives, such that the speech interaction system feeds back corresponding response information before the execution time of the next task arrives.
In an embodiment, the next task is a question-answer task with a real-time performance requirement.
In an embodiment, the apparatus further includes a task execution module (not shown in FIG. 7 ), configured to: determine according to the task information that the next task is a question-answer task without the real-time performance requirement and locally acquire and output response information of the next task when the execution time of the next task arrives.
The apparatus embodiments substantially correspond to the method embodiments. For detailed operations of each module in the apparatus, reference may be made to the above description of the method embodiments. The apparatus embodiments described above are only exemplary, modules described as separate parts therein may or may not be physically separated, and parts displayed as modules may be located in the same place or may also be distributed to multiple networks. Part or all of the modules therein may be selected according to a practical requirement.
An embodiment of the present disclosure also provides a computer-readable storage medium, in which a computer program is stored, the program being executed by a processor to implement the steps of any one of the above described methods.
The storage medium includes, but is not limited to, a disk memory, a compact disc read-only memory (CD-ROM), an optical memory, and the like. The computer-readable storage medium includes nonvolatile, volatile, removable and unremovable media and may store information by any method or technology. The information may be a computer-readable instruction, a data structure, a program module or other data. Examples of the computer storage medium include, but not limited to, a Phase change random access memory (PRAM), a static random access memory (SRAM), a dynamic random access memory (DRAM), a random access memory (RAM) of another type, a read-only memory (ROM), an electrically erasable programmable Rom (EEPROM), a flash memory or another memory technology, a CD-ROM, a digital video disk (DVD) or another optical memory, a cassette memory, a magnetic tape, a disk memory or another magnetic storage device or any other non-transmission medium, and may be configured to store information accessible for a computer device.
An embodiment of the present disclosure provides an electronic device, which includes: a processor; and a memory configured to store instructions executable by the processor; wherein the processor is configured to execute the instructions to implement any one of the above speech interaction methods.
FIG. 8 is a schematic diagram of a speech interaction apparatus 800, according to an exemplary embodiment. For example, the apparatus 800 may be provided as a user terminal or a speech interaction system. Referring to FIG. 8 , the apparatus 800 includes a processing component 822, which may further include one or more processors, and a memory resource represented by a memory 832, configured to store instructions executable by the processing component 822, for example, an application (APP). The APP stored in the memory 832 may include one or more modules, each of which corresponds to a set of instructions. In addition, the processing component 822 is configured to execute the instructions to execute the above speech interaction methods.
The apparatus 800 may further include a power component 826 configured to execute power management of the apparatus 800, a wired or wireless network interface 850 configured to connect the apparatus 800 to a network and an input/output (I/O) interface 858. The apparatus 800 may be operated based on an operating system stored in the memory 832, for example, Android, iOS, Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™ or the like.
In an exemplary embodiment, there is also provided a non-transitory computer-readable storage medium including instructions such as the memory 832 including instructions, and the instructions may be executed by the processing component 822 of the apparatus 800 to implement the above described methods. For example, the non-transitory computer-readable storage medium may be a ROM, a RAM, a CD-ROM, a magnetic tape, a floppy disc, an optical data storage device, and the like.
Other implementations of the present disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the present disclosure. The present disclosure is intended to cover any variations, uses, or adaptations of the present disclosure following the general principles thereof and including such departures from the present disclosure as come within known or customary practice in the art. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the present disclosure being indicated by the following claims.
It will be appreciated that the present disclosure is not limited to the exact construction that has been described above and illustrated in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. It is intended that the scope of the present disclosure only be limited by the appended claims.

Claims (14)

What is claimed is:
1. A speech interaction method, comprising:
acquiring, by a speech interaction system from a user terminal, speech information of a user;
determining, by the speech interaction system, a task list corresponding to the speech information, the task list comprising at least two ordered tasks;
sending, by the speech interaction system, response information of a present task and task information of a next task to the user terminal; and
for each task in the at least two ordered tasks, responsive to that the next task of the present task is a question-answer task with a real-time performance requirement,
outputting, by the user terminal, the response information of the present task to the user;
sending, by the user terminal, a task request containing the task information of the next task to the speech interaction system;
judging, by the speech interaction system, that a present time state is before execution time of the next task arrives;
querying and sending, by the speech interaction system, response information of the next task to the user terminal before the execution time of the next task arrives, and
outputting, by the user terminal, the response information of the next task when the execution time of the next task arrives,
wherein before querying and sending the response information of the next task to the user terminal, the method further comprises:
in a process of executing a first task in the task list, determining, by the speech interaction system, one or more question-answer tasks without a real-time performance requirement in the task list;
transmitting, by the speech interaction system, response information of all of the one or more question-answer tasks without the real-time performance requirement to the user terminal; and
locally acquiring and outputting, by the user terminal, the response information of all of the one or more question-answer tasks without the real-time performance requirement according to a sequence of the one or more question-answer tasks in the task list.
2. The method of claim 1, when sending the response information of the next task to the user terminal, further comprising:
sending, by the speech interaction system, task information of an unprocessed task adjacent to the next task to the user terminal; or
responsive to that the unprocessed task adjacent to the next task is a question-answer task with a real-time performance requirement, sending, by the speech interaction system, the task information of the unprocessed task adjacent to the next task to the user terminal, the user terminal storing the task list.
3. The method of claim 1, wherein the task information of the next task comprises identification information of the next task; and the task information of the next task further comprises at least one of index information of a question in the next task, a type of the question in the next task, or the execution time of the next task.
4. The method of claim 1, further comprising:
when a task in the task list is not completely executed, responsive to that a new user speech is received, interrupting execution of the task that is not completed by the user terminal in the task list.
5. The method of claim 1, further comprising: if the next task is a control task of controlling a smart home device, a control instruction corresponding to the control task is sent to an Internet of things system to enable the Internet of things system to control the smart home device.
6. A speech interaction method, comprising:
sending, by a user terminal, acquired speech information to a speech interaction system;
receiving, by the user terminal, response information of a present task sent by the speech interaction system before execution time of the present task arrives, and
outputting the response information of the present task, by the user terminal, when the execution time of the present task arrives,
wherein the response information of the present task further contains task information of a next task, the method further comprising:
responsive to that the next task is a question-answer task with a real-time performance requirement,
sending, by the user terminal, a task request containing the task information of the next task to the speech interaction system before execution time of the next task arrives,
judging, by the speech interaction system, that a present time state is before the execution time of the next task arrives, and
feeding back, by the speech interaction system, corresponding response information of the next task before the execution time of the next task arrives,
wherein a task list corresponding to the speech information is pre-stored, the method further comprising:
responsive to that the next task is a question-answer task with a real-time performance requirement,
determining, by the user terminal, task information of a next task in the task list;
sending, by the user terminal, a task request containing the task information of the next task to the speech interaction system before the execution time of the next task arrives,
judging, by the speech interaction system, that a present time state is before the execution time of the next task arrives, and
feeding back, by the speech interaction system, corresponding response information of the next task before the execution time of the next task arrives,
wherein the method further comprises:
when the present task is a first task in the task list, receiving, by the user terminal, response information of all question-answer tasks without a real-time performance requirement in the task list from the speech interaction system;
determining, by the user terminal, according to the task information that the next task is a question-answer task without a real-time performance requirement, and
locally acquiring and outputting, by the user terminal, response information of the next task when the execution time of the next task arrives.
7. The method of claim 6, further comprising: if the next task is a control task of controlling a smart home device, a control instruction corresponding to the control task is sent to an Internet of things system to enable the Internet of things system to control the smart home device.
8. A speech interaction apparatus, comprising:
a processor; and
a memory for storing instructions executable by the processor;
wherein the processor is configured to:
acquire, by a speech interaction system from a user terminal, speech information of a user;
determine, by the speech interaction system, a task list corresponding to the speech information, the task list comprising at least two ordered tasks; and
send, by the speech interaction system, response information of a present task and task information of a next task to the user terminal;
for each task in the at least two ordered tasks, responsive to that a next task of a present task is a question-answer task with a real-time performance requirement,
output, by the user terminal, the response information of the present task to the user,
send, by the user terminal, a task request containing the task information of the next task to the speech interaction system,
judge, by the speech interaction system, that a present time state is before execution time of the next task arrives,
query and send, by the speech interaction system, response information of the next task to the user terminal before the execution time of the next task arrives, and
output, by the user terminal, the response information of the next task when the execution time of the next task arrives,
wherein the processor is further configured to, before querying and sending the response information of the next task to the user terminal,
in a process of executing a first task in the task list, determine, by the speech interaction system, one or more question-answer tasks without a real-time performance requirement in the task list,
transmit, by the speech interaction system, response information of all of the one or more question-answer tasks without the real-time performance requirement to the user terminal, and
locally acquire and output, by the user terminal, the response information of all of the one or more question-answer tasks without the real-time performance requirement according to a sequence of the one or more question-answer tasks in the task list.
9. The apparatus of claim 8, wherein when sending the response information of the next task to the user terminal, the processor is further configured to perform one of:
sending task information of an unprocessed task adjacent to the next task to the user terminal; or
responsive to that the unprocessed task adjacent to the next task is a question-answer task with a real-time performance requirement, sending the task information of the unprocessed task adjacent to the next task to the user terminal, the user terminal storing the task list.
10. The apparatus of claim 8, wherein the task information of the next task comprises identification information of the next task; and the task information of the next task further comprises at least one of index information of a question in the next task, a type of the question in the next task, or the execution time of the next task.
11. The apparatus of claim 8, wherein the processor is further configured to:
when a task in the task list is not completely executed, responsive to that a new user speech is received, interrupt execution of the task that is not completed by the user terminal in the task list.
12. The apparatus of claim 8, wherein the processor is further configured to, if the next task is a control task of controlling a smart home device, send a control instruction corresponding to the control task to an Internet of things system to enable the Internet of things system to control the smart home device.
13. A non-transitory computer-readable storage medium having stored therein instructions that, when executed by a processor of a device, cause the device to perform the speech interaction method of claim 1.
14. A non-transitory computer-readable storage medium having stored therein instructions that, when executed by a processor of a device, cause the device to perform the speech interaction method of claim 6.
US16/932,148 2020-01-08 2020-07-17 Speech interaction method and apparatus, device and storage medium Active 2041-11-11 US11798545B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010017436.7A CN111243587A (en) 2020-01-08 2020-01-08 Voice interaction method, device, equipment and storage medium
CN202010017436.7 2020-01-08

Publications (2)

Publication Number Publication Date
US20210210088A1 US20210210088A1 (en) 2021-07-08
US11798545B2 true US11798545B2 (en) 2023-10-24

Family

ID=70872405

Family Applications (1)

Application Number Title Priority Date Filing Date
US16/932,148 Active 2041-11-11 US11798545B2 (en) 2020-01-08 2020-07-17 Speech interaction method and apparatus, device and storage medium

Country Status (6)

Country Link
US (1) US11798545B2 (en)
EP (1) EP3848801B1 (en)
JP (1) JP7288885B2 (en)
KR (1) KR102389034B1 (en)
CN (1) CN111243587A (en)
ES (1) ES2952381T3 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113222146A (en) * 2020-01-21 2021-08-06 厦门邑通软件科技有限公司 Operation behavior record management method, system and equipment
CN112017659A (en) * 2020-09-01 2020-12-01 北京百度网讯科技有限公司 Processing method, device and equipment for multi-sound zone voice signals and storage medium
CN114201294B (en) * 2021-12-08 2025-07-15 北京百度网讯科技有限公司 A task processing method, device, system, electronic device and storage medium
CN115390467A (en) * 2022-07-29 2022-11-25 青岛海尔科技有限公司 Voice interaction method and device, storage medium and electronic device

Citations (73)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002315069A (en) 2001-04-17 2002-10-25 Misawa Homes Co Ltd Remote controller
US20040054998A1 (en) 2002-07-26 2004-03-18 Matsushita Electric Industrial Co., Ltd. Program execution apparatus
US20040073538A1 (en) * 2002-10-09 2004-04-15 Lasoo, Inc. Information retrieval system and method employing spatially selective features
CN101656800A (en) 2008-08-20 2010-02-24 阿鲁策株式会社 Automatic answering device and method thereof, conversation scenario editing device, conversation server
US20110004881A1 (en) 2008-03-12 2011-01-06 Nxp B.V. Look-ahead task management
KR20110023570A (en) 2009-08-31 2011-03-08 엘지전자 주식회사 Mobile terminal and its control method
US20140172953A1 (en) * 2012-12-14 2014-06-19 Rawles Llc Response Endpoint Selection
US20160042735A1 (en) 2014-08-11 2016-02-11 Nuance Communications, Inc. Dialog Flow Management In Hierarchical Task Dialogs
US20160308811A1 (en) * 2015-04-17 2016-10-20 Microsoft Technology Licensing, Llc Communication System Invite Mechanism
US9792336B1 (en) * 2009-12-07 2017-10-17 Google Inc. Generating real-time search results
US20180062691A1 (en) * 2016-08-24 2018-03-01 Centurylink Intellectual Property Llc Wearable Gesture Control Device & Method
US20180182392A1 (en) * 2016-12-23 2018-06-28 Beijing Xiaoniao Tingting Technology Co., LTD. Method for performing voice control on device with microphone array, and device thereof
US20180232203A1 (en) * 2012-11-28 2018-08-16 Google Llc Method for user training of information dialogue system
US20180314552A1 (en) * 2017-04-28 2018-11-01 Samsung Electronics Co., Ltd. Voice data processing method and electronic device supporting the same
US20180374482A1 (en) * 2017-06-21 2018-12-27 Samsung Electronics Co., Ltd. Electronic apparatus for processing user utterance and server
US20190066677A1 (en) 2017-08-22 2019-02-28 Samsung Electronics Co., Ltd. Voice data processing method and electronic device supporting the same
US20190272590A1 (en) * 2018-02-09 2019-09-05 Deutsche Ag Stress testing and entity planning model execution apparatus, method, and computer readable media
CN110209791A (en) 2019-06-12 2019-09-06 百融云创科技股份有限公司 It is a kind of to take turns dialogue intelligent speech interactive system and device more
US20190279620A1 (en) * 2018-03-06 2019-09-12 GM Global Technology Operations LLC Speech recognition arbitration logic
WO2019177377A1 (en) 2018-03-13 2019-09-19 Samsung Electronics Co., Ltd. Apparatus for processing user voice input
US20190295552A1 (en) * 2018-03-23 2019-09-26 Amazon Technologies, Inc. Speech interface device
US20190294630A1 (en) * 2018-03-23 2019-09-26 nedl.com, Inc. Real-time audio stream search and presentation system
WO2019188393A1 (en) 2018-03-29 2019-10-03 ソニー株式会社 Information processing device, information processing method, transmission device and transmission method
US20190325084A1 (en) * 2018-04-20 2019-10-24 Facebook, Inc. Generating Personalized Content Summaries for Users
US20190334764A1 (en) * 2018-04-30 2019-10-31 Splunk Inc. Actionable alert messaging network for automated incident resolution
US20190370413A1 (en) * 2018-06-03 2019-12-05 Apple Inc. Accessing multiple domains across multiple devices for candidate responses
US20190384855A1 (en) * 2018-06-14 2019-12-19 Google Llc Generation of domain-specific models in networked system
US10521189B1 (en) * 2015-05-11 2019-12-31 Alan AI, Inc. Voice assistant with user data context
US20200019419A1 (en) * 2018-07-13 2020-01-16 Microsoft Technology Licensing, Llc Image-based skill triggering
US20200053157A1 (en) * 2007-06-04 2020-02-13 Voice Tech Corporation Using Voice Commands From A Mobile Device To Remotely Access And Control A Computer
US20200097472A1 (en) * 2018-09-21 2020-03-26 Servicenow, Inc. Parsing of user queries in a remote network management platform using extended context-free grammar rules
US20200111487A1 (en) * 2018-10-04 2020-04-09 Ca, Inc. Voice capable api gateway
US20200160852A1 (en) * 2015-07-21 2020-05-21 Amazon Technologies, Inc. Using Audio Input and Output to Interact with Text-Based Interactive Content
US20200160856A1 (en) * 2018-11-15 2020-05-21 International Business Machines Corporation Collaborative artificial intelligence (ai) voice response system control
US20200168211A1 (en) * 2016-05-31 2020-05-28 Huawei Technologies Co., Ltd. Information Processing Method, Server, Terminal, and Information Processing System
US20200175019A1 (en) * 2018-11-30 2020-06-04 Rovi Guides, Inc. Voice query refinement to embed context in a voice query
US20200184977A1 (en) * 2017-08-09 2020-06-11 Lg Electronics Inc. Method and apparatus for calling voice recognition service by using bluetooth low energy technology
US20200211546A1 (en) * 2018-03-14 2020-07-02 Google Llc Generating iot-based notification(s) and provisioning of command(s) to cause automatic rendering of the iot-based notification(s) by automated assistant client(s) of client device(s)
US20200219493A1 (en) * 2019-01-07 2020-07-09 2236008 Ontario Inc. Voice control in a multi-talker and multimedia environment
US20200242184A1 (en) * 2019-01-25 2020-07-30 Ford Global Technologies, Llc Pre-fetch and lazy load results of in-vehicle digital assistant voice searches
US20200275250A1 (en) * 2017-03-17 2020-08-27 Lg Electronics Inc. Method and apparatus for processing audio signal by using bluetooth technology
WO2020175293A1 (en) 2019-02-27 2020-09-03 パナソニックIpマネジメント株式会社 Apparatus control system, apparatus control method, and program
US20200286484A1 (en) * 2017-10-18 2020-09-10 Soapbox Labs Ltd. Methods and systems for speech detection
US20200302122A1 (en) * 2019-03-20 2020-09-24 Promethium, Inc. Ranking data assets for processing natural language questions based on data stored across heterogeneous data sources
US20200302923A1 (en) * 2018-03-08 2020-09-24 Google Llc Mitigation of client device latency in rendering of remotely generated automated assistant content
US20200312324A1 (en) * 2019-03-28 2020-10-01 Cerence Operating Company Hybrid arbitration System
US20200327893A1 (en) * 2016-08-26 2020-10-15 Sony Corporation Information processing device and information processing method
JP2020530580A (en) 2017-10-03 2020-10-22 グーグル エルエルシー Voice user interface shortcuts for assistant applications
US20200342223A1 (en) * 2018-05-04 2020-10-29 Google Llc Adapting automated assistant based on detected mouth movement and/or gaze
US20200372910A1 (en) * 2017-08-17 2020-11-26 Sony Corporation Information processing device, information processing method, and program
US20200379727A1 (en) * 2019-05-31 2020-12-03 Apple Inc. User activity shortcut suggestions
US20200401555A1 (en) * 2019-06-19 2020-12-24 Citrix Systems, Inc. Identification and recommendation of file content segments
US20200404219A1 (en) * 2019-06-18 2020-12-24 Tmrw Foundation Ip & Holding Sarl Immersive interactive remote participation in live entertainment
US20200411004A1 (en) * 2018-03-15 2020-12-31 Beijing Bytedance Network Technology Co., Ltd. Content input method and apparatus
US20210020174A1 (en) * 2019-07-15 2021-01-21 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for voice interaction
US20210035561A1 (en) * 2019-07-31 2021-02-04 Sonos, Inc. Locally distributed keyword detection
US20210035572A1 (en) * 2019-07-31 2021-02-04 Sonos, Inc. Locally distributed keyword detection
US20210043200A1 (en) * 2019-08-07 2021-02-11 International Business Machines Corporation Phonetic comparison for virtual assistants
US20210048930A1 (en) * 2018-05-07 2021-02-18 Alibaba Group Holding Limited Human-machine conversation method, client, electronic device, and storage medium
US20210082420A1 (en) * 2019-03-01 2021-03-18 Google Llc Dynamically adapting assistant responses
US20210118435A1 (en) * 2019-10-21 2021-04-22 Soundhound, Inc. Automatic Synchronization for an Offline Virtual Assistant
US20210124736A1 (en) * 2019-10-25 2021-04-29 Servicenow, Inc. Enhanced natural language processing with semantic shortcuts
US20210126985A1 (en) * 2019-10-23 2021-04-29 Microsoft Technology Licensing, Llc Personalized updates upon invocation of a service
US20210136224A1 (en) * 2019-10-30 2021-05-06 American Tel-A-Systems, Inc. Methods for auditing communication sessions
US20210150772A1 (en) * 2017-06-16 2021-05-20 Honda Motor Co., Ltd. Experience providing system, experience providing method, and experience providing program
US20210174791A1 (en) * 2018-05-02 2021-06-10 Melo Inc. Systems and methods for processing meeting information obtained from multiple sources
US20210174809A1 (en) * 2018-04-12 2021-06-10 Sony Corporation Information processing device, information processing system, and information processing method, and program
US20210200597A1 (en) * 2018-03-16 2021-07-01 Sony Corporation Information processing device, information processing method, and program
US20210210100A1 (en) * 2018-05-29 2021-07-08 Amazon Technologies, Inc. Voice command processing for locked devices
US20210272585A1 (en) * 2018-08-17 2021-09-02 Samsung Electronics Co., Ltd. Server for providing response message on basis of user's voice input and operating method thereof
US20210272563A1 (en) * 2018-06-15 2021-09-02 Sony Corporation Information processing device and information processing method
US11138894B1 (en) * 2016-09-21 2021-10-05 Workday, Inc. Educational learning importation
US20230169956A1 (en) * 2019-05-03 2023-06-01 Sonos, Inc. Locally distributed keyword detection

Patent Citations (78)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002315069A (en) 2001-04-17 2002-10-25 Misawa Homes Co Ltd Remote controller
US20040054998A1 (en) 2002-07-26 2004-03-18 Matsushita Electric Industrial Co., Ltd. Program execution apparatus
US20040073538A1 (en) * 2002-10-09 2004-04-15 Lasoo, Inc. Information retrieval system and method employing spatially selective features
US20200053157A1 (en) * 2007-06-04 2020-02-13 Voice Tech Corporation Using Voice Commands From A Mobile Device To Remotely Access And Control A Computer
US20110004881A1 (en) 2008-03-12 2011-01-06 Nxp B.V. Look-ahead task management
CN101656800A (en) 2008-08-20 2010-02-24 阿鲁策株式会社 Automatic answering device and method thereof, conversation scenario editing device, conversation server
KR20110023570A (en) 2009-08-31 2011-03-08 엘지전자 주식회사 Mobile terminal and its control method
US9792336B1 (en) * 2009-12-07 2017-10-17 Google Inc. Generating real-time search results
US20180232203A1 (en) * 2012-11-28 2018-08-16 Google Llc Method for user training of information dialogue system
US20140172953A1 (en) * 2012-12-14 2014-06-19 Rawles Llc Response Endpoint Selection
US20160042735A1 (en) 2014-08-11 2016-02-11 Nuance Communications, Inc. Dialog Flow Management In Hierarchical Task Dialogs
US20160308811A1 (en) * 2015-04-17 2016-10-20 Microsoft Technology Licensing, Llc Communication System Invite Mechanism
US10521189B1 (en) * 2015-05-11 2019-12-31 Alan AI, Inc. Voice assistant with user data context
US20200160852A1 (en) * 2015-07-21 2020-05-21 Amazon Technologies, Inc. Using Audio Input and Output to Interact with Text-Based Interactive Content
US20200168211A1 (en) * 2016-05-31 2020-05-28 Huawei Technologies Co., Ltd. Information Processing Method, Server, Terminal, and Information Processing System
US20180062691A1 (en) * 2016-08-24 2018-03-01 Centurylink Intellectual Property Llc Wearable Gesture Control Device & Method
US20200327893A1 (en) * 2016-08-26 2020-10-15 Sony Corporation Information processing device and information processing method
US11138894B1 (en) * 2016-09-21 2021-10-05 Workday, Inc. Educational learning importation
US20180182392A1 (en) * 2016-12-23 2018-06-28 Beijing Xiaoniao Tingting Technology Co., LTD. Method for performing voice control on device with microphone array, and device thereof
US20200275250A1 (en) * 2017-03-17 2020-08-27 Lg Electronics Inc. Method and apparatus for processing audio signal by using bluetooth technology
US20180314552A1 (en) * 2017-04-28 2018-11-01 Samsung Electronics Co., Ltd. Voice data processing method and electronic device supporting the same
US20210150772A1 (en) * 2017-06-16 2021-05-20 Honda Motor Co., Ltd. Experience providing system, experience providing method, and experience providing program
US20180374482A1 (en) * 2017-06-21 2018-12-27 Samsung Electronics Co., Ltd. Electronic apparatus for processing user utterance and server
US20200184977A1 (en) * 2017-08-09 2020-06-11 Lg Electronics Inc. Method and apparatus for calling voice recognition service by using bluetooth low energy technology
US20200372910A1 (en) * 2017-08-17 2020-11-26 Sony Corporation Information processing device, information processing method, and program
KR20190021143A (en) 2017-08-22 2019-03-05 삼성전자주식회사 Voice data processing method and electronic device supporting the same
US20190066677A1 (en) 2017-08-22 2019-02-28 Samsung Electronics Co., Ltd. Voice data processing method and electronic device supporting the same
JP2020530580A (en) 2017-10-03 2020-10-22 グーグル エルエルシー Voice user interface shortcuts for assistant applications
US20200286484A1 (en) * 2017-10-18 2020-09-10 Soapbox Labs Ltd. Methods and systems for speech detection
US11699442B2 (en) * 2017-10-18 2023-07-11 Soapbox Labs Ltd. Methods and systems for speech detection
US20190272590A1 (en) * 2018-02-09 2019-09-05 Deutsche Ag Stress testing and entity planning model execution apparatus, method, and computer readable media
US20190279620A1 (en) * 2018-03-06 2019-09-12 GM Global Technology Operations LLC Speech recognition arbitration logic
US20200302923A1 (en) * 2018-03-08 2020-09-24 Google Llc Mitigation of client device latency in rendering of remotely generated automated assistant content
WO2019177377A1 (en) 2018-03-13 2019-09-19 Samsung Electronics Co., Ltd. Apparatus for processing user voice input
US20200211546A1 (en) * 2018-03-14 2020-07-02 Google Llc Generating iot-based notification(s) and provisioning of command(s) to cause automatic rendering of the iot-based notification(s) by automated assistant client(s) of client device(s)
US20200411004A1 (en) * 2018-03-15 2020-12-31 Beijing Bytedance Network Technology Co., Ltd. Content input method and apparatus
US20210200597A1 (en) * 2018-03-16 2021-07-01 Sony Corporation Information processing device, information processing method, and program
US20190294630A1 (en) * 2018-03-23 2019-09-26 nedl.com, Inc. Real-time audio stream search and presentation system
US20190295552A1 (en) * 2018-03-23 2019-09-26 Amazon Technologies, Inc. Speech interface device
WO2019188393A1 (en) 2018-03-29 2019-10-03 ソニー株式会社 Information processing device, information processing method, transmission device and transmission method
KR20200136382A (en) 2018-03-29 2020-12-07 소니 주식회사 Information processing device, information processing method, transmission device, and transmission method
CN111903138A (en) 2018-03-29 2020-11-06 索尼公司 Information processing apparatus, information processing method, transmission apparatus, and transmission method
US20210006862A1 (en) * 2018-03-29 2021-01-07 Sony Corporation Information processing apparatus, information processing method, transmission apparatus, and transmission method
US20210174809A1 (en) * 2018-04-12 2021-06-10 Sony Corporation Information processing device, information processing system, and information processing method, and program
US20190325084A1 (en) * 2018-04-20 2019-10-24 Facebook, Inc. Generating Personalized Content Summaries for Users
US20190334764A1 (en) * 2018-04-30 2019-10-31 Splunk Inc. Actionable alert messaging network for automated incident resolution
US20210174791A1 (en) * 2018-05-02 2021-06-10 Melo Inc. Systems and methods for processing meeting information obtained from multiple sources
US20200342223A1 (en) * 2018-05-04 2020-10-29 Google Llc Adapting automated assistant based on detected mouth movement and/or gaze
US20210048930A1 (en) * 2018-05-07 2021-02-18 Alibaba Group Holding Limited Human-machine conversation method, client, electronic device, and storage medium
US20210210100A1 (en) * 2018-05-29 2021-07-08 Amazon Technologies, Inc. Voice command processing for locked devices
US20190370413A1 (en) * 2018-06-03 2019-12-05 Apple Inc. Accessing multiple domains across multiple devices for candidate responses
US20190384855A1 (en) * 2018-06-14 2019-12-19 Google Llc Generation of domain-specific models in networked system
US20210272563A1 (en) * 2018-06-15 2021-09-02 Sony Corporation Information processing device and information processing method
US20200019419A1 (en) * 2018-07-13 2020-01-16 Microsoft Technology Licensing, Llc Image-based skill triggering
US20210272585A1 (en) * 2018-08-17 2021-09-02 Samsung Electronics Co., Ltd. Server for providing response message on basis of user's voice input and operating method thereof
US20200097472A1 (en) * 2018-09-21 2020-03-26 Servicenow, Inc. Parsing of user queries in a remote network management platform using extended context-free grammar rules
US20200111487A1 (en) * 2018-10-04 2020-04-09 Ca, Inc. Voice capable api gateway
US20200160856A1 (en) * 2018-11-15 2020-05-21 International Business Machines Corporation Collaborative artificial intelligence (ai) voice response system control
US20200175019A1 (en) * 2018-11-30 2020-06-04 Rovi Guides, Inc. Voice query refinement to embed context in a voice query
US20200219493A1 (en) * 2019-01-07 2020-07-09 2236008 Ontario Inc. Voice control in a multi-talker and multimedia environment
US20200242184A1 (en) * 2019-01-25 2020-07-30 Ford Global Technologies, Llc Pre-fetch and lazy load results of in-vehicle digital assistant voice searches
WO2020175293A1 (en) 2019-02-27 2020-09-03 パナソニックIpマネジメント株式会社 Apparatus control system, apparatus control method, and program
US20210082420A1 (en) * 2019-03-01 2021-03-18 Google Llc Dynamically adapting assistant responses
US20200302122A1 (en) * 2019-03-20 2020-09-24 Promethium, Inc. Ranking data assets for processing natural language questions based on data stored across heterogeneous data sources
US20200312324A1 (en) * 2019-03-28 2020-10-01 Cerence Operating Company Hybrid arbitration System
US20230169956A1 (en) * 2019-05-03 2023-06-01 Sonos, Inc. Locally distributed keyword detection
US20200379727A1 (en) * 2019-05-31 2020-12-03 Apple Inc. User activity shortcut suggestions
CN110209791A (en) 2019-06-12 2019-09-06 百融云创科技股份有限公司 It is a kind of to take turns dialogue intelligent speech interactive system and device more
US20200404219A1 (en) * 2019-06-18 2020-12-24 Tmrw Foundation Ip & Holding Sarl Immersive interactive remote participation in live entertainment
US20200401555A1 (en) * 2019-06-19 2020-12-24 Citrix Systems, Inc. Identification and recommendation of file content segments
US20210020174A1 (en) * 2019-07-15 2021-01-21 Baidu Online Network Technology (Beijing) Co., Ltd. Method and apparatus for voice interaction
US20210035572A1 (en) * 2019-07-31 2021-02-04 Sonos, Inc. Locally distributed keyword detection
US20210035561A1 (en) * 2019-07-31 2021-02-04 Sonos, Inc. Locally distributed keyword detection
US20210043200A1 (en) * 2019-08-07 2021-02-11 International Business Machines Corporation Phonetic comparison for virtual assistants
US20210118435A1 (en) * 2019-10-21 2021-04-22 Soundhound, Inc. Automatic Synchronization for an Offline Virtual Assistant
US20210126985A1 (en) * 2019-10-23 2021-04-29 Microsoft Technology Licensing, Llc Personalized updates upon invocation of a service
US20210124736A1 (en) * 2019-10-25 2021-04-29 Servicenow, Inc. Enhanced natural language processing with semantic shortcuts
US20210136224A1 (en) * 2019-10-30 2021-05-06 American Tel-A-Systems, Inc. Methods for auditing communication sessions

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Examination Report of Indian Application No. 202044030604, dated Aug. 27, 2021.
First Office Action of Chinese Application No. 202010017436.7, dated Jun. 14, 2022.
Hao Yang, Microsoft Yiwen Voice Assistant Design, A thesis submitted in partial satisfaction of the Requirements for the degree of Master of Arts in Design in the Graduate School of Hunan University, dated Apr. 6, 2016, 66 pages.
Notice of Reasons for Refusal of Japanese Application No. 2020-125373, dated Sep. 7, 2021.
Notification of Reason for Refusal of Korean Application No. 10-2020-0090663, dated Aug. 31, 2021.
Supplementary European Search Report in European Application No. 201835030, dated Dec. 11, 2020.

Also Published As

Publication number Publication date
ES2952381T3 (en) 2023-10-31
JP7288885B2 (en) 2023-06-08
KR102389034B1 (en) 2022-04-21
CN111243587A (en) 2020-06-05
JP2021110921A (en) 2021-08-02
EP3848801B1 (en) 2023-05-10
EP3848801A1 (en) 2021-07-14
KR20210090081A (en) 2021-07-19
US20210210088A1 (en) 2021-07-08

Similar Documents

Publication Publication Date Title
US11798545B2 (en) Speech interaction method and apparatus, device and storage medium
KR102505597B1 (en) Voice user interface shortcuts for an assistant application
US11874904B2 (en) Electronic device including mode for using an artificial intelligence assistant function of another electronic device
US10503470B2 (en) Method for user training of information dialogue system
US20230410816A1 (en) Dialog management with multiple modalities
CN112967716B (en) Feedback controller for data transmission
KR20200007882A (en) Offer command bundle suggestions for automated assistants
CN113132214B (en) Dialogue method, dialogue device, dialogue server and dialogue storage medium
WO2020253064A1 (en) Speech recognition method and apparatus, and computer device and storage medium
CN115424624B (en) Man-machine interaction service processing method and device and related equipment
JP2023506087A (en) Voice Wakeup Method and Apparatus for Skills
CN119336867A (en) User question answering method and its device, equipment, and medium
CN114970559B (en) Intelligent response method and device
KR102837902B1 (en) Server, method and computer program for predicting intention of user
US11817087B2 (en) Systems and methods for reducing latency in cloud services
CN111770236B (en) Conversation processing method, device, system, server and storage medium
US12443390B2 (en) Electronic device and control method therefor
US20240111848A1 (en) Electronic device and control method therefor
WO2020246969A1 (en) Missed utterance resolutions
CN119441429B (en) Question answering method, system, device, equipment and storage medium
CN120067285A (en) Information query method, device and system
US20210049215A1 (en) Shared Context Manager for Cohabitating Agents
CN119132303A (en) Voice interaction method, device, equipment, storage medium and program product
CN119026605A (en) A data processing method, device, storage medium and computer program product
CN120343508A (en) Short message sending channel selection method, device, equipment and storage medium

Legal Events

Date Code Title Description
AS Assignment

Owner name: BEIJING XIAOMI PINECONE ELECTRONICS CO., LTD., CHINA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:GAO, LUYU;SUN, TIANWEI;MA, BAIMING;REEL/FRAME:053243/0033

Effective date: 20200715

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT RECEIVED

STPP Information on status: patent application and granting procedure in general

Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED

STCF Information on status: patent grant

Free format text: PATENTED CASE