WO2022206704A1 - 一种语音交互方法及电子设备 - Google Patents

一种语音交互方法及电子设备 Download PDF

Info

Publication number
WO2022206704A1
WO2022206704A1 PCT/CN2022/083479 CN2022083479W WO2022206704A1 WO 2022206704 A1 WO2022206704 A1 WO 2022206704A1 CN 2022083479 W CN2022083479 W CN 2022083479W WO 2022206704 A1 WO2022206704 A1 WO 2022206704A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
application
server
interface
voice
Prior art date
Application number
PCT/CN2022/083479
Other languages
English (en)
French (fr)
Inventor
黄益贵
乔登龙
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022206704A1 publication Critical patent/WO2022206704A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72406User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality by software upgrading or downloading
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72436User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for text messaging, e.g. SMS or e-mail
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/72442User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality for playing music files
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of terminal technologies, and in particular, to a voice interaction method and electronic device.
  • voice assistant APPs such as Siri, Xiao Ai, Xiao E, etc.
  • Siri Xiao Ai
  • Xiao E Xiao E
  • the voice assistant APP has a voice connection function when performing voice interaction with the user, thereby supporting multiple rounds of dialogue between the user and the voice assistant APP. For example, after the user wakes up the voice assistant APP in the mobile phone, after the voice assistant APP receives the first voice command of "how is the weather today", it can display the queried first information in the form of cards on the display interface of the mobile phone. Subsequently, if the voice assistant APP receives the second voice command of "what about tomorrow", the voice assistant APP can continue to inquire about the weather information of tomorrow (ie, the second information), and then continue to display the inquired second information on the display of the mobile phone In the interface, the voice connection function is realized.
  • the voice assistant APP can also provide corresponding services to the user by jumping to the application interface of other applications when answering the user's voice command. For example, if the voice assistant APP detects the third voice command of "I want to order takeaway", the voice assistant APP can jump to the search interface of the takeaway APP, so that the user can complete the relevant information related to "I want to order takeout” in the takeaway APP. operation. Generally, when the voice assistant APP jumps to another application, the voice assistant APP will end the current session with the user. At this time, the voice assistant APP can be turned off or switched to run in the background of the mobile phone, so that the subsequent voice assistant APP cannot continue to interact with the user, and the voice connection function cannot be realized, which reduces the user experience during voice interaction.
  • the present application provides a voice interaction method and electronic device, which can continue to implement the voice connection function after the voice assistant APP jumps to other applications, thereby improving the user experience in voice interaction.
  • the present application provides a voice interaction method, comprising: an electronic device displaying a conversation interface of a voice application, where the conversation interface is used to display the content of a conversation between a user and the voice application; After the first voice, the first voice can be converted into the first dialogue content in the form of text; further, the electronic device can obtain the corresponding first link according to the first dialogue content; and jump from the conversation interface of the voice application according to the first link to the interface of the first application; when the mobile phone exits the interface displaying the first application, the electronic device can jump back to the conversation interface according to the session identifier corresponding to the content of the first dialogue. In this way, the user can continue to dialogue with the voice application in the previous session interface, so as to realize the function of continuing the session content during voice interaction, thereby improving the user experience.
  • the method further includes: the electronic device sends a first request message to the first server, where the first request message may include the first dialogue content , so that the first server determines the session identifier of the first dialogue content and the above-mentioned first link in response to the first request message; at this time, the electronic device obtains the first link according to the first dialogue content, including: the electronic device receives the transmission from the first server.
  • the first response message includes the above-mentioned first link and session identifier.
  • the first server may combine the link of the first link (ie, an interface in the first application) with the session identifier of the first dialogue content and send it to the electronic device as the first response message. In this way, the voice application in the electronic device can obtain the session identifier and the first link of the first dialogue content.
  • the electronic device jumps back to the conversation interface according to the session identifier corresponding to the first conversation content, including: the first application running in the electronic device can first launch the voice application; further, in the voice application After being pulled up, the voice application in the electronic device can display the conversation content corresponding to the session identifier according to the session identifier, so as to restore the conversation interface displayed by the voice application before jumping to the first application.
  • launching the voice application by the first application in the electronic device includes: when the electronic device exits the interface displaying the first application, the first application can splicing the link of the voice application and the above-mentioned session identifier into a second link, and then , the first application can start the voice application according to the second link, and at the same time, the above-mentioned session identifier is passed to the voice application through the second link, so that the voice application can resume the session displayed by the voice application before jumping to the first application according to the session identifier interface.
  • the voice application of the electronic device displays the dialogue content corresponding to the session ID, including: after the voice application obtains the session ID through the above-mentioned second link, it can query whether the electronic device stores the dialogue content corresponding to the session ID; If the electronic device stores the dialogue content corresponding to the session ID, the voice application can display the dialogue content corresponding to the session ID in the session interface; if the electronic device does not store the dialogue content corresponding to the session ID, the voice application can The server acquires the conversation content corresponding to the session identifier, and displays the conversation content corresponding to the session identifier in the conversation interface.
  • the above-mentioned first request message may further include a device identifier (such as a UUID, etc.), where the device identifier is used to confirm whether the electronic device logs in to the first application.
  • a device identifier such as a UUID, etc.
  • the above-mentioned first link may include a link to a login page in the first application.
  • the login page of the first application is the service content (or service resource) corresponding to the first dialog content.
  • the first response message returned by the first server may include, in addition to the above-mentioned first link (that is, the link to the login page), information based on the first application when the electronic device does not log in to the first application. The search result of the first conversation content.
  • the first server may query the search result corresponding to the first conversation content based on the logged-in account information, and then carry the search result in the first response message and return it to the electronic device. .
  • the electronic device if the electronic device does not log in to the first application, the electronic device jumps from the conversation interface of the voice application to the interface of the first application according to the first link, including: the voice application of the electronic device according to the first link The link pulls up the first application and displays the login page of the first application. At this time, the electronic device jumps from the conversation interface of the voice application to the interface of the first application.
  • the method further includes: the electronic device receives a login operation input by the user in the login page, where the login operation is used to authorize the electronic device to log in to the first application. User account; at this time, the electronic device jumps back to the session interface according to the session identifier corresponding to the first conversation content, including: if a login success message corresponding to the login operation is received, the electronic device can be triggered to exit the interface displaying the first application , at this time, the electronic device can jump back to the above-mentioned session interface from the interface of the first application according to the session identifier.
  • the method further includes: the electronic device may request the first server to obtain the first service content corresponding to the first session content.
  • the electronic device may request the first server to obtain the first service content corresponding to the first session content.
  • the electronic device can display the first service content in the session interface, so that the user can Recommend more targeted and accurate service resources to improve user experience during voice interaction.
  • the above method further includes: the electronic device acquires second service content corresponding to the first dialog content, and the second service content is when the first application is not logged in The service content corresponding to the first conversation content; at this time, before the electronic device jumps from the conversation interface of the voice application to the interface of the first application according to the first link, the method further includes: displaying the second service content in the conversation interface by the electronic device and a link to the login page in the first app. In this way, the user can also obtain the corresponding second service content in the session interface without logging in.
  • the electronic device jumps from the conversation interface of the voice application to the interface of the first application according to the first link, including: if it is detected that the user selects the link of the login page in the above conversation interface, the electronic device can log in The link of the page jumps from the conversation interface to the login page of the first application; or, if the second voice input by the user is used to instruct to log in to the first application, the electronic device can jump from the conversation interface to the first application according to the link of the login page.
  • the login page of the app including: if it is detected that the user selects the link of the login page in the above conversation interface, the electronic device can log in The link of the page jumps from the conversation interface to the login page of the first application; or, if the second voice input by the user is used to instruct to log in to the first application, the electronic device can jump from the conversation interface to the first application according to the link of the login page.
  • the login page of the app is the user selects the link of the login page in the above conversation interface
  • the method further includes: the electronic device can start a preset timer; when the timer times out, The electronic device jumps back to the session interface from the interface of the first application. That is to say, after the electronic device jumps from the voice application to the first application, it can stay in the first application for a preset time, and when the timer expires, it can automatically jump back to the conversation interface of the voice application. Voice interaction continues in the interface.
  • the method further includes: the electronic device can switch the voice application to run in the background.
  • the present application provides a voice interaction method, including: a first server can receive a first request message sent by an electronic device, and the first request message includes a first dialogue content; in response to the first request message, the first server The session identifier and the first link of the first conversation content can be obtained; furthermore, the first server sends a first response message to the electronic device, where the first response message includes the session identifier and the first link, so that the electronic device can obtain the session identifier according to the session identifier. Jumping from the interface of the first application corresponding to the first link back to the interface of the voice application displaying the content of the first dialogue, so as to realize the voice connection function.
  • acquiring the session identifier and the first link of the first dialog content by the first server includes: the first server assigns the session identifier to the first dialog content; the first server obtains the session identifier from the second server of the first application. to obtain the first link corresponding to the first dialogue content.
  • assigning a session identifier to the first dialogue content by the first server includes: the first server identifies the semantics of the first dialogue content, for example, extracting slot information and intention of the first dialogue content; and then , the first server may assign a session identifier to the first dialogue content according to the semantics of the first dialogue content.
  • the electronic device may also assign a session identifier to the first dialogue content, and in this case, the electronic device does not need to acquire the session identifier of the current dialogue content from the first server.
  • the first server acquires the first link corresponding to the first service content from the second server of the first application, including: the first server sends a first message to the second server, the first message can include the semantics of the above-mentioned session identifier and the first dialogue content, so that the second server determines the first link according to the semantics of the first dialogue content, thereby establishing the correspondence between the first link and the session identifier; further, the first server A second message sent by the second server can be received, where the second message includes the session identifier and the first link, that is, the first server obtains the first link by interacting with the second server (the server of the first application).
  • the above-mentioned first request message may include the device identifier of the electronic device; in this case, the first message from the first server to the second server may also carry the above-mentioned device identifier, so that the second server can Whether the electronic device logs in to the first application is determined according to the device identification; if the electronic device does not log in to the first application, the second server may determine the link of the login page in the first application as the first link.
  • the method further includes: the first server receives a session restoration message sent by the electronic device, where the session restoration message includes the session of the first dialog content ID and device ID; in response to the session resumption message, the first server can query the second server whether the electronic device logs in to the first application according to the above device ID; if the electronic device logs in the first application, the first server can log in from the second server
  • the first service content corresponding to the first dialog content is obtained in the middle of the system, and the first service content is associated with the account information of the user after logging in the first application; the first server sends the first service content to the electronic device for display.
  • the first response message may further include second service content, and the second service content is the content corresponding to the first dialogue content when the electronic device is not logged into the first application. Service Content.
  • the user can also obtain the corresponding second service content in the session interface without logging in.
  • the above-mentioned first response message may further include a timeout time of the first link.
  • a timeout time of the first link when the electronic device displays the interface of the first application, a corresponding timer can be set according to the timeout period, and when the timer expires, the electronic device is triggered to automatically jump back to the conversation interface of the voice application.
  • the present application provides a voice interaction method, including: an electronic device can display a conversation interface of a voice application, where the conversation interface is used to display the content of a conversation between a user and the voice application; the electronic device detects a first voice input by the user Then, the first voice can be converted into the first dialogue content; further, the electronic device sends a first request message to the first server, where the first request message includes the above-mentioned first dialogue content; in response to the first request message, the first server The session identifier and the first link of the first conversation content can be obtained; the session identifier and the first link can be carried in the first response message and returned to the electronic device; further, the electronic device can jump from the session interface to The interface of the first application; after displaying the interface of the first application, the electronic device can jump back to the session interface according to the above session identifier.
  • the voice application when the user interacts with the voice application, the voice application, the first application and each server can use the session identifier of the current dialogue content as a carrying parameter to be passed in each interaction process.
  • the server provides the voice application with the service resource corresponding to the conversation content, the corresponding relationship between the service resource and the corresponding session identifier can be established, so that when the voice application jumps to the third-party application that provides the above-mentioned service resource, the third-party application
  • the service resource and the corresponding session ID can also be obtained, so that when the third-party application exits, the voice application can be restarted according to the session ID and the corresponding session interface can be restored, so that the voice application can still jump to the third-party application after jumping to the third-party application. Switch back to the conversation interface to implement the voice connection function and improve the user experience.
  • acquiring the first link by the first server includes: the first server acquiring the first link from the second server according to the content of the first dialog, and the second server is a server corresponding to the first application.
  • the first request message also includes a device identification of the electronic device; at this time, the first server may also send the device identification to the second server; the second server may determine whether the electronic device is based on the device identification Log in to the first application; if the electronic device is not logged in to the first application, the above-mentioned first link includes a link to the login page in the first application.
  • the method further includes: in response to a login operation input by the user on the interface of the first application, the electronic device sends a request to the first application.
  • a server sends a login request, and the login request includes a device identification; in response to the login request, the first server further requests the second server to mark the device identification as a login state; further, the first server sends a login success message to the electronic device;
  • the electronic device jumping back to the session interface according to the session identifier includes: in response to the login success message, the electronic device jumps back to the session interface from the interface of the first application according to the session identifier.
  • the method further includes: the electronic device sends a session resumption message to the first server, where the session resumption message includes the session identifier and Device identification; if the electronic device logs into the first application, the first server can respond to the session recovery message, obtain the first service content corresponding to the first conversation content from the second server, and send the first service content to the electronic device , at this time, the first service content is associated with the account information of the user after logging in to the first application; subsequently, the electronic device may display the first service content in the session interface.
  • the above method further includes: the first server obtains second service content from the second server, and the second service content is the same as the first application when the first application is not logged in.
  • a service content corresponding to the conversation content then, before the electronic device jumps from the conversation interface to the interface of the first application according to the first link, the method further includes: the electronic device can display the second service content in the conversation interface.
  • the electronic device jumps back to the session interface according to the session identifier, including: the first application in the electronic device pulls up the voice application according to the second link, and the second link includes the link of the voice application and the session identifier; After the voice application is pulled up, the voice application in the electronic device may display the dialogue content corresponding to the session identifier.
  • the present application provides an electronic device, comprising: a memory, a display screen and one or more processors; the memory, the display screen and the processor are coupled.
  • the memory is used to store computer program codes, and the computer program codes include computer instructions; when the electronic device is running, the processor is used to execute one or more computer instructions stored in the memory, so that the electronic device executes any of the above The described voice interaction method.
  • the present application provides a server, comprising: a processor, a memory, a communication module, and one or more computer programs; wherein the processor is coupled to the communication module and the memory, and the one or more computer programs are stored in In the memory, when the server is running, the processor executes one or more computer programs stored in the memory, so that the server executes the voice interaction method described in any one of the above.
  • the present application provides a voice interaction system, including the electronic device and the server, and the electronic device and the server can perform the voice interaction method described in any one of the third aspects when interacting with the server.
  • the present application provides a computer storage medium, comprising computer instructions, when the computer instructions are run on an electronic device (or server), the electronic device (or server) executes the voice interaction method described in any one of the above .
  • the present application provides a computer program product that, when the computer program product runs on an electronic device (or server), enables the electronic device (or server) to execute any of the voice interaction methods described above.
  • FIG. 1 is a schematic structural diagram of a voice interaction system provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of voice interaction between an electronic device and a server in the prior art
  • FIG. 3 is a schematic flowchart of voice interaction between an electronic device and a server according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram 1 of an electronic device according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of an interaction flow of a voice interaction method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram 1 of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • FIG. 7 is a second schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram 3 of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • FIG. 9 is a fourth schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • FIG. 10 is a schematic diagram 5 of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • FIG. 11 is a sixth schematic diagram of an application scenario of a voice interaction method provided by an embodiment of the present application.
  • FIG. 12 is a second schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Each voice input of the user generally corresponds to an intention of the user.
  • Intent is a collection of one or more expressions, such as "I want to watch a movie” and “I want to watch an action movie shot by Andy Lau in 2001" can both belong to the same video playback intent.
  • a skill can override one or more intents.
  • a developer of a food delivery APP can create a skill 1 named "ordering food delivery" in a voice platform (or called a voice open platform) provided by an electronic device manufacturer.
  • Skill 1 can cover multiple intents such as search, payment, and navigation. Subsequently, after the electronic device receives the voice input 1, if it recognizes that the voice input 1 contains a keyword related to "order takeout", it can be determined that the voice input 1 is associated with the skill 1, which can also be referred to as the voice input 1 hitting the skill 1.
  • the slot information refers to the key information used to express the intention in the user's voice input, and the slot information directly determines whether the electronic device (or server) can match the correct intention.
  • a slot corresponds to a keyword of a type of attribute, and the information in the slot (ie, slot information) can be filled with keywords of the same type.
  • the query pattern corresponding to the intent to play a song could be "I want to hear ⁇ song ⁇ of ⁇ singer ⁇ ". Among them, ⁇ singer ⁇ is the singer's slot, and ⁇ song ⁇ is the song's slot.
  • the electronic device can extract the slot information in the slot ⁇ singer ⁇ from the voice input: Faye Wong , the slot information in the ⁇ song ⁇ slot is: red bean. In this way, the electronic device (or server) can identify, according to the two slot information, that the user's intention of the voice input this time is to play a song.
  • a session process may include one or more rounds of dialogue between the user and the electronic device. Each session can be identified by a corresponding session identifier.
  • the session identifier may be sessionId or dialogId or the like.
  • one session may correspond to one skill. For example, when the user's voice input hits the takeaway skill, the electronic device can assign sessionId 1 to the session process. Subsequent conversations between the user and the electronic device under the takeaway skill belong to session 1, and the session IDs are all session IDs. is sessionId 1.
  • the electronic device When the user performs voice interaction with the electronic device, after each time the electronic device receives the user's voice input, it can interact with the server to determine the specific skill associated with the current voice input. Further, based on the determined specific skills, the server can extract the intention and slot information corresponding to the voice input this time, and instruct the electronic device to execute the operation instruction corresponding to the intention and the slot information this time.
  • the operation instruction may be an instruction such as displaying a card, playing a voice, jumping to other applications, or controlling a smart home device, so as to complete the response to the user's voice input this time.
  • the voice interaction system 100 may include an electronic device 101 and at least one server 102 .
  • a voice assistant APP (also referred to as a voice APP, a voice assistant or a smart voice, etc.) may be installed in the electronic device 101 for performing voice interaction with the user.
  • the user can wake up the voice assistant APP in the electronic device 101 by inputting a preset wake-up language (eg, "Hello Little E”, "Xiaoyi Xiaoyi", “hi Siri”, etc.).
  • a preset wake-up language eg, "Hello Little E”, "Xiaoyi Xiaoyi”, “hi Siri”, etc.
  • the voice assistant APP can have a conversation with the user.
  • the user can have one or more rounds of conversations with the voice assistant APP.
  • the user can input the corresponding voice input (also called voice command, voice input, etc.) into the voice assistant APP, trigger the voice assistant APP to recognize the voice input, and provide corresponding service resources, thus completing a round dialogue.
  • the voice assistant APP may receive the user's voice input.
  • the voice assistant APP can convert the above-mentioned voice input into corresponding text (ie, dialogue content) through the ASR (Automatic Speech Recognition, automatic speech recognition for short) technology.
  • the voice assistant APP may send the conversation content to the server 102 .
  • the electronic device 101 may also directly send the received voice input to the server 102, and the server 102 converts the voice input into corresponding dialogue content.
  • the server 102 can use a preset natural language understanding (NLU) algorithm to extract the dialogue content in the current dialogue. intent and slot information. Furthermore, in step S205, the server 102 may send the service resource (also referred to as service content) or the link of the service resource corresponding to the intent and slot information extracted this time to the voice assistant APP. For example, when the intention is to query the weather, the server 102 may send the corresponding weather information to the voice assistant APP of the electronic device 101 as a service resource.
  • NLU natural language understanding
  • the server 102 may send URLs corresponding to singer A and song B to the electronic device 101 .
  • the server 102 may send a deeplink (deep link) of the search page in the takeout APP to the electronic device 101 .
  • the first server may be used to extract intent and slot information in the conversation content.
  • the first server may acquire corresponding service resources from the second server according to the extracted intent and slot information.
  • the voice assistant APP can jump to the food delivery APP according to the deeplink.
  • the electronic device 101 closes or switches the voice assistant APP to run in the background, resulting in the end of this voice interaction process (ie, step S207 ).
  • the electronic device 101 cannot jump back to the voice assistant APP to continue the previous session, thereby reducing the continuity of multiple rounds of sessions during voice interaction.
  • the voice assistant APP of the electronic device 101 can receive the user's voice input. Furthermore, in step S302, the voice assistant APP may convert the received voice input into corresponding dialogue content. In step S303, the voice assistant APP may send to the server 102 the voice input (or the dialogue content in the form of text) received each time and the session identifier of the current session. Subsequently, in step S304, the server 102 may extract the corresponding intention and/or slot information from the dialogue content, so as to query the service resource corresponding to the voice input this time.
  • the server 102 finds that the service resource corresponding to the voice input this time is a deeplink of another application (such as a food delivery APP), then in step S305, the server 102 can send the above session identifier and the deeplink (ie, the service resource) together to the electronic Voice assistant APP in device 101 . Furthermore, in step S306, the voice assistant APP may jump to the interface of the food delivery APP according to the received deeplink, and also send the corresponding session identifier to the food delivery APP.
  • the service resource corresponding to the voice input this time is a deeplink of another application (such as a food delivery APP)
  • the server 102 can send the above session identifier and the deeplink (ie, the service resource) together to the electronic Voice assistant APP in device 101 .
  • the voice assistant APP may jump to the interface of the food delivery APP according to the received deeplink, and also send the corresponding session identifier to the food delivery APP.
  • the electronic device 101 can jump back to the voice assistant APP through the displayed interface of the takeaway APP, and transmit the corresponding session ID to the voice assistant APP , so that the voice assistant APP can display the corresponding conversation interface according to the conversation identifier.
  • the user can continue to talk with the voice assistant APP in the previous session interface, so as to realize the function of continuing the session content during voice interaction, thereby improving the user experience.
  • the above-mentioned electronic device 101 may specifically be a mobile phone, a speaker, an in-vehicle device (also referred to as a car device), a tablet computer, a notebook computer, an ultra-mobile personal computer (UMPC), a handheld computer, or a netbook. , a personal digital assistant (personal digital assistant, PDA), a wearable electronic device, a virtual reality device, and other electronic devices with a voice interaction function, which are not limited in this embodiment of the present application.
  • PDA personal digital assistant
  • FIG. 4 shows a schematic structural diagram of the electronic device 101 .
  • the electronic device 101 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a microphone 170B, sensor module 180, etc.
  • a processor 110 an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a microphone 170B, sensor module 180, etc.
  • USB universal serial bus
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc. Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may process one or more of the following operations: converting speech into text, recognizing the speech input by the user, responding to the user's speech, and sending the content obtained from the server to the display screen for display , jump from one app to another based on a link, etc.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G etc. applied on the electronic device 101.
  • the mobile communication module 150 may include one or more filters, switches, power amplifiers, low noise amplifiers (LNAs), and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the same device as at least part of the modules of the processor 110 .
  • the wireless communication module 160 can provide applications on the electronic device 101 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellites Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), and infrared technology (IR).
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating one or more communication processing modules.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through the antenna 2 .
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 101 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store one or more computer programs including instructions.
  • the processor 110 may execute the above-mentioned instructions stored in the internal memory 121, thereby causing the electronic device 101 to execute the method for intelligently recommending contacts provided in some embodiments of the present application, as well as various functional applications and data processing.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the stored program area may store the operating system; the stored program area may also store one or more application programs (such as gallery, contacts, etc.) and the like.
  • the storage data area may store data (such as photos, contacts, etc.) created during the use of the electronic device 101 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, universal flash storage (UFS), and the like.
  • the processor 110 causes the electronic device 101 to execute the voice provided in the embodiments of the present application by executing the instructions stored in the internal memory 121 and/or the instructions stored in the memory provided in the processor. Interaction methods, as well as various functional applications and data processing.
  • the electronic device 101 may implement audio functions through an audio module 170, a speaker 170A, a microphone 170B, an application processor, and the like. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 101 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the microphone 170B also referred to as “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound through the human mouth close to the microphone 170B, and input the sound signal into the microphone 170B.
  • the electronic device 101 may be provided with one or more microphones 170B. In other embodiments, the electronic device 101 may be provided with two microphones 170B, which may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 101 may further be provided with three, four or more microphones 170B to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the speaker 170A, the microphone 170B and the audio module 170 may be used to implement voice interaction with the user, such as receiving the user's voice, or responding to the user's operation through voice, and the like.
  • the sensor 180 may include a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc. This does not impose any restrictions.
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 101 .
  • the electronic device 101 may include more or less components than shown, or some components may be combined, or some components may be separated, or different component arrangements.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the electronic device 101 may further include one or more devices such as a GPU, a display screen, a camera, and a button, which is not limited in this embodiment of the present application.
  • the electronic device 101 when the above-mentioned electronic device 101 is a mobile phone, the electronic device 101 may also include one or more devices such as GPU, display screen, camera, headphone jack, buttons, battery, motor, indicator, and SIM card interface.
  • the application examples do not impose any limitation on this.
  • a mobile phone is used as the above-mentioned electronic device 101 for illustration.
  • FIG. 5 is a schematic flowchart of a voice interaction method provided by an embodiment of the present application. As shown in Figure 5, the voice interaction method may include:
  • the voice assistant APP of the mobile phone receives the first voice input input by the user.
  • the voice assistant APP can be woken up by inputting a wake-up word or the like.
  • the phone detects that the user has entered the correct wake word, it can start running the voice assistant APP.
  • the mobile phone can display a conversation interface 601 between the voice assistant APP and the user.
  • the mobile phone detects that the user inputs a preset operation to wake up the voice assistant APP (eg, long press the power button, wake up with a voice wake-up word)
  • the mobile phone can run the voice assistant APP and display the conversation interface 601 shown in FIG. 6 .
  • the voice assistant APP can call the microphone (or microphone array) of the mobile phone to detect the user's voice input (also called voice, voice command or corpus, etc.), so as to conduct one or more rounds with the user. round of dialogue.
  • the first voice input input by the user may be the voice of "I want takeout”.
  • the voice assistant APP of the mobile phone converts the first voice input into the first dialogue content.
  • the ASR service can be set in the mobile phone.
  • the voice assistant APP can call the ASR service to convert the first voice input into corresponding text content (ie, the first dialogue content).
  • the voice assistant APP can obtain the corresponding first dialogue content "I want takeout” in the form of text through the ASR service in the mobile phone.
  • the voice assistant APP can display the first conversation content 602 in the form of conversation in the conversation interface 601 .
  • the ASR service can also be set on other devices, such as a server or other terminal device, and the mobile phone can obtain the content after speech-to-text, such as the first dialogue content, by interacting with the device with the ASR service.
  • This embodiment of the present invention does not specifically limit this.
  • the voice assistant APP can also obtain data such as current location information, camera data, or health data of the mobile phone. For example, if the first conversation content hits skills such as navigation or ordering takeout, the mobile phone can also obtain current location information. For another example, if the first conversation content hits skills such as video shooting, the mobile phone can also acquire current camera data.
  • the mobile phone sends a first request message to the first server, where the first request message includes the above-mentioned first conversation content.
  • the session to which the first dialogue content belongs can also be determined by interacting with the dialogue manager (Dialog Manager, DM), thereby.
  • DM Dialog Manager
  • the session identifier can be used to jump back to the voice assistant APP and restore the above-mentioned first conversation content after the subsequent mobile phone jumps from the voice assistant APP to other applications.
  • the above-mentioned dialog manager may be provided in the first server.
  • the first server may be one or more servers that provide network services to the voice assistant APP.
  • the voice assistant APP of the mobile phone can send a first request message to the first server, where the first request message includes the above-mentioned first conversation content.
  • the first server may use a preset NLU algorithm to identify the semantics of the first dialogue content, and then assign a corresponding session identifier to the first dialogue content according to the semantics of the first dialogue content.
  • the DM in the first server may determine a skill (skill) hit by the first dialog content according to the semantics of the first dialog content, and then assign a session identifier according to the hit skill.
  • the DM can determine the specific skills hit by each dialogue content by extracting keywords in the dialogue content. If the dialogue content 1 and the dialogue content 2 hit the same skill, the dialogue content 1 and the dialogue content 2 belong to the same session, and at this time, the session identifiers of the dialogue content 1 and the dialogue content 2 are the same.
  • the above-mentioned conversation manager may be provided in the mobile phone.
  • the voice assistant APP of the mobile phone can use the preset NLU algorithm to identify the semantics of the first conversation content, and then request the DM in the mobile phone to define the first conversation content according to the semantics of the first conversation content Assign the corresponding session ID.
  • the mobile phone can send the first conversation content and the session identifier of the first conversation content to the server, for example, the mobile phone sends both the first conversation content and the session identifier of the first conversation content in the first request message to the first server.
  • the voice assistant APP of the mobile phone may also carry the device identification of the mobile phone in the first request message and send it to the first server.
  • the device identifier can be UUID (Universally Unique Identifier, Universal Unique Identifier), UDID (Unique Device Identifier, Device Unique Identifier), IMEI (International Mobile Equipment Identity, International Mobile Equipment Identity), SN (Serial Number, product serial number), UID (User Identification, user identification) or OpenID, etc.
  • the device identifier can be used to determine whether the user has logged into the account of the relevant application when subsequently acquiring the service resource (also referred to as the service content) corresponding to the first dialogue content.
  • the voice assistant APP of the mobile phone may also carry data such as the current location information, camera data, or health data of the mobile phone in the first request message and send it to the first server, which is not done in this embodiment of the present application. limit.
  • the first server determines the first service resource corresponding to the above-mentioned first dialog content.
  • the first server can extract the intent and/or slot information in the first conversation content through a preset NLU algorithm, that is, identify the content of the first conversation. semantics. Taking the first dialogue content as "I want takeout” as an example, the first server can extract the intention of the first dialogue content as order-takeout through the preset NLU algorithm. At this time, the first dialogue content is not clear. slot information. For another example, when the first conversation content is "I want to listen to Jay Chou's song", the first server can extract the intention of the first conversation content through the preset NLU algorithm: play the song, and the slot information corresponding to the singer For: Jay Chou. Subsequently, the first server may query the first service resource (ie, the first service content) corresponding to the first dialogue content according to the extracted intent and/or slot information.
  • the first service resource ie, the first service content
  • the first service resource (or the first service content) may be a specific video file or audio file, or may be a search result for a certain content, a page (such as a login page), or a function in an application, etc.
  • the application examples do not impose any limitation on this.
  • some device manufacturers may set up their own service open platforms, which are used to access services provided by developers of third-party applications to various devices of the device manufacturers themselves for use.
  • Huawei has set up HAG (Huawei Ability Gallery, Huawei Smart Service Open Platform), and through the HAG, it manages the access process of services provided by third-party application developers to Huawei equipment in a unified manner.
  • the device manufacturer may also inform the developer of the third-party application of the link to the voice assistant APP in the device manufacturer's own device through its own service development platform, or other settings or agreements. In this way, the third-party application developer can preset the link in the third-party application. After the device manufacturer's own device, such as a mobile phone, jumps from the voice assistant application to the interface of the third-party application, the third-party application can use the link to let the device launch the voice assistant APP.
  • the first server after the first server extracts the intent and/or slot information of the first conversation content, it can send a first message to the service open platform of the mobile phone to trigger the service open platform Determine the third-party application associated with the content of this first conversation.
  • the first server may send a first message to the HAG, where the first message includes the intent extracted from the first message: order-takeout.
  • the HAG can determine that the third-party application associated with the content of the first conversation this time is a food delivery APP (for example, a Meituan APP, etc.).
  • the first service resource corresponding to the first conversation content is provided by the server (eg, the second server) of the food delivery APP. Further, the HAG may notify the first server to obtain the first service resource corresponding to the first conversation content from the second server of the food delivery APP. Subsequently, the first server may obtain the corresponding first service resource from the second server by performing the following step S505 according to the intent and/or slot information of the first dialogue content and the session identifier of the first dialogue content.
  • the HAG can determine the first message associated with the content of the first conversation according to the intent carried in the first message.
  • the third-party application is a food delivery APP, and further, the HAG can directly interact with the second server of the food delivery APP, and obtain the first service resource corresponding to the first conversation content from the second server.
  • the first server determines the third-party application associated with the content of the first dialogue by interacting with the HAG as an example. It can be understood that, the first server may also determine the third-party application associated with the content of the first dialogue this time in other ways, which is not limited in this embodiment of the present application.
  • the first server sends a second request message to the second server, where the second request message includes the session identifier to which the first conversation content belongs.
  • the first server may send a second request message to the second server of the food delivery APP to request to obtain the food delivery APP.
  • a first service resource (or a link to the first service resource) corresponding to a conversation content.
  • the second request message may include intent and/or slot information extracted from the first dialogue content, and a session identifier of the first dialogue content.
  • the HAG may directly send the above-mentioned second request message to the second server of the take-out APP to request to obtain the first content corresponding to the first conversation content.
  • the second request message may include intent and/or slot information extracted from the first dialogue content, and a session identifier of the first dialogue content.
  • the second server can obtain the session identifier of the current conversation content (ie, the first conversation content) through the second request message, so that the second server of the first application can subsequently establish the first A correspondence between a session identifier of a dialog content and a service resource of the first dialog content.
  • the second server In response to the second request message, the second server sends the first service resource (or the link of the first service resource) to the voice assistant APP of the mobile phone through the first server.
  • the second server may determine the corresponding service resource, that is, the first service resource, according to the intent and/or slot information carried in the second request message. For example, when the intention is to listen to a song and the slot information is a nunchuck, the second server may determine that the corresponding service resource is the audio resource of the song "Nunchwan". For another example, when the intention is to order takeout, the second server may determine that the corresponding service resource is the real-time takeout search result in the takeout APP.
  • the first service resource may directly send the first service resource to the voice assistant APP of the mobile phone through the first server, or may send the link of the first service resource to the voice assistant APP of the mobile phone through the first server, so that the voice assistant APP Acquire the corresponding first service resource according to the connection.
  • the second request message received by the second server may further include a device identifier of the mobile phone, such as a UUID.
  • the second server may first determine whether the user has logged in to the food delivery APP according to the UUID carried in the second request message.
  • the food delivery APP can send the corresponding UUID to the second server, and the second server can mark the UUID as a logged-in state.
  • the second server may assign a token corresponding to the UUID so that the UUID is marked as a login state.
  • the second server can query whether the token corresponding to the UUID carried in the second request message is stored. If the corresponding token is stored, it means that the user has logged in to the food delivery APP; if the corresponding token is not stored, it means that the user has not logged in to the food delivery APP.
  • This embodiment of the present invention does not specifically limit how the second server determines whether the user logs in to the food delivery APP according to the device identifier.
  • the second server may obtain account information such as coupons, historical orders, taste preferences, etc. in the user's account. At this time, the second server may determine, based on the information, the first service resource corresponding to the intention of ordering takeout in the first dialogue content. For example, if the user has a KFC coupon, the first service resource may preferentially include the KFC search result. For another example, if the user has a KFC coupon, the second server may also carry information such as the discounted price in the first service resource. In this way, the second server can combine the user's logged-in account information to determine more targeted and accurate service resources for the user, thereby improving the user's use experience during voice interaction.
  • account information such as coupons, historical orders, taste preferences, etc.
  • the second server may use the login page in the food delivery APP as the first service resource corresponding to the content of the first conversation this time.
  • the second server can send the link of the login page in the food delivery APP (that is, the link of the first service resource) to the voice assistant APP of the mobile phone, so that the user can obtain more accurate and richer information through the voice assistant APP after logging in to the food delivery APP. service resources.
  • the second server may first send the first service resource or the link of the first service resource to the first server, and then the first server may send the first service resource or the link of the first service resource to the voice assistant of the mobile phone.
  • the link to the first service resource may be a link to an H5 (Hyper Text Markup Language, HTML5) page, a link to a relevant page in a quick application, or a link to a relevant page in a third-party application, and these links may be collectively referred to as deeplinks.
  • the link of the first service resource may further include the session identifier of the first dialogue content, thereby establishing a correspondence between the first service resource of the first dialogue content and the session identifier of the first dialogue content.
  • the deeplink of the first service resource also includes the session identifier of the first dialog content.
  • the second server can send the link 1 to the mobile phone through the first server, and the mobile phone can send the link 1 to the voice assistant APP.
  • the link 1 of the above-mentioned first service resource may further include a timeout period of the link 1, for example, 2s.
  • the timeout period can be used to trigger the mobile phone to jump back to the voice assistant APP from the interface of the application corresponding to link 1 after the timer reaches or exceeds the timeout period.
  • the voice assistant APP of the mobile phone jumps to the first application according to the above link.
  • the link obtained by the voice assistant APP of the mobile phone may include the deeplink of the above login page, and further, the voice assistant APP may be launched according to the deeplink of the login page Take-out APP, run the take-out APP in the front desk of the mobile phone.
  • the mobile phone can switch the voice assistant APP to run in the background of the mobile phone.
  • the voice assistant APP when the voice assistant APP is switched to the background, the content of the conversation between the current user and the voice assistant APP can be stored in a preset database, so as to prevent the voice assistant APP from being switched to the background or being killed (killed). The content of the conversation is lost.
  • the above-mentioned preset database can store the correspondence between the conversation content, conversation time, and conversation identifiers between the user and the voice assistant APP in a recent period of time (for example, 10 minutes).
  • Table 1 may also include context information such as a device identifier, a skill ID corresponding to the conversation content, and the like, which is not limited in this embodiment of the present application.
  • the voice assistant APP can restore the content of the conversation between the user and the voice assistant APP according to the specific session identifier according to Table 1.
  • the DM of the first server can also maintain the context information between different users and their voice assistant APP.
  • the context information can include the content of each round of dialogue and information such as session identifiers.
  • the voice assistant APP can also be based on the information recorded in the DM.
  • the session identification and context information restore the content of the conversation between the user and the voice assistant APP.
  • the mobile phone displays the interface of the first application corresponding to the above link.
  • step S508 after the voice assistant APP jumps to the takeaway APP according to the deeplink in the link, as shown in FIG. 7 , the takeaway APP can display the login in the takeaway APP Page 701.
  • the user can enter the authorization operation of authorizing the voice assistant APP or the account information of authorizing other logged-in APPs to log in to the food delivery APP in the login page 701 .
  • the authorization operation may be an operation of clicking the authorization button 702 in the login page 701 .
  • the login page 701 of the food delivery APP may also prompt the user to log in to the food delivery APP by entering an account and password, etc., which is not limited in this embodiment of the present application.
  • the takeaway APP After the takeaway APP detects that the user has input an authorization operation, it can send an authorization message to the second server of the takeaway APP, and the authorization message can include the device identification of the mobile phone, such as a UUID. Further, the second server may mark the UUID as a login state in response to the authorization message. For example, the second server may assign a token corresponding to the UUID so that the UUID is marked as a login state. Further, the second server may send a response message of successful login to the food delivery APP of the mobile phone. At this time, as shown in FIG. 8 , the food delivery APP may display a login success message 703 in the login page 701 in response to the response message. Of course, the takeaway APP may not display the login success message 703 .
  • the mobile phone can also trigger the voice assistant to pull up and display the display interface of other applications according to the above method, which is not limited in this embodiment of the present application.
  • the mobile phone may also jump back to the conversation interface of the voice assistant APP from the interface displaying the first application.
  • the mobile phone may exit the interface displaying the first application.
  • the mobile phone when the user switches the food delivery APP (ie, the first application) to run in the background, the mobile phone can exit the interface displaying the first application.
  • the mobile phone detects that the user clicks the back button when the interface of the takeaway APP is displayed, the mobile phone may exit the interface displaying the first application.
  • the food delivery APP after receiving the message that the user has successfully logged in, the food delivery APP can also automatically exit the interface displaying the first application.
  • a timeout period may be set in the link 1 of the first service resource obtained by the food delivery APP.
  • a food delivery APP can start a timer whose duration is the above-mentioned timeout period. At this time, after the mobile phone jumps from the voice assistant APP to the food delivery APP, if no operation input by the user to the food delivery APP is received within the above-mentioned timeout period, the interface of the food delivery APP can automatically exit the display.
  • the takeout APP can automatically exit after running the above timeout period. This embodiment of the present application does not limit the situation that the mobile phone exits the interface displaying the first application.
  • the takeaway APP can trigger the voice assistant APP to display the corresponding session according to the session identifier interface.
  • the deeplink 1 of the conversation interface of the voice assistant APP is: hvoice://com.huawei.vassistant/diaglog.
  • the first application such as a food delivery APP, may be preset with the above-mentioned deeplink1 of the voice assistant APP to realize interface jumping.
  • the takeaway APP can launch the voice assistant APP by calling the above-mentioned deeplink 2, and transmit the session identifier of the first dialogue content to the voice assistant APP through deeplink 2.
  • the take-out APP displays a message that the login is successful, or when the interface displayed by the take-out APP times out, the mobile phone may be triggered to exit the interface displaying the take-out APP.
  • the takeaway APP can pull up the voice assistant APP through the above deeplink 2, thereby passing the session ID carried in the deeplink 2 to the voice assistant APP.
  • the first application is preset with a link to the application that needs to be jumped (for example, the deeplink1 of the above voice assistant APP), which can be the H5 page corresponding to the first application, the quick application or the locally installed application preset with the need to jump.
  • the link of the voice application to be transferred is not specifically limited in this embodiment of the present invention.
  • the first application may also not preset the link of the application that needs to be jumped (for example, deeplink1 of the above-mentioned voice assistant APP).
  • the first application can be obtained from the server corresponding to the first application before the jump, or obtained from the application that needs to be jumped (for example, when jumping from the voice assistant APP to the first application or after, obtaining the voice assistant APP from the voice assistant APP. the deeplink1). This embodiment of the present invention does not specifically limit this.
  • the voice assistant APP After the voice assistant APP is pulled up, it can run in the foreground of the mobile phone. At this time, the voice assistant APP can obtain the corresponding conversation content according to the session ID transmitted by the takeaway APP. For example, the voice assistant APP can query the conversation content corresponding to the above conversation identifier in the database in the mobile phone shown in Table 1. If the corresponding conversation content is not found in the database in the mobile phone, the voice assistant APP may send the above session identifier to the first server, and query the first server for the conversation content corresponding to the above conversation identifier. Of course, the voice assistant APP can also directly query the first server for the conversation content corresponding to the above session identifier, which is not limited in this embodiment of the present application.
  • the voice assistant APP of the mobile phone displays a conversation interface corresponding to the above conversation identifier.
  • the voice assistant APP After the voice assistant APP acquires the dialogue content corresponding to the above-mentioned session identifier, the voice assistant APP can display the acquired dialogue content (the acquired dialogue content includes the above-mentioned first dialogue content) in the conversation interface, thereby restoring the display of the dialogue shown in FIG. 6 .
  • the conversation interface 601 shown that is, the conversation interface 601 before the voice assistant APP jumps to the takeaway APP is restored.
  • the voice assistant APP can jump to other applications to provide corresponding service resources to the user according to the content of the dialogue input by the user, and the mobile phone can also jump back to the voice assistant APP from other applications. And restore the previous conversation interface, so that the user can continue the conversation process with the voice assistant APP, which improves the continuity of multiple rounds of conversations during voice interaction, and improves the user experience at the same time.
  • the mobile phone may also send a session restoration message to the first server, so that the first server knows that the voice assistant APP has resumed the session interface 601 before jumping to the takeaway APP.
  • the voice assistant APP may carry the session identifier of the first conversation content and the device identifier of the mobile phone in the above-mentioned session resumption message.
  • the first server may first send the session identifier and device identifier in the above-mentioned conversation recovery message to the HAG, and the HAG determines that the third-party application corresponding to this conversation is a food delivery APP.
  • the HAG may send the above session identifier and device identifier to the second server of the food delivery APP.
  • the second server may determine whether the user logs in to the food delivery APP according to the received device identification (eg, UUID). Since the second server has marked the UUID of the mobile phone as the logged-in state in step 508, the second server can determine that the user has logged into the takeaway APP according to the device identifier in the session recovery message.
  • the received device identification eg, UUID
  • the second server has obtained the intent and/or slot information of the first dialogue content and the session identifier of the first dialogue content through the second request message. Therefore, after it is determined that the user has logged into the food delivery APP, the second server can determine the corresponding intention and/or slot information according to the currently received session identifier of the first dialogue content. Furthermore, the second server may determine the first service resource corresponding to the above-mentioned intention and/or slot information based on account information such as coupons, historical orders, and taste preferences in the user account.
  • the second server may recommend search results such as price, location, and evaluation of one or more nearby restaurants (or delicacies) to the user according to the user's account information. Furthermore, the second server may send the obtained search result to the voice assistant APP of the mobile phone through the first server.
  • the voice assistant APP of the mobile phone can display the received search results in the conversation interface 601 in the form of a card 901 .
  • the search results in the card 901 are associated with the account information after the user logs in to the food delivery APP, so that more targeted and accurate service resources can be recommended for the user, and the user experience during voice interaction can be improved.
  • the second server may also send the link of the search result obtained based on the user's account information to the voice assistant APP of the mobile phone through the first server.
  • the voice assistant APP can obtain the above search result according to the link, and display the search result in the conversation interface 601 in the form of a card 901 .
  • the user can select the corresponding option in the card 901 displayed by the voice assistant APP through voice, touch, etc., so that after the voice assistant APP jumps to the take-out APP, not only can jump from the take-out APP back to the voice assistant APP, but also can continue Provide the user with the voice connection function for this session.
  • the first server can also obtain from the second server that the user is in an unlogged state , such as search results for nearby restaurants (or gourmet food) when you are not logged in.
  • the first server may also obtain a link to the login page in the food delivery APP from the second server. That is, in step S506, the second server may send the search result in the unlogged state and the link to the login page to the voice assistant APP.
  • the second server may also send the link of the search result in the unlogged state (ie, the link of the first service resource) and the link of the above login page to the voice assistant APP, which is not limited in this embodiment of the present application.
  • the link to the login page sent by the second server may still carry parameters such as the session identifier and UUID.
  • the search result in the logout state can be displayed in the conversation interface 601 in the form of a card 1001 .
  • the voice assistant APP can first obtain the search results in the unlogged state according to the link, and then display the search results in the unlogged state in the form of a card 1001 on the Conversation interface 601. Since the search results obtained by the voice assistant APP at this time are obtained by the second server when the user is not logged in, the search results in the card 1001 may be different from those shown in FIG. 9 when the user is logged in.
  • the voice assistant APP can display the link 1002 of the login page in the conversation interface 601 to remind the user to obtain more accurate search results after logging in to the food delivery APP.
  • the user may jump to the login page of the food delivery APP by clicking the above link 1002 to log in, or may jump to the login page of the food delivery APP by voice input to log in.
  • the voice assistant APP can jump to the login page of the food delivery APP, and transmit the corresponding session identifier to the food delivery APP.
  • the subsequent mobile phone can directly jump back to the voice assistant APP from the food delivery APP, and redisplay the conversation interface of the voice assistant APP as shown in FIG. 10 according to the above-mentioned session identifier.
  • the voice assistant APP can send the second dialogue content of "login to the food delivery APP". to the first server.
  • the first server extracts the second dialogue content with the intention of logging in (login)
  • it may request the second server to log in to the user's account in the takeaway APP.
  • the second server may send the login page and session identifier of the food delivery APP to the voice assistant APP as the link, and the voice assistant APP will jump to the food delivery APP according to the link.
  • the mobile phone can jump from the displayed interface of the takeaway APP back to the interface of the voice assistant APP, and redisplay the conversation interface of the voice assistant APP as shown in FIG. 10 according to the above session identifier.
  • a session recovery message may also be sent to the first server, triggering the first server to obtain from the second server the user's information about the nearby restaurant ( or food), and send the search results to the voice assistant APP.
  • the voice assistant APP can display the received search results in the conversation interface 601 in the form of a card 1101 . Since the search result received by the voice assistant APP at this time is obtained by the second server when the user is logged in, the search result in the card 1101 is associated with the account information after the user logs in to the takeaway APP.
  • the search results in 1001 the search results can recommend more targeted and accurate service resources for users, and improve the user experience during voice interaction.
  • the voice assistant APP, the server of the voice assistant APP (ie the first server), and the server of the third-party application (ie the second server) can interact with this
  • the session ID of the content of the secondary dialogue is passed as a carrying parameter in each interaction process.
  • the second server can establish a correspondence between the service resource and the corresponding session identifier when providing the voice assistant APP with the service resource corresponding to the conversation content, so that the voice assistant APP jumps to the third-party application that provides the above-mentioned service resource.
  • the third-party application can also obtain the service resources and the corresponding session ID, so that when the third-party application exits, it can re-launch the voice assistant APP according to the session ID and restore the corresponding session interface, so that the voice assistant APP jumps to the first After the third-party application, you can still jump back to the voice assistant APP to realize the voice connection function and improve the user experience.
  • the voice assistant APP is installed in the mobile phone as an example. It can be understood that in the voice interaction scenario, the electronic device on which the voice assistant APP is installed may also be a vehicle-mounted device, a tablet computer, a watch etc., all of these devices can be used to implement the voice interaction method in the foregoing embodiment, which is not limited in this embodiment of the present application.
  • an embodiment of the present application discloses an electronic device, and the electronic device may be the above-mentioned mobile phone.
  • the electronic device may specifically include: a touch screen 1201 including a touch sensor 1206 and a display screen 1207; one or more processors 1202; a memory 1203; one or more application programs (not shown); and one or more A computer program 1204, the various devices described above may be connected by one or more communication buses 1205.
  • the above-mentioned one or more computer programs 1204 are stored in the above-mentioned memory 1203 and configured to be executed by the one or more processors 1202, and the one or more computer programs 1204 include instructions that can be used to execute the above-mentioned Relevant steps performed by the mobile phone in the embodiment.
  • Each functional unit in each of the embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software functional units.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a computer-readable storage medium.
  • a computer-readable storage medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: flash memory, removable hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

Abstract

本申请提供一种语音交互方法及电子设备,涉及终端技术领域,可在语音应用跳转至其他应用后继续实现语音接续功能,提高用户在语音交互时的使用体验。该方法包括:电子设备显示语音应用的会话界面,该会话界面用于显示用户与语音应用之间的对话内容;后续,当电子设备检测到用户输入的第一语音后,可将第一语音转换为文本形式的第一对话内容;进而,电子设备可根据第一对话内容获取对应的第一链接;并根据第一链接从语音应用的会话界面跳转至第一应用的界面;当手机退出显示第一应用的界面时,电子设备可根据与第一对话内容对应的会话标识跳转回会话界面。

Description

一种语音交互方法及电子设备
本申请要求于2021年03月30日提交国家知识产权局、申请号为202110343786.7、申请名称为“一种语音交互方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种语音交互方法及电子设备。
背景技术
随着语音识别技术的发展,许多电子设备中添加了语音助手APP(例如Siri、小爱同学、小E等)来帮助用户完成与电子设备的人机交互过程。一般,用户唤醒电子设备中的语音助手APP后,语音助手APP可回答或执行用户发出的每一条语音指令。
在一些场景中,语音助手APP与用户进行语音交互时具有语音接续的功能,从而支持用户与语音助手APP之间实现多轮对话。例如,用户唤醒手机中的语音助手APP后,语音助手APP接收到“今天天气怎么样”的第一语音指令后,可在手机的显示界面中通过卡片等形式展示查询到的第一信息。后续,如果语音助手APP接收到“明天呢”的第二语音指令,则语音助手APP可继续查询明天的天气信息(即第二信息),进而将查询到的第二信息继续显示在手机的显示界面中,实现语音接续功能。
在另一些场景中,语音助手APP在回答用户的语音指令时,还可以通过跳转至其他应用的应用界面,向用户提供相应的服务。例如,如果语音助手APP检测到“我想点外卖”的第三语音指令,则语音助手APP可跳转至外卖APP的搜索界面,使得用户可以在外卖APP中完成与“我想点外卖”相关的操作。一般,当语音助手APP跳转至其他应用后,语音助手APP会结束与用户的本次会话。此时,语音助手APP可关闭或切换至手机后台运行,使得后续语音助手APP无法继续与用户交互,无法实现语音接续功能,降低了用户在语音交互时的使用体验。
发明内容
本申请提供一种语音交互方法及电子设备,可在语音助手APP跳转至其他应用后继续实现语音接续功能,提高用户在语音交互时的使用体验。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请提供一种语音交互方法,包括:电子设备显示语音应用的会话界面,该会话界面用于显示用户与语音应用之间的对话内容;后续,当电子设备检测到用户输入的第一语音后,可将第一语音转换为文本形式的第一对话内容;进而,电子设备可根据第一对话内容获取对应的第一链接;并根据第一链接从语音应用的会话界面跳转至第一应用的界面;当手机退出显示第一应用的界面时,电子设备可根据与第一对话内容对应的会话标识跳转回会话界面。这样,用户可以继续在之前的会话界面中与语音应用对话,实现语音交互时会话内容的接续功能,从而提高用户的使用体验。
在一种可能的实现方式中,在电子设备将第一语音转换为第一对话内容之后,还包括:电子设备向第一服务器发送第一请求消息,第一请求消息中可包括第一对话内容, 以使得第一服务器响应于第一请求消息确定第一对话内容的会话标识和上述第一链接;此时,电子设备根据第一对话内容获取第一链接,包括:电子设备接收第一服务器发送的第一响应消息,第一响应消息中包括上述第一链接和会话标识。例如,第一服务器可将第一链接(即第一应用中某一界面)的链接与第一对话内容的会话标识拼接后作为第一响应消息发送给电子设备。这样,电子设备中的语音应用可以获取到第一对话内容的会话标识以及第一链接。
在一种可能的实现方式中,电子设备根据与第一对话内容对应的会话标识跳转回会话界面,包括:在电子设备中运行的第一应用可先拉起语音应用;进而,在语音应用被拉起后,电子设备中的语音应用可根据会话标识显示与会话标识对应的对话内容,从而恢复跳转至第一应用前语音应用显示的会话界面。
示例性的,电子设备中的第一应用拉起语音应用,包括:当电子设备退出显示第一应用的界面时,第一应用可将语音应用的链接和上述会话标识拼接为第二链接,继而,第一应用可按照第二链接拉起语音应用,同时,通过第二链接将上述会话标识传递给语音应用,使得语音应用可根据该会话标识恢复跳转至第一应用前语音应用显示的会话界面。
示例性的,电子设备的语音应用显示与会话标识对应的对话内容,包括:语音应用通过上述第二链接获取到会话标识后,可查询电子设备中是否存储有与会话标识对应的对话内容;若电子设备存储有与会话标识对应的对话内容,则语音应用可将与会话标识对应的对话内容显示在会话界面中;若电子设备没有存储与会话标识对应的对话内容,则语音应用可从第一服务器获取与会话标识对应的对话内容,并将与会话标识对应的对话内容显示在会话界面中。
在一种可能的实现方式中,上述第一请求消息中还可以包括设备标识(例如UUID等),该设备标识用于确认电子设备是否登录第一应用。
示例性的,若电子设备没有登录第一应用,则上述第一链接可包括第一应用中登录页面的链接。此时,第一应用的登录页面为与第一对话内容对应的服务内容(或服务资源)。又例如,若电子设备没有登录第一应用,则第一服务器返回的第一响应消息中除了包括上述第一链接(即登录页面的链接)外,还可以包括电子设备没有登录第一应用时基于第一对话内容的搜索结果。又例如,若电子设备登录了第一应用,则第一服务器可基于登陆后的账户信息查询与第一对话内容对应的搜索结果,进而将该搜索结果携带在第一响应消息中返回给电子设备。
在一种可能的实现方式中,若电子设备没有登录第一应用,则电子设备根据第一链接从语音应用的会话界面跳转至第一应用的界面,包括:电子设备的语音应用根据第一链接拉起第一应用,并显示第一应用的登录页面。此时,电子设备从语音应用的会话界面跳转至第一应用的界面。
在一种可能的实现方式中,在电子设备显示第一应用的登录页面之后,还包括:电子设备接收用户在登录页面中输入的登录操作,该登录操作用于授权电子设备登录第一应用的用户账号;此时,电子设备根据与第一对话内容对应的会话标识跳转回会话界面,包括:若接收到与登录操作对应的登录成功消息,则可触发电子设备退出显示第一应用的界面,此时,电子设备可根据会话标识从第一应用的界面跳转回上述会话界面。
在一种可能的实现方式中,在电子设备根据会话标识跳转回会话界面之后,还包括:电子设备可请求第一服务器获取与第一对话内容对应的第一服务内容,此时,由于用户已经登录第一应用,因此,第一服务器可获取到的第一服务内容与登录第一应用后用户的账户信息关联,进而,电子设备可在会话界面中显示该第一服务内容,从而为用户推荐更有针对性、更精准的服务资源,提高语音交互时用户的使用体验。
在一种可能的实现方式中,若电子设备没有登录第一应用,则上述方法还包括:电子设备获取与第一对话内容对应的第二服务内容,第二服务内容为没有登录第一应用时与第一对话内容对应的服务内容;此时,在电子设备根据第一链接从语音应用的会话界面跳转至第一应用的界面之前,还包括:电子设备在会话界面中显示第二服务内容和第一应用中登录页面的链接。这样,在没有登录的状态下用户也可以在会话界面中获取到对应的第二服务内容。
在这种场景下,电子设备根据第一链接从语音应用的会话界面跳转至第一应用的界面,包括:若检测到用户在上述会话界面中选择登录页面的链接,则电子设备可根据登录页面的链接从会话界面跳转至第一应用的登录页面;或者,若用户输入的第二语音用于指示登录第一应用,则电子设备可根据登录页面的链接从会话界面跳转至第一应用的登录页面。
在一种可能的实现方式中,电子设备根据第一链接从语音应用的会话界面跳转至第一应用的界面之后,还包括:电子设备可启动预设的定时器;当定时器超时时,电子设备从第一应用的界面跳转回会话界面。也就是说,当电子设备从语音应用跳转至第一应用后,可在第一应用中停留预设时间,当定时器超时时,可自动跳转回语音应用的会话界面,用户可在会话界面中继续进行语音交互。
在一种可能的实现方式中,在电子设备根据第一链接从语音应用的会话界面跳转至显示第一应用的界面之后,还包括:电子设备可将语音应用切换至后台运行。
第二方面,本申请提供一种语音交互方法,包括:第一服务器可接收电子设备发送的第一请求消息,第一请求消息中包括第一对话内容;响应于第一请求消息,第一服务器可获取第一对话内容的会话标识和第一链接;进而,第一服务器向电子设备发送第一响应消息,该第一响应消息包括会话标识和第一链接,以使得电子设备根据该会话标识可从与第一链接对应的第一应用的界面跳转回显示第一对话内容的语音应用的界面,实现语音接续功能。
在一种可能的实现方式中,第一服务器获取第一对话内容的会话标识和第一链接,包括:第一服务器为第一对话内容分配会话标识;第一服务器从第一应用的第二服务器中获取与第一对话内容对应的第一链接。
在一种可能的实现方式中,第一服务器为第一对话内容分配会话标识,包括:第一服务器识别第一对话内容的语义,例如,提取第一对话内容的槽位信息和意图等;进而,第一服务器可根据第一对话内容的语义为第一对话内容分配会话标识。在一些实施例中,电子设备也可为第一对话内容分配会话标识,此时,电子设备不需要再从第一服务器中获取本次对话内容的会话标识。
在一种可能的实现方式中,第一服务器从第一应用的第二服务器中获取与第一服务内容对应的第一链接,包括:第一服务器向第二服务器发送第一消息,第一消息中可包 括上述会话标识和第一对话内容的语义,以使得第二服务器根据第一对话内容的语义确定第一链接,从而建立第一链接与会话标识之间的对应关系;进而,第一服务器可接收第二服务器发送的第二消息,第二消息包括上述会话标识以及第一链接,即第一服务器通过与第二服务器(第一应用的服务器)交互获取上述第一链接。
在一种可能的实现方式中,上述第一请求消息中可包括电子设备的设备标识;此时,第一服务器向第二服务器的第一消息中也可携带上述设备标识,以使得第二服务器根据设备标识确定电子设备是否登录第一应用;若电子设备没有登录第一应用,则第二服务器可将第一应用中登录页面的链接确定为第一链接。
在一种可能的实现方式中,在第一服务器向电子设备发送第一响应消息之后,还包括:第一服务器接收电子设备发送的会话恢复消息,该会话恢复消息中包括第一对话内容的会话标识和设备标识;响应于会话恢复消息,第一服务器可根据上述设备标识在第二服务器中查询电子设备是否登录第一应用;若电子设备登录第一应用,则第一服务器可从第二服务器中获取与第一对话内容对应的第一服务内容,第一服务内容与登录第一应用后用户的账户信息关联;第一服务器将第一服务内容发送至电子设备进行显示。
在一种可能的实现方式中,若电子设备没有登录第一应用,则上述第一响应消息还可以包括第二服务内容,第二服务内容为没有登录第一应用时与第一对话内容对应的服务内容。这样,在没有登录的状态下用户也可以在会话界面中获取到对应的第二服务内容。
在一种可能的实现方式中,上述第一响应消息中还可以包括第一链接的超时时间。这样,电子设备在显示第一应用的界面时,可按照该超时时间设置对应的定时器,当定时器超时后触发电子设备自动跳转回语音应用的会话界面。
第三方面,本申请提供一种语音交互方法,包括:电子设备可显示语音应用的会话界面,该会话界面用于显示用户与语音应用之间的对话内容;电子设备检测用户输入的第一语音后,可将第一语音转换为第一对话内容;进而,电子设备向第一服务器发送第一请求消息,第一请求消息中包括上述第一对话内容;响应于第一请求消息,第一服务器可获取第一对话内容的会话标识和第一链接;并将该会话标识和第一链接携带在第一响应消息中返回给电子设备;进而,电子设备可根据第一链接从会话界面跳转至第一应用的界面;在显示第一应用的界面之后,电子设备可根据上述会话标识跳转回会话界面。
也就是说,用户与语音应用进行语音交互时,语音应用、第一应用以及各个服务器在交互时,可将本次对话内容的会话标识作为携带参数在各个交互过程中传递。这样,服务器在向语音应用提供与对话内容对应的服务资源时可建立该服务资源与对应会话标识之间的对应关系,使得语音应用跳转至提供上述服务资源的第三方应用时,第三方应用也可获得与本次服务资源与对应会话标识,从而在第三方应用退出时能够按照该会话标识重新拉起语音应用并恢复相应的会话界面,使得语音应用跳转至第三方应用之后仍然能够跳转回会话界面实现语音接续功能,提高用户的使用体验。
在一种可能的实现方式中,第一服务器获取第一链接,包括:第一服务器根据第一对话内容从第二服务器获取第一链接,第二服务器为第一应用对应的服务器。
在一种可能的实现方式中,第一请求消息中还包括电子设备的设备标识;此时,第一服务器还可以向第二服务器发送设备标识;第二服务器可根据该设备标识确定电子设 备是否登录第一应用;若电子设备没有登录第一应用,则上述第一链接包括第一应用中登录页面的链接。
在一种可能的实现方式中,在电子设备根据第一链接从会话界面跳转至第一应用的界面之后,还包括:响应于用户在第一应用的界面输入的登录操作,电子设备向第一服务器发送登录请求,该登录请求中包括设备标识;响应于该登录请求,第一服务器进而请求第二服务器将该设备标识标记为登录状态;进而,第一服务器向电子设备发送登录成功消息;此时,电子设备根据会话标识跳转回会话界面,包括:响应于登录成功消息,电子设备根据上述会话标识从第一应用的界面跳转回会话界面。
在一种可能的实现方式中,在电子设备根据会话标识从第一应用的界面跳转回会话界面之后还包括:电子设备向第一服务器发送会话恢复消息,该会话恢复消息中包括会话标识和设备标识;若电子设备登录第一应用,则第一服务器可响应该会话恢复消息,从第二服务器中获取与第一对话内容对应的第一服务内容,并将第一服务内容发送给电子设备,此时,第一服务内容与登录第一应用后用户的账户信息关联;后续,电子设备可在会话界面中显示第一服务内容。
在一种可能的实现方式中,若电子设备没有登录第一应用,则上述方法还包括:第一服务器从第二服务器获取第二服务内容,第二服务内容为没有登录第一应用时与第一对话内容对应的服务内容;那么,在电子设备根据第一链接从会话界面跳转至第一应用的界面之前,还包括:电子设备可在会话界面中显示第二服务内容。
在一种可能的实现方式中,电子设备根据会话标识跳转回会话界面,包括:电子设备中的第一应用根据第二链接拉起语音应用,第二链接包括语音应用的链接和会话标识;在语音应用被拉起后,电子设备中的语音应用可显示与会话标识对应的对话内容。
第四方面,本申请提供一种电子设备,包括:存储器、显示屏和一个或多个处理器;存储器、显示屏与处理器耦合。其中,存储器用于存储计算机程序代码,计算机程序代码包括计算机指令;当电子设备运行时,该处理器用于执行该存储器存储的一个或多个计算机指令,以使电子设备执行如上述任一项所述的语音交互方法。
第五方面,本申请提供一种服务器,包括:处理器、存储器、通信模块以及一个或多个计算机程序;其中,处理器与通信模块和存储器均耦合,上述一个或多个计算机程序被存储在存储器中,当服务器运行时,该处理器执行该存储器存储的一个或多个计算机程序,以使服务器执行上述任一项所述的语音交互方法。
第六方面,本申请提供一种语音交互系统,包括上述电子设备和服务器,电子设备和服务器交互时可执行上述第三方面中任一项所述的语音交互方法。
第七方面,本申请提供一种计算机存储介质,包括计算机指令,当计算机指令在电子设备(或服务器)上运行时,使得在电子设备(或服务器)执行上述任一项所述的语音交互方法。
第八方面,本申请提供一种计算机程序产品,当计算机程序产品在在电子设备(或服务器)上运行时,使得在电子设备(或服务器)执行上述任一项所述的语音交互方法。
可以理解地,上述提供的电子设备、服务器、语音交互系统、计算机存储介质以及计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种语音交互系统的架构示意图;
图2为现有技术中电子设备与服务器进行语音交互的流程示意图;
图3为本申请实施例提供的一种电子设备与服务器进行语音交互的流程示意图;
图4为本申请实施例提供的一种电子设备的结构示意图一;
图5为本申请实施例提供的一种语音交互方法的交互流程示意图;
图6为本申请实施例提供的一种语音交互方法的应用场景示意图一;
图7为本申请实施例提供的一种语音交互方法的应用场景示意图二;
图8为本申请实施例提供的一种语音交互方法的应用场景示意图三;
图9为本申请实施例提供的一种语音交互方法的应用场景示意图四;
图10为本申请实施例提供的一种语音交互方法的应用场景示意图五;
图11为本申请实施例提供的一种语音交互方法的应用场景示意图六;
图12为本申请实施例提供的一种电子设备的结构示意图二。
具体实施方式
为了方便清楚地理解下述各实施例,首先给出语音识别技术中相关术语的简要介绍:
意图(intent):用户每次的语音输入一般对应着用户的一个意图。意图是一句或多句表达形式的集合,例如“我要看电影”和“我想看2001年刘德华拍摄的动作电影”都可以属于同一个视频播放的意图。
技能(skill):一个技能可以覆盖一个或多个意图。例如,外卖APP的开发人员可以在电子设备厂商提供的语音平台(或称为语音开放平台)中创建名称为“点外卖”的技能1。技能1可以覆盖搜索、支付以及导航等多个意图。后续,电子设备接收到语音输入1后,如果识别出语音输入1包含与“点外卖”相关的关键词,则可确定语音输入1与技能1关联,也可称为语音输入1命中技能1。
槽位(slot)信息:槽位信息是指用户的语音输入中用来表达意图的关键信息,槽位信息直接决定电子设备(或服务器)能否匹配到正确的意图。一个槽位对应着一类属性的关键词,该槽位中的信息(即槽位信息)可以由同一类型的关键词进行填充。例如,与歌曲播放这一意图对应的查询句式可以为“我想听{singer}的{song}”。其中,{singer}为歌手的槽位,{song}为歌曲的槽位。那么,如果接收到用户输入“我想听王菲的红豆”这一语音输入,则电子设备(或服务器)可从该语音输入中提取到{singer}这一槽位中的槽位信息为:王菲,{song}这一槽位中的槽位信息为:红豆。这样,电子设备(或服务器)可根据这两个槽位信息识别出本次语音输入的用户意图为:播放歌曲。
会话标识:一次会话过程可以包括用户与电子设备之间的一轮或多轮对话。每次会话可通过对应的会话标识标识。例如,该会话标识可以为sessionId或dialogId等。一般,一次会话过程可与一个技能对应。例如,当用户的语音输入命中点外卖的技能后,电子设备可为本次会话过程分配sessionId 1,后续,用户与电子设备在点外卖这一技能下的对话均属于会话1,其会话标识均为sessionId 1。
在用户与电子设备进行语音交互时,电子设备每次接收到用户的语音输入后,均可通过与服务器交互确定与本次语音输入关联的具体技能。进而,基于确定出的具体技能,服务器可提取与本次语音输入对应的意图和槽位信息,并指示电子设备执行与本次意图 和槽位信息对应的操作指令。例如,该操作指令可以是显示卡片、播放语音、跳转至其他应用或控制智能家居设备等指令,从而完成对本次用户的语音输入的响应。
下面将结合附图对本实施例的实施方式进行详细描述。
示例性的,本申请实施例提供的一种语音交互方法可应用图1所示的语音交互系统100。语音交互系统100中可包括电子设备101和至少一个服务器102。
电子设备101中可以安装语音助手APP(也可称为语音APP、语音助手或智慧语音等),用于与用户进行语音交互。例如,用户可通过输入预设的唤醒语(例如“你好小E”、“小艺小艺”、“hi Siri”等)唤醒电子设备101中的语音助手APP。语音助手APP被唤醒后,可与用户进行会话。在一次会话过程中,用户可与语音助手APP进行一轮或多轮对话。在每轮对话中,用户可向语音助手APP输入相应的语音输入(也可称为语音指令、语音输入等),触发语音助手APP识别该语音输入,并提供对应的服务资源,从而完成一轮对话。
例如,如图2所示,在步骤S201中,语音助手APP可接收用户的语音输入。在步骤S202中,语音助手APP可通过ASR(Automatic Speech Recognition,自动语音识别简称)技术将上述语音输入转换为对应的文本(即对话内容)。进而,在步骤S203中,语音助手APP可将该对话内容发送给服务器102。或者,电子设备101也可直接将接收到的语音输入发送给服务器102,由服务器102将上述语音输入转换为对应的对话内容。
仍如图2所示,服务器102接收到电子设备101发来的对话内容后,在步骤S204中,服务器102可使用预设的自然语言理解(natural language understanding,NLU)算法提取本次对话内容中的意图和槽位信息。进而,在步骤S205中,服务器102可与本次提取到的意图和槽位信息对应的服务资源(也可称为服务内容)或服务资源的链接发送给语音助手APP。例如,当意图为查询天气时,服务器102可将对应的天气信息作为服务资源发送给电子设备101的语音助手APP。又例如,当意图为播放歌曲,槽位信息包括歌手A和歌曲B时,服务器102可将与歌手A和歌曲B对应的URL发送给电子设备101。又例如,当意图为点外卖时,服务器102可将外卖APP中搜索页面的deeplink(深度链接)发送给电子设备101。
需要说明的是,语音交互系统100中的服务器102可以有多个。例如,第一服务器可用于提取对话内容中的意图和槽位信息。第一服务器可根据提取到的意图和槽位信息从第二服务器中获取对应的服务资源。
在一些场景中,当服务器102将其他应用(例如外卖APP)的deeplink发送给电子设备101的语音助手APP后,如步骤S206所述,语音助手APP可按照该deeplink跳转至外卖APP。此时,电子设备101将语音助手APP关闭或切换至后台运行,导致本次语音交互过程结束(即步骤S207)。后续,当用户结束在外卖APP中的操作后,电子设备101也无法跳转回语音助手APP继续之前的会话,从而降低了语音交互时多轮会话的连续性。
在本申请实施例中,用户与电子设备101进行会话时,如图3所示,在步骤S301中,电子设备101的语音助手APP可接收用户的语音输入。进而,在步骤S302中,语音助手APP可将接收到的语音输入转换为对应的对话内容。在步骤S303中,语音助手APP可将每次接收到的语音输入(或文本形式的对话内容)以及本次会话的会话标识发送给 服务器102。后续,在步骤S304中,服务器102可从对话内容中提取对应的意图和/或槽位信息,从而查询与本次语音输入对应的服务资源。如果服务器102查询到与本次语音输入对应的服务资源为其他应用(例如外卖APP)的deeplink,则在步骤S305中,服务器102可将上述会话标识和该deeplink(即服务资源)一同发给电子设备101中的语音助手APP。进而,在步骤S306中,语音助手APP可按照接收到的deeplink跳转至外卖APP的界面,并将对应的会话标识也发送给外卖APP。这样,用户在电子设备101结束外卖APP的界面操作后,如步骤S307所述,电子设备101可由显示的外卖APP的界面可跳转回语音助手APP,并将对应的会话标识传递给语音助手APP,使得语音助手APP可根据该会话标识显示相应的会话界面。这样,用户可以继续在之前的会话界面中与语音助手APP对话,实现语音交互时会话内容的接续功能,从而提高用户的使用体验。
示例性的,上述电子设备101具体可以为手机、音箱、车载设备(也可称为车机)、平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、手持计算机、上网本、个人数字助理(personal digital assistant,PDA)、可穿戴电子设备、虚拟现实设备等具有语音交互功能的电子设备,本申请实施例对此不做任何限制。
示例性的,图4示出了电子设备101的结构示意图。
电子设备101可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,麦克风170B,传感器模块180等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在本发明实施例中,处理器110可以处理如下一个或多个操作:将语音转化成文本、识别用户输入的语音、对用户的语音进行响应、将从服务器获取的内容发送给显示屏进行显示、根据链接从一个应用跳转到另一个应用等。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
移动通信模块150可以提供应用在电子设备101上的包括2G/3G/4G/5G等无线通信 的解决方案。移动通信模块150可以包括一个或多个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
无线通信模块160可以提供应用在电子设备101上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(Bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成一个或多个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备101的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储一个或多个计算机程序,该一个或多个计算机程序包括指令。处理器110可以通过运行存储在内部存储器121的上述指令,从而使得电子设备101执行本申请一些实施例中所提供的联系人智能推荐的方法,以及各种功能应用和数据处理等。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统;该存储程序区还可以存储一个或多个应用程序(比如图库、联系人等)等。存储数据区可存储电子设备101使用过程中所创建的数据(比如照片,联系人等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如一个或多个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。在另一些实施例中,处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,来使得电子设备101执行本申请实施例中所提供的语音交互方法,以及各种功能应用和数据处理。
电子设备101可以通过音频模块170,扬声器170A,麦克风170B,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备101可以通过扬声器170A收听音乐,或收听免提通话。
麦克风170B,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170B发声,将声音信号输入到麦 克风170B。电子设备101可以设置一个或多个麦克风170B。在另一些实施例中,电子设备101可以设置两个麦克风170B,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备101还可以设置三个,四个或更多麦克风170B,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
在本发明实施中,扬声器170A、麦克风170B和音频模块170可用于实现与用户的语音交互,例如接收用户的语音,或者通过语音响应用户的操作,等。
传感器180可以包括压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等,本申请实施例对此不做任何限制。
可以理解的是,本发明实施例示意的结构并不构成对电子设备101的具体限定。在本申请另一些实施例中,电子设备101可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
例如,当上述电子设备101为音箱时,电子设备101中还可以包括GPU、显示屏、摄像头以及按键等一项或多项器件,本申请实施例对此不做任何限制。
又例如,当上述电子设备101为手机时,电子设备101中还可以包括GPU、显示屏、摄像头、耳机接口、按键、电池、马达、指示器以及SIM卡接口等一项或多项器件,本申请实施例对此不做任何限制。
以下,将结合附图对本申请实施例提供的一种语音交互方法进行具体介绍。以下实施例中均以手机作为上述电子设备101举例说明。
图5为本申请实施例提供的一种语音交互方法的流程示意图。如图5所示,该语音交互方法可以包括:
S501、手机的语音助手APP接收用户输入的第一语音输入。
示例性的,用户需要使用手机中的语音助手APP时,可通过输入唤醒词等方式唤醒语音助手APP。例如,当手机检测到用户输入正确的唤醒词后,可开始运行语音助手APP。并且,如图6所示,手机可显示语音助手APP与用户的会话界面601。或者,当手机检测到用户输入唤醒语音助手APP的预设操作(例如长按电源键、语音唤醒词唤醒)后,可运行语音助手APP,并显示图6所示的会话界面601。
手机中的语音助手APP被唤醒后,语音助手APP可调用手机的麦克风(或麦克风阵列)检测用户的语音输入(也可称为语音、语音指令或语料等),以便与用户进行一轮或多轮对话。示例性的,用户输入的第一语音输入可以为“我要点外卖”的语音。
S502、手机的语音助手APP将第一语音输入转化为第一对话内容。
示例性的,手机中可设置ASR服务。语音助手APP检测到用户输入的第一语音输入后,可调用ASR服务将第一语音输入转换为对应的文本内容(即第一对话内容)。例如,当第一语音输入为“我要点外卖”的语音时,语音助手APP可通过手机中的ASR服务,得到对应的文本形式的第一对话内容“我要点外卖”。此时,仍如图6所示,语音助手APP可在会话界面601中以对话的形式显示第一对话内容602。可以理解的是,ASR服务也可以设置在其他设备上,例如服务器或其他终端设备,手机可以通过与具有ASR服务的设备交互,获取语音转化文本后的内容,例如第一对话内容。本发明实施例对此不 做具体限定。
在一些实施例中,语音助手APP将第一语音输入转化为文本形式的第一对话内容后,还可获取手机当前的位置信息、相机数据或健康数据等数据。例如,如果第一对话内容命中导航或点外卖等技能,则手机还可以获取当前的位置信息。又例如,如果第一对话内容命中拍视频等技能,则手机还可以获取当前的相机数据。
S503、手机向第一服务器发送第一请求消息,第一请求消息中包括上述第一对话内容。
示例性的,手机的语音助手APP将上述第一语音输入转化为文本形式的第一对话内容后,还可以通过与对话管理器(Dialog Manager,DM)交互确定第一对话内容所属的会话,从而得到第一对话内容的会话标识,例如,第一对话内容的sessionId。该会话标识可用于后续手机从语音助手APP跳转至其他应用后,能够跳转回语音助手APP并恢复上述第一对话内容。
例如,上述对话管理器可以设置在第一服务器中。第一服务器可以是向语音助手APP提供网络服务的一个或多个服务器。手机的语音助手APP获取到上述第一对话内容后,可向第一服务器发送第一请求消息,第一请求消息中包括上述第一对话内容。进而,第一服务器可使用预设的NLU算法识别第一对话内容的语义,再根据第一对话内容的语义为第一对话内容分配对应的会话标识。
示例性的,第一服务器中的DM可根据第一对话内容的语义确定第一对话内容命中的技能(skill),进而按照命中的技能分配会话标识。例如,DM可通过提取对话内容中的关键词确定各个对话内容命中的具体技能。如果对话内容1和对话内容2命中同一个技能,则对话内容1和对话内容2属于同一会话,此时,对话内容1和对话内容2的会话标识相同。
或者,上述对话管理器可以设置在手机中。此时,手机的语音助手APP获取到上述第一对话内容后,可使用预设的NLU算法识别第一对话内容的语义,进而请求手机中的DM根据第一对话内容的语义为第一对话内容分配对应的会话标识。进而,手机可将第一对话内容和第一对话内容的会话标识发送给服务器,例如手机将第一对话内容和第一对话内容的会话标识均携带在第一请求消息中发送给第一服务器。
在一些实施例中,手机的语音助手APP还可以将手机的设备标识携带在上述第一请求消息中发送至第一服务器。例如,该设备标识可以为UUID(Universally Unique Identifier,通用唯一识别码)、UDID(Unique Device Identifier,设备唯一识别码)、IMEI(International Mobile Equipment Identity,国际移动设备识别码)、SN(Serial Number,产品序列号)、UID(User Identification,用户身份证明)或OpenID等。该设备标识可用于后续获取与第一对话内容对应的服务资源(也可称为服务内容)时判断用户是否已登录相关应用的账号。
在一些实施例中,手机的语音助手APP还可以将手机当前的位置信息、相机数据或健康数据等数据携带在上述第一请求消息中发送至第一服务器,本申请实施例对此不做任何限制。
S504、响应于第一请求消息,第一服务器确定与上述第一对话内容对应的第一服务资源。
示例性的,第一服务器获取到第一请求消息中的第一对话内容后,可通过预设的NLU算法提取第一对话内容中的意图和/或槽位信息,即识别第一对话内容的语义。以第一对话内容为“我要点外卖”举例,第一服务器通过预设的NLU算法可以提取到第一对话内容的意图为点外卖(order-takeout),此时,第一对话内容中没有明确的槽位信息。又例如,当第一对话内容为“我要听周杰伦的歌曲”时,第一服务器通过预设的NLU算法可以提取到第一对话内容的意图为:播放歌曲,且与歌手对应的槽位信息为:周杰伦。后续,第一服务器可根据提取到的意图和/或槽位信息,查询与第一对话内容对应的第一服务资源(即第一服务内容)。
例如,第一服务资源(或第一服务内容)可以为具体的视频文件或音频文件,也可以为针对某一内容的搜索结果、页面(例如登录页面)或应用中的一项功能等,本申请实施例对此不做任何限制。
在一些实施例中,一些设备厂商可设置自己的服务开放平台,用于将第三方应用的开发者提供的服务接入到设备厂商自身的各种设备中使用。例如,华为设置有HAG(Huawei Ability Gallery,华为智慧服务开放平台),并通过HAG统一管理第三方应用的开发者提供的服务向华为设备的接入过程。在一种可选方式中,设备厂商也可以通过自己的服务开发平台,或,其他设置或约定等方式,告知第三方应用的开发者该设备厂商自身的设备中的语音助手APP的链接。这样,第三方应用开发者可以将该链接预设于该第三方应用中。在该设备厂商自身的设备,例如手机,从语音助手应用跳转到该第三方应用的界面后,第三方应用可以通过该链接再让设备拉起语音助手APP。
仍以第一对话内容为“我要点外卖”举例,第一服务器提取到第一对话内容的意图和/或槽位信息后,可向手机的服务开放平台发送第一消息,触发该服务开放平台确定与本次第一对话内容关联的第三方应用。例如,第一服务器可向HAG发送第一消息,第一消息中包括从第一消息中提取到的意图:order-takeout。这样,HAG根据第一消息中携带的意图,可确定与本次第一对话内容关联的第三方应用为外卖APP(例如美团APP等)。也就是说,与第一对话内容对应的第一服务资源是由外卖APP的服务器(例如第二服务器)提供的。进而,HAG可通知第一服务器从外卖APP的第二服务器中获取与第一对话内容对应的第一服务资源。后续,第一服务器可根据第一对话内容的意图和/或槽位信息,以及第一对话内容的会话标识,通过执行下述步骤S505从第二服务器中获取对应的第一服务资源。
或者,仍以第一对话内容为“我要点外卖”举例,第一服务器向HAG发送上述第一消息后,HAG可根据第一消息中携带的意图确定出与本次第一对话内容关联的第三方应用为外卖APP,进而,HAG可直接与外卖APP的第二服务器交互,从第二服务器中获取与第一对话内容对应的第一服务资源。
需要说明的是,上述实施例中是以第一服务器通过与HAG交互,确定与本次第一对话内容关联的第三方应用举例说明的。可以理解的是,第一服务器也可以通过其他方式确定与本次第一对话内容关联的第三方应用,本申请实施例对此不做任何限制。
S505、当第一服务资源由第一应用的服务器(即第二服务器)提供时,第一服务器向第二服务器发送第二请求消息,第二请求消息包括第一对话内容所属的会话标识。
示例性的,当第一服务器从HAG获取到与上述第一对话内容关联的第三方应用为外 卖APP后,第一服务器可向外卖APP的第二服务器发送第二请求消息,以请求获取与第一对话内容对应的第一服务资源(或第一服务资源的链接)。其中,第二请求消息中可以包括从第一对话内容中提取到的意图和/或槽位信息,以及第一对话内容的会话标识。
或者,HAG确定出与上述第一对话内容关联的第三方应用为外卖APP后,HAG可直接向外卖APP的第二服务器发送上述第二请求消息,以请求获取与第一对话内容对应的第一服务资源。同样,第二请求消息中可以包括从第一对话内容中提取到的意图和/或槽位信息,以及第一对话内容的会话标识。
无论采用哪种方式发送上述第二请求消息,第二服务器通过第二请求消息可以获取到本次对话内容(即第一对话内容)的会话标识,使得第一应用的第二服务器后续可建立第一对话内容的会话标识与第一对话内容的服务资源之间的对应关系。
S506、响应于第二请求消息,第二服务器将第一服务资源(或第一服务资源的链接)通过第一服务器发送至手机的语音助手APP。
示例性的,第二服务器接收到上述第二请求消息后,可根据第二请求消息中携带的意图和/或槽位信息确定对应的服务资源,即第一服务资源。例如,当意图为听歌、槽位信息为双节棍时,第二服务器可确定对应的服务资源为歌曲《双节棍》的音频资源。又例如,当意图为点外卖时,第二服务器可确定对应的服务资源为外卖APP中实时的外卖搜索结果。进而,第一服务资源可直接将第一服务资源通过第一服务器发送至手机的语音助手APP,也可以将第一服务资源的链接通过第一服务器发送至手机的语音助手APP,使得语音助手APP根据该连接获取对应的第一服务资源。
在一些实施例中,第二服务器接收到的第二请求消息中还可以包括手机的设备标识,例如UUID。此时,第二服务器接收到第二请求消息后,可先根据第二请求消息中携带的UUID判断用户是否已经登录外卖APP。
一般,当用户使用账号、密码等方式登录外卖APP后,外卖APP可将对应的UUID发送至第二服务器,第二服务器可将该UUID标记为登录状态。例如,第二服务器可分配与该UUID对应的token(令牌),使得该UUID被标记为登录状态。那么,当第二服务器接收到上述第二请求消息后,可查询是否存储有与第二请求消息中携带的UUID对应的token。如果存储有对应的token,则说明用户已经登录外卖APP;如果没有存储对应的token,则说明用户还未登录外卖APP。本发明实施例对第二服务器如何根据设备标识判断用户是否登录外卖APP,不做具体限定。
如果用户已经登录外卖APP,则第二服务器可以获取到用户账户里的优惠券、历史订单、口味偏好等账户信息。此时,第二服务器可基于这些信息,确定与第一对话内容中点外卖这一意图对应的第一服务资源。例如,如果用户拥有肯德基的优惠券,则第一服务资源中可优先包括肯德基的搜索结果。又例如,如果用户拥有肯德基的优惠券,则第二服务器还可以将优惠后的价格等信息携带在第一服务资源中。这样一来,第二服务器可结合用户登录后的账户信息,为用户确定更有针对性、更精准的服务资源,提高语音交互时用户的使用体验。
如果用户还未登录外卖APP,则第二服务器可将外卖APP中的登录页面作为本次与第一对话内容对应的第一服务资源。此时,第二服务器可将外卖APP中的登录页面的链接(即第一服务资源的链接)发送给手机的语音助手APP,以便用户后续登录外卖APP 后通过语音助手APP获取更准确、更丰富的服务资源。
示例性的,第二服务器可先将第一服务资源或第一服务资源的链接发送给第一服务器,再由第一服务器将第一服务资源或第一服务资源的链接发送至手机的语音助手APP。其中,第一服务资源的链接可以是H5(Hyper Text Markup Language,HTML5)页面的链接、快应用中相关页面的链接或第三方应用中相关页面的链接,这些链接可统一称为deeplink。并且,第一服务资源的链接中还可以包括第一对话内容的会话标识,从而建立第一对话内容的第一服务资源与第一对话内容的会话标识之间的对应关系。
例如,第二服务器确定出的第一服务资源的deeplink为:https://www.huawei.com/hag/accountLogin;第一对话内容的会话标识为:sessionId=xxxyyy;那么,第二服务器可将第一对话内容的会话标识拼接至第一服务资源的deeplink,最终得到第一服务资源的链接1为:https://www.huawei.com/hag/accountLogin&sessionId=xxxyyy,其中,链接1中既包括第一服务资源的deeplink,也包括第一对话内容的会话标识。进而,第二服务器可将链接1通过第一服务器发送给手机,手机可将链接1发送给语音助手APP。
在另一些实施例中,上述第一服务资源的链接1中还可以包括链接1的超时时间,例如2s等。该超时时间可用于在定时器到达或超过所述超时时间后,触发手机从链接1对应的应用的界面跳转回语音助手APP。
S507、手机的语音助手APP根据上述链接跳转至第一应用。
以第一服务资源为外卖APP中的登录页面举例,在步骤S507中,手机的语音助手APP获取到的链接中可以包括上述登录页面的deeplink,进而,语音助手APP可按照登录页面的deeplink拉起外卖APP,将外卖APP运行在手机前台。
此时,手机可将语音助手APP切换至手机后台运行。示例性的,语音助手APP切换至后台时,可将当前用户与语音助手APP之间的对话内容存储在预设的数据库中,避免语音助手APP被切换至后台或者被杀死(kill)后当前的对话内容丢失。
例如,如表1所示,上述预设的数据库可存储最近一段时间(例如10分钟)内用户与语音助手APP之间的对话内容、对话时间以及会话标识之间的对应关系。当然,表1中还可以包括设备标识、与对话内容对应的技能ID等上下文信息,本申请实施例对此不做任何限制。后续,语音助手APP通过表1可根据具体的会话标识恢复用户与语音助手APP之间的对话内容。当然,第一服务器的DM也可以维护不同用户与其语音助手APP之间的上下文信息,该上下文信息中可以包括每轮对话的对话内容以及会话标识等信息,语音助手APP也可以根据DM中记录的会话标识和上下文信息恢复用户与语音助手APP之间的对话内容。
表1
Figure PCTCN2022083479-appb-000001
Figure PCTCN2022083479-appb-000002
S508、手机显示与上述链接对应的第一应用的界面。
仍以第一服务资源为外卖APP中的登录页面举例,在步骤S508中,语音助手APP根据链接中的deeplink跳转至外卖APP后,如图7所示,外卖APP可显示外卖APP中的登录页面701。此时,用户可在登录页面701中输入授权语音助手APP或授权其他已登录APP的账号信息登录外卖APP的授权操作。例如,该授权操作可以为点击登录页面701中授权按钮702的操作。或者,外卖APP的登录页面701中也可以提示用户通过输入账户和密码等方式登录外卖APP,本申请实施例对此不做任何限制。
外卖APP检测到用户输入授权操作后,可向外卖APP的第二服务器发送授权消息,该授权消息中可以包括手机的设备标识,例如UUID。进而,第二服务器可响应该授权消息,将该UUID标记为登录状态。例如,第二服务器可分配与该UUID对应的token,使得该UUID被标记为登录状态。进而,第二服务器可向手机的外卖APP发送登录成功的响应消息。此时,如图8所示,外卖APP可响应该响应消息在登录页面701中显示登录成功消息703。当然,外卖APP也可以不显示登录成功消息703。
当然,如果第一服务资源的链接包括其他应用或其他页面的deeplink,则手机也可按照上述方法触发语音助手拉起并显示其他应用的显示界面,本申请实施例对此不做任何限制。
S509、在显示第一应用的界面后,手机还可以从显示第一应用的界面跳转回语音助手APP的会话界面。
其中,在显示第一应用的界面后,手机可退出显示第一应用的界面。例如,当用户将外卖APP(即第一应用)切换时后台运行时,手机可退出显示第一应用的界面。又例如,当手机在显示外卖APP的界面时如果检测到用户点击返回按钮,则手机可退出显示第一应用的界面。又例如,当外卖APP接收到用户登录成功的消息后,也可自动退出显示第一应用的界面。
又或者,外卖APP获取到的第一服务资源的链接1中可以设置超时时间。例如,外卖APP可启动一个时长为上述超时时间的定时器。此时,手机从语音助手APP跳转至外卖APP后,如果在上述超时时间内没有接收到用户对外卖APP输入的任何操作,则外卖APP的界面可自动退出显示。或者,如果第一服务资源的链接1中设置有超时时间,则无论是否接收到用户对外卖APP输入的操作,外卖APP均可以在运行上述超时时间之后自动退出。本申请实施例对手机退出显示第一应用的界面的情况不做限制。
当手机退出显示外卖APP的界面时,由于外卖APP获取到的第一服务资源的链接1中包括第一对话内容的会话标识,因此,外卖APP可根据该会话标识触发语音助手APP显示相应的会话界面。
例如,语音助手APP的会话界面的deeplink 1为:hivoice://com.huawei.vassistant/diaglog。第一应用,例如外卖APP,可以预设有语音助手APP的上述deeplink1以实现界面跳转。例如,外卖APP可将第一对话内容的会话标识:sessionId=xxxyyy添加至上述deeplink 1中,拼接后生成携带会话标识的语音助手APP的deeplink 2:hivoice://com.huawei.vassistant/diaglog&sessionId=xxxyyy。进而,外卖APP 可通过调用上述deeplink 2拉起语音助手APP,并将第一对话内容的会话标识通过deeplink 2传递给语音助手APP。又例如,当外卖APP显示登录成功的消息,或者,当外卖APP显示的界面超时时,可触发手机退出显示外卖APP的界面。此时,外卖APP可通过上述deeplink 2拉起语音助手APP,从而将deeplink 2中携带的会话标识传递给语音助手APP。
可以理解的是,第一应用预设有需要跳转的应用的链接(例如上述语音助手APP的deeplink1),可以是第一应用对应的H5页面、快应用或本地安装应用中预设有需要跳转的语音应用的链接,本发明实施例不做具体限定。在一种可选的方式中,第一应用也可以不预设需要跳转的应用的链接(例如上述语音助手APP的deeplink1)。第一应用可在跳转前从第一应用对应的服务器获取,或者从需要跳转的应用获取(例如,从语音助手APP跳转到第一应用时或之后,从语音助手APP获取语音助手APP的deeplink1)。本发明实施例对此不做具体限定。
语音助手APP被拉起后可运行在手机的前台,此时,语音助手APP可根据外卖APP传递的会话标识获取对应的对话内容。例如,语音助手APP可在表1所示的手机内的数据库中查询与上述会话标识对应的对话内容。如果手机内的数据库中没有查询到对应的对话内容,则语音助手APP可将上述会话标识发送给第一服务器,在第一服务器中查询与上述会话标识对应的对话内容。当然,语音助手APP也可以直接在第一服务器中查询与上述会话标识对应的对话内容,本申请实施例对此不做任何限制。
S510、手机的语音助手APP显示与上述会话标识对应的会话界面。
语音助手APP获取到与上述会话标识对应的对话内容后,语音助手APP可将获取到的对话内容(获取到的对话内容包括上述第一对话内容)显示在会话界面中,从而恢复显示图6所示的会话界面601,即恢复语音助手APP跳转至外卖APP之前的会话界面601。
这样,用户在与语音助手APP对话时,语音助手APP可根据用户输入的对话内容跳转至其他应用中向用户提供对应的服务资源,并且,手机还可以从其他应用重新跳转回语音助手APP并恢复之前的会话界面,使得用户可以继续与语音助手APP接续本次会话过程,提高语音交互时多轮会话的连续性,同时提高了用户的使用体验。
另外,手机在重新跳转到语音助手APP后,还可以向第一服务器发送会话恢复消息,以使得第一服务器获知语音助手APP已经恢复跳转至外卖APP之前的会话界面601。
示例性的,语音助手APP可将第一对话内容的会话标识和手机的设备标识携带在上述会话恢复消息中。这样,第一服务器获取到上述对话恢复消息后,可将对话恢复消息中的会话标识和设备标识发送至外卖APP的第二服务器。例如,第一服务器可先将上述话恢复消息中的会话标识和设备标识发送至HAG,由HAG确定与本次对话对应的第三方应用为外卖APP。进而,HAG可将上述会话标识和设备标识发送至外卖APP的第二服务器。第二服务器根据接收到的设备标识(例如UUID)可确定用户是否登录外卖APP。由于在步骤508中第二服务器已经将手机的UUID标记为登录状态,因此,第二服务器根据会话恢复消息中的设备标识可确定用户已经登录外卖APP。
在上述步骤S505中第二服务器通过第二请求消息已经获取到第一对话内容的意图和/或槽位信息,以及第一对话内容的会话标识。因此,当确定用户已经登录外卖APP后,第二服务器可以根据当前接收到的第一对话内容的会话标识,确定出对应的意图和/或槽 位信息。进而,第二服务器可基于用户账户里的优惠券、历史订单、口味偏好等账户信息,确定与上述意图和/或槽位信息对应的第一服务资源。例如,当第一对话内容的意图为点外卖时,第二服务器可根据用户的账户信息为用户推荐附近的一个或多个餐厅(或美食)的价位、位置、评价等搜索结果。进而,第二服务器可将得到的搜索结果通过第一服务器发送给手机的语音助手APP。
此时,如图9所示,手机的语音助手APP可将接收到的搜索结果以卡片901的形式展示在会话界面601中。其中,卡片901中的搜索结果是与用户登录外卖APP后的账户信息关联的,从而能够为用户推荐更有针对性、更精准的服务资源,提高语音交互时用户的使用体验。
或者,第二服务器也可将基于用户的账户信息得到的搜索结果的链接通过第一服务器发送给手机的语音助手APP。此时,语音助手APP可根据该链接获取到上述搜索结果,并将搜索结果以卡片901的形式展示在会话界面601中。
后续,用户可通过语音、触摸等方式在语音助手APP展示的卡片901中选择相应的选项,使得语音助手APP跳转至外卖APP后,不仅可以从外卖APP跳转回语音助手APP,还可以继续向用户提供本次会话的语音接续功能。
在另一些实施例中,第一服务器接收到“我要点外卖”的第一对话内容后,如果用户没有登录外卖APP的账号,则第一服务器也可以从第二服务器获取用户在未登录状态下的第一服务资源,例如未登录状态下对附近餐厅(或美食)的搜索结果。并且,第一服务器还可以从第二服务器获取外卖APP中登录页面的链接。也就是说,在步骤S506中,第二服务器可将未登录状态下的搜索结果和登录页面的链接发送至语音助手APP。当然,第二服务器也可将未登录状态下搜索结果的链接(即第一服务资源的链接)和上述登录页面的链接发送至语音助手APP,本申请实施例对此不做任何限制。其中,第二服务器发送的登录页面的链接中仍然可携带上述会话标识、UUID等参数。
如果语音助手APP接收到未登录状态下的搜索结果和登录页面的链接,如图10所示,可将未登录状态下的搜索结果以卡片1001的形式显示在会话界面601中。如果语音助手APP接收到未登录状态下的搜索结果的链接,则语音助手APP可先根据该链接获取未登录状态下的搜索结果,再将未登录状态下的搜索结果以卡片1001的形式显示在会话界面601中。由于语音助手APP此时得到的搜索结果是第二服务器在用户未登录的状态下搜索得到的,因此,卡片1001中的搜索结果可能与图9所示的用户在登录状态下的搜索结果不同。并且,仍如图10所示,语音助手APP可将登录页面的链接1002显示在会话界面601中,提醒用户登录外卖APP后获得更精准的搜索结果。
后续,用户可以通过点击上述链接1002跳转至外卖APP的登录页面进行登录,也可以通过语音输入跳转至外卖APP的登录页面进行登录。例如,如果检测到用户点击上述会话界面601中的链接1002,则与上述步骤S507-S510类似的,语音助手APP可跳转至外卖APP的登录页面,并将对应的会话标识传递给外卖APP。这样,后续手机可从外卖APP直接跳转回语音助手APP,并按照上述会话标识重新显示如图10所示的语音助手APP的会话界面。
又例如,如果检测到用户输入“登录外卖APP”的语音输入(即第二语音输入),则与上述步骤S501-S510类似的,语音助手APP可将“登录外卖APP”的第二对话内容 发送至第一服务器。第一服务器提取到第二对话内容的意图为登录(login)时,可请求第二服务器登录用户在外卖APP中的账号。进而,第二服务器可将外卖APP的登录页面和会话标识作为本次链接发送至语音助手APP,由语音助手APP根据该链接跳转至外卖APP。后续,与上述实施例类似的,手机可从显示的外卖APP的界面跳转回语音助手APP的界面,并按照上述会话标识重新显示如图10所示的语音助手APP的会话界面。
示例性的,当手机跳转回语音助手APP后,与上述实施例类似的,还可以向第一服务器发送会话恢复消息,触发第一服务器从第二服务器获取用户在登录状态下对附近餐厅(或美食)的搜索结果,并将该搜索结果发送至语音助手APP。此时,如图11所示,语音助手APP可将接收到的搜索结果以卡片1101的形式显示在会话界面601中。由于语音助手APP此时接收到的搜索结果是第二服务器在用户登录的状态下搜索得到的,因此,卡片1101中的搜索结果是与用户登录外卖APP后的账户信息关联的,相较于卡片1001中的搜索结果,搜索结果能够为用户推荐更有针对性、更精准的服务资源,提高语音交互时用户的使用体验。
可以看出,在本申请实施例提供的语音交互方法中,语音助手APP、语音助手APP的服务器(即第一服务器)以及第三方应用的服务器(即第二服务器)在交互时,可将本次对话内容的会话标识作为携带参数在各个交互过程中传递。这样,第二服务器在向语音助手APP提供与对话内容对应的服务资源时可建立该服务资源与对应会话标识之间的对应关系,使得语音助手APP跳转至提供上述服务资源的第三方应用时,第三方应用也可获得与本次服务资源与对应会话标识,从而在第三方应用退出时能够按照该会话标识重新拉起语音助手APP并恢复相应的会话界面,使得语音助手APP跳转至第三方应用之后仍然能够跳转回语音助手APP实现语音接续功能,提高用户的使用体验。
需要说明的是,上述实施例中是以在手机中安装语音助手APP举例说明的,可以理解的是,在语音交互场景中,安装语音助手APP的电子设备还可以是车载设备、平板电脑、手表等,这些设备均可用于实现上述实施例中的语音交互方法,本申请实施例对此不做任何限制。
如图12所示,本申请实施例公开了一种电子设备,该电子设备可以为上述手机。该电子设备具体可以包括:触摸屏1201,所述触摸屏1201包括触摸传感器1206和显示屏1207;一个或多个处理器1202;存储器1203;一个或多个应用程序(未示出);以及一个或多个计算机程序1204,上述各器件可以通过一个或多个通信总线1205连接。其中,上述一个或多个计算机程序1204被存储在上述存储器1203中并被配置为被该一个或多个处理器1202执行,该一个或多个计算机程序1204包括指令,该指令可以用于执行上述实施例中手机执行的相关步骤。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单 元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (34)

  1. 一种语音交互方法,其特征在于,包括:
    电子设备显示语音应用的会话界面,所述会话界面用于显示用户与所述语音应用之间的对话内容;
    所述电子设备检测用户输入的第一语音,并将所述第一语音转换为第一对话内容;
    所述电子设备根据所述第一对话内容获取第一链接;
    所述电子设备根据所述第一链接从所述语音应用的所述会话界面跳转至第一应用的界面;
    在显示所述第一应用的界面之后,所述电子设备根据与所述第一对话内容对应的会话标识跳转回所述会话界面。
  2. 根据权利要求1所述的方法,其特征在于,在所述电子设备将所述第一语音转换为第一对话内容之后,还包括:
    所述电子设备向第一服务器发送第一请求消息,所述第一请求消息中包括所述第一对话内容,以使得所述第一服务器响应于所述第一请求消息确定所述第一对话内容的会话标识和所述第一链接;
    其中,所述电子设备根据第一对话内容获取第一链接,包括:
    所述电子设备接收所述第一服务器发送的第一响应消息,所述第一响应消息中包括所述第一链接和所述会话标识。
  3. 根据权利要求1或2所述的方法,其特征在于,所述电子设备根据与所述第一对话内容对应的会话标识跳转回所述会话界面,包括:
    所述电子设备中的所述第一应用拉起所述语音应用;
    在所述语音应用被拉起后,所述电子设备中的所述语音应用显示与所述会话标识对应的对话内容。
  4. 根据权利要求3所述的方法,其特征在于,所述电子设备中的所述第一应用拉起所述语音应用,包括:
    所述电子设备中的所述第一应用根据第二链接拉起所述语音应用,所述第二链接包括所述语音应用的链接和所述会话标识。
  5. 根据权利要求3所述的方法,其特征在于,所述电子设备的所述语音应用显示与所述会话标识对应的对话内容,包括:
    所述语音应用查询所述电子设备中是否存储有与所述会话标识对应的对话内容;
    若所述电子设备存储有与所述会话标识对应的对话内容,则所述语音应用将与所述会话标识对应的对话内容显示在所述会话界面中;
    若所述电子设备没有存储与所述会话标识对应的对话内容,则所述语音应用从第一服务器获取与所述会话标识对应的对话内容,并将与所述会话标识对应的对话内容显示在所述会话界面中。
  6. 根据权利要求2所述的方法,其特征在于,所述第一请求消息中还包括设备标识,所述设备标识用于确认所述电子设备是否登录所述第一应用。
  7. 根据权利要求6所述的方法,其特征在于,若所述电子设备没有登录所述第一应用,则所述第一链接包括所述第一应用中登录页面的链接。
  8. 根据权利要求7所述的方法,其特征在于,所述电子设备根据所述第一链接从所述语音应用的所述会话界面跳转至第一应用的界面,包括:
    所述电子设备的语音应用根据所述第一链接拉起所述第一应用,并显示所述第一应用的登录页面。
  9. 根据权利要求8所述的方法,其特征在于,在所述电子设备显示所述第一应用的登录页面之后,还包括:
    所述电子设备接收用户在所述登录页面中输入的登录操作,所述登录操作用于授权所述电子设备登录所述第一应用的用户账号;
    其中,所述电子设备根据与所述第一对话内容对应的会话标识跳转回所述会话界面,包括:
    若接收到与所述登录操作对应的登录成功消息,则所述电子设备根据所述会话标识从所述第一应用的界面跳转回所述会话界面。
  10. 根据权利要求8或9所述的方法,其特征在于,在所述电子设备根据所述会话标识跳转回所述会话界面之后,还包括:
    所述电子设备在所述会话界面中显示与所述第一对话内容对应的第一服务内容,所述第一服务内容与登录所述第一应用后用户的账户信息关联。
  11. 根据权利要求7所述的方法,其特征在于,若所述电子设备没有登录所述第一应用,则所述方法还包括:
    所述电子设备获取与所述第一对话内容对应的第二服务内容,所述第二服务内容为没有登录所述第一应用时与所述第一对话内容对应的服务内容;
    在所述电子设备根据所述第一链接从所述语音应用的所述会话界面跳转至第一应用的界面之前,还包括:
    所述电子设备在所述会话界面中显示所述第二服务内容和所述第一应用中登录页面的链接。
  12. 根据权利要求11所述的方法,其特征在于,所述电子设备根据所述第一链接从所述语音应用的所述会话界面跳转至第一应用的界面,包括:
    若检测到用户在所述会话界面中选择所述登录页面的链接,则所述电子设备根据所述登录页面的链接从所述会话界面跳转至所述第一应用的登录页面;或者,
    若用户输入的第二语音用于指示登录所述第一应用,则所述电子设备根据所述登录页面的链接从所述会话界面跳转至所述第一应用的登录页面。
  13. 根据权利要求1-12中任一项所述的方法,其特征在于,所述电子设备根据所述第一链接从所述语音应用的所述会话界面跳转至第一应用的界面之后,还包括:
    所述电子设备启动预设的定时器;
    当所述定时器超时时,所述电子设备从所述第一应用的界面跳转回所述会话界面。
  14. 根据权利要求1-13中任一项所述的方法,其特征在于,在所述电子设备根据所述第一链接从所述语音应用的所述会话界面跳转至显示第一应用的界面之后,还包括:
    所述电子设备将所述语音应用切换至后台运行。
  15. 一种语音交互方法,其特征在于,包括:
    第一服务器接收电子设备发送的第一请求消息,所述第一请求消息中包括第一对话 内容;
    响应于所述第一请求消息,所述第一服务器获取所述第一对话内容的会话标识和第一链接;
    所述第一服务器向所述电子设备发送第一响应消息,所述第一响应消息包括所述会话标识和所述第一链接,所述会话标识用于所述电子设备从所述第一链接对应的第一应用的界面跳转回显示所述第一对话内容的语音应用的界面。
  16. 根据权利要求15所述的方法,其特征在于,所述第一服务器获取所述第一对话内容的会话标识和第一链接,包括:
    所述第一服务器为所述第一对话内容分配会话标识;
    所述第一服务器从所述第一应用的第二服务器中获取与所述第一对话内容对应的第一链接。
  17. 根据权利要求16所述的方法,其特征在于,所述第一服务器为所述第一对话内容分配会话标识,包括:
    所述第一服务器识别所述第一对话内容的语义;
    所述第一服务器根据所述第一对话内容的语义为所述第一对话内容分配会话标识。
  18. 根据权利要求16所述的方法,其特征在于,所述第一服务器从所述第一应用的第二服务器中获取与所述第一对话内容对应的第一链接,包括:
    所述第一服务器向所述第二服务器发送第一消息,所述第一消息中包括所述会话标识和所述第一对话内容的语义,以使得第二服务器根据所述第一对话内容的语义确定所述第一链接,并建立所述第一链接与所述会话标识之间的对应关系;
    所述第一服务器接收所述第二服务器发送的第二消息,所述第二消息包括所述会话标识以及所述第一链接。
  19. 根据权利要求18所述的方法,其特征在于,所述第一请求消息中包括所述电子设备的设备标识;
    其中,所述第一消息中包括所述设备标识,以使得所述第二服务器根据所述设备标识确定所述电子设备是否登录所述第一应用;若所述电子设备没有登录所述第一应用,则所述第一链接为所述第一应用中登录页面的链接。
  20. 根据权利要求19所述的方法,其特征在于,在所述第一服务器向所述电子设备发送第一响应消息之后,还包括:
    所述第一服务器接收所述电子设备发送的会话恢复消息,所述会话恢复消息中包括所述第一对话内容的会话标识和所述设备标识;
    响应于所述会话恢复消息,所述第一服务器根据所述设备标识在所述第二服务器中查询所述电子设备是否登录所述第一应用;
    若所述电子设备登录所述第一应用,则所述第一服务器从第二服务器中获取与所述第一对话内容对应的第一服务内容,所述第一服务内容与登录所述第一应用后用户的账户信息关联;
    所述第一服务器将所述第一服务内容发送至所述电子设备。
  21. 根据权利要求19所述的方法,其特征在于,所述第一响应消息还包括第二服务内容,所述第二服务内容为没有登录所述第一应用时与所述第一对话内容对应的服务内 容。
  22. 根据权利要求15-21中任一项所述的方法,其特征在于,所述第一响应消息中还包括所述第一链接的超时时间。
  23. 一种语音交互方法,其特征在于,包括:
    电子设备显示语音应用的会话界面,所述会话界面用于显示用户与所述语音应用之间的对话内容;
    所述电子设备检测用户输入的第一语音,并将所述第一语音转换为第一对话内容;
    所述电子设备向第一服务器发送第一请求消息,所述第一请求消息中包括所述第一对话内容;
    响应于所述第一请求消息,所述第一服务器获取所述第一对话内容的会话标识和第一链接;
    所述第一服务器向所述电子设备发送第一响应消息,所述第一响应消息包括所述会话标识和所述第一链接;
    所述电子设备根据所述第一链接从所述会话界面跳转至第一应用的界面;
    在显示所述第一应用的界面之后,所述电子设备根据所述会话标识跳转回所述会话界面。
  24. 根据权利要求23所述的方法,其特征在于,所述第一服务器获取所述第一链接,包括:
    所述第一服务器根据所述第一对话内容从第二服务器获取所述第一链接,所述第二服务器为所述第一应用对应的服务器。
  25. 根据权利要求24所述的方法,其特征在于,所述第一请求消息中还包括所述电子设备的设备标识;所述方法还包括:
    所述第一服务器向所述第二服务器发送所述设备标识;
    所述第二服务器根据所述设备标识确定所述电子设备是否登录所述第一应用;若所述电子设备没有登录所述第一应用,则所述第一链接包括所述第一应用中登录页面的链接。
  26. 根据权利要求25所述的方法,其特征在于,在所述电子设备根据所述第一链接从所述会话界面跳转至第一应用的界面之后,还包括:
    响应于用户在所述第一应用的界面输入的登录操作,所述电子设备向所述第一服务器发送登录请求,所述登录请求中包括所述设备标识;
    响应于所述登录请求,所述第一服务器请求所述第二服务器将所述设备标识标记为登录状态;
    所述第一服务器向所述电子设备发送登录成功消息;
    其中,所述电子设备根据所述会话标识跳转回所述会话界面,包括:
    响应于所述登录成功消息,所述电子设备根据所述会话标识从所述第一应用的界面跳转回所述会话界面。
  27. 根据权利要求25或26所述的方法,其特征在于,在所述电子设备根据所述会话标识从所述第一应用的界面跳转回所述会话界面之后还包括:
    所述电子设备向所述第一服务器发送会话恢复消息,所述会话恢复消息中包括所述 会话标识和所述设备标识;
    若所述电子设备登录所述第一应用,响应于所述会话恢复消息,所述第一服务器从所述第二服务器中获取与所述第一对话内容对应的第一服务内容,并将所述第一服务内容发送给所述电子设备,所述第一服务内容与登录所述第一应用后用户的账户信息关联;
    所述电子设备在所述会话界面中显示所述第一服务内容。
  28. 根据权利要求25所述的方法,其特征在于,若所述电子设备没有登录所述第一应用,则所述方法还包括:
    所述第一服务器从所述第二服务器获取第二服务内容,所述第二服务内容为没有登录所述第一应用时与所述第一对话内容对应的服务内容;
    在所述电子设备根据所述第一链接从所述会话界面跳转至第一应用的界面之前,还包括:
    所述电子设备在所述会话界面中显示所述第二服务内容。
  29. 根据权利要求23-28中任一项所述的方法,其特征在于,所述电子设备根据所述会话标识跳转回所述会话界面,包括:
    所述电子设备中的所述第一应用根据第二链接拉起所述语音应用,所述第二链接包括所述语音应用的链接和所述会话标识;
    在所述语音应用被拉起后,所述电子设备中的所述语音应用显示与所述会话标识对应的对话内容。
  30. 一种电子设备,其特征在于,包括:
    触摸屏,所述触摸屏包括触摸传感器和显示屏;
    一个或多个处理器;
    存储器;
    通信模块;
    其中,所述存储器中存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行如权利要求1-14中任一项所述的语音交互方法。
  31. 一种服务器,其特征在于,包括:
    一个或多个处理器;
    存储器;
    通信模块;
    其中,所述存储器中存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令,当所述指令被所述服务器执行时,使得所述服务器执行如权利要求15-22中任一项所述的语音交互方法。
  32. 一种语音交互系统,其特征在于,包括如权利要求30所述的电子设备,以及如权利要求31所述的服务器。
  33. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-14中任一项所述的语音交互方法;或者,当所述指令在服务器上运行时,使得所述服务器执行如权利要求15-22中任一项所述的语音交互方法。
  34. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-14中任一项所述的语音交互方法;或者,当所述计算机程序产品在服务器上运行时,使得所述服务器执行如权利要求15-22中任一项所述的语音交互方法。
PCT/CN2022/083479 2021-03-30 2022-03-28 一种语音交互方法及电子设备 WO2022206704A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110343786.7A CN115150501A (zh) 2021-03-30 2021-03-30 一种语音交互方法及电子设备
CN202110343786.7 2021-03-30

Publications (1)

Publication Number Publication Date
WO2022206704A1 true WO2022206704A1 (zh) 2022-10-06

Family

ID=83404172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083479 WO2022206704A1 (zh) 2021-03-30 2022-03-28 一种语音交互方法及电子设备

Country Status (2)

Country Link
CN (1) CN115150501A (zh)
WO (1) WO2022206704A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116841661A (zh) * 2022-03-24 2023-10-03 华为技术有限公司 服务调用方法及电子设备
CN116049372B (zh) * 2023-02-07 2023-11-28 北京百度网讯科技有限公司 人机对话方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110892382A (zh) * 2017-05-16 2020-03-17 谷歌有限责任公司 用于经由自动化助理恢复对话会话的系统、方法和装置
CN111724775A (zh) * 2019-03-22 2020-09-29 华为技术有限公司 一种语音交互方法及电子设备
CN111819566A (zh) * 2018-03-05 2020-10-23 谷歌有限责任公司 通过自动助理在先前对话场境之间的转变
CN111989650A (zh) * 2018-03-12 2020-11-24 谷歌有限责任公司 用于管理未完成的自动助理动作的系统、方法和装置
US20220094650A1 (en) * 2020-09-21 2022-03-24 Google Llc Asynchronous resumption of dialog session(s) between a user and an automated assistant based on intermediate user interaction(s)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9767794B2 (en) * 2014-08-11 2017-09-19 Nuance Communications, Inc. Dialog flow management in hierarchical task dialogs
CN110334352B (zh) * 2019-07-08 2023-07-07 腾讯科技(深圳)有限公司 引导信息显示方法、装置、终端及存储介质
CN110336840B (zh) * 2019-08-12 2022-05-13 思必驰科技股份有限公司 用于语音对话平台的第三方账号注册方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110892382A (zh) * 2017-05-16 2020-03-17 谷歌有限责任公司 用于经由自动化助理恢复对话会话的系统、方法和装置
CN111819566A (zh) * 2018-03-05 2020-10-23 谷歌有限责任公司 通过自动助理在先前对话场境之间的转变
CN111989650A (zh) * 2018-03-12 2020-11-24 谷歌有限责任公司 用于管理未完成的自动助理动作的系统、方法和装置
CN111724775A (zh) * 2019-03-22 2020-09-29 华为技术有限公司 一种语音交互方法及电子设备
US20220094650A1 (en) * 2020-09-21 2022-03-24 Google Llc Asynchronous resumption of dialog session(s) between a user and an automated assistant based on intermediate user interaction(s)

Also Published As

Publication number Publication date
CN115150501A (zh) 2022-10-04

Similar Documents

Publication Publication Date Title
US10964300B2 (en) Audio signal processing method and apparatus, and storage medium thereof
US10809964B2 (en) Portable intelligent voice interactive control device, method and system
CN107895578B (zh) 语音交互方法和装置
US20210366483A1 (en) Method, interaction device, server, and system for speech recognition
CN107277754B (zh) 一种蓝牙连接的方法及蓝牙外围设备
WO2020249091A1 (zh) 一种语音交互方法、装置及系统
WO2022206704A1 (zh) 一种语音交互方法及电子设备
WO2017206916A1 (zh) 处理器中内核运行配置的确定方法以及相关产品
CN108694947B (zh) 语音控制方法、装置、存储介质及电子设备
CN108449507B (zh) 语音通话数据处理方法、装置、存储介质及移动终端
CN106375774B (zh) 一种直播间显示内容控制的方法、装置和系统
KR102390713B1 (ko) 전자 장치 및 전자 장치의 통화 서비스 제공 방법
US20220391060A1 (en) Methods for displaying and providing multimedia resources
CN112806067B (zh) 语音切换方法、电子设备及系统
US20190146749A1 (en) Sound effect processing method and mobile terminal
WO2017181615A1 (zh) 一种陌生来电处理方法、装置及移动终端
WO2017206832A1 (zh) 一种信息处理方法、服务器、终端及信息处理系统
CN110910887B (zh) 语音唤醒方法和装置
WO2017215661A1 (zh) 一种场景音效的控制方法、及电子设备
CN111107156A (zh) 用于主动发起对话的服务端处理方法及服务器、能够主动发起对话的语音交互系统
US20220172722A1 (en) Electronic device for processing user utterance and method for operating same
CN111479250A (zh) 一种文件共享方法、装置、系统及终端设备
WO2021103449A1 (zh) 交互方法、移动终端及可读存储介质
WO2017215615A1 (zh) 一种音效处理方法及移动终端
CN112988956A (zh) 自动生成对话的方法及装置、信息推荐效果检测方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22778886

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22778886

Country of ref document: EP

Kind code of ref document: A1