WO2020192456A1 - 一种语音交互方法及电子设备 - Google Patents

一种语音交互方法及电子设备 Download PDF

Info

Publication number
WO2020192456A1
WO2020192456A1 PCT/CN2020/079385 CN2020079385W WO2020192456A1 WO 2020192456 A1 WO2020192456 A1 WO 2020192456A1 CN 2020079385 W CN2020079385 W CN 2020079385W WO 2020192456 A1 WO2020192456 A1 WO 2020192456A1
Authority
WO
WIPO (PCT)
Prior art keywords
electronic device
candidate
server
slot information
user
Prior art date
Application number
PCT/CN2020/079385
Other languages
English (en)
French (fr)
Inventor
罗红枫
赵玉锡
张文
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US17/442,024 priority Critical patent/US20220172717A1/en
Priority to EP20777455.5A priority patent/EP3923274A4/en
Publication of WO2020192456A1 publication Critical patent/WO2020192456A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/451Execution arrangements for user interfaces
    • G06F9/453Help systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72484User interfaces specially adapted for cordless or mobile telephones wherein functions are triggered by incoming communication events
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/725Cordless telephones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M2250/00Details of telephonic subscriber devices
    • H04M2250/74Details of telephonic subscriber devices with voice recognition means

Definitions

  • This application relates to the field of terminal technology, and in particular to a voice interaction method and electronic equipment.
  • HCI Human-computer interaction
  • GUI Graphical User Interface
  • voice assistants such as Siri, Xiao Ai, Xiao E, etc.
  • voice assistants have been added to many electronic devices to help users complete the human-computer interaction process with the electronic device.
  • Siri As an example of a voice assistant, after the user wakes up Siri on the mobile phone, Siri and the user can communicate with each other through voice user interface (VUI).
  • VUI voice user interface
  • Siri can answer every query sent by the user.
  • the voice communication between the user and Siri is interrupted, for example, if the user suddenly receives an incoming call during a conversation with Siri, the mobile phone will automatically exit the voice conversation with Siri. If the user wants to continue voice communication with Siri, the user needs to wake up the voice assistant in the phone again. That is to say, after the user's dialogue with the voice assistant in the mobile phone is interrupted, the voice assistant cannot continue to continue the voice dialogue with the user, resulting in low work efficiency of the voice assistant in the mobile phone.
  • the present application provides a voice interaction method and electronic device.
  • the voice assistant can resume the conversation with the user, thereby improving the use efficiency and experience of the voice assistant in the electronic device.
  • this application provides a voice interaction method, including: in response to the user's operation to wake up the voice assistant application, the electronic device starts to run the voice assistant application in the foreground and displays a first interface, the first interface is used to display the user and the voice assistant The content of the conversation between applications; further, the user can input a voice to the electronic device.
  • the first voice input of the user includes the first slot information; if the first slot information The semantics of is not clear.
  • the first slot information is departure information, and the departure information has multiple related locations in the map.
  • the electronic device can display on the first interface
  • the first card the first card includes N (N ⁇ 1) candidate options of the first slot information.
  • N candidate options correspond to N query requests one-to-one.
  • Each of these N query requests contains Carrying the corresponding candidate option of the first slot information; in this way, when the user selects a certain candidate option (for example, the first candidate option) in the first card at any time, the electronic device can send the information to the first server
  • the first query request corresponding to the first candidate option enables the first server to update the first slot information in the first voice input according to the first candidate option carried in the first query request, thereby providing the user with information corresponding to the first voice input Service results.
  • the first server can support the voice assistant to continue the conversation with the user based on the corresponding query request, thereby improving the use efficiency and experience of the voice assistant in the electronic device.
  • the electronic device displays the first card in the first interface, it further includes: when the electronic device switches the voice assistant application from the foreground to the background, the electronic device may display the second card of other applications. Interface, at this time, the voice assistant application has not been killed (kill). Therefore, when the electronic device switches the voice assistant application to the foreground again, the electronic device can redisplay the first interface. The first card in the first interface The candidate options are still valid.
  • the operation of selecting the first candidate option from the above N candidate options may include: clicking a touch operation of the first candidate option in the first card; or inputting the first candidate option to the electronic device The second voice input.
  • the user can select the options in the card through touch operation, or select the options in the card through voice, and these two interaction methods can be used in multiple rounds of dialogue, improving the user’s interaction with the voice assistant. Interactive mode and experience.
  • the first voice input may also include second slot information, for example, the second slot information is destination information; if the semantics of the second slot information is not clear, then
  • the electronic device sends the first query request corresponding to the first candidate option to the first server, it further includes: the electronic device displays a second card in the first interface, and the second card includes M (M ⁇ 1) Candidate options, these M candidate options correspond to the M query requests of the second slot information one-to-one, and these M query requests all carry the first candidate option selected by the user, and these M queries Each query request in the request carries the corresponding candidate option of the second slot information; then, after the user selects the second candidate option from the M candidate options at any time, the electronic device can send the second candidate option to the first server The second query request corresponding to the two candidate options.
  • the query request corresponding to each candidate option in the second card carries the information of the first slot selected by the user in the last round of dialogue, even if the dialogue between the user and the voice assistant application is interrupted .
  • the user can continue to select the candidate for the second slot information in the second card, without having to input the selected first slot information into the voice assistant application, so that the user can Continue to complete the interrupted conversation with the voice assistant application at any time, thereby improving the work efficiency and use experience of the voice assistant application in the mobile phone.
  • the electronic device displays the second card in the first interface, it further includes: when the electronic device switches the voice assistant application from the foreground to the background, the electronic device displays the second interface; when After the electronic device switches the voice assistant application to the foreground again, the electronic device displays the first interface again, and the candidate options in the second card in the first interface are still valid.
  • the operation of selecting the second candidate option from the above M candidate options includes: clicking a touch operation of the second candidate option in the second card; or, inputting the second candidate option to the electronic device The third voice input.
  • the user can select the options in the card through touch operation, or select the options in the card through voice, and these two interaction methods can be used in multiple rounds of dialogue, improving the user’s interaction with the voice assistant. Interactive mode and experience.
  • the method further includes: the electronic device sends the first voice input to the first server, so that the first server extracts the first voice input from the first voice input.
  • Slot information obtain N candidate options of the first slot information, and establish a one-to-one correspondence between these N candidate options and N query requests; the electronic device receives the above-mentioned N candidate options sent by the first server and One-to-one correspondence between the N query requests.
  • the first server may also extract the second slot information from the first voice input, obtain M candidate options of the second slot information, and establish the M candidate options and M queries There is a one-to-one correspondence between requests; then, after the electronic device sends the first query request corresponding to the first candidate option to the first server, the method further includes: the electronic device receives the M candidate options sent by the first server and the One-to-one correspondence between M query requests.
  • the method further includes: the electronic device receives a user's fourth voice input, and the fourth voice input includes the screening conditions of the above N candidate options
  • the electronic device displays a third card in the first interface, and the third card includes one or more candidate options that meet the screening conditions, thereby helping the user to filter the options in the card.
  • the present application provides a voice interaction method, including: a first server receives a first voice input sent by an electronic device; the first server extracts first slot information in the first voice input; when the first slot information When the semantics of is not clear, the first server can obtain N (N ⁇ 1) candidate options of the first slot information, and establish a one-to-one correspondence between these N candidate options and N query requests.
  • Each query request in the query request carries the corresponding candidate options of the first slot information; the first server may send the above N candidate options to the electronic device, or the first server may combine the above N candidate options with N
  • the corresponding relationship of the two query requests is sent to the electronic device; after the user selects the first candidate option among the N candidate options in the electronic device, the electronic device can send the first candidate option to the first server, and the first server can further The first query corresponding to the first candidate option requests to update the first slot information in the first voice input; the first server may determine the service result corresponding to the first voice input based on the updated first slot information.
  • the above-mentioned first voice input further includes second slot information, and when the semantics of the second slot information is not clear, the first candidate option sent by the electronic device is received at the first server After that, it also includes: the first server obtains M (M ⁇ 1) candidate options of the second slot information, and establishes a one-to-one correspondence between the M candidate options and the M query requests.
  • Each of the M query requests carries the first candidate option selected by the user, and each of the M query requests carries the corresponding candidate option of the second slot information;
  • the first server sends the M candidate options to Electronic device; after the user selects one candidate option (for example, the second candidate option) among the M candidate options, the electronic device can send the second candidate option to the first server, because the second query request corresponding to the second candidate option It includes both the first slot information selected by the user (that is, the first candidate option) and the second slot information (that is, the second candidate option) selected by the user. Therefore, the first server can update the first slot according to the second query request.
  • the first server may determine the service result corresponding to the first voice input based on the updated first slot information and second slot information.
  • this application provides a voice interaction system, including: in response to the user's operation to wake up the voice assistant, the electronic device starts to run the voice assistant application in the foreground; the electronic device receives the user's first voice input; the electronic device transmits the first voice
  • the input is sent to the first server; the first server extracts the first slot information in the first voice input; when the semantics of the first slot information is not clear, the first server can obtain N (N ⁇ 1) Candidate options, and establish a one-to-one correspondence between these N candidate options and N query requests, each of these N query requests carries the corresponding candidate option of the first slot information;
  • the first server may send the above N candidate options to the electronic device; the electronic device displays the first card, and the first card contains the N candidate options of the first slot information; in response to the user selecting the first option from the N candidate options
  • the electronic device may send a first query request corresponding to the first candidate option to the first server; or, the electronic device may send the first candidate option to the
  • the first voice input also includes second slot information; when the semantics of the second slot information is not clear, the electronic device sends the first query request corresponding to the first candidate option
  • the first server obtains M (M ⁇ 1) candidate options of the second slot information, and establishes a one-to-one correspondence between these M candidate options and M query requests, The M query requests all carry the first candidate option selected by the user, and each of the M query requests carries the corresponding candidate option of the second slot information; the first server will The candidate options are sent to the electronic device; the electronic device displays the second card, and the second card contains M candidate options of the second slot information.
  • the method further includes: in response to the user's operation of selecting the second candidate option from the foregoing M candidate options, the electronic device sets the second candidate option corresponding to the second candidate option.
  • the query request is sent to the first server; or, the electronic device sends the second candidate option to the first server, so that the first server determines the second query request corresponding to the second candidate option.
  • the above-mentioned voice interaction system further includes a second server configured to send N candidate options of the first slot information and/or M of the second slot information to the first server Candidate options.
  • this application provides an electronic device, including: a touch screen, a communication module, one or more processors, one or more memories, one or more microphones, and one or more computer programs; wherein the processor and The touch screen, the communication module, the microphone, and the memory are all coupled.
  • the one or more computer programs mentioned above are stored in the memory.
  • the processor executes the one or more computer programs stored in the memory to make the electronic device execute The voice interaction method described in any one of the above.
  • the present application provides a computer storage medium including computer instructions, which when the computer instructions run on an electronic device, cause the electronic device to execute the voice interaction method described in any one of the first aspect.
  • the present application provides a computer program product, which when the computer program product runs on an electronic device, causes the electronic device to execute the voice interaction method described in any one of the first aspect.
  • the electronic equipment described in the fourth aspect, the computer storage medium described in the fifth aspect, and the computer program product described in the sixth aspect provided above are all used to execute the corresponding methods provided above.
  • the beneficial effects that can be achieved please refer to the beneficial effects in the corresponding method provided above, which will not be repeated here.
  • FIG. 1 is a first structural diagram of an electronic device according to an embodiment of the application
  • FIG. 2 is a schematic diagram of the architecture of an operating system in an electronic device provided by an embodiment of the application;
  • FIG. 3 is a schematic structural diagram of a voice interaction system provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram 1 of a scene of a voice interaction method provided by an embodiment of this application.
  • FIG. 5 is an interaction schematic diagram of a voice interaction method provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of a second scenario of a voice interaction method provided by an embodiment of this application.
  • FIG. 7 is a schematic diagram three of a scene of a voice interaction method provided by an embodiment of this application.
  • FIG. 8 is a fourth schematic diagram of a scene of a voice interaction method provided by an embodiment of this application.
  • FIG. 9 is a schematic diagram five of a scene of a voice interaction method provided by an embodiment of this application.
  • FIG. 10 is a sixth schematic diagram of a voice interaction method provided by an embodiment of this application.
  • FIG. 11 is a schematic diagram 7 of a voice interaction method provided by an embodiment of this application.
  • FIG. 12 is a schematic diagram eight of a scene of a voice interaction method provided by an embodiment of this application.
  • FIG. 13 is a schematic diagram 9 of a voice interaction method provided by an embodiment of this application.
  • FIG. 14 is a schematic diagram ten of a scene of a voice interaction method provided by an embodiment of this application.
  • FIG. 15 is a second structural diagram of an electronic device according to an embodiment of the application.
  • FIG. 16 is a third structural diagram of an electronic device provided by an embodiment of this application.
  • the voice interaction method provided by the embodiments of this application can be applied to mobile phones, tablet computers, notebook computers, ultra-mobile personal computers (UMPC), handheld computers, netbooks, and personal digital assistants (personal digital assistants).
  • UMPC ultra-mobile personal computers
  • PDA digital assistant
  • wearable electronic devices virtual reality devices, and other electronic devices with voice assistant functions
  • other electronic devices with voice assistant functions do not make any restrictions on this.
  • FIG. 1 shows a schematic structural diagram of an electronic device 100.
  • the electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2.
  • Mobile communication module 150 wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, buttons 190, motor 191, indicator 192, camera 193, display screen 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM Subscriber identification module
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the electronic device 100.
  • the electronic device 100 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 to store instructions and data.
  • the memory in the processor 110 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a two-way synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
  • the processor 110 may include multiple sets of I2C buses.
  • the processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc. through different I2C bus interfaces.
  • the processor 110 may couple the touch sensor 180K through an I2C interface, so that the processor 110 and the touch sensor 180K communicate through an I2C bus interface to implement the touch function of the electronic device 100.
  • the I2S interface can be used for audio communication.
  • the processor 110 may include multiple sets of I2S buses.
  • the processor 110 may be coupled with the audio module 170 through an I2S bus to realize communication between the processor 110 and the audio module 170.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through an I2S interface, so as to realize the function of answering calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.
  • the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is generally used to connect the processor 110 and the wireless communication module 160.
  • the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to implement the Bluetooth function.
  • the audio module 170 may transmit audio signals to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices.
  • the MIPI interface includes camera serial interface (camera serial interface, CSI), display serial interface (display serial interface, DSI), etc.
  • the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100.
  • the processor 110 and the display screen 194 communicate through a DSI interface to realize the display function of the electronic device 100.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely illustrative and does not constitute a structural limitation of the electronic device 100.
  • the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive the charging input of the wired charger through the USB interface 130.
  • the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, it can also supply power to the electronic device through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 can receive input from the battery 142 and/or the charging management module 140, and supply power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can be used to monitor performance parameters such as battery capacity, battery cycle times, battery charging voltage, battery discharging voltage, and battery health status (such as leakage, impedance). In some other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100.
  • the mobile communication module 150 may include one or more filters, switches, power amplifiers, low noise amplifiers (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering, amplifying and transmitting the received electromagnetic waves to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves for radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110.
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.), or displays an image or video through the display screen 194.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), and global navigation satellites.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating one or more communication processing modules.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110.
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
  • the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technologies may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the electronic device 100 implements a display function through a GPU, a display screen 194, and an application processor.
  • the GPU is a microprocessor for image processing, connected to the display 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • the display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active-matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.
  • the electronic device 100 can implement a shooting function through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.
  • the ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transfers the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193.
  • the camera 193 is used to capture still images or videos.
  • the mobile phone 100 may include 1 or N cameras, and N is a positive integer greater than 1.
  • the camera 193 may be a front camera or a rear camera.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the electronic device 100 selects the frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in a variety of encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • the NPU can realize applications such as intelligent cognition of the electronic device 100, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 121 may be used to store one or more computer programs, and the one or more computer programs include instructions.
  • the processor 110 can execute the above-mentioned instructions stored in the internal memory 121 to enable the electronic device 100 to execute the method for intelligently recommending contacts provided in some embodiments of the present application, as well as various functional applications and data processing.
  • the internal memory 121 may include a storage program area and a storage data area. Among them, the storage program area can store the operating system; the storage program area can also store one or more application programs (such as a gallery, contacts, etc.) and so on.
  • the data storage area can store data (such as photos, contacts, etc.) created during the use of the electronic device 101.
  • the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, universal flash storage (UFS), etc.
  • the processor 110 executes the instructions stored in the internal memory 121 and/or the instructions stored in the memory provided in the processor to cause the electronic device 100 to execute the smart device provided in the embodiments of the present application. Recommended number method, as well as various functional applications and data processing.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the earphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 170 can also be used to encode and decode audio signals.
  • the audio module 170 may be provided in the processor 110, or part of the functional modules of the audio module 170 may be provided in the processor 110.
  • the speaker 170A also called a “speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the electronic device 100 answers a call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can approach the microphone 170C through the mouth to make a sound, and input the sound signal to the microphone 170C.
  • the electronic device 100 may be provided with one or more microphones 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 170D is used to connect wired earphones.
  • the earphone interface 170D may be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association
  • the sensor 180 may include a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc.
  • a pressure sensor a gyroscope sensor
  • an air pressure sensor a magnetic sensor
  • an acceleration sensor a distance sensor
  • a proximity light sensor a fingerprint sensor
  • a temperature sensor a touch sensor
  • an ambient light sensor a bone conduction sensor
  • the electronic device 100 provided in the embodiment of the present application may further include one or more devices such as the button 190, the motor 191, the indicator 192, and the SIM card interface 195, which is not limited in the embodiment of the present application.
  • the software system of the above electronic device 100 may adopt a layered architecture, an event-driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.
  • the embodiment of the present application takes a layered Android system as an example to illustrate the software structure of the electronic device 100.
  • FIG. 2 is a block diagram of the software structure of the electronic device 100 according to an embodiment of the present application.
  • the layered architecture divides the software into several layers, and each layer has a clear role and division of labor. Communication between layers through software interface.
  • the Android system is divided into four layers, from top to bottom, the application layer, the application framework layer, the Android runtime and system library, and the kernel layer.
  • the application layer can include a series of application packages.
  • the application package may include APPs (applications) such as call, memo, browser, contact, camera, gallery, calendar, map, Bluetooth, music, video, short message, etc.
  • APPs applications
  • the application layer may also include a voice assistant APP.
  • the user may call the voice assistant APP Siri, Xiao E, or Xiao Ai.
  • the voice assistant APP After the voice assistant APP is started, it can collect the user's voice input and convert the voice input into a corresponding voice task. Furthermore, the voice APP can call the interface of the related application to complete the voice task, so that the user can control the electronic device through voice.
  • the application framework layer provides application programming interfaces (application programming interface, API) and programming frameworks for applications in the application layer.
  • the application framework layer includes some predefined functions.
  • the application framework layer can include a window manager, a content provider, a view system, a phone manager, a resource manager, and a notification manager.
  • the window manager is used to manage window programs.
  • the window manager can obtain the size of the display, determine whether there is a status bar, lock the screen, take a screenshot, etc.
  • the content provider is used to store and retrieve data and make these data accessible to applications.
  • the data may include video, image, audio, phone calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls that display text and controls that display pictures.
  • the view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface that includes a short message notification icon may include a view that displays text and a view that displays pictures.
  • the phone manager is used to provide the communication function of the electronic device 100. For example, the management of the call status (including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, etc.
  • the notification manager enables the application to display notification information in the status bar, which can be used to convey notification-type messages, and it can disappear automatically after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, and so on.
  • the notification manager can also be a notification that appears in the status bar at the top of the system in the form of a chart or scroll bar text, such as a notification of an application running in the background, or a notification that appears on the screen in the form of a dialog window. For example, text messages are prompted in the status bar, prompt sounds, electronic devices vibrate, and indicator lights flash.
  • the application framework layer also includes a VUI (voice user interface, voice user interface) manager.
  • VUI voice user interface, voice user interface
  • the VUI manager can monitor the running status of the voice assistant APP, and can also serve as a bridge between the voice assistant APP and other apps, and transfer the voice tasks obtained by the voice assistant APP to the relevant APP for execution.
  • the electronic device 100 may use a microphone to detect the user's voice input. If it is detected that the user inputs the wake-up voice of "Hello, Little E", the VUI manager can start the voice assistant APP in the application layer. At this time, as shown in (a) of FIG. 3, the electronic device 100 may display a dialogue interface 301 of the voice assistant APP. The electronic device 100 can display the content of the dialogue between the user and the voice assistant APP in the dialogue interface 301.
  • the voice assistant APP can continue to use the microphone to detect the user's voice input. Taking the voice input 302 this time as an example of "navigating to the Big Wild Goose Pagoda", the voice assistant APP can display the text information corresponding to the voice input 302 in the dialogue interface 301. In addition, the voice assistant APP can send the voice input 302 to the server 200, and the server 200 recognizes and responds to the voice input 302 this time.
  • the server 200 may include a voice recognition module, a voice understanding module, and a dialogue management module.
  • the voice recognition module may first convert the voice input 302 into corresponding text information.
  • the speech understanding module in the server 200 may use a natural language understanding (NLU) algorithm to extract the user intent and slot information from the text information.
  • NLU natural language understanding
  • the dialog management module can request the corresponding service content from the server of the relevant third-party application according to the extracted user intention and slot information.
  • the dialogue management model can request a navigation service whose destination is the Big Wild Goose Pagoda from the server of Baidu Maps APP.
  • the server of Baidu Maps APP can send the navigation route whose destination is the Big Wild Goose Pagoda to the server 200, and the server 200 can send the navigation route to the electronic device 100.
  • the electronic device 100 The voice assistant APP in can display the above-mentioned navigation route in the dialogue interface 301 in the form of a card 303 or the like, so that the voice assistant APP completes the response to the voice input 302 this time.
  • the voice assistant APP acts as a foreground application and presents corresponding visual output to the user through the display screen 194. If the electronic device 100 detects that another event interrupts the conversation, for example, an incoming call event, the user opens another application, etc., the electronic device 100 can switch the voice assistant APP that was originally running in the foreground to the background to continue running, and interrupt it in the foreground. New application for this conversation. The voice assistant APP switching to the background will not provide the user with visual output related to the voice assistant APP, and the user cannot interact with the voice assistant APP.
  • the electronic device 100 can switch the voice assistant APP to the foreground again, and continue to display the aforementioned dialogue interface 301 and the historical dialogue content in the dialogue interface 301 .
  • the electronic device 100 can continue to display the dialogue interface 301, and the user can continue to operate the card 303 in the dialogue interface 301 Option to continue the previous round of voice input 302 for the next round of dialogue, without having to re-enter the voice input of "Navigate to the Big Wild Goose Pagoda", thereby improving the use efficiency and experience of the voice assistant APP in the electronic device 100.
  • the system library can include multiple functional modules. For example: surface manager (surface manager), media library (Media Libraries), three-dimensional graphics processing library (for example: OpenGL ES), 2D graphics engine (for example: SGL), etc.
  • the surface manager is used to manage the display subsystem and provides a combination of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support multiple audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to realize 3D graphics drawing, image rendering, synthesis, and layer processing.
  • the 2D graphics engine is a drawing engine for 2D drawing.
  • Android Runtime includes core libraries and virtual machines. Android runtime is responsible for the scheduling and management of the Android system.
  • the core library consists of two parts: one part is the function functions that the java language needs to call, and the other part is the core library of Android.
  • the application layer and the application framework layer run in a virtual machine.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object life cycle management, stack management, thread management, security and exception management, and garbage collection.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer includes at least a display driver, a camera driver, an audio driver, a sensor driver, etc., which are not limited in the embodiment of the present application.
  • Each voice input of the user corresponds to an intent of the user.
  • the intent is a collection of multiple sentences, such as "I want to watch a movie” and “I want to watch an action movie shot by Andy Lau in 2001”. Can belong to the same video playback intent.
  • Slot information refers to the key information used to express intent in the user's voice input, and the slot information directly determines whether the electronic device (or server) can match the correct intent.
  • a slot corresponds to keywords of one type of attribute, and the information in this slot can be filled with keywords of the same type.
  • the query corresponding to the intention of playing a song is "I want to listen to ⁇ song ⁇ of ⁇ singer ⁇ ".
  • ⁇ singer ⁇ is the singer's slot
  • ⁇ song ⁇ is the song's slot.
  • the electronic device can extract the information in the ⁇ singer ⁇ slot from the voice input: Faye Wong, ⁇ song ⁇ The information in this slot is: red beans. In this way, the electronic device (or server) can recognize that the user's intention of this voice input is to play Faye Wong's song Red Bean based on the two slot information.
  • a voice interaction method provided by an embodiment of the present application will be specifically introduced with reference to the accompanying drawings.
  • a mobile phone is used as an example of the above-mentioned electronic device 100.
  • FIG. 5 is a schematic flowchart of a voice interaction method provided by an embodiment of this application. As shown in Figure 5, the voice interaction method may include:
  • the mobile phone when the user wants to talk to the voice assistant in the mobile phone, he can trigger the mobile phone to start running the voice assistant APP in the foreground by inputting a wake-up voice including a wake-up word. For example, after the mobile phone detects that the user enters the wake-up voice of "Hello, little E", it can open the voice assistant APP in the foreground and display the dialogue interface of the voice assistant APP. As shown in (a) in FIG. 6, the mobile phone can display the dialogue interface 601 of the voice assistant APP in full screen, and the dialogue interface 601 can display the dialogue content between the user and the voice assistant "Little E" in real time.
  • the mobile phone can display the dialogue interface of the voice assistant APP in the form of a floating window, as shown in (b) in FIG. 6, the mobile phone can display the dialogue content between the user and the voice assistant "Little E" in the floating window 602 in real time.
  • the user can also use a preset gesture or button to wake up the mobile phone to run the voice assistant APP in the foreground.
  • a preset gesture or button to wake up the mobile phone to run the voice assistant APP in the foreground.
  • the dialogue interface 601 is provided with a voice collection button 603. If it is detected that the user clicks the voice collection button 603, the voice assistant APP can call the microphone in the mobile phone to collect the user's voice input (ie, the first voice input). For example, the first voice input 604 input by the user to the mobile phone is "I want to take a taxi from People's Park to the front door". Alternatively, after the mobile phone displays the dialogue interface 601 of the voice assistant APP, it can automatically turn on the microphone to collect the user's first voice input, which is not limited in this embodiment of the application.
  • the mobile phone sends a first voice input to the first server, so that the first server extracts user intention and slot information from the first voice input, and the first voice input includes the first slot and the second slot.
  • the first voice input 604 can be sent to the first server for voice recognition and Understand, the user intention and slot information in the first voice input 604 are extracted.
  • the first server may use a voice recognition algorithm to convert the first voice input 604 into corresponding text information, that is, "I want to take a taxi from People's Park to the front door". Furthermore, the first server may use a preset NLU algorithm to extract user intention and slot information from the text information of the first voice input 604.
  • the first voice input input by the user includes multiple slot information.
  • the first voice input 604 includes two slots, one is the slot of the departure place when the taxi is taken (ie the first slot), and the other is the slot of the destination when the taxi is taken (ie the second slot).
  • the first server can extract the slot information in the first slot (that is, the first slot information) from "I want to take a taxi from the People's Park to the front door" as: People's Park, the slot in the second slot
  • the information (that is, the second slot information) is: front door.
  • the first server may extract the user intention corresponding to the first voice input 604 from "I want to take a taxi from People's Park to the front door” as: taxi.
  • the first server may save the content of each conversation between the user and the voice assistant APP, and generate a record of the conversation between the user and the voice assistant APP.
  • the first server may set the size of the conversation record to be saved to a certain size. Then, when the conversation record between the user and the voice assistant APP exceeds the preset size, the mobile phone may delete the oldest conversation content.
  • the first server may be set to save the conversation record between the user and the voice assistant APP for a certain period of time. If the first server does not receive a new voice input within the preset time, the first server may delete this conversation record.
  • S503 The first server requests the second server to respond to the first voice input according to the user intention and the slot information.
  • the first server extracts from the first voice input 604 that the user's intention is to take a taxi, the first slot information is People’s Park, and the second slot information is behind the front door.
  • a third-party APP corresponding to the user's intention for example, the Didi Dache APP.
  • the first server may send a first service request to the server (ie, the second server) of the Didi Dache APP.
  • the first service request includes the user's intention, the first slot information and the second slot extracted by the first server. Bit information.
  • the second server After the second server receives the first service request sent by the first server, the second server can determine that the user needs to use the taxi service according to the user's intention, and the second server can determine the departure place according to the first slot information and the second slot information And the specific address of the destination.
  • the second server finds multiple addresses associated with the aforementioned first slot information (ie, People's Park), it means that the departure place entered by the user in the first voice input 604 is not accurate.
  • the second server may send multiple addresses associated with the first slot information (that is, People's Park) found as candidate options to the first server.
  • the candidate options include: the detailed address of the North Gate of People’s Park, the detailed address of the East Gate of People’s Park, the detailed address of the West Gate of People’s Park, and the detailed address of the South Gate of People’s Park.
  • the first server can establish the first server after receiving the N candidate options of the departure information (that is, the first slot information) sent by the server of the Didi Dache APP
  • the corresponding relationship between each of the N candidate options and the corresponding query request, and each query request contains the corresponding candidate option.
  • each query request includes a taxi-hailing sentence template corresponding to the candidate option.
  • the taxi-hailing sentence template is text content of a fixed sentence pattern associated with the user's intention of the first voice input.
  • the taxi-hailing sentence template associated with the taxi-hailing intention is "Get a taxi from ⁇ first slot information ⁇ to ⁇ second slot information ⁇ ". Since the above N candidate options are candidate options of the first slot information, each query request corresponding to these N candidate options includes a corresponding taxi sentence template, and the first slot in the taxi sentence template The bit information is the corresponding candidate option.
  • the first candidate item of the first slot information is the detailed address of the north gate of People's Park, and the first candidate item corresponds to the first query request.
  • the first query request may include the first taxi-hailing sentence template, and the first taxi-hailing sentence template is: "Take a taxi from ⁇ People's Park North Gate ⁇ to ⁇ Qianmen ⁇ ".
  • the first taxi-hailing sentence template is changed from ⁇ People's Park ⁇ to ⁇ People's Park North Gate ⁇ .
  • the second candidate item of the first slot information is the detailed address of the south gate of People's Park, and the second candidate item corresponds to the second query request.
  • the second query request may include a second taxi-hailing sentence template, which is: "Take a taxi from ⁇ People’s Park South Gate ⁇ to ⁇ Qianmen ⁇ ".
  • the first slot information is changed from ⁇ People's Park ⁇ to ⁇ People's Park South Gate ⁇ .
  • the third candidate of the first slot information is the detailed address of the west gate of People's Park, and the third candidate corresponds to the third query request.
  • the third query request may include a third taxi-hailing sentence template.
  • the third taxi-hailing sentence template is: "Take a taxi from ⁇ People's Park West Gate ⁇ to ⁇ Qianmen ⁇ ".
  • the first slot information is changed from ⁇ People's Park ⁇ to ⁇ People's Park West Gate ⁇ .
  • the fourth candidate of the first slot information is the detailed address of the east gate of People's Park, and the fourth candidate corresponds to the fourth query request.
  • the fourth query request may include the fourth taxi-hailing sentence template, which is: "Take a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen ⁇ ".
  • the fourth taxi-hailing sentence template the first slot information is changed from ⁇ People's Park ⁇ to ⁇ People's Park East Gate ⁇ .
  • the first server may also update the first slot information in the first voice input to the corresponding candidate option and then carry it in the query request.
  • the first voice input 604 is "I want to take a taxi from People’s Park to the front door", and when the first candidate is the North Gate of People’s Park, the first server can update the first voice input 604 to "I want to take a taxi from People’s Park North Take a taxi and go to the front door", and carry the updated first voice input 604 in the first query request corresponding to the first candidate option.
  • the first server can update the first voice input 604 to "I want to take a taxi from the South Gate of People’s Park to the front gate”, and carry the updated first voice input 604 with The second query request corresponding to the second candidate option.
  • the third candidate is the West Gate of People’s Park
  • the first server can update the first voice input 604 to “I want to take a taxi from the West Gate of People’s Park to the front door”, and carry the updated first voice input 604 with the third The third query request corresponding to the candidate option.
  • the first server can update the first voice input 604 to "I want to take a taxi from the East Gate of People’s Park to the front door", and carry the updated first voice input 604 in and
  • the fourth candidate option corresponds to the fourth query request.
  • the first server may send the aforementioned four candidate options of the first slot information to the mobile phone.
  • the first server may send the above four candidate options of the first slot information and the corresponding query request to the mobile phone. It is convenient for subsequent users to choose accurate departure information to complete the taxi service.
  • the server of the Didi Taxi APP can also send relevant information about the candidate options to the first server. For example, information such as the distance between each specific address and the user's current location, the user rating of each specific address, and opening hours. Then, the first server can send these information to the mobile phone.
  • the mobile phone displays a first card in the dialogue interface of the voice assistant APP, and the first card includes the above N candidate options.
  • the N candidate options can be displayed to the user in the dialogue interface 601 of the voice assistant APP in the form of a card or the like for the user to select.
  • the mobile phone after the mobile phone receives the above four candidate options of the first slot information sent by the first server, it can use JS (javascript) rendering to load the first slot in the dialog interface 601 being displayed in the foreground.
  • the first card 701 includes 4 specific departure information related to "Renmin Park” that the server of Didi Dache APP queried, namely the detailed address of the North Gate of People's Park and the detailed address of the East Gate of People's Park , The detailed address of the west gate of People’s Park and the detailed address of the south gate of People’s Park. These departure information are also candidates for the first slot information in the first voice input 604 described above.
  • the specific name of the first candidate option 702 is "East Gate of People's Park".
  • the mobile phone can also display the specific address of the East Gate of People's Park in the first candidate option 702: for example, No. 11, Daqing First Road.
  • the mobile phone can also display in the first candidate option 702 that the distance between the East Gate of People's Park and the user's current location is 560 meters for example.
  • the opening hours of the East Gate of People’s Park are 8:00-18:00.
  • the mobile phone can also establish the correspondence between the candidate option 702 and the query request in the first card 701, that is, the candidate Correspondence between option 702 and "Take a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen ⁇ ". Subsequently, if it is detected that the user clicks on the candidate option 702 in the first card 701, the mobile phone can send a corresponding query request to the first server to retrieve the slot information again.
  • the mobile phone In response to the user's operation of selecting the first candidate option in the first card, the mobile phone instructs the first server to update the first slot information, where the updated first slot information is the first candidate option.
  • the mobile phone may send the first candidate option 702 to the first server, that is, the East Gate of People's Park. Since the query request corresponding to the East Gate of People's Park is stored in the first server, the query request contains the fourth taxi sentence template: "Take a taxi from ⁇ People's Park East Gate ⁇ to ⁇ Front Gate ⁇ ". Therefore, the first server may use the NLU algorithm to re-extract the user's intention and slot information corresponding to the first voice input 604 from the taxi sentence template of "Take a taxi from ⁇ People's Park East Gate ⁇ to ⁇ Qianmen ⁇ ". The difference from step S502 is that the first slot information extracted by the first server this time is the first candidate option 702 selected by the user, that is, the East Gate of People's Park.
  • the mobile phone can send a query request corresponding to the first candidate option 702 (for example, the first query request) to the first server.
  • the first query request contains the fourth taxi sentence template: "Take a taxi from ⁇ People’s Park East Gate ⁇ Front door ⁇ ” sent to the first server.
  • the first server uses the NLU algorithm to retrieve the first candidate option 702 selected by the user for the first slot information from the first query request, that is, the East Gate of People's Park.
  • the mobile phone after the mobile phone displays the first card 701 in the dialogue interface 601, if the mobile phone collects the user's second voice input 901, the second voice input 901 can be "select the east gate of People’s Park ", the mobile phone can send the second voice input 901 to the first server.
  • the first server may perform voice recognition on the second voice input 901 in combination with the conversation record between the user and the voice assistant, and recognize that the user has selected the first candidate option 702 in the first card 701. Then, through the first query request corresponding to the first candidate option 702, the first server can extract the new first slot information as the East Gate of People’s Park from the fourth taxi-hailing sentence template in the first query request.
  • the user can use natural language to select candidate options in the first card 701 in the form of voice.
  • the second voice input 901 can be "I choose the East Gate of People’s Park", and when the first server detects that the second voice input 901 includes the candidate of "East Gate of People’s Park", it can recognize that the user selected the first The first candidate option 702 in the card 701.
  • the second voice input 901 can be "select the first location", and the first server can recognize the voice "select the first location” in the first card 701 in combination with the conversation record between the user and the voice assistant.
  • the input corresponding option is the first candidate option 702.
  • the second voice input 901 may be "East Gate”
  • the first server can recognize that "East Gate” refers to “East Gate of People's Park” in combination with the conversation record between the user and the voice assistant, and further, the first server It can be determined that the user has selected the first candidate option 702 in the first card 701.
  • the user can also filter the candidate options in the first card 701.
  • the user can input the third voice input 1001 of "What are the departure places within 500 meters" into the mobile phone.
  • the mobile phone can send the third voice input 1001 to the first server. Since the first server records the detailed information of each candidate option in the first card 701, the first server can recognize and understand the third voice input 1001 according to the distance between each candidate option and the user. , Filter out one or more candidate options within 500 meters from the user from the above 4 candidate options. For example, the first server may send the "West Gate of People's Park” and “North Gate of People's Park” in the screening to the mobile phone.
  • the mobile phone can display a card 1002 in response to the third voice input 1001 in the dialogue interface 601 described above.
  • the card 1002 includes candidate options selected by the first server for the user within 500 meters from the user. In this way, the user can continue to select the corresponding candidate option in the card 1002 as the first slot information in the first voice input 604.
  • S507 The first server requests the second server to respond to the first voice input according to the updated first slot information.
  • the first server can send a second service request to the server of the Didi Dache APP (ie the second server), and the second service request includes The user's intention (ie, taking a taxi), the updated first slot information (ie, the east gate of People's Park), and the second slot information (ie, the front door) extracted in step S506.
  • the second server After the server of the Didi Dache APP receives the second service request sent by the first server, similar to step S503, the second server needs to determine clear departure information (that is, the first slot) before providing the user with this taxi service. Information) and destination information (that is, the second slot information). The second server can determine the departure place of this taxi service according to the updated first slot information: East Gate of People's Park.
  • the second server finds multiple addresses associated with the foregoing second slot information (ie, the front door), it indicates that the destination entered by the user in the first voice input 604 is not accurate.
  • the second server may send multiple addresses associated with the second slot information (ie, the front door) found as candidate options to the first server.
  • the candidate options include: the detailed address of Qianmen subway station, the detailed address of Qianmen Street, and the detailed address of Qianmen Building.
  • the first server may establish each candidate option among the M candidate options and the corresponding query request
  • the corresponding relationship between each query request includes not only the first candidate option selected by the user for the first slot information in step S506, but also the corresponding candidate option of the second slot information.
  • the taxi-hailing sentence template is "Take a taxi from ⁇ first slot information ⁇ to ⁇ second slot information ⁇ ". Since the first slot information has been determined to be the East Gate of People’s Park, the taxi-hailing sentence template in the M query requests corresponding to the second slot information at this time is "Take a taxi from ⁇ People’s Park East Gate ⁇ ⁇ second Slot information ⁇ ". Wherein, ⁇ second slot information ⁇ can be filled with candidate options corresponding to the second slot information.
  • the first candidate item of the second slot information is the detailed address of the Qianmen subway station, and the first candidate item corresponds to the first query request.
  • the first query request may include the first taxi-hailing sentence template, which is: "Take a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen Subway Station ⁇ ".
  • the first slot information is ⁇ People's Park East Gate ⁇ determined in step S506, and the second slot information is ⁇ Qianmen Subway Station ⁇ .
  • the second candidate item of the second slot information is the detailed address of Qianmen Street, and the second candidate item corresponds to the second query request.
  • the second query request may include a second taxi-hailing sentence template, which is: "Take a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen Street ⁇ ".
  • the first slot information is ⁇ People's Park East Gate ⁇ determined in step S506, and the second slot information is ⁇ Qianmen Avenue ⁇ .
  • the third candidate of the second slot information is the detailed address of Qianmen Building, and the third candidate corresponds to the third query request.
  • the third query request may include a third taxi-hailing sentence template, which is: "Take a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen Building ⁇ ".
  • the first slot information is ⁇ People's Park East Gate ⁇ determined in step S506, and the second slot information is ⁇ Qianmen Building ⁇ .
  • the first server can also update the second slot information in the first voice input 604 to the corresponding candidate option and then carry it in the query request.
  • the first slot information in the first voice input 604 is the user
  • the east gate of the People's Park selected in step S506 is not limited in this embodiment of the application.
  • the first server may send the three candidate options of the second slot information to the mobile phone.
  • the first server may send the 3 candidate options of the second slot information and the corresponding query request to the mobile phone. It is convenient for subsequent users to choose accurate destination information to complete the taxi service.
  • the mobile phone displays a second card in the dialogue interface of the voice assistant APP, and the second card includes the above M candidate options.
  • the mobile phone After the mobile phone receives the M candidate options of the second slot information sent by the first server, similar to step S505, as shown in FIG. 11, the mobile phone can continue to display the second card 1101 in the dialogue interface 601 of the voice assistant APP.
  • the second card 1101 includes 3 specific destination information related to "Qianmen" that the Didi Dache APP server inquired, namely the detailed address of Qianmen subway station, the detailed address of Qianmen Avenue, and the detailed address of Qianmen Building. These destination information are also candidates for the second slot information in the first voice input 604 described above.
  • the mobile phone can establish the correspondence between the candidate option 1102 in the second card 1101 and the taxi-sentence template “Take a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen Building ⁇ ” relationship.
  • the mobile phone may also first display the second card 1101 for the user to select the destination information, and then display the first card 701 for the user to select the departure information.
  • the second server may also display the above-mentioned first card 701 and second card 1101 in the dialogue interface 601 at the same time, which is not limited in the embodiment of the present application.
  • the mobile phone After the mobile phone displays the second card 1101 in the dialogue interface 601 of the voice assistant APP, if a preset event that interrupts the dialogue between the user and the voice assistant APP is detected, the mobile phone will not end the process of the voice assistant APP (that is, kill voice assistant APP), but switch the voice assistant APP to the background to continue running.
  • the voice assistant APP that is, kill voice assistant APP
  • the aforementioned preset event may be actively triggered by the user.
  • the preset event may be an operation of the user clicking the return button or the home button, or the preset event may be an operation of the user opening a notification message, pulling up a menu, or pulling down a menu.
  • the foregoing preset event may be an event passively received by the mobile phone. For example, as shown in FIG. 12, after the mobile phone displays the second card 1101 in the dialogue interface 601 of the voice assistant APP, if an incoming call event is received, the mobile phone The incoming call interface 1201 of the call application can be displayed. At this time, the mobile phone can switch the voice assistant APP to run in the background.
  • the mobile phone detects that the user has ended the call after 5 minutes of the call. Then, the mobile phone can automatically switch the voice assistant APP that was originally running in the foreground back to the foreground. At this time, as shown in FIG. 13, the mobile phone can redisplay the dialogue interface 601 of the voice assistant APP, and the dialogue interface 601 also displays the dialogue content between the user and the voice assistant APP when the voice assistant APP switches to the background. For example, the dialogue interface 601 also displays the aforementioned second card 1101, and the second card 1101 includes M candidate options of the second slot information.
  • the user can also find various applications running in the background in the multitasking window of the mobile phone, and then switch the voice assistant APP running in the background to the foreground.
  • the voice assistant APP can also wake up the voice assistant APP again by waking up the voice or pressing a button to switch the voice assistant APP from the background to the foreground, and this embodiment of the application does not impose any restriction on this.
  • the mobile phone In response to the user's operation of selecting the second candidate option in the second card, the mobile phone instructs the first server to update the first slot information and the second slot information, where the updated first slot information is the first candidate Option, the updated second slot information is the second candidate option.
  • the user can select one of the multiple candidate options in the second card 1101 by clicking.
  • the user may also select one or more of the multiple candidate options in the second card 1101 by voice.
  • the mobile phone may send the second candidate option 1102 to the first server, that is, Qianmen Building.
  • the first server stores the taxi-hailing sentence template corresponding to the second candidate option 1102: "Take a taxi from ⁇ the East Gate of People’s Park ⁇ to ⁇ Qianmen Building ⁇ "
  • the first server can use the NLU algorithm from Park East Gate ⁇ Take a taxi to ⁇ Qianmen Mansion ⁇ "
  • This taxi sentence template re-extracts the first slot information as People’s Park East Gate and the second slot information as Qianmen Building.
  • the mobile phone can send a corresponding query request to the first server.
  • the query request includes a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen Building ⁇ " Sentence template.
  • the first server can extract the taxi sentence template from the query request: "Take a taxi from ⁇ People’s Park East Gate ⁇ to ⁇ Qianmen Building ⁇ "
  • the first slot information is the East Gate of People’s Park, and the second slot information is Qianmen Building.
  • the first server deletes the corresponding taxi-hailing sentence template or conversation record due to the timeout of this session between the user and the voice assistant APP
  • the first slot information previously selected by the user has been recorded in the mobile phone, and established The corresponding relationship between the candidate options of the second slot information and the taxi-hailing sentence template. Therefore, when the mobile phone re-runs the voice assistant APP in the foreground, if the user selects an option in the second card 1101, the first server The first slot information and the second slot information in the first voice input 604 can still be extracted to realize the conversation connection between the user and the voice assistant APP.
  • the user can also select candidate options in the second card 1101 by voice. For example, if the third voice input of the user input "Select Qianmen Building" is collected, the mobile phone may send the third voice input to the first server.
  • the first server may perform voice recognition on the third voice input in combination with the conversation record between the user and the voice assistant, and recognize that the user has selected the second candidate option 1102 in the second card 1101. Then, through the query request corresponding to the second candidate option 1102, the first server can extract the first slot information as the East Gate of People’s Park and the second slot information according to the taxi-hailing sentence template in the query request. It is the Qianmen Building.
  • the mobile phone (or the first server) has already recorded the first slot information previously selected by the user. Therefore, when the voice assistant APP is switched to the foreground again, the mobile phone will When the selected second slot information is sent to the first server, the first server can still determine the first slot information selected by the user before the voice assistant APP is switched to the background again, thereby realizing the dialogue between the user and the voice assistant APP Continue.
  • S513 The first server requests the second server to respond to the first voice input according to the updated first slot information and second slot information.
  • the first server can send the third service to the server of the Didi Dache APP (ie the second server) Request, the third service request contains the user's intention (i.e. taxi), the updated first slot information (i.e. People's Park East Gate) and the updated second slot information (i.e. Qianmen Building).
  • the third service request contains the user's intention (i.e. taxi), the updated first slot information (i.e. People's Park East Gate) and the updated second slot information (i.e. Qianmen Building).
  • the second server can determine to provide a taxi service to the user according to the user's intention in the third service request.
  • the departure information of is the first slot information (ie, the East Gate of People’s Park), and the destination information of the taxi service is the second slot information (ie, Qianmen Building). Since the East Gate of People’s Park and Qianmen Building are both place names with clear addresses, the server of the Didi Taxi APP can generate a taxi order in response to the first voice input 604. The departure point of the taxi order is the East Gate of People’s Park and the destination It is the Qianmen Building. Furthermore, the server of the Didi Dache APP may send the generated taxi order to the first server.
  • the mobile phone displays the response result in the dialogue interface of the voice assistant APP.
  • the first server can send the ride-hailing order to the mobile phone.
  • the mobile phone can display a third card 1401 in the dialogue interface 601 of the voice assistant APP, and the third card 1401 includes the taxi order sent by the first server.
  • the third card 1401 also includes a confirmation button 1402 for the ride-hailing order and a cancel button 1403 for the ride-hailing order.
  • the mobile phone can send an order cancellation instruction to the first server, and the first server can send an order to Didi after receiving the order cancellation instruction
  • the server of the taxi app sends a response message of the order cancellation, and the server of the Didi taxi app can cancel this taxi service.
  • the mobile phone can send an order confirmation instruction to the first server, and the first server can receive the order confirmation instruction. Send an order confirmation response message to the Didi Dache APP server, and then the Didi Dache APP server can start to provide users with this taxi service.
  • the mobile phone can automatically open the Didi Dache app at the front desk, and the user can see this taxi order in the Didi Dache app Related information. At this time, the mobile phone can switch the voice assistant APP to run in the background.
  • the mobile phone can help the user determine the relevant information in this ride-hailing order through multiple rounds of dialogue between the user and the voice assistant in the dialogue interface of the voice assistant APP, and the mobile phone does not need to jump to The interface of the Didi Taxi APP can help users determine the relevant information of this taxi service and improve the intelligent voice interaction experience.
  • the mobile phone or the server can record the slot information selected by the user each time in a voice task.
  • the user does not need to input the selected slot information into the voice assistant APP, so that the user can continue at any time Complete the interrupted conversation with the voice assistant APP, thereby improving the work efficiency and use experience of the voice assistant APP in the mobile phone.
  • an embodiment of the present application discloses an electronic device, which can be used to implement the methods described in the above method embodiments.
  • the electronic device may specifically include: a receiving unit 1501, a sending unit 1502, a display unit 1503, and a switching unit 1504.
  • the receiving unit 1501 is used to support the electronic device to perform the processes S501, S504, S508, and S514 in FIG. 5;
  • the sending unit 1502 is used to support the electronic device to perform the processes S502, S506, and S512 in FIG. 5;
  • the display unit 1503 is used to support
  • the electronic device executes the processes S505, S509 and S511, S515 in FIG. 5;
  • the switching unit 1504 is used to support the electronic device to execute the process S510 in FIG.
  • all relevant content of each step involved in the above method embodiment can be cited in the function description of the corresponding function module, and will not be repeated here.
  • an embodiment of the present application discloses an electronic device, including: a touch screen 1601, the touch screen 1601 includes a touch-sensitive surface 1606 and a display screen 1607; one or more processors 1602; a memory 1603; a communication module 1608 ; And one or more computer programs 1604.
  • the above devices may be connected through one or more communication buses 1605.
  • the aforementioned one or more computer programs 1604 are stored in the aforementioned memory 1603 and configured to be executed by the one or more processors 1602, and the one or more computer programs 1604 include instructions, and the aforementioned instructions can be used to execute the aforementioned Each step in the embodiment should be implemented.
  • the foregoing processor 1602 may specifically be the processor 110 shown in FIG. 1
  • the foregoing memory 1603 may specifically be the internal memory 121 and/or the external memory 120 shown in FIG. 1
  • the foregoing display screen 1607 may specifically be FIG.
  • the communication module 1608 may specifically be the mobile communication module 150 and/or the wireless communication module 160 shown in FIG. 1
  • the touch-sensitive surface 1606 may specifically be the touch in the sensor module 180 shown in FIG. Sensors, the embodiments of this application do not impose any restrictions on this.
  • the functional units in the various embodiments of the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • a computer readable storage medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: flash memory, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephone Function (AREA)

Abstract

一种语音交互方法及电子设备,涉及终端技术领域,当用户与语音助手的对话中断后,语音助手可重新与用户接续本次对话内容,提高语音助手的使用效率和使用体验。该方法包括:响应于用户唤醒语音助手应用的操作,显示第一界面,第一界面用于显示用户与语音助手应用之间的对话内容;接收用户的第一语音输入,第一语音输入中包括第一槽位信息;在第一界面中显示第一卡片,第一卡片中包括第一槽位信息的N个候选选项,这N个候选选项与N个查询请求一一对应,N个查询请求中每个查询请求内携带有对应的第一槽位信息的候选选项;响应于用户从N个候选选项中选中第一候选选项的操作,向第一服务器发送与第一候选选项对应的第一查询请求。

Description

一种语音交互方法及电子设备
本申请要求在2019年3月22日提交中国国家知识产权局、申请号为201910224332.0、发明名称为“一种语音交互方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种语音交互方法及电子设备。
背景技术
人机交互(human-computer interaction,HCI)是指人与计算机之间使用某种对话语言,以一定的交互方式,为完成确定任务的人与计算机之间的信息交换过程。目前,手机等电子设备大量使用图形用户界面(Graphical User Interface,GUI)与用户实现人机交互过程。
随着语音识别技术的发展,许多电子设备中添加了语音助手(例如Siri、小爱同学、小E等)来帮助用户完成与电子设备的人机交互过程。以Siri作为语音助手举例,用户在手机中唤醒Siri后,Siri与用户可通过语音用户界面(voice user interface,VUI)进行语音交流。在进行语音交流时,Siri可回答用户发出的每一条询问(query)。
但是,当用户与Siri之间的语音交流被中断后,例如,用户与Siri对话时如果突然接到来电,手机会自动退出本次与Siri的语音对话。如果用户希望继续与Siri进行语音交流,用户需要重新在手机中唤醒语音助手。也就是说,用户与手机中语音助手的对话过程发生中断后,语音助手无法继续与用户接续本次语音对话,导致语音助手在手机中的工作效率不高。
发明内容
本申请提供一种语音交互方法及电子设备,当用户与语音助手的对话被中断后,语音助手可重新与用户接续本次对话内容,提高语音助手在电子设备中的使用效率和使用体验。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请提供一种语音交互方法,包括:响应于用户唤醒语音助手应用的操作,电子设备在前台开始运行语音助手应用并显示第一界面,第一界面用于显示用户与语音助手应用之间的对话内容;进而,用户可向电子设备输入语音,以电子设备接收到用户的第一语音输入为例,该第一语音输入中包括第一槽位信息;如果第一槽位信息的语义不明确,例如,第一槽位信息为出发地信息,该出发地信息在地图中的相关地点有多个,那么,响应于第一语音输入,电子设备可在上述第一界面中显示第一卡片,第一卡片中包括第一槽位信息的N(N≥1)个候选选项,这N个候选选项与N个查询请求一一对应,这N个查询请求中每个查询请求内携带有对应的第一槽位信息的候选选项;这样一来,用户在任意时刻选中第一卡片中的某一候选选项(例如第一候选选项)时,电子设备均可向第一服务器发送与第一候选选项对应的第一查询请求,使得第一服务器根据第一查询请求中携带的第一候选选项更新第一语音输入中的第一槽位信息,从而向用户提供与第一语音输入对应的服务结果。也就是说,即使用户与语音助手应用之间的对话被打断,当电子设备显示出上述第一卡片后,由于第一卡片中各个候选选项与各个查询请求之间设置有对应关系,因此用户选中第一卡片中的候选选项后,第一服务器可基于对应的查询请求支持语音助手与用户接续本次对话内容,提高语音助手在电子设备中的使用效率和使用体验。
在一种可能的实现方式中,在电子设备在第一界面中显示第一卡片之后,还包括:当电 子设备将语音助手应用从前台切换至后台运行后,电子设备可显示其他应用的第二界面,此时,语音助手应用并未被杀掉(kill),因此,当电子设备将语音助手应用重新切换至前台运行后,电子设备可重新显示第一界面,第一界面中第一卡片内的候选选项仍然有效。
在一种可能的实现方式中,从上述N个候选选项中选中第一候选选项的操作可以包括:点击第一卡片中第一候选选项的触摸操作;或者,向电子设备输入包含第一候选选项的第二语音输入。也就是说,用户可以通过触摸操作选择卡片中的选项,也可以通过语音的方式选择卡片中的选项,并且这两种交互方式可以在多轮对话中夹杂使用,提高用户与语音助手交互时的交互模式和使用体验。
在一种可能的实现方式中,上述第一语音输入中还可以包括第二槽位信息,例如,第二槽位信息为目的地信息;如果第二槽位信息的语义也不明确,则在电子设备向第一服务器发送与第一候选选项对应的第一查询请求之后,还包括:电子设备在第一界面中显示第二卡片,第二卡片中包括第二槽位信息的M(M≥1)个候选选项,这M个候选选项与第二槽位信息的M个查询请求一一对应,这M个查询请求中均携带有用户已选中的第一候选选项,并且,这M个查询请求中每个查询请求内携带有对应的第二槽位信息的候选选项;那么,用户在任意时刻从这M个候选选项中选中第二候选选项后,电子设备可向第一服务器发送与第二候选选项对应的第二查询请求。
可以看出,由于第二卡片中与每个候选选项对应的查询请求中均携带有上轮对话中用户选中的第一槽位信息,因此,即使用户与语音助手应用之间的对话被打断,当手机重新在前台运行语音助手应用时,用户也可以继续在第二卡片选择第二槽位信息的候选项,无需再向语音助手应用输入已选择的第一槽位信息,使得用户可以在任意时刻继续与语音助手应用完成被打断的对话,从而提高手机中语音助手应用的工作效率和使用体验。
在一种可能的实现方式中,在电子设备在第一界面中显示第二卡片之后,还包括:当电子设备将该语音助手应用从前台切换至后台运行后,电子设备显示第二界面;当电子设备将该语音助手应用重新切换至前台运行后,电子设备重新显示第一界面,第一界面中第二卡片内的候选选项仍然有效。
在一种可能的实现方式中,从上述M个候选选项中选中第二候选选项的操作包括:点击第二卡片中第二候选选项的触摸操作;或者,向电子设备输入包含第二候选选项的第三语音输入。也就是说,用户可以通过触摸操作选择卡片中的选项,也可以通过语音的方式选择卡片中的选项,并且这两种交互方式可以在多轮对话中夹杂使用,提高用户与语音助手交互时的交互模式和使用体验。
在一种可能的实现方式中,在电子设备接收用户的第一语音输入之后,还包括:电子设备向第一服务器发送第一语音输入,以使得第一服务器从第一语音输入中提取第一槽位信息,获取第一槽位信息的N个候选选项,并建立这N个候选选项与N个查询请求之间的一一对应关系;电子设备接收第一服务器发送的上述N个候选选项与该N个查询请求之间的一一对应关系。
在一种可能的实现方式中,第一服务器还可以从第一语音输入中提取第二槽位信息,获取第二槽位信息的M个候选选项,并建立这M个候选选项与M个查询请求之间的一一对应关系;那么,在电子设备向第一服务器发送与第一候选选项对应的第一查询请求之后,还包括:电子设备接收第一服务器发送的上述M个候选选项与该M个查询请求之间的一一对应关系。
在一种可能的实现方式中,在电子设备在第一界面中显示第一卡片之后,还包括:电子 设备接收用户的第四语音输入,第四语音输入中包括上述N个候选选项的筛选条件;响应于第四语音输入,电子设备在第一界面中显示第三卡片,第三卡片中包括满足该筛选条件的一个或多个候选选项,从而帮助用户筛选卡片中的选项。
第二方面,本申请提供一种语音交互方法,包括:第一服务器接收电子设备发送的第一语音输入;第一服务器提取第一语音输入中的第一槽位信息;当第一槽位信息的语义不明确时,第一服务器可获取第一槽位信息的N(N≥1)个候选选项,并建立这N个候选选项与N个查询请求之间的一一对应关系,这N个查询请求中每个查询请求内携带有对应的第一槽位信息的候选选项;第一服务器可将上述N个候选选项发送给电子设备,或者,第一服务器可将上述N个候选选项与N个查询请求的对应关系发送给电子设备;用户在电子设备中选中N个候选选项中的第一候选选项后,电子设备可向第一服务器发送该第一候选选项,进而,第一服务器可根据与第一候选选项对应的第一查询请求更新第一语音输入中的第一槽位信息;第一服务器基于更新后的第一槽位信息,可确定与第一语音输入对应的服务结果。由于第一服务器中记录有第一槽位信息的N个候选选项与N个查询请求之间的对应关系,因此,电子设备向第一服务器发送第一候选选项后,第一服务器与用户接续本次对话内容,提高语音助手在电子设备中的使用效率和使用体验。
在一种可能的实现方式中,上述第一语音输入中还包括第二槽位信息,当第二槽位信息的语义也不明确时,在第一服务器接收到电子设备发送的第一候选选项之后,还包括:第一服务器获取第二槽位信息的M(M≥1)个候选选项,并建立这M个候选选项与M个查询请求之间的一一对应关系,这M个查询请求中均携带有用户已选的第一候选选项,并且,这M个查询请求中每个查询请求内携带有对应的第二槽位信息的候选选项;第一服务器将这M个候选选项发送给电子设备;用户在这M个候选选项中选中一个候选选项(例如第二候选选项)后,电子设备可向第一服务器发送该第二候选选项,由于与第二候选选项对应的第二查询请求中既包括用户选中的第一槽位信息(即第一候选选项)又包括用户选中的第二槽位信息(即第二候选选项),因此,第一服务器根据第二查询请求可更新第一语音输入中的第一槽位信息和第二槽位信息。进而,第一服务器可基于更新后的第一槽位信息和第二槽位信息,确定与第一语音输入对应的服务结果。
第三方面,本申请提供一种语音交互系统,包括:响应于用户唤醒语音助手的操作,电子设备在前台开始运行语音助手应用;电子设备接收用户的第一语音输入;电子设备将第一语音输入发送至第一服务器;第一服务器提取第一语音输入中的第一槽位信息;当第一槽位信息的语义不明确时,第一服务器可获取第一槽位信息的N(N≥1)个候选选项,并建立这N个候选选项与N个查询请求之间的一一对应关系,这N个查询请求中每个查询请求内携带有对应的第一槽位信息的候选选项;第一服务器可将上述N个候选选项发送给电子设备;电子设备显示第一卡片,第一卡片中包含第一槽位信息的N个候选选项;响应于用户从N个候选选项中选择第一候选选项的操作,电子设备可将与第一候选选项对应的第一查询请求发送给第一服务器;或,电子设备将第一候选选项发送给第一服务器,以使得第一服务器确定与第一候选选项对应的第一查询请求;进而,第一服务器可根据第一查询请求更新第一语音输入中的第一槽位信息,从而确定与第一语音输入对应的服务结果。
在一种可能的实现方式中,第一语音输入中还包括第二槽位信息;当第二槽位信息的语义也不明确时,在电子设备将与第一候选选项对应的第一查询请求发送给第一服务器之后,还包括:第一服务器获取第二槽位信息的M(M≥1)个候选选项,并建立这M个候选选项与M个查询请求之间的一一对应关系,这M个查询请求中均携带有用户选中的第一候选选 项,并且,这M个查询请求中每个查询请求内携带有对应的第二槽位信息的候选选项;第一服务器将上述M个候选选项发送给电子设备;电子设备显示第二卡片,第二卡片中包含第二槽位信息的M个候选选项。
在一种可能的实现方式中,在电子设备显示第二卡片之后,还包括:响应于用户从上述M个候选选项中选择第二候选选项的操作,电子设备将与第二候选选项对应的第二查询请求发送给第一服务器;或,电子设备将第二候选选项发送给第一服务器,以使得第一服务器确定与第二候选选项对应的第二查询请求。
在一种可能的实现方式中,上述语音交互系统还包括第二服务器,第二服务器用于向第一服务器发送第一槽位信息的N个候选选项和/或第二槽位信息的M个候选选项。
第四方面,本申请提供一种电子设备,包括:触摸屏、通信模块、一个或多个处理器、一个或多个存储器、一个或多个麦克风以及一个或多个计算机程序;其中,处理器与触摸屏、通信模块、麦克风和存储器均耦合,上述一个或多个计算机程序被存储在存储器中,当电子设备运行时,该处理器执行该存储器存储的一个或多个计算机程序,以使电子设备执行上述任一项所述的语音交互方法。
第五方面,本申请提供一种计算机存储介质,包括计算机指令,当计算机指令在电子设备上运行时,使得电子设备执行如第一方面中任一项所述的语音交互方法。
第六方面,本申请提供一种计算机程序产品,当计算机程序产品在电子设备上运行时,使得电子设备执行如第一方面中任一项所述的语音交互方法。
可以理解地,上述提供的第四方面所述的电子设备、第五方面所述的计算机存储介质,以及第六方面所述的计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种电子设备的结构示意图一;
图2为本申请实施例提供的一种电子设备内操作系统的架构示意图;
图3为本申请实施例提供的一种语音交互系统的架构示意图;
图4为本申请实施例提供的一种语音交互方法的场景示意图一;
图5为本申请实施例提供的一种语音交互方法的交互示意图;
图6为本申请实施例提供的一种语音交互方法的场景示意图二;
图7为本申请实施例提供的一种语音交互方法的场景示意图三;
图8为本申请实施例提供的一种语音交互方法的场景示意图四;
图9为本申请实施例提供的一种语音交互方法的场景示意图五;
图10为本申请实施例提供的一种语音交互方法的场景示意图六;
图11为本申请实施例提供的一种语音交互方法的场景示意图七;
图12为本申请实施例提供的一种语音交互方法的场景示意图八;
图13为本申请实施例提供的一种语音交互方法的场景示意图九;
图14为本申请实施例提供的一种语音交互方法的场景示意图十;
图15为本申请实施例提供的一种电子设备的结构示意图二;
图16为本申请实施例提供的一种电子设备的结构示意图三。
具体实施方式
下面将结合附图对本实施例的实施方式进行详细描述。
示例性的,本申请实施例提供的一种语音交互方法可应用于手机、平板电脑、笔记本电 脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、手持计算机、上网本、个人数字助理(personal digital assistant,PDA)、可穿戴电子设备、虚拟现实设备等具有语音助手功能的电子设备,本申请实施例对此不做任何限制。
示例性的,图1示出了电子设备100的结构示意图。
电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备100的具体限定。在本申请另一些实施例中,电子设备100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器110可以包含多组I2C总线。处理器110可以通过不同的I2C总线接口分别耦合触摸传感器180K,充电器,闪光灯,摄像头193等。例如:处理器110可以通过I2C接口耦合触摸传感器180K,使处理器110与触摸传感器180K通过I2C总线接口通信,实现电子设备100的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器110可以包含多组I2S总线。处理器110可以通过I2S总线与音频模块170耦合,实现处理器110与音频模块170之间的通信。在一些实施例中,音频模块170可以通过I2S接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频 模块170与无线通信模块160可以通过PCM总线接口耦合。在一些实施例中,音频模块170也可以通过PCM接口向无线通信模块160传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器110与无线通信模块160。例如:处理器110通过UART接口与无线通信模块160中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块170可以通过UART接口向无线通信模块160传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器110与显示屏194,摄像头193等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器110和摄像头193通过CSI接口通信,实现电子设备100的拍摄功能。处理器110和显示屏194通过DSI接口通信,实现电子设备100的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器110与摄像头193,显示屏194,无线通信模块160,音频模块170,传感器模块180等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对电子设备100的结构限定。在本申请另一些实施例中,电子设备100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141可接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。
电源管理模块141可用于监测电池容量,电池循环次数,电池充电电压,电池放电电压,电池健康状态(例如漏电,阻抗)等性能参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括一个或多个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动 通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(Bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成一个或多个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
电子设备100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。
ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理, 转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。在一些实施例中,手机100可以包括1个或N个摄像头,N为大于1的正整数。摄像头193可以是前置摄像头也可以是后置摄像头。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。电子设备100可以支持一种或多种视频编解码器。这样,电子设备100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现电子设备100的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储一个或多个计算机程序,该一个或多个计算机程序包括指令。处理器110可以通过运行存储在内部存储器121的上述指令,从而使得电子设备100执行本申请一些实施例中所提供的联系人智能推荐的方法,以及各种功能应用和数据处理等。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统;该存储程序区还可以存储一个或多个应用程序(比如图库、联系人等)等。存储数据区可存储电子设备101使用过程中所创建的数据(比如照片,联系人等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如一个或多个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。在另一些实施例中,处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,来使得电子设备100执行本申请实施例中所提供的智能推荐号码的方法,以及各种功能应用和数据处理。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备100可以通过扬声器170A收听音乐,或收听免提通话。
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备100可以设置一个或多个麦克风170C。在另一些实施例中,电子设备100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备 100还可以设置三个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
传感器180可以包括压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等,本申请实施例对此不做任何限制。
当然,本申请实施例提供的电子设备100还可以包括按键190、马达191、指示器192以及SIM卡接口195等一项或多项器件,本申请实施例对此不做任何限制。
上述电子设备100的软件系统可以采用分层架构,事件驱动架构,微核架构,微服务架构,或云架构。本申请实施例以分层架构的Android系统为例,示例性说明电子设备100的软件结构。
图2是本申请实施例的电子设备100的软件结构框图。
分层架构将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层,安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。
如图2所示,应用程序包可以包括通话,备忘录,浏览器,联系人,相机,图库,日历,地图,蓝牙,音乐,视频,短信息等APP(应用,application)。
在本申请实施例中,应用程序层中还可以包括语音助手APP,例如,用户可称该语音助手APP为Siri、小E或小爱同学等。
语音助手APP开启后可采集用户的语音输入,并将该语音输入转换为对应的语音任务。进而,语音APP可调用相关应用的接口完成该语音任务,使用户通过语音这种方式实现对电子设备的控制。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层包括一些预先定义的函数。
如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。
内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒 等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
在本申请实施例中,应用程序框架层中还包括VUI(voice user interface,声音用户界面)管理器。VUI管理器可以监测语音助手APP的运行状态,也可作为语音助手APP与其他APP之间的桥梁,将语音助手APP得到的语音任务传递给相关的APP执行。
示例性的,电子设备100可使用麦克风检测用户的语音输入。如果检测到用户输入“你好,小E”的唤醒语音,则VUI管理器可启动应用程序层中的语音助手APP。此时,如图3中的(a)所示,电子设备100可显示语音助手APP的对话界面301。电子设备100可在对话界面301中显示用户与语音助手APP之间的对话内容。
仍如图3中的(a)所示,用户唤醒语音助手APP后,语音助手APP可继续使用麦克风检测用户的语音输入。以本次语音输入302为“导航去大雁塔”举例,语音助手APP可将与语音输入302对应的文本信息显示在对话界面301中。并且,语音助手APP可将语音输入302发送给服务器200,由服务器200对本次语音输入302进行识别和响应。
如图3中的(b)所示,服务器200可包括语音识别模块、语音理解模块以及对话管理模块。服务器200接收到本次语音输入302后,可先由语音识别模块将语音输入302转换为对应的文本信息。进而,服务器200中的语音理解模块可使用自然语言理解(natural language understanding,NLU)算法提取上述文本信息中的用户意图(intent)和槽位信息(slot)。例如,上述语音输入302中的用户意图为:导航,语音输入302中的槽位信息为:大雁塔。那么,对话管理模块可根据提取到的用户意图和槽位信息,向相关第三方应用的服务器请求对应的服务内容。例如,对话管理模可向百度地图APP的服务器请求目的地为大雁塔的导航服务。这样,百度地图APP的服务器可将目的地为大雁塔的导航路线发送给服务器200,服务器200可将该导航路线发送给电子设备100,仍如图3中的(a)所示,电子设备100中的语音助手APP可通过卡片303等形式将上述导航路线显示在对话界面301中,使得语音助手APP完成本次对语音输入302的响应。
在本申请实施例中,如图4所示,当用户与语音助手APP在对话界面301中进行对话时,语音助手APP作为前台应用通过显示屏194为用户呈现相应的视觉输出。如果电子设备100检测到其他事件打断了该对话,例如,来电事件、用户打开其他应用等,电子设备100可将原本运行在前台的语音助手APP切换至后台继续运行,并在前台运行打断该对话的新应用。语音助手APP切换至后台不会向用户提供与语音助手APP相关的视觉输出,用户无法与语音助手APP交互。
当用户再次唤醒语音助手APP或用户退出上述新应用时,仍如图4所示,电子设备100可重新将语音助手APP切换至前台,继续显示上述对话界面301以及对话界面301中的历史对话内容。仍以如图3中的(a)所示的对话界面301举例,语音助手APP从后台切换至前台后,电子设备100可继续显示对话界面301,用户可以继续在对话界面301中操作卡片303内的选项,接续上一轮语音输入302进行下一轮对话,无需重新输入“导航去大雁塔”的语音输入,从而提高语音助手APP在电子设备100中的使用效率和使用体验。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(Media Libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以 支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。2D图形引擎是2D绘图的绘图引擎。
Android Runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。
应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动等,本申请实施例对此不做任何限制。
为了方便清楚地理解下述各实施例,首先给出相关技术的简要介绍:
用户意图(intent):用户的每次语音输入都对应着用户的一个意图,意图是多句表达形式的集合,例如“我要看电影”和“我想看2001年刘德华拍摄的动作电影”都可以属于同一个视频播放的意图。
槽位信息:槽位信息是指用户的语音输入中用来表达意图的关键信息,槽位信息直接决定电子设备(或服务器)能否匹配到正确的意图。一个槽位对应着一类属性的关键词,该槽位中的信息可以由同一类型的关键词进行填充。例如,与歌曲播放这一意图对应的查询句式(query)为“我想听{singer}的{song}”。其中,{singer}为歌手的槽位,{song}为歌曲的槽位。那么,如果接收到用户发出“我想听王菲的红豆”这一语音输入,则电子设备(或服务器)可从该语音输入中提取到{singer}这一槽位中的信息为:王菲,{song}这一槽位中的信息为:红豆。这样,电子设备(或服务器)可根据这两个槽位信息识别出本次语音输入的用户意图为:播放王菲的歌曲红豆。
以下,将结合附图对本申请实施例提供的一种语音交互方法进行具体介绍。以下实施例中均以手机作为上述电子设备100举例说明。
图5为本申请实施例提供的一种语音交互方法的流程示意图。如图5所示,该语音交互方法可以包括:
S501、手机在前台运行语音助手APP时,接收用户的第一语音输入。
示例性的,用户希望与手机中的语音助手对话时,可通过输入包含唤醒词的唤醒语音触发手机在前台开始运行语音助手APP。例如,手机检测到用户输入“你好,小E”的唤醒语音后,可在前台打开语音助手APP并显示语音助手APP的对话界面。如图6中的(a)所示,手机可通过全屏的形式显示语音助手APP的对话界面601,对话界面601中可以实时显示用户与语音助手“小E”的对话内容。或者,手机可通过悬浮窗的形式显示语音助手APP的对话界面,如图6中的(b)所示,手机可在悬浮窗602中实时显示用户与语音助手“小E”的对话内容。
当然,用户除了使用唤醒语音打开手机中的语音助手APP外,还可以使用预设的手势或按键唤醒手机在前台运行语音助手APP,本申请实施例对此不做任何限制。
手机在前台开始运行语音助手APP后,以图6中的(a)所示的对话界面601举例,对话界面601中设置有语音采集按钮603。如果检测到用户点击该语音采集按钮603,则语音助手APP可调用手机中的麦克风采集用户的语音输入(即第一语音输入)。例如,用户向手机输入的第一语音输入604为“我要从人民公园打车去前门”。或者,手机显示语音助手APP的对话界面601后,可自动开启麦克风采集用户的第一语音输入,本申请实施例对此不做任 何限制。
S502、手机向第一服务器发送第一语音输入,以使得第一服务器从第一语音输入中提取用户意图和槽位信息,第一语音输入中包括第一槽位和第二槽位。
仍以上述第一语音输入604举例,手机接收到用户发出的“我要从人民公园打车去前门”的第一语音输入604后,可将第一语音输入604发送给第一服务器进行语音识别和理解,从而提取出第一语音输入604中的用户意图和槽位信息。
示例性的,第一服务器接收到手机发来的第一语音输入604后,可使用语音识别算法将第一语音输入604转为对应的文本信息,即“我要从人民公园打车去前门”。进而,第一服务器可使用预设的NLU算法从第一语音输入604的文本信息中提取用户意图和槽位信息。在本申请实施例中,用户输入的第一语音输入中包含多个槽位信息。例如,上述第一语音输入604中包含两个槽位,一个是打车时出发地的槽位(即第一槽位),另一个是打车时目的地的槽位(即第二槽位)。那么,第一服务器可从“我要从人民公园打车去前门”中提取到第一槽位中的槽位信息(即第一槽位信息)为:人民公园,第二槽位中的槽位信息(即第二槽位信息)为:前门。并且,第一服务器可从“我要从人民公园打车去前门”中提取到第一语音输入604对应的用户意图为:打车。
另外,第一服务器可以保存每次用户与语音助手APP之间的对话内容,生成用户与语音助手APP之间的对话记录。例如,第一服务器可设置保存的对话记录的大小是一定的,那么,当用户与语音助手APP之间的对话记录超过预设大小时,手机可删除时间最早的对话内容。又例如,第一服务器可设置在一定时间内保存用户与语音助手APP之间的对话记录。如果第一服务器在预设时间内没有接收到新的语音输入,则第一服务器可删除本次对话记录。
S503、第一服务器根据上述用户意图和槽位信息请求第二服务器响应上述第一语音输入。
仍以上述第一语音输入604举例,第一服务器从第一语音输入604中提取到用户意图为打车、第一槽位信息为人民公园、第二槽位信息为前门后,可确定与打车这一用户意图对应的第三方APP(例如滴滴打车APP)。进而,第一服务器可向滴滴打车APP的服务器(即第二服务器)发送第一服务请求,该第一服务请求中包含第一服务器提取到的用户意图、第一槽位信息以及第二槽位信息。
第二服务器接收到第一服务器发送的第一服务请求后,第二服务器根据用户意图可确定用户需要使用打车服务,进而第二服务器可根据第一槽位信息和第二槽位信息确定出发地和目的地的具体地址。
如果第二服务器查询到与上述第一槽位信息(即人民公园)关联的多个地址,说明用户在第一语音输入604中输入的出发地并不准确。为了继续向用户提供本次打车服务,第二服务器可将查找到的与第一槽位信息(即人民公园)关联的多个地址作为候选选项发送给第一服务器。例如,该候选选项包括:人民公园北门的详细地址、人民公园东门的详细地址、人民公园西门的详细地址以及人民公园南门的详细地址。
S504、第一服务器接收到第二服务器发送的第一槽位信息的N个候选选项后,向手机发送上述N个候选选项。
仍以滴滴打车APP的服务器为第二服务器举例,第一服务器接收到滴滴打车APP的服务器发送的出发地信息(即第一槽位信息)的N个候选选项后,第一服务器可建立这N个候选选项中每个候选选项与相应查询请求之间的对应关系,每个查询请求中均包含对应的候选选项。
示例性的,可预先设置每个查询请求中均包含与候选选项对应的打车句式模板。该打车 句式模板是与第一语音输入的用户意图相关联的固定句式的文本内容。例如,与打车意图关联的打车句式模板为“从{第一槽位信息}打车去{第二槽位信息}”。由于上述N个候选选项为第一槽位信息的候选选项,因此,与这N个候选选项对应的每个查询请求中均包括对应的打车句式模板,且打车句式模板中的第一槽位信息为相应的候选选项。
例如,第一槽位信息的第一候选选项为人民公园北门的详细地址,第一候选选项与第一查询请求对应。第一查询请求中可包含第一打车句式模板,该第一打车句式模板为:“从{人民公园北门}打车去{前门}”。在第一打车句式模板中,第一槽位信息从{人民公园}变为了{人民公园北门}。
例如,第一槽位信息的第二候选选项为人民公园南门的详细地址,第二候选选项与第二查询请求对应。第二查询请求中可包含第二打车句式模板,该第二打车句式模板为:“从{人民公园南门}打车去{前门}”。第二打车句式模板中,第一槽位信息从{人民公园}变为了{人民公园南门}。
例如,第一槽位信息的第三候选选项为人民公园西门的详细地址,第三候选选项与第三查询请求对应。第三查询请求中可包含第三打车句式模板,该第三打车句式模板为:“从{人民公园西门}打车去{前门}”。在第三打车句式模板,第一槽位信息从{人民公园}变为了{人民公园西门}。
例如,第一槽位信息的第四候选选项为人民公园东门的详细地址,第四候选选项与第四查询请求对应。第四查询请求中可包含第四打车句式模板,该第四打车句式模板为:“从{人民公园东门}打车去{前门}”。在第四打车句式模板中,第一槽位信息从{人民公园}变为了{人民公园东门}。
在一些实施例中,第一服务器也可将第一语音输入中的第一槽位信息更新为对应的候选选项后携带在查询请求中。例如,第一语音输入604为“我要从人民公园打车去前门”,当第一候选选项为人民公园北门时,第一服务器可将第一语音输入604更新为“我要从人民公园北门打车去前门”,并将更新后的第一语音输入604携带在与第一候选选项对应的第一查询请求中。当第二候选选项为人民公园南门时,第一服务器可将第一语音输入604更新为“我要从人民公园南门打车去前门”,并将更新后的第一语音输入604携带在与第二候选选项对应的第二查询请求中。当第三候选选项为人民公园西门时,第一服务器可将第一语音输入604更新为“我要从人民公园西门打车去前门”,并将更新后的第一语音输入604携带在与第三候选选项对应的第三查询请求中。当第四候选选项为人民公园东门时,第一服务器可将第一语音输入604更新为“我要从人民公园东门打车去前门”,并将更新后的第一语音输入604携带在与第四候选选项对应的第四查询请求中。
进而,第一服务器可将上述4个第一槽位信息的候选选项发送给手机。或者,第一服务器可将上述4个第一槽位信息的候选选项以及与对应的查询请求发送给手机。方便后续用户选择准确的出发地信息完成打车服务。
当然,除了上述人民公园的多个具体地址(即上述N个候选选项)外,滴滴打车APP的服务器还可以将候选选项的相关信息也发给第一服务器。例如,每个具体地址与用户当前位置之间的距离、每个具体地址的用户评分、开放时间等信息。那么,第一服务器可一并将这些信息发送给手机。
S505、手机在语音助手APP的对话界面中显示第一卡片,第一卡片中包括上述N个候选选项。
手机接收到第一服务器发来的第一槽位信息的N个候选选项后,可通过卡片等形式在语 音助手APP的对话界面601中向用户显示这N个候选选项供用户选择。
示例性的,如图7所示,手机接收到第一服务器发来的上述4个第一槽位信息的候选选项后,可通过JS(javascript)渲染在前台正在显示的对话界面601中加载第一卡片701,第一卡片701中包括滴滴打车APP的服务器查询到的与“人民公园”相关的4个具体的出发地信息,即人民公园北门的详细地址、人民公园东门的详细地址、人民公园西门的详细地址以及人民公园南门的详细地址。这些出发地信息也是上述第一语音输入604中第一槽位信息的候选选项。
仍如图7所示,以第一卡片701中的第一候选选项702举例,第一候选选项702的具体名称为“人民公园东门”。手机还可以在第一候选选项702中显示人民公园东门的具体地址:例如大庆一路11号。手机还可以在第一候选选项702中显示人民公园东门与用户当前位置之间举例为560米。人民公园东门的开放时间为8:00-18:00。
另外,如果第一服务器在发送“人民公园东门”这一候选选项702时,还向手机发送了与候选选项702对应的查询请求。该查询请求中的打车句式模板为:从{人民公园东门}打车去{前门},则手机还可以在第一卡片701中建立候选选项702与该查询请求之间的对应关系,即候选选项702与“从{人民公园东门}打车去{前门}”这一打车句式模板之间的对应关系。后续,如果检测到用户点击第一卡片701中的候选选项702,则手机可将对应的查询请求发送给第一服务器重新提取槽位信息。
S506、响应于用户在第一卡片中选中第一候选选项的操作,手机指示第一服务器更新第一槽位信息,其中,更新后的第一槽位信息为第一候选选项。
如图8中的(a)所示,手机在语音助手APP的对话界面601中显示出上述第一卡片701后,用户可通过点击的方式从第一卡片701中的多个候选选项中选择一个。或者,如图8中的(b)所示,手机在语音助手APP的对话界面601中显示出上述第一卡片701后,用户还可以通过语音的方式从第一卡片701中的多个候选选项中选择一个。
示例性的,如果检测到用户点击第一卡片701中的第一候选选项702,则手机可向第一服务器发送该第一候选选项702,即人民公园东门。由于第一服务器内存储有与人民公园东门对应的查询请求,该查询请求中包含第四打车句式模板:“从{人民公园东门}打车去{前门}”。因此,第一服务器可使用NLU算法从“从{人民公园东门}打车去{前门}”这一打车句式模板中重新提取与第一语音输入604对应的用户意图和槽位信息。与步骤S502不同的是,第一服务器本次提取到的第一槽位信息为用户选中的第一候选选项702,即人民公园东门。
或者,如果手机中已经建立了第一候选选项702与包含上述第四打车句式模板的查询请求之间的对应关系,那么,检测到用户点击第一卡片701中的第一候选选项702后,手机可向第一服务器发送与第一候选选项702对应的查询请求(例如第一查询请求),该第一查询请求中包含第四打车句式模板:“从{人民公园东门}打车去{前门}”发送给第一服务器。同样,第一服务器使用NLU算法可重新从第一查询请求中提取到第一槽位信息为用户选中的第一候选选项702,即人民公园东门。
又或者,如图9所示,手机在上述对话界面601中显示出上述第一卡片701后,如果手机采集到用户的第二语音输入901,第二语音输入901可以为“选择人民公园东门”,则手机可将第二语音输入901发送给第一服务器。第一服务器可结合用户与语音助手之间的对话记录对第二语音输入901进行语音识别,识别出用户选择了第一卡片701中的第一候选选项702。那么,第一服务器通过与第一候选选项702对应的第一查询请求,可从第一查询请求内的第四打车句式模板中提取到新的第一槽位信息为人民公园东门。
需要说明的是,用户可使用自然语言以语音的形式选择第一卡片701中的候选选项。例如,第二语音输入901可以为“我选择人民公园东门”,第一服务器检测到第二语音输入901中包括“人民公园东门”这一候选选项时,可识别出用户选择了第一卡片701中的第一候选选项702。又例如,第二语音输入901可以为“选择第1个地点”,第一服务器结合用户与语音助手之间的对话记录可识别出第一卡片701内与“选择第1个地点”这一语音输入对应的选项为第一候选选项702。又例如,第二语音输入901可以为“东门”,第一服务器结合用户与语音助手之间的对话记录可识别出“东门”指的是“人民公园东门”,进而,第一服务器可确定用户选择了第一卡片701中的第一候选选项702。
在另一些实施例中,手机在上述对话界面601中显示出上述第一卡片701后,用户还可以对第一卡片701中的候选选项进行筛选。如图10中的(a)所示,用户可以向手机输入“500米内的出发地有哪些”的第三语音输入1001。进而,手机可将第三语音输入1001发送给第一服务器。由于第一服务器中记录有第一卡片701内每个候选选项的详细信息,因此,第一服务器对第三语音输入1001进行语音识别和理解后,可根据每个候选选项与用户之间的距离,为用户从上述4个候选选项中筛选出距离用户500米内的一个或多个候选选项。例如,第一服务器可将筛选中的“人民公园西门”和“人民公园北门”发送给手机。此时,如图10中的(b)所示,手机可在上述对话界面601中显示响应第三语音输入1001的卡片1002。卡片1002中包括第一服务器为用户筛选出的距离用户500米内候选选项。这样,用户可继续在卡片1002中选择相应的候选选项作为第一语音输入604中的第一槽位信息。
S507、第一服务器根据更新后的第一槽位信息请求第二服务器响应上述第一语音输入。
以第一服务器将第一槽位信息更新为“人民公园东门”举例,第一服务器可向滴滴打车APP的服务器(即第二服务器)发送第二服务请求,该第二服务请求中包含步骤S506中提取到的用户意图(即打车)、更新后的第一槽位信息(即人民公园东门)以及第二槽位信息(即前门)。
滴滴打车APP的服务器接收到第一服务器发送的第二服务请求后,与步骤S503类似的,第二服务器向用户提供本次打车服务前,需要确定明确的出发地信息(即第一槽位信息)和目的地信息(即第二槽位信息)。第二服务器根据更新后的第一槽位信息可确定本次打车服务的出发地为:人民公园东门。
但是,如果第二服务器查询到与上述第二槽位信息(即前门)关联的多个地址,说明用户在第一语音输入604中输入的目的地并不准确。类似的,第二服务器可将查找到的与第二槽位信息(即前门)关联的多个地址作为候选选项发送给第一服务器。例如,该候选选项包括:前门地铁站的详细地址、前门大街的详细地址以及前门大厦的详细地址。
S508、第一服务器接收到第二服务器发送的第二槽位信息的M个候选选项后,向手机发送上述M个候选选项。
与步骤S504类似的,第一服务器接收到滴滴打车APP的服务器发送的第二槽位信息的M个候选选项后,第一服务器可建立这M个候选选项中每个候选选项与相应查询请求之间的对应关系,每个查询请求中除了包含用户在步骤S506中为第一槽位信息选中的第一候选选项外,还包括对应的第二槽位信息的候选选项。
仍以查询请求中包括打车句式模板举例,该打车句式模板为“从{第一槽位信息}打车去{第二槽位信息}”。由于第一槽位信息已经确定为人民公园东门,因此,此时与第二槽位信息对应的M个查询请求中的打车句式模板为“从{人民公园东门}打车去{第二槽位信息}”。其中,{第二槽位信息}内可填充对应的第二槽位信息的候选选项。
例如,第二槽位信息的第一候选选项为前门地铁站的详细地址,第一候选选项与第一查询请求对应。第一查询请求中可包含第一打车句式模板,该第一打车句式模板为:“从{人民公园东门}打车去{前门地铁站}”。此时,在第一打车句式模板中,第一槽位信息为步骤S506中确定的{人民公园东门},第二槽位信息为{前门地铁站}。
例如,第二槽位信息的第二候选选项为前门大街的详细地址,第二候选选项与第二查询请求对应。第二查询请求中可包含第二打车句式模板,该第二打车句式模板为:“从{人民公园东门}打车去{前门大街}”。此时,在第二打车句式模板中,第一槽位信息为步骤S506中确定的{人民公园东门},第二槽位信息为{前门大街}。
例如,第二槽位信息的第三候选选项为前门大厦的详细地址,第三候选选项与第三查询请求对应。第三查询请求中可包含第三打车句式模板,该第三打车句式模板为:“从{人民公园东门}打车去{前门大厦}”。此时,在第三打车句式模板中,第一槽位信息为步骤S506中确定的{人民公园东门},第二槽位信息为{前门大厦}。
当然,第一服务器也可将第一语音输入604中的第二槽位信息更新为对应的候选选项后携带在查询请求中,此时,第一语音输入604的第一槽位信息均为用户在步骤S506中选中的人民公园东门,本申请实施例对此不做任何限制。
进而,第一服务器可将第二槽位信息的3个候选选项发送给手机。或者,第一服务器可将上述第二槽位信息的3个候选选项以及与对应的查询请求发送给手机。方便后续用户选择准确的目的地信息完成打车服务。
S509、手机在语音助手APP的对话界面中显示第二卡片,第二卡片中包括上述M个候选选项。
手机接收到第一服务器发送的第二槽位信息的M个候选选项后,与步骤S505类似的,如图11所示,手机可继续在语音助手APP的对话界面601中显示第二卡片1101。第二卡片1101中包括滴滴打车APP的服务器查询到的与“前门”相关的3个具体的目的地信息,即前门地铁站的详细地址、前门大街的详细地址以及前门大厦的详细地址。这些目的地信息也是上述第一语音输入604中第二槽位信息的候选选项。
以第二卡片1101中“前门大厦”这一候选选项1102举例,如果第一服务器向手机发送了候选选项1102以及对应查询请求,由于该查询请求中包含“从{人民公园东门}打车去{前门大厦}”的打车句式模板,因此,手机可以在第二卡片1101中建立候选选项1102与“从{人民公园东门}打车去{前门大厦}”这一打车句式模板之间的对应关系。
当然,手机也可以先显示第二卡片1101供用户选择目的地信息,再显示上述第一卡片701供用户选择出发地信息。或者,第二服务器也可以在对话界面601中同时显示上述第一卡片701和第二卡片1101,本申请实施例对此不做任何限制。
S510、手机将语音助手APP切换至后台运行。
手机在语音助手APP的对话界面601中显示出上述第二卡片1101后,如果检测到打断用户与语音助手APP对话的预设事件,则手机不会结束语音助手APP的进程(即kill语音助手APP),而是将语音助手APP切换至后台继续运行。
其中,上述预设事件可以是用户主动触发的。例如,该预设事件可以是用户点击返回键或home键的操作,或者,该预设事件可以是用户打开通知消息、上拉菜单、下拉菜单等操作。或者,上述预设事件可以是手机被动接收到的事件,例如,如图12所示,手机在语音助手APP的对话界面601中显示出上述第二卡片1101后,如果接收到来电事件,则手机可显示通话应用的来电界面1201,此时,手机可将语音助手APP切换至后台运行。
S511、手机将语音助手APP从后台切换至前台运行后,重新显示上述对话界面,该对话界面中包括上述第二卡片。
仍以上述来电事件举例,用户通话5分钟后手机检测到用户结束了本次通话。那么,手机可自动将原本在前台运行的语音助手APP重新切换至前台。此时,如图13所示,手机可重新显示语音助手APP的对话界面601,对话界面601中还显示有语音助手APP切换至后台时用户与语音助手APP的对话内容。例如,该对话界面601中还显示有上述第二卡片1101,第二卡片1101中包括第二槽位信息的M个候选选项。
当然,用户也可以在手机的多任务窗口中查找到正在后台运行的各个应用,进而将正在后台运行的语音助手APP切换至前台运行。或者,语音助手APP切换至后台运行后,用户还可以通过唤醒语音或按键再次唤醒语音助手APP,将语音助手APP从后台切换至前台运行,本申请实施例对此不做任何限制。
S512、响应于用户在第二卡片中选中第二候选选项的操作,手机指示第一服务器更新第一槽位信息和第二槽位信息,其中,更新后的第一槽位信息为第一候选选项,更新后的第二槽位信息为第二候选选项。
仍如图13所示,用户可通过点击的方式从第二卡片1101中的多个候选选项中选择一个。或者,用户还可以通过语音的方式从第二卡片1101中的多个候选选项中选择一个或多个。
示例性的,如果检测到用户点击第二卡片1101中的第二候选选项1102,则手机可向第一服务器发送该第二候选选项1102,即前门大厦。由于第一服务器内存储有与第二候选选项1102对应的打车句式模板:“从{人民公园东门}打车去{前门大厦}”,因此,第一服务器使用NLU算法可从“从{人民公园东门}打车去{前门大厦}”这一打车句式模板中重新提取到第一槽位信息为人民公园东门,第二槽位信息为前门大厦。
或者,如果手机中已经建立了第二候选选项1102与包含“从{人民公园东门}打车去{前门大厦}”这一打车句式模板的查询请求之间的对应关系,那么,如果检测到用户点击第二卡片1101中的第二候选选项1102,则手机可向第一服务器发送对应的查询请求,该查询请求中包含“从{人民公园东门}打车去{前门大厦}”这一打车句式模板。这样,无论第一服务器中是否存储有该打车句式模板,第一服务器均可根据查询请求中携带的打车句式模板:“从{人民公园东门}打车去{前门大厦}”,重新提取到第一槽位信息为人民公园东门,第二槽位信息为前门大厦。
也就是说,即使第一服务器因用户与语音助手APP本次会话超时删除了对应的打车句式模板或对话记录,但由于手机内已经记录了之前用户选中的第一槽位信息,并建立了第二槽位信息的各个候选选项与打车句式模板之间的对应关系,因此,当手机重新在前台运行语音助手APP时,如果用户选中了第二卡片1101中某一选项,则第一服务器仍然可以提取到第一语音输入604中的第一槽位信息和第二槽位信息,实现用户与语音助手APP之间的会话接续。
当然,手机在上述对话界面601中显示出上述第二卡片1101后,用户也可以通过语音选择第二卡片1101中的候选选项。例如,如果采集到用户输入“选择前门大厦”的第三语音输入,则手机可将第三语音输入发送给第一服务器。第一服务器可结合用户与语音助手之间的对话记录对第三语音输入进行语音识别,识别出用户选择了第二卡片1101中的第二候选选项1102。那么,第一服务器通过与第二候选选项1102对应的查询请求,第一服务器可根据该查询请求中的打车句式模板重新提取到第一槽位信息为人民公园东门,第二槽位信息为前门大厦。
可以看出,用户在选择第二槽位信息时,手机(或第一服务器)已经记录了用户之前选择的第一槽位信息,因此,当语音助手APP重新被切换至前台后,手机将用户选择的第二槽位信息发送给第一服务器时,第一服务器仍然可以确定出语音助手APP重新被切换至后台前用户选择的第一槽位信息,从而实现用户与语音助手APP之间的对话接续。
S513、第一服务器根据更新后的第一槽位信息和第二槽位信息请求第二服务器响应上述第一语音输入。
以更新后的第一槽位信息为人民公园东门,更新后的第二槽位信息为前门大街为例,第一服务器可向滴滴打车APP的服务器(即第二服务器)发送第三服务请求,第三服务请求中包含用户意图(即打车)、更新后的第一槽位信息(即人民公园东门)以及更新后的第二槽位信息(即前门大厦)。
滴滴打车APP的服务器接收到第一服务器发送的第三服务请求后,与步骤S503和S507类似的,第二服务器根据第三服务请求中的用户意图可确定向用户提供打车服务,该打车服务的出发地信息为第一槽位信息(即人民公园东门),打车服务的目的地信息为第二槽位信息(即前门大厦)。由于人民公园东门和前门大厦均为地址明确的地名,因此,滴滴打车APP的服务器可生成响应第一语音输入604的打车订单,该打车订单中的出发地为人民公园东门,目的地为前门大厦。进而,滴滴打车APP的服务器可将生成的打车订单发送给第一服务器。
S514、第一服务器接收到第二服务器发送的上述第一语音输入的响应结果后,将该响应结果发送给手机。
S515、手机在语音助手APP的对话界面中显示该响应结果。
第一服务器接收到滴滴打车APP的服务器响应第一语音输入604的打车订单后,可将该打车订单发送给手机。进而,如图14所示,手机可在语音助手APP的对话界面601中显示第三卡片1401,第三卡片1401中包括第一服务器发来的打车订单。第三卡片1401中还包括该打车订单的确认按钮1402以及该打车订单的取消按钮1403。
如果检测到用户点击上述取消按钮1403,或者检测到用户输入“取消打车”的语音输入,则手机可向第一服务器发送取消订单的指示,第一服务器接收到取消订单的指示后可向滴滴打车APP的服务器发送订单取消的响应消息,滴滴打车APP的服务器可撤销本次打车服务。
相应的,如果检测到用户点击上述确认按钮1402,或者检测到用户输入“确认打车”的语音输入,则手机可向第一服务器发送确认订单的指示,第一服务器接收到确认订单的指示后可向滴滴打车APP的服务器发送订单确认的响应消息,进而,滴滴打车APP的服务器可以开始为用户提供本次打车服务。并且,检测到用户点击上述确认按钮1402,或者检测到用户输入“确认打车”的语音输入后,手机可自动在前台打开滴滴打车APP,用户在滴滴打车APP中可以看到本次打车订单的相关信息。此时,手机可将语音助手APP切换至后台运行。也就是说,在手机接收到打车订单之前,手机可在语音助手APP的对话界面中通过用户与语音助手之间的多轮对话帮助用户确定本次打车订单中的相关信息,手机无需跳转至滴滴打车APP的界面中便可帮助用户确定本次打车服务的相关信息,提高智能化的语音交互体验。
可以看出,在本申请实施例中,手机或服务器可记录用户在一次语音任务中每次选择的槽位信息。这样,即使用户与语音助手APP之间的对话被打断,当手机重新在前台运行语音助手APP时,用户也无需再向语音助手APP输入已选择的槽位信息,使得用户可以在任意时刻继续与语音助手APP完成被打断的对话,从而提高手机中语音助手APP的工作效率和使用体验。
如图15所示,本申请实施例公开了一种电子设备,该电子设备可用于实现以上各个方法 实施例中记载的方法。该电子设备具体可以包括:接收单元1501、发送单元1502、显示单元1503以及切换单元1504。其中,接收单元1501用于支持电子设备执行图5中的过程S501、S504、S508和S514;发送单元1502用于支持电子设备执行图5中的过程S502、S506和S512;显示单元1503用于支持电子设备执行图5中的过程S505、S509和S511、S515;切换单元1504用于支持电子设备执行图5中的过程S510。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
如图16所示,本申请实施例公开了一种电子设备,包括:触摸屏1601,所述触摸屏1601包括触敏表面1606和显示屏1607;一个或多个处理器1602;存储器1603;通信模块1608;以及一个或多个计算机程序1604。上述各器件可以通过一个或多个通信总线1605连接。其中,上述一个或多个计算机程序1604被存储在上述存储器1603中并被配置为被该一个或多个处理器1602执行,该一个或多个计算机程序1604包括指令,上述指令可以用于执行上述应实施例中的各个步骤。
示例性的,上述处理器1602具体可以为图1所示的处理器110,上述存储器1603具体可以为图1所示的内部存储器121和/或外部存储器120,上述显示屏1607具体可以为图1所示的显示屏194,上述通信模块1608具体可以为图1所示的移动通信模块150和/或无线通信模块160,上述触敏表面1606具体可以为图1所示的传感器模块180中的触摸传感器,本申请实施例对此不做任何限制。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种语音交互方法,其特征在于,包括:
    响应于用户唤醒语音助手应用的操作,电子设备显示第一界面,所述第一界面用于显示用户与所述语音助手应用之间的对话内容;
    所述电子设备接收用户的第一语音输入,所述第一语音输入中包括第一槽位信息;
    响应于所述第一语音输入,所述电子设备在所述第一界面中显示第一卡片,所述第一卡片中包括所述第一槽位信息的N个候选选项,所述N个候选选项与N个查询请求一一对应,所述N个查询请求中每个查询请求内携带有对应的所述第一槽位信息的候选选项,N≥1;
    响应于用户从所述N个候选选项中选中第一候选选项的操作,所述电子设备向第一服务器发送与所述第一候选选项对应的第一查询请求,以便向用户提供与所述第一语音输入对应的服务结果。
  2. 根据权利要求1所述的方法,其特征在于,在所述电子设备在所述第一界面中显示第一卡片之后,还包括:
    当所述电子设备将所述语音助手应用从前台切换至后台运行后,所述电子设备显示第二界面;
    当所述电子设备将所述语音助手应用重新切换至前台运行后,所述电子设备重新显示所述第一界面。
  3. 根据权利要求1或2所述的方法,其特征在于,从所述N个候选选项中选中第一候选选项的操作包括:点击所述第一卡片中所述第一候选选项的触摸操作;或者,向所述电子设备输入包含所述第一候选选项的第二语音输入。
  4. 根据权利要求1-3中任一项所述的方法,其特征在于,所述第一语音输入中还包括第二槽位信息;在所述电子设备向第一服务器发送与所述第一候选选项对应的第一查询请求之后,还包括:
    所述电子设备在所述第一界面中显示第二卡片,所述第二卡片中包括所述第二槽位信息的M个候选选项,所述M个候选选项与M个查询请求一一对应,所述M个查询请求中均携带有用户选中的所述第一候选选项,并且,所述M个查询请求中每个查询请求内携带有对应的所述第二槽位信息的候选选项,M≥1;
    响应于用户从所述M个候选选项中选中第二候选选项的操作,所述电子设备向所述第一服务器发送与所述第二候选选项对应的第二查询请求。
  5. 根据权利要求4所述的方法,其特征在于,在所述电子设备在所述第一界面中显示第二卡片之后,还包括:
    当所述电子设备将所述语音助手应用从前台切换至后台运行后,所述电子设备显示第二界面;
    当所述电子设备将所述语音助手应用重新切换至前台运行后,所述电子设备重新显示所述第一界面。
  6. 根据权利要求4或5所述的方法,其特征在于,从所述M个候选选项中选中第二候选选项的操作包括:点击所述第二卡片中所述第二候选选项的触摸操作;或者,向所述电子设备输入包含所述第二候选选项的第三语音输入。
  7. 根据权利要求1-6中任一项所述的方法,其特征在于,在所述电子设备接收用户的第一语音输入之后,还包括:
    所述电子设备向所述第一服务器发送所述第一语音输入,以使得所述第一服务器从所述 第一语音输入中提取所述第一槽位信息,获取所述第一槽位信息的N个候选选项,并建立所述N个候选选项与所述N个查询请求之间的一一对应关系;
    所述电子设备接收所述第一服务器发送的所述N个候选选项与所述N个查询请求之间的一一对应关系。
  8. 根据权利要求4-6中任一项所述的方法,其特征在于,在所述电子设备向第一服务器发送与所述第一候选选项对应的第一查询请求之后,还包括:
    所述电子设备接收所述第一服务器发送的所述M个候选选项与所述M个查询请求之间的一一对应关系。
  9. 根据权利要求1-8中任一项所述的方法,其特征在于,在所述电子设备在所述第一界面中显示第一卡片之后,还包括:
    所述电子设备接收用户的第四语音输入,所述第四语音输入中包括所述N个候选选项的筛选条件;
    响应于所述第四语音输入,所述电子设备在所述第一界面中显示第三卡片,所述第三卡片中包括满足所述筛选条件的一个或多个候选选项。
  10. 一种语音交互方法,其特征在于,包括:
    第一服务器接收电子设备发送的第一语音输入;
    所述第一服务器提取所述第一语音输入中的第一槽位信息;
    所述第一服务器获取所述第一槽位信息的N个候选选项,并建立所述N个候选选项与N个查询请求之间的一一对应关系,所述N个查询请求中每个查询请求内携带有对应的所述第一槽位信息的候选选项,N≥1;
    所述第一服务器将所述N个候选选项发送给所述电子设备;
    若接收到所述电子设备发送的第一候选选项,则所述第一服务器根据与所述第一候选选项对应的第一查询请求更新所述第一槽位信息,所述第一候选选项为所述N个候选选项中的一个;
    所述第一服务器基于更新后的第一槽位信息,确定与所述第一语音输入对应的服务结果。
  11. 根据权利要求10所述的方法,其特征在于,所述第一语音输入中还包括第二槽位信息;在所述第一服务器接收到所述电子设备发送的第一候选选项之后,还包括:
    所述第一服务器获取所述第二槽位信息的M个候选选项,并建立所述M个候选选项与M个查询请求之间的一一对应关系,所述M个查询请求中均携带有所述第一候选选项,并且,所述M个查询请求中每个查询请求内携带有对应的所述第二槽位信息的候选选项,M≥1;
    所述第一服务器将所述M个候选选项发送给所述电子设备;
    若接收到所述电子设备发送的第二候选选项,则所述第一服务器根据与所述第二候选选项对应的第二查询请求更新所述第二槽位信息,所述第二候选选项为所述M个候选选项中的一个;
    其中,所述第一服务器基于更新后的第一槽位信息,确定与所述第一语音输入对应的服务结果,包括:
    所述第一服务器基于更新后的第一槽位信息和第二槽位信息,确定与所述第一语音输入对应的服务结果。
  12. 一种语音交互系统,其特征在于,包括:
    响应于用户唤醒语音助手的操作,电子设备在前台开始运行语音助手应用;
    所述电子设备接收用户的第一语音输入;
    所述电子设备将所述第一语音输入发送至第一服务器;
    所述第一服务器提取所述第一语音输入中的第一槽位信息;
    所述第一服务器获取所述第一槽位信息的N个候选选项,并建立所述N个候选选项与N个查询请求之间的一一对应关系,所述N个查询请求中每个查询请求内携带有对应的所述第一槽位信息的候选选项,N≥1;
    所述第一服务器将所述N个候选选项发送给所述电子设备;
    所述电子设备显示第一卡片,所述第一卡片中包含所述第一槽位信息的N个候选选项;
    响应于用户从所述N个候选选项中选择第一候选选项的操作,所述电子设备将与所述第一候选选项对应的第一查询请求发送给第一服务器;或,所述电子设备将所述第一候选选项发送给第一服务器,以使得所述第一服务器确定与所述第一候选选项对应的第一查询请求;
    所述第一服务器根据所述第一查询请求确定与所述第一语音输入对应的服务结果。
  13. 根据权利要求12所述的系统,其特征在于,所述第一语音输入中还包括第二槽位信息;在所述电子设备将与所述第一候选选项对应的第一查询请求发送给第一服务器之后,还包括:
    所述第一服务器获取所述第二槽位信息的M个候选选项,并建立所述M个候选选项与M个查询请求之间的一一对应关系,所述M个查询请求中均携带有所述第一候选选项,并且,所述M个查询请求中每个查询请求内携带有对应的所述第二槽位信息的候选选项,M≥1;
    所述第一服务器将所述M个候选选项发送给所述电子设备;
    所述电子设备显示第二卡片,所述第二卡片中包含所述第二槽位信息的M个候选选项。
  14. 根据权利要求13所述的系统,其特征在于,在所述电子设备显示第二卡片之后,还包括:
    响应于用户从所述M个候选选项中选择第二候选选项的操作,所述电子设备将与所述第二候选选项对应的第二查询请求发送给第一服务器;或,所述电子设备将所述第二候选选项发送给第一服务器,以使得所述第一服务器确定与所述第二候选选项对应的第二查询请求。
  15. 根据权利要求13所述的系统,其特征在于,所述系统还包括第二服务器,所述第二服务器用于向所述第一服务器发送所述第一槽位信息的N个候选选项和/或所述第二槽位信息的M个候选选项。
  16. 一种电子设备,其特征在于,包括:
    触摸屏,所述触摸屏包括触敏表面和显示屏;
    通信模块;
    一个或多个处理器;
    一个或多个存储器;
    一个或多个麦克风;
    以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述一个或多个存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行如权利要求1-9中任一项所述的语音交互方法。
  17. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-9中任一项所述的语音交互方法。
  18. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-9中任一项所述的语音交互方法。
PCT/CN2020/079385 2019-03-22 2020-03-14 一种语音交互方法及电子设备 WO2020192456A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/442,024 US20220172717A1 (en) 2019-03-22 2020-03-14 Voice Interaction Method and Electronic Device
EP20777455.5A EP3923274A4 (en) 2019-03-22 2020-03-14 VOICE INTERACTION METHOD AND ELECTRONIC DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910224332.0A CN111724775B (zh) 2019-03-22 2019-03-22 一种语音交互方法及电子设备
CN201910224332.0 2019-03-22

Publications (1)

Publication Number Publication Date
WO2020192456A1 true WO2020192456A1 (zh) 2020-10-01

Family

ID=72563753

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/079385 WO2020192456A1 (zh) 2019-03-22 2020-03-14 一种语音交互方法及电子设备

Country Status (4)

Country Link
US (1) US20220172717A1 (zh)
EP (1) EP3923274A4 (zh)
CN (1) CN111724775B (zh)
WO (1) WO2020192456A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114327349A (zh) * 2021-12-13 2022-04-12 青岛海尔科技有限公司 智能卡片的确定方法及装置、存储介质、电子装置

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111833868A (zh) * 2020-06-30 2020-10-27 北京小米松果电子有限公司 语音助手控制方法、装置及计算机可读存储介质
CN112820285A (zh) * 2020-12-29 2021-05-18 北京搜狗科技发展有限公司 一种交互方法和耳机设备
CN113035191B (zh) * 2021-02-26 2023-11-10 光禹莱特数字科技(上海)有限公司 语音交互方法、装置、存储介质和计算机设备
CN115150501A (zh) * 2021-03-30 2022-10-04 华为技术有限公司 一种语音交互方法及电子设备
CN113297359B (zh) * 2021-04-23 2023-11-28 阿里巴巴新加坡控股有限公司 交互信息的方法以及装置
CN113470638B (zh) * 2021-05-28 2022-08-26 荣耀终端有限公司 槽位填充的方法、芯片、电子设备和可读存储介质
CN113923305B (zh) * 2021-12-14 2022-06-21 荣耀终端有限公司 一种多屏协同的通话方法、系统、终端及存储介质
CN116841661A (zh) * 2022-03-24 2023-10-03 华为技术有限公司 服务调用方法及电子设备
CN116955758A (zh) * 2022-04-13 2023-10-27 华为技术有限公司 搜索方法和电子设备
CN114995726B (zh) * 2022-04-22 2023-07-21 青岛海尔科技有限公司 交互方式的确定方法和装置、存储介质及电子装置
CN117894307A (zh) * 2022-10-14 2024-04-16 华为技术有限公司 语音交互方法、语音交互装置和电子设备
CN115457960B (zh) * 2022-11-09 2023-04-07 广州小鹏汽车科技有限公司 语音交互方法、服务器及计算机可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103888600A (zh) * 2014-02-17 2014-06-25 刘岩 一种即时通信客户端
US20150154953A1 (en) * 2013-12-02 2015-06-04 Spansion Llc Generation of wake-up words
CN105448293A (zh) * 2014-08-27 2016-03-30 北京羽扇智信息科技有限公司 语音监听及处理方法和设备
CN107193396A (zh) * 2017-05-31 2017-09-22 维沃移动通信有限公司 一种输入方法和移动终端
CN107680589A (zh) * 2017-09-05 2018-02-09 百度在线网络技术(北京)有限公司 语音信息交互方法、装置及其设备

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9858925B2 (en) * 2009-06-05 2018-01-02 Apple Inc. Using context information to facilitate processing of commands in a virtual assistant
US10241752B2 (en) * 2011-09-30 2019-03-26 Apple Inc. Interface for a virtual digital assistant
US10276170B2 (en) * 2010-01-18 2019-04-30 Apple Inc. Intelligent automated assistant
CN106445940A (zh) * 2015-08-05 2017-02-22 阿里巴巴集团控股有限公司 一种导航方法及装置
US10331312B2 (en) * 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
KR102502220B1 (ko) * 2016-12-20 2023-02-22 삼성전자주식회사 전자 장치, 그의 사용자 발화 의도 판단 방법 및 비일시적 컴퓨터 판독가능 기록매체
US11204787B2 (en) * 2017-01-09 2021-12-21 Apple Inc. Application integration with a digital assistant
CN107122179A (zh) * 2017-03-31 2017-09-01 阿里巴巴集团控股有限公司 语音的功能控制方法和装置
US10832666B2 (en) * 2017-04-19 2020-11-10 Verizon Patent And Licensing Inc. Advanced user interface for voice search and results display
US10269351B2 (en) * 2017-05-16 2019-04-23 Google Llc Systems, methods, and apparatuses for resuming dialog sessions via automated assistant
US10303715B2 (en) * 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
CN107318036A (zh) * 2017-06-01 2017-11-03 腾讯音乐娱乐(深圳)有限公司 歌曲搜索方法、智能电视及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150154953A1 (en) * 2013-12-02 2015-06-04 Spansion Llc Generation of wake-up words
CN103888600A (zh) * 2014-02-17 2014-06-25 刘岩 一种即时通信客户端
CN105448293A (zh) * 2014-08-27 2016-03-30 北京羽扇智信息科技有限公司 语音监听及处理方法和设备
CN107193396A (zh) * 2017-05-31 2017-09-22 维沃移动通信有限公司 一种输入方法和移动终端
CN107680589A (zh) * 2017-09-05 2018-02-09 百度在线网络技术(北京)有限公司 语音信息交互方法、装置及其设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3923274A4

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114327349A (zh) * 2021-12-13 2022-04-12 青岛海尔科技有限公司 智能卡片的确定方法及装置、存储介质、电子装置
CN114327349B (zh) * 2021-12-13 2024-03-22 青岛海尔科技有限公司 智能卡片的确定方法及装置、存储介质、电子装置

Also Published As

Publication number Publication date
CN111724775B (zh) 2023-07-28
US20220172717A1 (en) 2022-06-02
EP3923274A4 (en) 2022-04-20
CN111724775A (zh) 2020-09-29
EP3923274A1 (en) 2021-12-15

Similar Documents

Publication Publication Date Title
WO2020192456A1 (zh) 一种语音交互方法及电子设备
RU2766255C1 (ru) Способ голосового управления и электронное устройство
WO2021052263A1 (zh) 语音助手显示方法及装置
CN110910872B (zh) 语音交互方法及装置
WO2020244495A1 (zh) 一种投屏显示方法及电子设备
WO2020244492A1 (zh) 一种投屏显示方法及电子设备
WO2020238774A1 (zh) 一种通知消息的预览方法及电子设备
WO2020151387A1 (zh) 一种基于用户运动状态的推荐方法及电子设备
CN110138959B (zh) 显示人机交互指令的提示的方法及电子设备
WO2020207326A1 (zh) 一种对话消息的发送方法及电子设备
WO2022052776A1 (zh) 一种人机交互的方法、电子设备及系统
WO2020073288A1 (zh) 一种触发电子设备执行功能的方法及电子设备
WO2020077540A1 (zh) 一种信息处理方法及电子设备
WO2020211705A1 (zh) 一种联系人的推荐方法及电子设备
WO2020006711A1 (zh) 一种消息的播放方法及终端
WO2020259554A1 (zh) 可进行学习的关键词搜索方法和电子设备
WO2023273543A1 (zh) 一种文件夹管理方法及装置
WO2022143258A1 (zh) 一种语音交互处理方法及相关装置
CN112740148A (zh) 一种向输入框中输入信息的方法及电子设备
WO2021042881A1 (zh) 消息通知方法及电子设备
WO2023005711A1 (zh) 一种服务的推荐方法及电子设备
WO2022033355A1 (zh) 一种邮件处理方法及电子设备
CN113380240B (zh) 语音交互方法和电子设备
WO2023197951A1 (zh) 搜索方法和电子设备
WO2024012346A1 (zh) 任务迁移的方法、电子设备和系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20777455

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020777455

Country of ref document: EP

Effective date: 20210908