WO2024051611A1 - Human-machine interaction method and related apparatus - Google Patents

Human-machine interaction method and related apparatus Download PDF

Info

Publication number
WO2024051611A1
WO2024051611A1 PCT/CN2023/116615 CN2023116615W WO2024051611A1 WO 2024051611 A1 WO2024051611 A1 WO 2024051611A1 CN 2023116615 W CN2023116615 W CN 2023116615W WO 2024051611 A1 WO2024051611 A1 WO 2024051611A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice input
user
wake
terminal
free
Prior art date
Application number
PCT/CN2023/116615
Other languages
French (fr)
Chinese (zh)
Inventor
李凌飞
沈波
任亮亮
张跃
徐平
吴奇强
吴雪晨
谭彬林
耿安峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024051611A1 publication Critical patent/WO2024051611A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation

Definitions

  • the present application relates to the field of terminal technology, and in particular to human-computer interaction methods and related devices.
  • voice interaction has become one of the commonly used and important human-computer interaction methods.
  • most voice interactions require users to wake up the terminal through a preset wake-up word first, and then implement subsequent interactions. This method is cumbersome and results in poor user experience.
  • Some manufacturers also provide a wake-up-free function, that is, there is no need to wake up the terminal in advance, and you can directly enter the predefined wake-up-free instructions.
  • the predefined wake-up-free instructions are fixed and limited, and it is easy to accidentally wake up the user while chatting. Affect user experience.
  • This application provides a human-computer interaction method and related devices, in order to improve the user's interactive experience.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: receiving a first voice input from a user; and making a corresponding response to the first voice input if it is determined that the first voice input is semantically similar to a predefined first wake-up-free instruction.
  • the above-mentioned first wake-up-free instruction is used to instruct the terminal to perform the operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.
  • the terminal receives the first voice input from the user, and when the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input, that is, Without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is semantically similar to the predefined first wake-up-free instruction, the terminal can respond, which is conducive to solving the predefined problem.
  • the first wake-up-free command is fixed and limited, causing the problem of terminal unresponsiveness, which in turn helps improve the user's interactive experience.
  • the above method before making a corresponding response to the first voice input, the above method further includes: confirming the semantics of the first voice input to the user.
  • the terminal After the terminal receives the first voice input, it can confirm with the user whether the semantics of the recognized first voice input are correct. This can improve the accuracy on the one hand and prevent the user from accidentally mentioning the first voice input causing the terminal to To respond, for example, if the user mentions the first voice input by mistake, a negative reply can be made when the terminal confirms to the user, so as to prevent the terminal from continuing to perform the corresponding operation, which is conducive to improving the user's experience.
  • confirming the semantics of the first voice input to the user includes: confirming the semantics of the first voice input to the user through a prompt box and/or voice broadcast.
  • the terminal can confirm the semantics of the first voice input to the user through the prompt box, which contains the semantics of the first voice input, and can also confirm the semantics of the first voice input to the user through voice broadcast, and can also confirm the semantics of the first voice input to the user through the prompt box and voice broadcast. In a combined manner, the semantics of the first voice input is confirmed to the user.
  • the above method further includes: prompting the user with a first wake-up-free instruction.
  • the terminal can also prompt the user to directly use the predefined first wake-up-free command next time.
  • the terminal may prompt the user with the first wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
  • the above method before receiving the first voice input from the user, further includes: receiving a second voice input from the user; When the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input.
  • the second wake-up-free command when the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input.
  • the second wake-up-free command before receiving the first voice input from the user, the above method further includes: receiving a second voice input from the user; When the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input
  • the terminal learns and generates a second wake-up-free instruction through the above method.
  • the second wake-up-free instruction can be used to instruct the terminal to perform a corresponding operation without inputting a preset wake-up word.
  • the terminal receives the second voice input from the user, if the second voice input is semantically similar to the first wake-up-free instruction, it will confirm with the user whether it is the above semantics. If the user confirms the semantics of the second voice input, If correct, a corresponding second wake-up-free instruction is generated, so that when the terminal is not awakened in advance next time and receives the second voice input again, it can respond to it.
  • the number of wake-up-free instructions that can be used to instruct the terminal to perform corresponding operations without inputting a preset wake-up word is greatly increased, which in turn helps improve the user's interactive experience.
  • the first voice input is semantically similar to the predefined first wake-up-free instruction, including: the first voice input is the same as the second wake-up-free instruction.
  • the terminal After receiving the first voice input, the terminal determines whether the first voice input and the first wake-up-free instruction are semantically similar.
  • One way is to perform semantic analysis to determine the two based on the first voice input and the predefined first wake-up-free instruction. Are the semantics similar?
  • Another way is that the terminal can determine whether the first voice input and the generated second wake-up-free instruction are the same.
  • the second wake-up-free instruction is an instruction generated based on the second voice input that is semantically similar to the first wake-up-free instruction.
  • the first voice input is the same as the generated second wake-up-free instruction, the first voice input is semantically similar to the first wake-up-free instruction, so that the terminal can also respond to the first voice input.
  • the above two methods can be used in combination or separately, which greatly improves the terminal's flexibility in determining whether the first voice input and the first wake-up-free instruction are semantically similar.
  • receiving the second voice input from the user includes: receiving the second voice input multiple times continuously within a preset time range.
  • the terminal when the terminal receives the second voice input multiple times continuously within the preset time range, it will confirm the semantics of the second voice input to the user. In this way, the user can effectively avoid accidentally mentioning the second voice input. In this case, the terminal mistakenly thinks that the user wants to perform the corresponding operation, which is beneficial to improving the user's interactive experience.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or it can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or it can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: receiving a first voice input from the user; and making a corresponding response to the first voice input if the preset wake-up word is not received but the first voice input contains the target object,
  • the target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the preset wake-up word is used to wake up the terminal.
  • the terminal when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interactive experience.
  • the above method before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second wake-up word from the user. Voice input; when the number of times the first object contained in the second voice input is mentioned in the second voice input and its previous voice input exceeds a preset threshold, the first object is determined as the target object.
  • the terminal may record the number of times the first object is mentioned in the voice input. If the number of times the first object is mentioned in the voice input exceeds a preset threshold, it is determined as the target object so that the user can subsequently refer to it without prior notice.
  • voice input containing the target object is sent out. After the terminal receives the above voice input, it can respond. That is, there is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.
  • the above method further includes: based on the target object, generating a wake-up-free instruction including the target object; and prompting the user for the wake-up-free instruction.
  • the terminal can generate a wake-up-free instruction including the target object based on the target object, and prompt the user that the above wake-up-free instruction can be used directly next time, without waking up the terminal in advance, and the terminal can respond accordingly.
  • the terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic module or software that implements all or part of the terminal functions Software implementation, this application does not limit this.
  • the method includes: receiving a first voice input from the user, the first voice input belonging to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions; condition, respond to the first voice input.
  • the terminal After the terminal receives the first voice input that is semantically similar to the predefined wake-up-free instruction, it responds to the first voice input when the preset conditions are met. That is to say, for the predefined wake-up-free command The terminal will respond accordingly only if the first voice input with similar command semantics meets the preset conditions, which may not be possible in all circumstances. This can prevent the user from mistakenly mentioning the first voice input and causing the terminal to respond. It is conceivable that the first voice input may be more colloquial than the predefined wake-up-free instructions. If a response is made under any circumstances, it is likely that the terminal response will be frequently triggered during the user's conversation. Therefore, by setting the preset Conditions, the terminal will respond accordingly only when the preset conditions are met, which will greatly improve the user's interactive experience.
  • the above-mentioned preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined location ; The user from whom the first voice input comes does not belong to the preset group; or, the time when the first voice input is received falls within the preset period.
  • the number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, the above first voice input can be responded to. It is not difficult to understand that, If the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller. That is, the user may really want the terminal to perform the corresponding operation. On the contrary, if the number of surrounding users is large, the user mistakenly mentions the first voice input. And the greater the possibility of first voice input.
  • the user is in a predefined position. For example, the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service.
  • the terminal responds to the first voice input from the user.
  • the user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them.
  • the time when the first voice input is received falls within a preset period.
  • the preset period can be, for example, working hours. During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond to predefined wake-up-free instructions. In summary, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.
  • the above method is applied to a car, and the number of users within a preset range from the terminal does not exceed a threshold, including: there is a passenger in the car; or , the above-mentioned user is in a predefined position, including: the user is in the main driving position.
  • the present application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: without receiving a preset wake-up word from the user, determining according to the first voice input from the user that the first voice input is used to request navigation; and asking the user for the purpose of requesting navigation. location; provide navigation services to users based on destinations fed back by users.
  • the terminal without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. It provides navigation services to users without waking up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.
  • the above method further includes: generating a wake-up-free instruction including a destination; and prompting the user for the wake-up-free instruction.
  • the terminal can generate a wake-up-free command including the above destination, and prompt the user that the above wake-up-free command can be used directly next time, and the terminal can respond accordingly.
  • the terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
  • this application provides a human-computer interaction method, which can be executed by a terminal, or can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
  • the method includes: receiving a first voice input from the user, which does not belong to the predefined wake-up-free instructions; and receiving the first wake-up-free instruction among the first voice input and the predefined wake-up-free instructions. If the semantics are similar, the user is guided to input the above-mentioned first wake-up-free instruction.
  • the terminal receives the first voice input.
  • the first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions.
  • the terminal guides the user to enter the corresponding first free
  • the wake-up command allows the terminal to respond accordingly after the user inputs the first wake-up-free command. Compared with the terminal not responding or prompting, the user's interactive experience can be greatly improved.
  • the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or a voice broadcast.
  • the terminal can guide the user to enter the first wake-up-free instruction through a prompt box, which contains the first wake-up-free instruction. It can also guide the user to enter the first wake-up-free instruction through voice broadcast, or it can also combine the prompt box and voice broadcast. , guiding the user to enter the first wake-up-free command.
  • the above-mentioned guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast includes: prompting the user to input the first wake-up-free instruction through a prompt box , the prompt box contains the first wake-up-free command; when the number of prompts through the prompt box reaches the preset threshold within the preset time period, but the user does not issue the first wake-up-free command, the user is guided to enter the third wake-up command through voice broadcast.
  • the present application provides a computer device, including a unit for implementing the method in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects. It should be understood that each unit can implement the corresponding function by executing a computer program.
  • the present application provides a computer device, including a processor configured to execute the method described in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects.
  • the computer device may further include a memory for storing computer readable instructions, and the processor reads the computer readable instructions so that the computer device can implement the methods described in the above aspects.
  • the computer device may also include a communication interface for the computer device to communicate with other devices.
  • the communication interface may be a transceiver, a circuit, a bus, a module or other types of communication interfaces.
  • this application provides a vehicle for implementing the method in any of the first to fifth aspects and any possible implementation manner of the first to fifth aspects, or including the sixth aspect or the seventh aspect. Any of the computer equipment described above.
  • the present application provides a chip system, which includes at least one processor and is used to support the implementation of any of the above-mentioned first to fifth aspects and any possible implementation manner of the first to fifth aspects.
  • the functions involved for example, include receiving or processing data and/or information involved in the above methods.
  • the chip system further includes a memory, the memory is used to store program instructions and data, and the memory is located within the processor or outside the processor.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the present application provides a computer-readable storage medium.
  • Computer-readable instructions are stored in the storage medium.
  • the computer-readable instructions When executed by a computer, the computer implements the first to fifth aspects. and the method in any possible implementation manner of the first aspect to the fifth aspect.
  • the present application provides a computer program product.
  • the computer program product includes: computer readable instructions.
  • the computer implements the first to fifth aspects and The method in any possible implementation manner of the first aspect to the fifth aspect.
  • Figure 1 is a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • Figure 2 is a schematic diagram of a scenario applicable to the human-computer interaction method provided by the embodiment of the present application.
  • Figure 3 is a schematic diagram of a known human-computer interaction method
  • Figure 4 is a schematic diagram of another known human-computer interaction method
  • Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application.
  • Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application.
  • Figure 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • Figure 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application.
  • Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application.
  • Figure 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application.
  • Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application.
  • Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application;
  • Figure 14 is a schematic flow chart of the fourth human-computer interaction method provided by the embodiment of the present application.
  • Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application
  • Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application.
  • Figure 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application.
  • the methods provided by the embodiments of this application can be applied to mobile phones, tablet computers, smart watches, smart speakers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, On terminals such as personal computers (PCs), ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), and distributed devices.
  • AR augmented reality
  • VR virtual reality
  • PCs personal computers
  • UMPCs ultra-mobile personal computers
  • PDAs personal digital assistants
  • distributed devices such as personal computers (PCs), ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), and distributed devices.
  • FIG. 1 shows a schematic structural diagram of a terminal 100.
  • the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , Antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a microcontroller unit (microcontroller unit, MCU), a modem processor, a graphics processor (graphics processor).
  • processing unit GPU
  • image signal processor ISP
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor neural network processor
  • the application processor outputs sound signals through the audio module 170 (such as the speaker 170A, etc.), or displays images or videos through the display screen 194 .
  • the controller may be the nerve center and command center of the terminal 100.
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • the processor 110 can perform different operations to implement different functions by executing instructions.
  • the instruction may be an instruction pre-stored in the memory before the device leaves the factory, or it may be an instruction read from the APP after the user installs a new application (APP) during use. This is not the case in the embodiments of this application. Any limitations.
  • processor 110 may include one or more interfaces.
  • Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, secure digital input and output (SDIO), pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous receiver/transmitter (UART) interface, universal synchronous asynchronous receiver/transmitter (USART), mobile industry processor interface , MIPI), general-purpose input/output (GPIO) interface, SIM interface and/or USB interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • SDIO secure digital input and output
  • PCM pulse code modulation
  • UART universal asynchronous receiver/transmitter
  • USBART universal synchronous asynchronous receiver/transmitter
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • the USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transmit data between the terminal 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other terminals.
  • the interface connection relationships between the modules illustrated in this application are only schematic illustrations and do not constitute a structural limitation on the terminal 100 .
  • the terminal 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through the wireless charging coil of the terminal 100 . While charging the battery 142, the charging management module 140 can also provide power to the terminal through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc.
  • the power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters.
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in terminal 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • a modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194.
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN), such as wireless fidelity (wireless fidelity, Wi-Fi), Bluetooth (bluetooth, BT), and global navigation satellite systems. Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR), etc.
  • WLAN wireless local area networks
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), fifth generation (5th generation, 5G) communication system, BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc.
  • GSM global system for mobile communications
  • GPRS general packet radio service
  • CDMA code division multiple access
  • WCDMA broadband Code division multiple access
  • TD-SCDMA time-division code division multiple access
  • LTE long term evolution
  • 5th generation, 5G may include a global positioning system (GPS), GNSS, BeiDou navigation satellite system (BDS), quasi-
  • the terminal 100 can implement display functions through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • terminal 100 may include one or more display screens 194.
  • the display screen 194 can be used to display a prompt box, which contains a predefined wake-up-free instruction.
  • the prompt box is used to prompt the user to directly use the above-mentioned wake-up-free instruction next time, that is, no need to wake up in advance.
  • the terminal can realize voice interaction with the terminal through the above wake-up-free command.
  • the terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • terminal 100 may include one or more cameras 193.
  • Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
  • Video codecs are used to compress or decompress digital video.
  • Terminal 100 may support one or more video codecs.
  • the terminal 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
  • MPEG moving picture experts group
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the terminal 100 .
  • the internal memory 121 may include a program storage area and a data storage area. Among them, the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area may store data created during use of the terminal 100 (such as audio data, phone book, etc.).
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the terminal 100 can implement audio functions through the audio module 170, such as the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. For example, music playback, recording, etc.
  • the audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
  • Receiver 170B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the voice can be heard by bringing the receiver 170B close to the human ear.
  • Microphone 170C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C.
  • Terminal 100 can be set At least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which in addition to collecting sound signals, may also implement a noise reduction function. In other embodiments, the terminal 100 can also be equipped with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
  • the microphone 170C can be used to receive voice input from the user, that is, can be used to collect sound signals from the user.
  • the headphone interface 170D is used to connect wired headphones.
  • the headphone interface 170D can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the buttons 190 include a power button (also called a power button), a volume button, etc.
  • the button 190 may be a mechanical button or a touch button.
  • the terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • the motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 .
  • Different application scenarios such as time reminders, receiving information, alarm clocks, games, etc.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or separated from the terminal 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the terminal 100 may support one or more SIM card interfaces.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different.
  • the SIM card interface 195 is also compatible with different types of SIM cards.
  • the SIM card interface 195 is also compatible with external memory cards.
  • the terminal 100 interacts with the network through the SIM card to implement functions such as calls and data communications.
  • the terminal 100 adopts eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.
  • the terminal 100 may include more or fewer components than shown, or some components may be combined, or some components may be separated, or may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • FIG 2 is a schematic diagram of a scenario applicable to the method provided by the embodiment of this application.
  • the user can input operations that he wants the terminal to perform through voice to achieve interaction with the terminal (a mobile phone is used as an example in Figure 2).
  • voice interaction has become one of the important and commonly used human-computer interaction methods.
  • the user can first wake up the terminal through the preset wake-up word.
  • the user can first wake up the voice assistant (or smart assistant, intelligent assistant, etc., this application does not limit this) through the preset wake-up word. , and then achieve subsequent interactions.
  • Some manufacturers also provide a wake-up-free function, that is, users do not need to wake up the voice assistant in advance and can directly interact with the terminal through predefined wake-up-free instructions.
  • the predefined wake-up-free instructions are fixed and limited. If the wake-up-free instructions input by the user's voice are inaccurate, the terminal will become unresponsive and the user experience will be poor.
  • Figure 3 shows a known human-computer interaction method.
  • the user wakes up the terminal through a preset wake-up word in advance. More specifically, the user first wakes up the voice assistant in the terminal through a preset wake-up word.
  • the wake-up word is "little Yi Xiaoyi", in response to the user inputting "Xiaoyi Xiaoyi” through voice, the voice assistant replies "I am here”.
  • the user inputs "Navigate to location A" through voice.
  • the voice assistant replies "Okay, let's start navigating for you” and displays the route to location A through the user interface. . It can be seen that the entire interaction process is relatively cumbersome, resulting in poor user experience.
  • FIG 4 shows another known human-computer interaction method.
  • users can directly voice input predefined wake-up-free instructions to interact with the terminal.
  • the user voice inputs "Navigate to the company", and in response to the user's voice input of "Navigate to the company", the terminal displays the route to the company through the user interface, in which the location of the user's company is pre-stored on the terminal.
  • the voice assistant will not respond.
  • the predefined wake-up-free instructions are fixed and limited, which may cause the voice assistant to be unable to respond to the user’s voice input. entry, resulting in poor user experience.
  • this application provides a human-computer interaction method.
  • the method includes: when the terminal receives the first voice input from the user and is semantically similar to the predefined first wake-up-free instruction. , respond accordingly to the first voice input, that is, without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is consistent with the predefined first wake-up-free instruction. If the command semantics are similar, the terminal can recognize and respond, which will help alleviate the problem of terminal unresponsiveness caused by the fixed and limited predefined first wake-up-free command, which will further help improve the user's voice interaction experience.
  • words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects.
  • the first voice input and the second voice input are only used to distinguish different voice inputs, and their order is not limited.
  • words such as “first” and “second” do not limit the number and position, and words such as “first” and “second” do not limit the number and position.
  • "at least one item” refers to one item or multiple items.
  • “And/or” describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural.
  • the character “/” generally indicates that the related objects are in an “or” relationship, but it does not exclude the situation that the related objects are in an “and” relationship. The specific meaning can be understood based on the context.
  • the embodiments shown below can be executed by the terminal, or can also be executed by components configured in the terminal (such as chips, chip systems, etc.), or can also be executed by logic modules that can realize all or part of the terminal functions. Or software implementation, which is not limited in the embodiments of this application.
  • the terminal may have a structure as shown in FIG. 1 , or may have more or less structures than in FIG. 1 , which is not limited in the embodiments of the present application.
  • Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application. As shown in Figure 5, the method 500 may include step 501 and step 502. Each step shown in Figure 5 will be described in detail below.
  • Step 501 Receive first voice input from the user.
  • the first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
  • a first voice input is received from the user.
  • the first voice input may be, for example, "navigate to location A”, “navigate to the company”, “leave to work”, “play a song” B”, “I want to listen to song B”, etc.
  • the embodiments of this application do not place any restrictions on the specific content of the first voice input.
  • Step 502 If the first voice input is semantically similar to the predefined first wake-up-free instruction, make a corresponding response to the first voice input.
  • the first wake-up-free instruction is used to instruct the terminal to perform an operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.
  • One possible implementation method is that after the terminal receives the first voice input from the user, it performs semantic analysis on the first voice input and the predefined first wake-up-free instruction based on natural language processing (NLP).
  • NLP natural language processing
  • the terminal determines whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, wherein the The second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction. That is to say, the second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction, but uses different terms.
  • the predefined first wake-up-free command is "Navigate to the company”
  • the second wake-up-free command obtained based on voice input learning is "Go to work”.
  • the semantics of the two are similar, but the terms are different.
  • the second wake-up-free command is More colloquially, the first wake-up-free command is a standard human-computer interaction term.
  • the terminal makes a response to the first voice input. Respond accordingly.
  • the terminal can first determine whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, such as If it belongs, respond to it. If it does not belong, further semantic analysis is performed on the first voice input and the predefined first wake-up-free instruction based on NLP. If the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input; if they are not similar, the terminal does not make a corresponding response to the first voice input.
  • the above-mentioned predefined first wake-up-free instruction and/or the first wake-up-free instruction obtained based on learning of voice input can be stored in the instruction library.
  • the terminal After receiving the first voice input, the terminal determines whether to respond accordingly based on the first wake-up-free command and the second wake-up-free command stored in the command library. If the first voice input is semantically similar to the first wake-up-free instruction, the terminal responds accordingly to the first voice input.
  • the above method further includes: confirming the semantics of the first voice input to the user.
  • the terminal After the terminal receives the first voice input from the user, if the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal asks the user whether the above semantics is correct. If the user replies that the above semantics are correct, , then the terminal responds accordingly to the first voice input.
  • the terminal can ask the user whether the semantics are correct through voice broadcast, and can also ask the user whether the semantics are correct through a prompt box (such as toast).
  • the above prompt box contains the semantics of the above-mentioned first voice input, or it can also ask the user through a prompt box (such as toast).
  • Prompt boxes (such as toast) and voice broadcasts are used to ask users whether the semantics are correct.
  • the embodiments of this application do not limit the method used by the terminal to query the user for semantics.
  • the above method further includes: prompting the user with a first wake-up-free instruction. That is to say, in addition to performing the operation indicated by the first voice input (such as navigating to the company), the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time to instruct the terminal to perform the corresponding operation.
  • Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application.
  • the terminal in response to the user's voice input of "Go to work", the terminal asks "Do you want to navigate to the company?", the user voice replies "Yes", and in response to the user's reply, the terminal displays Go to via the user interface. Company route.
  • the terminal shown in Figure 6 asks the user "Do you want to navigate to the company" through voice broadcasting is only an example, and should not constitute any limitation on the embodiment of the present application.
  • the terminal can also ask the user "Do you want to navigate to the company” through a prompt box (such as a toast), or can also ask the user "Do you want to navigate to the company” through a prompt box (such as a toast) plus a voice broadcast? Navigate to the company?”
  • a prompt box such as a toast
  • the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time through a prompt box and/or a voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time” through voice broadcast.
  • the above method before receiving the first voice input from the user, further includes: receiving a second voice input from the user; and if the second voice input is semantically similar to the first wake-up-free instruction, confirming the second voice input to the user. Semantics of the second voice input; in response to the user's operation of confirming the semantics of the second voice input, generating a second wake-up-free instruction corresponding to the second voice input.
  • the terminal determines whether the predefined first wake-up-free instruction contains instructions that are semantically similar to the above-mentioned second voice input. For example, semantic analysis of the two can be performed based on NLP. , if it is determined that the above-mentioned second voice input has similar semantics to a certain predefined first wake-up-free instruction, the user is asked whether the above-mentioned semantics is correct. If the user replies that the above-mentioned semantics is correct, the terminal generates a message corresponding to the second voice input. The second wake-up-free command. In addition, the terminal can also save the second voice input in the command library.
  • receiving the second voice input from the user includes: receiving the above-mentioned second voice input multiple times continuously within a preset time range. That is, if the terminal receives the above-mentioned second voice input multiple times continuously within a preset time range, the terminal then confirms the semantics of the second voice input to the user. In this way, it can effectively prevent the user from mistakenly mentioning the second voice input in the chat conversation, causing the terminal to respond, thereby improving the user's experience.
  • the terminal continuously receives the above voice input. Afterwards, the user is asked “Do you want to navigate to the company” through a prompt box and/or voice broadcast (for example, see Figure 6), and in response to the user's confirmation operation, the route to the company is displayed through the user interface. .
  • the terminal can also prompt the user to directly use the first wake-up-free command next time through a prompt box and/or voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time” through voice broadcast.
  • FIG. 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • Step 701 Receive second voice input from the user.
  • the terminal receives a second voice input from the user.
  • the second voice input includes: “Go to work”, “Is there a traffic jam on the road?", “Avoid the congested road”, “Choose a smooth road”, etc., which will not be listed here.
  • Step 702 Determine whether the second voice input is semantically similar to the predefined first wake-up-free instruction.
  • the terminal After receiving the second voice input from the user, the terminal determines whether the predefined first wake-up-free instruction contains an instruction semantically similar to the above-mentioned second voice input. If it is determined that the predefined first wake-up-free instruction does not include the above-mentioned If the second voice input has semantically similar instructions, step 703 is executed, that is, the second voice input is not responded to; if the second voice input is semantically similar to a first wake-up-free instruction, step 704 is executed, that is, the user is asked Whether the above-mentioned second speech input is the above-mentioned semantics.
  • Step 703 Do not respond to the second voice input.
  • Step 704 Ask the user whether the second voice input is the above semantics.
  • the terminal does not respond to the above-mentioned second voice input; if the user's reply to the above-mentioned second voice input is the above-mentioned semantics, the terminal executes step 705.
  • the terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method.
  • Step 705 Generate a second wake-up-free instruction and respond to the second voice input.
  • the terminal determines the second voice input as the second wake-up-free command, saves it in the command library, and responds to the second voice input.
  • the terminal can also prompt the user to directly use the first wake-up-free instruction next time through a prompt box and/or voice broadcast.
  • the terminal can also prompt the user to directly use the first wake-up-free instruction next time through a prompt box and/or voice broadcast.
  • FIG. 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application.
  • the method shown in Figure 8 is a method in which the terminal triggers an inquiry to the user after receiving the second voice input multiple times in succession.
  • Step 801 Receive second voice input from the user.
  • the terminal receives a second voice input from the user.
  • the second voice input includes: “Go to work”, “Is there a traffic jam on the road?", “Avoid the congested road”, “Choose a smooth road”, etc., which will not be listed here.
  • Step 802 Determine whether the second voice input is received multiple times continuously.
  • the terminal After receiving the second voice input from the user, the terminal determines whether the second voice input is received multiple times continuously within a preset time range. If the terminal receives the second voice input multiple times continuously within the preset time range, step 804 is executed again; otherwise, the terminal executes step 803, that is, it does not respond to the second voice input.
  • Step 803 Do not respond to the second voice input.
  • Step 804 Ask the user whether the second voice input is the above semantics.
  • the terminal does not respond to the above second voice input; if the user's reply to the second voice input is the above semantics, the terminal executes step 805.
  • the terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method.
  • Step 805 Generate a second wake-up-free instruction and respond to the second voice input.
  • the terminal determines the second voice input as the second wake-up-free command, saves it in the command library, and responds to the second voice input.
  • the terminal can also prompt the user to directly use the predefined first wake-up-free phrase next time through a prompt box and/or a voice broadcast.
  • the terminal receives the first voice input from the user, and when the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input, that is, Without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is semantically similar to the predefined first wake-up-free instruction, the terminal can respond, which is conducive to solving the predefined problem.
  • the first wake-up-free command is fixed and limited, causing the problem of terminal unresponsiveness, which in turn helps improve the user's interactive experience.
  • Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application.
  • the method 900 may include step 901 and step 902. Each step shown in Figure 9 will be described in detail below.
  • Step 901 Receive first voice input from the user.
  • the first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
  • a first voice input from the user is received.
  • the first voice input may be, for example, “navigate to location A”, “departure to location A”, “I want to go to location A”, “Play song B”, “I want to listen to song B”, etc., the embodiment of the present application does not place any limitation on the specific content of the first voice input.
  • Step 902 If the preset wake-up word is not received but the first voice input contains the target object, make a corresponding response to the first voice input.
  • the above-mentioned target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the above-mentioned preset wake-up word is used to wake up the terminal. More specifically, the above-mentioned preset wake-up word The wake-up word is used to wake up the voice assistant (or smart assistant, smart assistant, etc., this application does not limit this) in the terminal.
  • the terminal when the terminal receives the first voice input without being awakened in advance, if the first voice input contains the target object, it will respond accordingly to the first voice input; if the first voice input does not contain the The target object then does not respond to the first voice input.
  • the above-mentioned target object may be, for example, a location, a media name (such as a song title), or an artist name, etc.
  • This application does not limit the specific content of the target object.
  • the above method before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second voice input from the user; and the first object included in the second voice input.
  • the first object is determined as the target object.
  • the first object may be, for example, a location, a media name (such as a song title), an artist name, etc. This application does not limit the specific content of the target object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • the terminal determines whether the second voice input contains the first object.
  • determine the number of mentions of the first object If the number of mentions of the first object in the current second voice input and the previously received voice input exceeds a preset threshold, Then the above-mentioned first object is determined as the target object, so that next time the user directly speaks voice input containing the above-mentioned target object, the terminal can make a corresponding response. For example, if the terminal determines location A as the target object, there is no need to wake up the terminal in advance next time.
  • the user directly inputs "navigate to location A" by voice. After the terminal receives the above voice input and determines that the above voice input contains location A, the user The interface displays the route to location A. In this way, the user does not need to wake up the terminal in advance next time, which simplifies the interaction process and helps improve the user experience.
  • the terminal may record the number of times the first object is mentioned in the voice input, and each time the first object is mentioned, the corresponding number is incremented by 1.
  • the terminal can also generate a wake-up-free instruction including the target object based on the target object; prompt the user with the wake-up-free instruction, so that the user can directly use the above wake-up-free instruction next time to control the terminal to perform the corresponding operation.
  • the terminal can prompt the user with the above-mentioned wake-up-free instruction through a prompt box and/or voice broadcast.
  • the terminal gives a voice prompt to the user, "Next time, try direct navigation to location A", where location A is the target object.
  • Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application.
  • the terminal in response to the user's voice input operation of "Xiaoyi Xiaoyi", the terminal replies "I am here", that is, the terminal is awakened. Further, in response to the user's voice input operation of "navigate to location A”, the terminal replies "OK, let's start navigating for you", and the terminal displays the route to location A through the user interface.
  • the terminal can prompt the user through voice broadcasting to "try next time and just use navigation to go to location A.” In other words, next time the user does not need to wake up the terminal in advance and directly inputs "Navigation to location A" by voice, the terminal can display the route to location A through the user interface.
  • FIG. 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application.
  • Step 1101 Receive a preset wake-up word from the user.
  • the above preset wake-up words are used to wake up the terminal, and more specifically, are used to wake up the voice assistant in the terminal.
  • Step 1102 Receive second voice input from the user.
  • the terminal After the terminal is awakened, it receives the second voice input from the user.
  • the second voice input includes: “Navigate to location A”, “Leave to location A”, “I want to go to location A”, etc., which are not listed here one by one.
  • Step 1103 Determine whether the second voice input includes the first object.
  • the first object includes, for example, but is not limited to: location, media name (such as song title) or artist name, etc.
  • the terminal determines whether the second voice input contains the first object (such as location A). If the second voice input does not contain the first object, then step 1104 is executed; if the second voice input contains the first object, then Execute step 1105.
  • the first object such as location A
  • Step 1104 respond to the first voice input.
  • Step 1105 Determine whether the number of times the first object is mentioned in the current voice input and its previously received voice input exceeds a preset threshold.
  • step 1104 is executed, that is, responding to the second voice input; if the first object is mentioned in the current voice input If the number of mentions in other previously received voice inputs exceeds the preset threshold, step 1106 is executed.
  • Step 1106 Determine the first object as the target object.
  • the terminal can also generate a wake-up-free instruction based on the target object, and prompt the user to directly use the above wake-up-free instruction next time.
  • the terminal can prompt the user to directly use the above wake-up-free instruction next time through a prompt box and/or voice broadcast.
  • the terminal when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interaction experience.
  • Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application.
  • the method 1200 may include step 1201 and step 1202. Each step shown in Figure 12 will be described in detail below.
  • Step 1201 Receive a first voice input from a user.
  • the first voice input belongs to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.
  • the above-mentioned predefined wake-up-free command is used to instruct the terminal to perform operations corresponding to the wake-up-free command without inputting a preset wake-up word.
  • the instruction library pre-stores a predefined first instruction set and a second instruction set.
  • the instructions in the first instruction set are semantically similar to the predefined wake-up-free instructions, and the instructions in the second instruction set It is a predefined wake-up-free command.
  • the terminal receives the first voice input and determines that the first voice input belongs to the first instruction set.
  • the instruction library pre-stores a predefined second instruction set and a first instruction set learned based on voice input that corresponds to the instructions in the second instruction set, and the terminal receives the first voice input , determining that the first voice input belongs to the first instruction set.
  • the terminal receives the first voice input , determining that the first voice input belongs to the first instruction set.
  • Table 1 is an example of the first instruction set and the second instruction set pre-stored in the instruction library.
  • the instructions in the second instruction set are predefined wake-up-free instructions, such as "check whether there is congestion”, “slide down”, “reduce the page”, “navigate to the company”, “navigate home” ", etc.
  • the instructions in the first instruction set are semantically similar to the instructions in the second instruction set, such as "Is there a traffic jam on the road?", “Scroll down”, “Zoom out”, “Go to work”, “I want to go home” "wait. It can be seen that the instructions in the first instruction set are semantically similar to the instructions in the second instruction set, but the terms are different.
  • the instructions in the first instruction set are more colloquial, while the instructions in the second instruction set are standard human-computer interaction. instruction.
  • the first instruction set may be further divided into a first instruction sub-set. 1. First instruction sub-set 2.
  • the instructions in the first instruction sub-set 2 are more colloquial than the instructions in the first instruction sub-set 1.
  • the conditions for the terminal to respond to the instructions in the first instruction subset 2 are stricter than the conditions for responding to the instructions in the first instruction subset 1 .
  • Step 1202 If the preset conditions are met, respond to the above-mentioned first voice input.
  • the above preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined position; the user from whom the first voice input comes does not belong to the preset group ; Or, the time when the first voice input is received falls within a preset period.
  • the number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, it is not difficult to respond to the first voice input. It is understood that if the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller, that is, the user may really want the terminal to perform the corresponding operation. In contrast, if the number of surrounding users is large, the user The greater the possibility of mistakenly referring to the first speech input. Therefore, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.
  • the user is in a predefined position.
  • the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service.
  • the terminal responds to the first voice input from the user.
  • the user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them.
  • the time when the first voice input is received falls within a preset period.
  • the preset period may be, for example, working hours (or commuting hours). During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond Predefined wake-up-free instructions.
  • the following takes the above method applied to a car as an example (for example, the terminal uses a car machine as an example), and enumerates the response of the terminal to the first voice input in the above scenarios.
  • Scenario 1 When there is one passenger in the car, the car machine responds to the first voice input; or, when there are multiple passengers in the car, the car machine does not respond to the first voice input.
  • the car computer can determine the number of people currently in the car based on the camera in the car. When there is one passenger in the car, that is, when there is only the driver in the car, the car computer responds to the first voice input. When there are multiple passengers in the car, the car machine does not respond to the first voice input.
  • the vehicle engine can respond to the instructions in the second instruction set even when there are one or more passengers in the vehicle. In this way, the possibility of accidentally waking up the car during a chat conversation can be greatly reduced when there are multiple passengers in the car.
  • Scenario 2 When the voice input comes from the main driver, the car machine responds to the first voice input; or when the voice input comes from other passengers other than the main driver, the car machine does not respond to the first voice input.
  • the car machine can obtain whether the first voice input comes from the driver or other passengers based on the interaction with the seat. If the first voice input comes from the driver, then The car machine responds to the first voice input; if the first voice input comes from other passengers, the car machine does not respond to the first voice input. In addition, the vehicle machine can respond to instructions in the second instruction set whether from the main driver or other passengers.
  • Scenario 3 When the user from whom the first voice input comes does not belong to the preset group, the car machine responds to the first voice input; or, when the user from whom the first voice input comes belongs to the preset group, the car machine Does not respond to first voice input.
  • the car machine can determine whether the first voice input comes from a preset group of people, taking a child as an example. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the voice input does not come from a child, the car machine responds to the first voice input. In this way, it can effectively avoid the situation where the child mistakenly speaks the first voice input and causes the car machine to respond.
  • Scenario 4 When the time when the voice input is received falls within the preset time period, the car machine responds to the first voice input; or when the time when the voice input is received does not fall within the preset time period, the car machine does not respond. First voice input.
  • the preset time period is the working period. If the vehicle machine receives the first voice input during the working period, it can respond to the first voice input; if the vehicle machine receives the first voice input during the non-working period, It is possible not to respond to the first voice input.
  • the terminal when the terminal determines to respond to the first voice input, it can first determine the semantics of the voice input to the user, and in response to the user's operation to confirm the above semantics, respond to the above first voice input .
  • the vehicle responds to the first voice input when the first voice input comes from the driver and the time when the voice input is received falls within a preset period.
  • the car machine responds to the first voice input when there is only one passenger in the car and the time when the first voice input is received falls within a preset time period. For the sake of brevity, they are not listed here.
  • Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application.
  • the method described in Figure 13 is a combination of scenario two and scenario four.
  • Step 1301 Receive first voice input from the user.
  • the car machine Without waking up the car machine in advance, in response to the user's operation of inputting the first voice input, the car machine receives the first speech from the user. sound input.
  • the first voice input belongs to a first instruction set, and instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.
  • Step 1302 Determine whether the first voice input comes from the driver.
  • the vehicle computer After the vehicle computer receives the first voice input, it determines whether the first voice input comes from the driver. If the first voice input does not come from the driver, the vehicle computer executes step 1303; if the first voice input comes from the driver If you are driving, perform step 1304.
  • Step 1303, do not respond to the first voice input.
  • the vehicle machine does not respond to the first voice input.
  • the vehicle machine can respond to instructions in the second instruction set from the user.
  • Step 1304 Determine whether the time when the first voice input is received falls within a preset period.
  • the vehicle computer continues to determine whether the time when the first voice input is received falls within the preset period. If the time when the first voice input is received falls within the preset period, the vehicle machine may execute step 1305; if the time when the first voice input is received does not fall within the preset period, the vehicle machine may execute step 1306.
  • Step 1305, respond to the first voice input.
  • the vehicle machine can respond to the first voice input.
  • Step 1306 Respond to the first voice input, but need to ask the user.
  • the car machine can respond to the first voice input, but before responding to the first voice input, the user needs to confirm the semantics of the first voice input. When the semantics are confirmed, respond to the first voice input.
  • the terminal After the terminal receives the first voice input that is semantically similar to the predefined wake-up-free instruction, it responds to the first voice input when the preset conditions are met. That is to say, for the predefined wake-up-free command The terminal will respond accordingly only if the first voice input with similar command semantics meets the preset conditions, which may not be possible in all circumstances. This can prevent the user from mistakenly mentioning the first voice input and causing the terminal to respond. It is conceivable that the first voice input may be more colloquial than the predefined wake-up-free instructions. If a response is made under any circumstances, it is likely that the terminal response will be frequently triggered during the user's conversation. Therefore, by setting the preset Conditions, the terminal will respond accordingly only when the preset conditions are met, which will greatly improve the user's interactive experience.
  • Figure 14 is a schematic flowchart of the fourth human-computer interaction method provided by an embodiment of the present application.
  • the method 1400 may include step 1401 and step 1402. Each step shown in Figure 14 will be described in detail below.
  • Step 1401 Receive the first voice input from the user.
  • the first voice input does not belong to the predefined wake-up-free instructions.
  • the first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
  • the first voice input from the user is received.
  • the first voice input may be, for example, “leave to the company”, “navigate to the company”, “leave to work”, etc.
  • This application implements The specific content of voice input is not limited in any way.
  • Step 1402 If the first voice input is semantically similar to the first no-wake-up instruction among the predefined no-wake-up instructions, guide the user to input the first no-wake-up instruction.
  • the terminal determines that the first voice input and the first wake-up-free instruction have similar semantics, and then guides the user to input the first wake-up-free instruction so that the terminal responds to the first wake-up-free instruction.
  • the terminal may determine based on semantic analysis in natural language processing that the first voice input is semantically similar to the first wake-up-free instruction.
  • the above voice input is “Go to work”
  • the first wake-up-free instruction with similar semantics is “Navigate to the company”.
  • the terminal After the terminal receives the voice input of "Go to work”, it determines that the above voice input does not occur. It belongs to the predefined wake-up-free instructions, and it is recognized that the semantics of the above voice input are similar to "navigate to the company”. Therefore, the terminal can guide the user to say "Navigate to the company".
  • the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast.
  • the terminal can guide the user to issue the first wake-up-free instruction through a prompt box. For example, after the terminal determines that the voice input has similar semantics to the first wake-up-free instruction in the command library, the terminal displays the first wake-up-free instruction on the user interface through a prompt box. The terminal can also guide the user to issue the first wake-up-free instruction through voice broadcast. For example, after the terminal determines that the above-mentioned voice input has similar semantics to the first no-wake-up command in the command library, the terminal reminds the user by voice to use the above-mentioned first no-wake-up command. The terminal can guide the user to issue the first wake-up-free command through a prompt box and voice broadcast.
  • the terminal After the terminal determines that the above-mentioned voice input has similar semantics to the first wake-up-free command in the command library, the terminal first displays the above-mentioned first wake-up-free command on the user interface through a prompt box. If the user has not issued a wake-up command within the preset time range, The above first wake-up-free command will eventually The user is prompted with a voice to use the first wake-up-free command, or the prompt box displays the first wake-up-free command on the user interface, and at the same time, the voice prompts the user to use the first wake-up-free command.
  • This application does not limit the terminal boot method.
  • guiding the user to issue the first wake-up-free instruction through a prompt box and/or voice broadcast including: prompting the user to issue the first wake-up-free instruction through a prompt box, the prompt box containing the first wake-up-free instruction;
  • a voice broadcast is used to guide the user to issue the first wake-up-free command.
  • the terminal prompts the user to issue the first wake-up-free instruction through a prompt box for the first time, and the prompt box contains the first wake-up-free instruction.
  • the second time it prompts the user to issue the first wake-up-free instruction through the prompt box again, within 1 minute.
  • the number of prompts through the prompt box reaches two, but the user does not issue the first wake-up-free command, the user is guided to issue the first wake-up-free command through voice broadcast.
  • Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application.
  • the terminal determines that the voice input has similar semantics to the first wake-up-free command in the command library, "Navigate to the company". Therefore, the user is prompted through the prompt box for the first time to " Try using the navigation method to go to the company. The second time the user still uses “leave to work”. The terminal continues to prompt the user through the prompt box to "try using the navigation method to go to the company.” The third time the user still uses “leave to work”. The terminal prompts the user through a prompt box to "try using the navigation system to go to the company” and prompts the user through a voice prompt to "try using the navigation system to go to the company.”
  • the terminal receives the first voice input.
  • the first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions.
  • the terminal guides the user to input the corresponding first wake-up-free command, so that after the user inputs the first wake-up-free command, the terminal responds accordingly.
  • the user's interactive experience can be greatly improved. .
  • Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application.
  • the method may include steps 1601 to 1605. Each step shown in Figure 16 will be described in detail below.
  • Step 1601 Receive first voice input from the user.
  • the terminal In response to the user's operation of inputting the first voice input, the terminal receives the first voice input from the user.
  • the first voice input is received without receiving a preset wake-up word from the user.
  • the first voice input includes: “Navigate to location A”, “I want to go to location A”, “Place A” "Where” and so on, I won't list them one by one here.
  • Step 1602 Determine whether the first voice input is used to request navigation.
  • the terminal determines whether the intention of the first voice input is to request navigation. If the first voice input is not used to request navigation, the terminal performs step 1603; if the first voice input is used to request navigation, the terminal performs step 1604.
  • Step 1603 Do not respond to the first voice input.
  • Step 1604 Ask the user for the destination requesting navigation.
  • the terminal inquires the user about the destination for requesting navigation. For example, if the voice input is "Navigate to location A", then the terminal receives and determines that the first voice input is for navigation. Further, the terminal asks the user for the navigation destination, such as asking the user "where do you want to go.” The user feedbacks "Place A", and after receiving the user's feedback, the terminal obtains the route to Location A from the cloud.
  • the terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast.
  • a prompt box such as toast
  • a prompt box such as toast plus voice broadcast.
  • This application does not place any restrictions on the terminal's query method. For example, the terminal asks the user through a prompt box (such as toast) for the first two times, and the third time uses a prompt box (such as toast) plus voice broadcast to ask the user.
  • Step 1605 Provide navigation services to the user based on the destination fed back by the user.
  • the terminal After obtaining the route to the above destination, the terminal provides navigation services to the user. For example, display directions to a destination through the user interface.
  • the terminal can also generate a wake-up-free instruction including the above destination based on the destination.
  • the terminal can also prompt the user through prompt boxes and/or voice broadcasts that the above-mentioned wake-up-free instruction can be used directly next time.
  • FIG 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application.
  • the terminal in response to the user's voice input operation of "navigate to location A", the terminal asks the user "where do you want to go” through voice broadcast.
  • the user replies "Place A", and in response to the user's reply, the terminal displays the route to the location A to the user through the user interface.
  • the terminal without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. to provide navigation services to users, There is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interaction experience.
  • An embodiment of the present application also provides a terminal, which includes corresponding modules for performing the steps performed by the terminal in any one of the embodiments described in FIGS. 5 to 17 .
  • the terminal can be used to implement the method described in any of the embodiments described in Figures 5 to 17.
  • the modules included in the terminal can be implemented by software and/or hardware.
  • An embodiment of the present application also provides a terminal, which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to call and execute the computer program, so that the terminal implements the implementation described in Figures 5 to 17.
  • a terminal which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to call and execute the computer program, so that the terminal implements the implementation described in Figures 5 to 17. The method described in any of the examples.
  • An embodiment of the present application also provides a vehicle, on which a terminal as described above is deployed.
  • the terminal may be a vehicle machine, for example.
  • This application also provides a chip system, which includes at least one processor and is used to implement the method described in any one of the embodiments described in FIGS. 5 to 17 .
  • the chip system further includes a memory, the memory is used to store program instructions and data, and the memory is located within the processor or outside the processor.
  • the chip system can be composed of chips or include chips and other discrete devices.
  • the computer program product includes computer-readable instructions.
  • the computer-readable instructions are run by a computer, any one of the embodiments described in FIGS. 5 to 17 can be implemented. the method described.
  • This application also provides a computer-readable storage medium that stores computer-readable instructions.
  • the computer readable instructions are executed by the computer, the method described in any one of the embodiments described in FIGS. 5 to 17 is implemented.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the above-mentioned processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA), or other available processors.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • Programmd logic devices discrete gate or transistor logic devices, discrete hardware components.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • non-volatile memory may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory.
  • Volatile memory can be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • unit may be used to refer to computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution.
  • multiple units or components may be combined or can be integrated into another system, or some features can to ignore or not execute.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as discrete components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • each functional unit may be implemented in whole or in part by software, hardware, firmware, or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions (programs). When the computer program instructions (program) are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part.
  • the computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., digital video discs (DVD)), or semiconductor media (e.g., solid state disks (SSD) )wait.
  • magnetic media e.g., floppy disks, hard disks, magnetic tapes
  • optical media e.g., digital video discs (DVD)
  • semiconductor media e.g., solid state disks (SSD)
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Abstract

Provided are a human-machine interaction method and a related apparatus. The method comprises: a terminal receiving a first speech input from a user, and when the first speech input is semantically similar to a predefined first wake-up-free instruction, making a corresponding response to the first speech input. That is, it is not necessary to wake up the terminal in advance, and as long as the received first speech input is semantically similar to the predefined first wake-up-free instruction, the terminal can execute an operation corresponding to the first speech input. Therefore, the problem of a terminal having no response due to the fact that a first wake-up-free instruction is fixed and limited is solved, and compared with an approach in which the terminal only responds to the predefined first wake-up-free instruction, the method greatly improves the interaction experience of a user.

Description

人机交互方法及相关装置Human-computer interaction method and related devices
本申请要求于2022年09月05日提交中国专利局、申请号为202211079452.4、申请名称为“人机交互方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on September 5, 2022, with the application number 202211079452.4 and the application title "Human-computer interaction method and related devices", the entire content of which is incorporated into this application by reference. .
技术领域Technical field
本申请涉及终端技术领域,尤其涉及人机交互方法及相关装置。The present application relates to the field of terminal technology, and in particular to human-computer interaction methods and related devices.
背景技术Background technique
随着智能终端的普及程度越来越高,语音交互成为常用且重要的人机交互方式之一。目前,语音交互大多需要用户先通过预设的唤醒词唤醒终端,进而实现后续的交互,这种方式比较繁琐,进而用户体验不佳。还有一部分厂家提供了免唤醒功能,也即,无需预先唤醒终端,直接输入预定义的免唤醒指令即可,但是预定义的免唤醒指令固定且有限,并且容易在用户聊天的时候误唤醒,影响用户体验。With the increasing popularity of smart terminals, voice interaction has become one of the commonly used and important human-computer interaction methods. At present, most voice interactions require users to wake up the terminal through a preset wake-up word first, and then implement subsequent interactions. This method is cumbersome and results in poor user experience. Some manufacturers also provide a wake-up-free function, that is, there is no need to wake up the terminal in advance, and you can directly enter the predefined wake-up-free instructions. However, the predefined wake-up-free instructions are fixed and limited, and it is easy to accidentally wake up the user while chatting. Affect user experience.
因此,希望提供人机交互方法,以提高用户的交互体验。Therefore, it is hoped to provide human-computer interaction methods to improve the user's interactive experience.
发明内容Contents of the invention
本申请提供了人机交互方法及相关装置,以期提高用户的交互体验。This application provides a human-computer interaction method and related devices, in order to improve the user's interactive experience.
第一方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片系统等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。In the first aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
示例性地,该方法包括:接收来自用户的第一语音输入;在确定上述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对上述第一语音输入做出相应的响应,上述第一免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行第一免唤醒指令对应的操作。Exemplarily, the method includes: receiving a first voice input from a user; and making a corresponding response to the first voice input if it is determined that the first voice input is semantically similar to a predefined first wake-up-free instruction. , the above-mentioned first wake-up-free instruction is used to instruct the terminal to perform the operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.
基于上述技术方案,终端接收来自用户的第一语音输入,在上述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应,也即,在未预先唤醒终端的情况下,即使用户语音输入的语句不是预定义的第一免唤醒指令,只要与预定义的第一免唤醒指令语义相似,终端便可以做出响应,有利于解决预定义的第一免唤醒指令固定且有限导致的终端无响应的问题,进而有利于提高用户的交互体验。Based on the above technical solution, the terminal receives the first voice input from the user, and when the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input, that is, Without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is semantically similar to the predefined first wake-up-free instruction, the terminal can respond, which is conducive to solving the predefined problem. The first wake-up-free command is fixed and limited, causing the problem of terminal unresponsiveness, which in turn helps improve the user's interactive experience.
结合第一方面,在第一方面的某些可能的实现方式中,在对第一语音输入做出相应的响应之前,上述方法还包括:向用户确认第一语音输入的语义。In conjunction with the first aspect, in some possible implementations of the first aspect, before making a corresponding response to the first voice input, the above method further includes: confirming the semantics of the first voice input to the user.
终端接收到第一语音输入之后,可以向用户确认一下识别出的第一语音输入的语义是否正确,这样一方面可以提高准确性,另一方面,可以避免用户误提及第一语音输入导致终端做出响应,例如,如果用户是误提及第一语音输入,则可以在终端向用户确认时,做出否定的回复,以避免终端继续执行对应的操作,有利于提高用户的体验。After the terminal receives the first voice input, it can confirm with the user whether the semantics of the recognized first voice input are correct. This can improve the accuracy on the one hand and prevent the user from accidentally mentioning the first voice input causing the terminal to To respond, for example, if the user mentions the first voice input by mistake, a negative reply can be made when the terminal confirms to the user, so as to prevent the terminal from continuing to perform the corresponding operation, which is conducive to improving the user's experience.
可选地,向用户确认第一语音输入的语义,包括:通过提示框和/或语音播报,向用户确认第一语音输入的语义。Optionally, confirming the semantics of the first voice input to the user includes: confirming the semantics of the first voice input to the user through a prompt box and/or voice broadcast.
终端可以通过提示框向用户确认第一语音输入的语义,提示框中包含第一语音输入的语义,还可以通过语音播报,向用户确认第一语音输入的语义,还可以通过提示框和语音播报结合的方式,向用户确认第一语音输入的语义。通过提供上述多种确认方式,大大提高了终端向用户确认语义时的灵活性。The terminal can confirm the semantics of the first voice input to the user through the prompt box, which contains the semantics of the first voice input, and can also confirm the semantics of the first voice input to the user through voice broadcast, and can also confirm the semantics of the first voice input to the user through the prompt box and voice broadcast. In a combined manner, the semantics of the first voice input is confirmed to the user. By providing the above multiple confirmation methods, the terminal's flexibility in confirming semantics to users is greatly improved.
结合第一方面,在第一方面的某些可能的实现方式中,上述方法还包括:向用户提示第一免唤醒指令。In conjunction with the first aspect, in some possible implementations of the first aspect, the above method further includes: prompting the user with a first wake-up-free instruction.
终端还可以向用户提示下一次直接使用预定义的第一免唤醒指令。例如,终端可以通过提示框和/或语音播报,向用户提示第一免唤醒指令。本申请对提示方式不作限定。The terminal can also prompt the user to directly use the predefined first wake-up-free command next time. For example, the terminal may prompt the user with the first wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
结合第一方面,在第一方面的某些可能的实现方式中,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的第二语音输入;在第二语音输入与第一免唤醒指令语义相似的情况下,向用户确认第二语音输入的语义;响应于用户确认第二语音输入的语义的操作,生成与第二语音输入对应 的第二免唤醒指令。In conjunction with the first aspect, in some possible implementations of the first aspect, before receiving the first voice input from the user, the above method further includes: receiving a second voice input from the user; When the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input. The second wake-up-free command.
终端通过上述方法来学习并生成第二免唤醒指令,第二免唤醒指令可以用于在不输入预设的唤醒词的情况下指示终端执行对应的操作。具体地,终端接收到来自用户的第二语音输入后,在第二语音输入与第一免唤醒指令语义相似的情况下,向用户确认一下是否是上述语义,如果用户确认第二语音输入的语义正确,则生成与其对应的第二免唤醒指令,以便于下次终端未被预先唤醒的情况下,再次接收到第二语音输入时,可以对其做出响应。换言之,大大增加了可以用于在不输入预设的唤醒词的情况下指示终端执行对应的操作的免唤醒指令,进而有利于提高用户的交互体验。The terminal learns and generates a second wake-up-free instruction through the above method. The second wake-up-free instruction can be used to instruct the terminal to perform a corresponding operation without inputting a preset wake-up word. Specifically, after the terminal receives the second voice input from the user, if the second voice input is semantically similar to the first wake-up-free instruction, it will confirm with the user whether it is the above semantics. If the user confirms the semantics of the second voice input, If correct, a corresponding second wake-up-free instruction is generated, so that when the terminal is not awakened in advance next time and receives the second voice input again, it can respond to it. In other words, the number of wake-up-free instructions that can be used to instruct the terminal to perform corresponding operations without inputting a preset wake-up word is greatly increased, which in turn helps improve the user's interactive experience.
结合第一方面,在第一方面的某些可能的实现方式中,第一语音输入与预定义的第一免唤醒指令语义相似,包括:第一语音输入与第二免唤醒指令相同。In conjunction with the first aspect, in some possible implementations of the first aspect, the first voice input is semantically similar to the predefined first wake-up-free instruction, including: the first voice input is the same as the second wake-up-free instruction.
终端接收到第一语音输入之后,确定第一语音输入与第一免唤醒指令是否语义相似,一种方式是,可以基于第一语音输入与预定义的第一免唤醒指令做语义分析确定二者是否语义相似。另一种方式是,终端可以判断第一语音输入与生成的第二免唤醒指令是否相同,可以理解,第二免唤醒指令是基于第二语音输入生成的与第一免唤醒指令语义相似的指令,如果第一语音输入与生成的第二免唤醒指令相同,则第一语音输入与第一免唤醒指令语义相似,这样终端也可以对第一语音输入做出响应。上述两种方式可以结合使用,也可以分开使用,大大提高了终端确定第一语音输入与第一免唤醒指令是否语义相似的灵活性。After receiving the first voice input, the terminal determines whether the first voice input and the first wake-up-free instruction are semantically similar. One way is to perform semantic analysis to determine the two based on the first voice input and the predefined first wake-up-free instruction. Are the semantics similar? Another way is that the terminal can determine whether the first voice input and the generated second wake-up-free instruction are the same. It can be understood that the second wake-up-free instruction is an instruction generated based on the second voice input that is semantically similar to the first wake-up-free instruction. , if the first voice input is the same as the generated second wake-up-free instruction, the first voice input is semantically similar to the first wake-up-free instruction, so that the terminal can also respond to the first voice input. The above two methods can be used in combination or separately, which greatly improves the terminal's flexibility in determining whether the first voice input and the first wake-up-free instruction are semantically similar.
结合第一方面,在第一方面的某些可能的实现方式中,接收来自用户的第二语音输入,包括:在预设时长范围内连续多次接收到第二语音输入。In conjunction with the first aspect, in some possible implementations of the first aspect, receiving the second voice input from the user includes: receiving the second voice input multiple times continuously within a preset time range.
换言之,终端在预设时长范围内连续多次接收到第二语音输入的情况下,再向用户确认第二语音输入的语义,这样一来,可以有效地避免用户误提及第二语音输入的情况下,终端误以为是用户希望执行相应的操作,有利于提高用户的交互体验。In other words, when the terminal receives the second voice input multiple times continuously within the preset time range, it will confirm the semantics of the second voice input to the user. In this way, the user can effectively avoid accidentally mentioning the second voice input. In this case, the terminal mistakenly thinks that the user wants to perform the corresponding operation, which is beneficial to improving the user's interactive experience.
第二方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片系统等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。In the second aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or it can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or it can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
示例性地,该方法包括:接收来自用户的第一语音输入;在未接收到预设的唤醒词,但第一语音输入包含目标对象的情况下,对第一语音输入做出相应的响应,上述目标对象是在第一语音输入之前接收到的其他语音输入中被提及次数达到预设门限的对象,上述预设的唤醒词用于唤醒终端。Exemplarily, the method includes: receiving a first voice input from the user; and making a corresponding response to the first voice input if the preset wake-up word is not received but the first voice input contains the target object, The target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the preset wake-up word is used to wake up the terminal.
基于上述技术方案,终端未被预先唤醒的情况下,接收到来自用户的第一语音输入后,若该第一语音输入中包含之前语音输入中被提及次数达到预设门限的对象,则对其做出相应的响应,也即,通过对之前语音输入的学习,保存被提及次数达到预设门限的目标对象后,只要接收到的语音输入中包含上述目标对象,即使不预先唤醒终端,终端也可以对其做出相应的响应,节省了唤醒终端的时间,简化了交互流程,有利于提高用户的交互体验。Based on the above technical solution, when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interactive experience.
结合第二方面,在第二方面的某些可能的实现方式中,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的预设的唤醒词;接收来自用户的第二语音输入;在第二语音输入中包含的第一对象在第二语音输入及其之前的语音输入中被提及的次数超过预设门限的情况下,将第一对象确定为目标对象。Combined with the second aspect, in some possible implementations of the second aspect, before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second wake-up word from the user. Voice input; when the number of times the first object contained in the second voice input is mentioned in the second voice input and its previous voice input exceeds a preset threshold, the first object is determined as the target object.
终端可以记录第一对象在语音输入中被提及的次数,如果第一对象在语音输入中被提及的次数超过预设门限,则将其确定为目标对象,以便于用户后续可以在未预先唤醒终端的情况下,发出包含目标对象的语音输入,终端接收到上述语音输入后,便可以做出响应,也即,无需预先唤醒终端,简化了交互流程,有利于提高用户的交互体验。The terminal may record the number of times the first object is mentioned in the voice input. If the number of times the first object is mentioned in the voice input exceeds a preset threshold, it is determined as the target object so that the user can subsequently refer to it without prior notice. When the terminal is woken up, voice input containing the target object is sent out. After the terminal receives the above voice input, it can respond. That is, there is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.
结合第二方面,在第二方面的某些可能的实现方式中,上述方法还包括:基于目标对象,生成包含目标对象的免唤醒指令;向用户提示免唤醒指令。Combined with the second aspect, in some possible implementations of the second aspect, the above method further includes: based on the target object, generating a wake-up-free instruction including the target object; and prompting the user for the wake-up-free instruction.
终端可以基于目标对象,生成包含目标对象的免唤醒指令,并向用户提示下次可以直接使用上述免唤醒指令,无需预先唤醒终端,终端即可以做出相应的响应。其中,终端可以通过提示框和/或语音播报,向用户提示上述免唤醒指令。本申请对提示方式不作限定。The terminal can generate a wake-up-free instruction including the target object based on the target object, and prompt the user that the above wake-up-free instruction can be used directly next time, without waking up the terminal in advance, and the terminal can respond accordingly. The terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
第三方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片系统等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软 件实现,本申请对此不作限定。In the third aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic module or software that implements all or part of the terminal functions Software implementation, this application does not limit this.
示例性地,该方法包括:接收来自用户的第一语音输入,该第一语音输入属于第一指令集合,该第一指令集合中的指令与预定义的免唤醒指令语义相似;在满足预设条件的情况下,响应第一语音输入。Exemplarily, the method includes: receiving a first voice input from the user, the first voice input belonging to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions; condition, respond to the first voice input.
基于上述技术方案,终端接收到与预定义的免唤醒指令语义相似的第一语音输入后,在满足预设条件的情况下,响应第一语音输入,也就是说,对于与预定义的免唤醒指令语义相似的第一语音输入,满足预设条件,终端才会做出相应的响应,并不是任何情况下都能响应,这样可以避免用户误提及第一语音输入导致终端响应。可以想象,第一语音输入可能相对预定义的免唤醒指令来说比较口语化,如果任何情况下都做出响应,很可能出现用户交谈过程中频繁触发终端响应的情况,因此,通过设置预设条件,在满足预设条件的情况下,终端才会做出相应的响应,有利于大大提高用户的交互体验。Based on the above technical solution, after the terminal receives the first voice input that is semantically similar to the predefined wake-up-free instruction, it responds to the first voice input when the preset conditions are met. That is to say, for the predefined wake-up-free command The terminal will respond accordingly only if the first voice input with similar command semantics meets the preset conditions, which may not be possible in all circumstances. This can prevent the user from mistakenly mentioning the first voice input and causing the terminal to respond. It is conceivable that the first voice input may be more colloquial than the predefined wake-up-free instructions. If a response is made under any circumstances, it is likely that the terminal response will be frequently triggered during the user's conversation. Therefore, by setting the preset Conditions, the terminal will respond accordingly only when the preset conditions are met, which will greatly improve the user's interactive experience.
结合第三方面,在第三方面的某些可能的实现方式中,上述预设条件包括以下至少一项:与终端距离处于预设范围内的用户的数量不超过阈值;用户处于预定义的位置;第一语音输入所来自的用户不属于预设人群;或,接收到第一语音输入的时间落入预设时段。Combined with the third aspect, in some possible implementations of the third aspect, the above-mentioned preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined location ; The user from whom the first voice input comes does not belong to the preset group; or, the time when the first voice input is received falls within the preset period.
与终端距离处于预设范围内的用户的数量不超过阈值,也即,在与终端距离处于预设范围内的用户的数量较少的情况下,可以响应上述第一语音输入,不难理解,如果周围用户数量较少,则用户误提及第一语音输入的可能性越小,也即,用户可能确实是希望终端执行对应的操作,相对地,如果周围用户数量较多,则用户误提及第一语音输入的可能性越大。用户处于预定义的位置,例如,终端响应来自距离自身最近的用户的第一语音输入,或,用户处于景区,希望终端提高服务的可能性更大等,终端响应来自用户的第一语音输入。第一语音输入所来自的用户不属于预设人群,预设人群例如小孩、老人等,可以理解,对于预设人群,其发出的指令可能存在危险性,终端可以不对其做出响应。接收到第一语音输入的时间落入预设时段,预设时段例如可以是上班时段,这些时段终端可以响应上述第一语音输入,如果是其他时段,终端可以只响应预定义的免唤醒指令。综上,上述预设条件可以有效地避免用户误提及第一语音输入导致终端响应。The number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, the above first voice input can be responded to. It is not difficult to understand that, If the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller. That is, the user may really want the terminal to perform the corresponding operation. On the contrary, if the number of surrounding users is large, the user mistakenly mentions the first voice input. And the greater the possibility of first voice input. The user is in a predefined position. For example, the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service. The terminal responds to the first voice input from the user. The user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them. The time when the first voice input is received falls within a preset period. The preset period can be, for example, working hours. During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond to predefined wake-up-free instructions. In summary, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.
结合第三方面,在第三方面的某些可能的实现方式中,上述方法应用于车,上述与终端距离处于预设范围内的用户的数量不超过阈值,包括:车内存在一个乘客;或,上述用户处于预定义的位置,包括:用户处于主驾的位置。Combined with the third aspect, in some possible implementations of the third aspect, the above method is applied to a car, and the number of users within a preset range from the terminal does not exceed a threshold, including: there is a passenger in the car; or , the above-mentioned user is in a predefined position, including: the user is in the main driving position.
第四方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片系统等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。In the fourth aspect, the present application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
示例性地,该方法包括:在未接收到来自用户的预设的唤醒词的情况下,根据来自用户的第一语音输入,确定第一语音输入用于请求导航;向用户询问请求导航的目的地;基于用户反馈的目的地,为用户提供导航服务。Exemplarily, the method includes: without receiving a preset wake-up word from the user, determining according to the first voice input from the user that the first voice input is used to request navigation; and asking the user for the purpose of requesting navigation. location; provide navigation services to users based on destinations fed back by users.
基于上述技术方案,在未预先唤醒终端的情况下,终端接收到来自用户的第一语音输入后,发现其意图是想请求导航,便可以向用户询问导航的目的地,并根据用户反馈的目的地,向用户提供导航服务,无需预先唤醒终端,简化了交互流程,有利于提高用户的交互体验。Based on the above technical solution, without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. It provides navigation services to users without waking up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.
结合第四方面,在第四方面的某些可能的实现方式中,上述方法还包括:生成包含目的地的免唤醒指令;向用户提示上述免唤醒指令。Combined with the fourth aspect, in some possible implementations of the fourth aspect, the above method further includes: generating a wake-up-free instruction including a destination; and prompting the user for the wake-up-free instruction.
终端可以生成包含上述目的地的免唤醒指令,并向用户提示下次可以直接使用上述免唤醒指令,终端便可以做出相应的响应。其中,终端可以通过提示框和/或语音播报,向用户提示上述免唤醒指令。本申请对提示方式不作限定。The terminal can generate a wake-up-free command including the above destination, and prompt the user that the above wake-up-free command can be used directly next time, and the terminal can respond accordingly. The terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.
第五方面,本申请提供了一种人机交互方法,该方法可以由终端来执行,或者,也可以由配置在终端中的部件(如芯片、芯片系统等)执行,或者,还可以由能实现全部或部分终端功能的逻辑模块或软件实现,本申请对此不作限定。In the fifth aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.
示例性地,该方法包括:接收来自用户的第一语音输入,该第一语音输入不属于预定义的免唤醒指令;在第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似的情况下,引导用户输入上述第一免唤醒指令。Exemplarily, the method includes: receiving a first voice input from the user, which does not belong to the predefined wake-up-free instructions; and receiving the first wake-up-free instruction among the first voice input and the predefined wake-up-free instructions. If the semantics are similar, the user is guided to input the above-mentioned first wake-up-free instruction.
基于上述技术方案,终端接收到第一语音输入,该第一语音输入不属于预定义的免唤醒指令,但该第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似,则终端引导用户输入对应的第一免 唤醒指令,以便于用户输入第一免唤醒指令后,终端对其做出相应的响应,相比于终端不响应也不提示,可以大大提高用户的交互体验。Based on the above technical solution, the terminal receives the first voice input. The first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions. Then the terminal guides the user to enter the corresponding first free The wake-up command allows the terminal to respond accordingly after the user inputs the first wake-up-free command. Compared with the terminal not responding or prompting, the user's interactive experience can be greatly improved.
结合第五方面,在第五方面的某些可能的实现方式中,上述引导用户输入第一免唤醒指令,包括:通过提示框和/或语音播报,引导用户输入第一免唤醒指令。Combined with the fifth aspect, in some possible implementations of the fifth aspect, the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or a voice broadcast.
终端可以通过提示框引导用户输入第一免唤醒指令,提示框中包含第一免唤醒指令,还可以通过语音播报,引导用户输入第一免唤醒指令,还可以通过提示框和语音播报结合的方式,引导用户输入第一免唤醒指令。通过提供上述多种方式,大大提高了终端引导用户输入第一免唤醒指令时的灵活性。The terminal can guide the user to enter the first wake-up-free instruction through a prompt box, which contains the first wake-up-free instruction. It can also guide the user to enter the first wake-up-free instruction through voice broadcast, or it can also combine the prompt box and voice broadcast. , guiding the user to enter the first wake-up-free command. By providing the above multiple methods, the flexibility of the terminal in guiding the user to input the first wake-up-free command is greatly improved.
结合第五方面,在第五方面的某些可能的实现方式中,上述通过提示框和/或语音播报,引导用户输入第一免唤醒指令,包括:通过提示框提示用户输入第一免唤醒指令,该提示框中包含第一免唤醒指令;在预设时长范围内通过提示框提示的次数达到预设门限,但用户未发出第一免唤醒指令的情况下,通过语音播报,引导用户输入第一免唤醒指令。Combined with the fifth aspect, in some possible implementations of the fifth aspect, the above-mentioned guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast includes: prompting the user to input the first wake-up-free instruction through a prompt box , the prompt box contains the first wake-up-free command; when the number of prompts through the prompt box reaches the preset threshold within the preset time period, but the user does not issue the first wake-up-free command, the user is guided to enter the third wake-up command through voice broadcast. One-free wake-up command.
第六方面,本申请提供了一种计算机设备,包括用于实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法的单元。应理解,各个单元可通过执行计算机程序来实现相应的功能。In a sixth aspect, the present application provides a computer device, including a unit for implementing the method in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects. It should be understood that each unit can implement the corresponding function by executing a computer program.
第七方面,本申请提供了一种计算机设备,包括处理器,所述处理器用于执行第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中所述的方法。In a seventh aspect, the present application provides a computer device, including a processor configured to execute the method described in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects.
所述计算机设备还可以包括存储器,用于存储计算机可读指令,所述处理器读取所述计算机可读指令使得所述计算机设备可以实现上述各方面中描述的方法。所述计算机设备还可以包括通信接口,所述通信接口用于该计算机设备与其它设备进行通信,示例性地,通信接口可以是收发器、电路、总线、模块或其它类型的通信接口。The computer device may further include a memory for storing computer readable instructions, and the processor reads the computer readable instructions so that the computer device can implement the methods described in the above aspects. The computer device may also include a communication interface for the computer device to communicate with other devices. For example, the communication interface may be a transceiver, a circuit, a bus, a module or other types of communication interfaces.
第八方面,本申请提供了一种车辆,用于实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法,或,包括第六方面或第七方面所述的任意一种计算机设备。In an eighth aspect, this application provides a vehicle for implementing the method in any of the first to fifth aspects and any possible implementation manner of the first to fifth aspects, or including the sixth aspect or the seventh aspect. Any of the computer equipment described above.
第九方面,本申请提供了一种芯片系统,该芯片系统包括至少一个处理器,用于支持实现上述第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中所涉及的功能,例如,例如接收或处理上述方法中所涉及的数据和/或信息。In a ninth aspect, the present application provides a chip system, which includes at least one processor and is used to support the implementation of any of the above-mentioned first to fifth aspects and any possible implementation manner of the first to fifth aspects. The functions involved, for example, include receiving or processing data and/or information involved in the above methods.
在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存程序指令和数据,存储器位于处理器之内或处理器之外。In a possible design, the chip system further includes a memory, the memory is used to store program instructions and data, and the memory is located within the processor or outside the processor.
该芯片系统可以由芯片构成,也可以包含芯片和其它分立器件。The chip system may be composed of chips, or may include chips and other discrete devices.
第十方面,本申请提供了一种计算机可读存储介质,所述存储介质中存储有计算机可读指令,当所述计算机可读指令被计算机执行时,使得计算机实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法。In a tenth aspect, the present application provides a computer-readable storage medium. Computer-readable instructions are stored in the storage medium. When the computer-readable instructions are executed by a computer, the computer implements the first to fifth aspects. and the method in any possible implementation manner of the first aspect to the fifth aspect.
第十一方面,本申请提供了一种计算机程序产品,所述计算机程序产品包括:计算机可读指令,当所述计算机可读指令被计算机运行时,使得计算机实现第一方面至第五方面以及第一方面至第五方面任一种可能实现方式中的方法。In an eleventh aspect, the present application provides a computer program product. The computer program product includes: computer readable instructions. When the computer readable instructions are run by a computer, the computer implements the first to fifth aspects and The method in any possible implementation manner of the first aspect to the fifth aspect.
应当理解的是,本申请的第六方面至第十一方面与本申请的第一方面至第五方面的技术方案相对应,各方面及对应的可行实施方式所取得的有益效果相似,不再赘述。It should be understood that the sixth to eleventh aspects of the present application correspond to the technical solutions of the first to fifth aspects of the present application, and the beneficial effects achieved by each aspect and corresponding feasible implementations are similar and are no longer Repeat.
附图说明Description of the drawings
图1是本申请实施例提供的终端的结构示意图;Figure 1 is a schematic structural diagram of a terminal provided by an embodiment of the present application;
图2是适用于本申请实施例提供的人机交互方法的场景示意图;Figure 2 is a schematic diagram of a scenario applicable to the human-computer interaction method provided by the embodiment of the present application;
图3是一种已知的人机交互方法的示意图;Figure 3 is a schematic diagram of a known human-computer interaction method;
图4是另一种已知的人机交互方法的示意图;Figure 4 is a schematic diagram of another known human-computer interaction method;
图5是本申请实施例提供的第一种人机交互方法的示意性流程图;Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application;
图6是本申请实施例提供的第一种人机交互方法的交互示意图;Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application;
图7是本申请实施例提供的对语音输入的用语进行学习的流程示意图;Figure 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application;
图8是本申请实施例提供的对语音输入的用语进行学习的又一流程示意图;Figure 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application;
图9是本申请实施例提供的第二种人机交互方法的示意性流程图;Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application;
图10是本申请实施例提供的第二种人机交互方法的交互示意图; Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application;
图11是本申请实施例提供的对第二语音输入中的第一对象进行学习的流程示意图;Figure 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application;
图12是本申请实施例提供的第三种人机交互方法的示意性流程图;Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application;
图13是本申请实施例提供的根据场景确定是否响应第一语音输入流程示意图;Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application;
图14是本申请实施例提供的第四种人机交互方法的流程示意图;Figure 14 is a schematic flow chart of the fourth human-computer interaction method provided by the embodiment of the present application;
图15是本申请实施例提供的引导用户发出第一免唤醒指令的交互示意图;Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application;
图16是本申请实施例提供的第五种人机交互方法的示意性流程图;Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application;
图17是本申请实施例提供的第五种人机交互方法的交互示意图。Figure 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application.
具体实施方式Detailed ways
下面将结合附图,对本申请中的技术方案进行描述。The technical solutions in this application will be described below with reference to the accompanying drawings.
本申请实施例提供的方法可以应用于手机、平板电脑、智能手表、智能音箱、可穿戴设备、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、个人计算机(personal computer,PC)、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、分布式设备等终端上。The methods provided by the embodiments of this application can be applied to mobile phones, tablet computers, smart watches, smart speakers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, On terminals such as personal computers (PCs), ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), and distributed devices.
需要说明的是,本申请实施例对终端的具体类型不作任何限定。It should be noted that the embodiments of this application do not place any limitation on the specific type of terminal.
示例性地,图1示出了终端100的结构示意图。如图1所示,该终端100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。其中传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L,骨传导传感器180M等。Exemplarily, FIG. 1 shows a schematic structural diagram of a terminal 100. As shown in Figure 1, the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , Antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP)、微控制单元(microcontroller unit,MCU)、调制解调处理器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、存储器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器及神经网络处理器(neural-network processing unit,NPU)等中的一个或多个。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (application processor, AP), a microcontroller unit (microcontroller unit, MCU), a modem processor, a graphics processor (graphics processor). processing unit (GPU), image signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor and neural network processor (neural -One or more of -network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors.
其中,应用处理器通过音频模块170(如扬声器170A等)输出声音信号,或通过显示屏194显示图像或视频。The application processor outputs sound signals through the audio module 170 (such as the speaker 170A, etc.), or displays images or videos through the display screen 194 .
控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the terminal 100. The controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
处理器110可以通过执行指令,执行不同的操作,以实现不同的功能。该指令例如可以是设备出厂前预先保存在存储器中的指令,也可以是用户在使用过程中安装新的应用(application,APP)之后从APP中读取到的指令,本申请实施例对此不作任何限定。The processor 110 can perform different operations to implement different functions by executing instructions. For example, the instruction may be an instruction pre-stored in the memory before the device leaves the factory, or it may be an instruction read from the APP after the user installs a new application (APP) during use. This is not the case in the embodiments of this application. Any limitations.
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口、集成电路内置音频(inter-integrated circuit sound,I2S)接口、安全数字输入输出接口(secure digital input and output,SDIO)、脉冲编码调制(pulse code modulation,PCM)接口、通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口、通用同步异步收发传输器(universal synchronous asynchronous receiver/transmitter,USART)、移动产业处理器接口(mobile industry processor interface,MIPI)、通用输入输出(general-purpose input/output,GPIO)接口、SIM接口和/或USB接口等。 In some embodiments, processor 110 may include one or more interfaces. Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, secure digital input and output (SDIO), pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous receiver/transmitter (UART) interface, universal synchronous asynchronous receiver/transmitter (USART), mobile industry processor interface , MIPI), general-purpose input/output (GPIO) interface, SIM interface and/or USB interface, etc.
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为终端100充电,也可以用于终端100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他终端。The USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc. The USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transmit data between the terminal 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other terminals.
可以理解的是,本申请示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在另一些实施例中,终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationships between the modules illustrated in this application are only schematic illustrations and do not constitute a structural limitation on the terminal 100 . In other embodiments, the terminal 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过终端100的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为终端供电。The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from the wired charger through the USB interface 130 . In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through the wireless charging coil of the terminal 100 . While charging the battery 142, the charging management module 140 can also provide power to the terminal through the power management module 141.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110、内部存储器121、外部存储器、显示屏194、摄像头193和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量、电池循环次数、电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc. The power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters. In some other embodiments, the power management module 141 may also be provided in the processor 110 . In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.
终端100的无线通信功能可以通过天线1、天线2、移动通信模块150、无线通信模块160、调制解调处理器以及基带处理器等实现。The wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in terminal 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
移动通信模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器、开关、功率放大器、低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。The mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。A modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN),如无线保真(wireless fidelity,Wi-Fi)网络)、蓝牙(bluetooth,BT)、全球导航卫星系统(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近距离无线通信技术(near field communication,NFC)、红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN), such as wireless fidelity (wireless fidelity, Wi-Fi), Bluetooth (bluetooth, BT), and global navigation satellite systems. Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR), etc. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
在一些实施例中,终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),第五代(5th generation,5G)通信系统,BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS), GNSS,北斗卫星导航系统(BeiDou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。In some embodiments, the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), fifth generation (5th generation, 5G) communication system, BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc. The GNSS may include a global positioning system (GPS), GNSS, BeiDou navigation satellite system (BDS), quasi-zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
终端100可以通过GPU、显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The terminal 100 can implement display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像、视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)、有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED),迷你LED(Mini LED)、微Led(Micro LED)、微OLED(Micro-OLED)、量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端100可以包括一个或多个显示屏194。The display screen 194 is used to display images, videos, etc. Display 194 includes a display panel. The display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode). emitting diode (AMOLED), flexible light-emitting diode (FLED), mini LED (Mini LED), micro Led (Micro LED), micro OLED (Micro-OLED), quantum dot light emitting diode (quantum dot light emitting diodes, QLED), etc. In some embodiments, terminal 100 may include one or more display screens 194.
在本申请中,显示屏194可以用于显示提示框,该提示框中包含预定义的免唤醒指令,该提示框用于提示用户下一次可以直接使用上述免唤醒指令,也即,无需预先唤醒终端,即可以通过上述免唤醒指令实现与终端的语音交互。In this application, the display screen 194 can be used to display a prompt box, which contains a predefined wake-up-free instruction. The prompt box is used to prompt the user to directly use the above-mentioned wake-up-free instruction next time, that is, no need to wake up in advance. The terminal can realize voice interaction with the terminal through the above wake-up-free command.
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。在一些实施例中,终端100可以包括一个或多个摄像头193。Camera 193 is used to capture still images or video. The object passes through the lens to produce an optical image that is projected onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. In some embodiments, terminal 100 may include one or more cameras 193.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端100在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.
视频编解码器用于对数字视频压缩或解压缩。终端100可以支持一种或多种视频编解码器。这样,终端100可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1、MPEG2、MPEG3、MPEG4等。Video codecs are used to compress or decompress digital video. Terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the terminal 100 . The internal memory 121 may include a program storage area and a data storage area. Among them, the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.). The storage data area may store data created during use of the terminal 100 (such as audio data, phone book, etc.). In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
终端100可以通过音频模块170,如扬声器170A、受话器170B、麦克风170C和耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放、录音等。The terminal 100 can implement audio functions through the audio module 170, such as the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. For example, music playback, recording, etc.
音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。The audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。终端100可以通过扬声器170A收听音乐,或收听免提通话。Speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.
受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当终端100接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。Receiver 170B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the terminal 100 answers a call or a voice message, the voice can be heard by bringing the receiver 170B close to the human ear.
麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。终端100可以设置 至少一个麦克风170C。在另一些实施例中,终端100可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端100还可以设置三个、四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。Microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C. Terminal 100 can be set At least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which in addition to collecting sound signals, may also implement a noise reduction function. In other embodiments, the terminal 100 can also be equipped with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.
在本申请中,麦克风170C可以用于接收来自用户的语音输入,也即,可以用于采集来自用户的声音信号。In this application, the microphone 170C can be used to receive voice input from the user, that is, can be used to collect sound signals from the user.
耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动终端平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。The headphone interface 170D is used to connect wired headphones. The headphone interface 170D can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.
按键190包括开机键(或称电源键)、音量键等。按键190可以是机械按键,也可以是触摸式按键。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。The buttons 190 include a power button (also called a power button), a volume button, etc. The button 190 may be a mechanical button or a touch button. The terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100.
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 191 can generate vibration prompts. The motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback. For example, touch operations for different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. The motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 . Different application scenarios (such as time reminders, receiving information, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also be customized.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和终端100的接触和分离。终端100可以支持一个或多个SIM卡接口。SIM卡接口195可以支持Nano SIM卡、Micro SIM卡、SIM卡等。同一个SIM卡接口195可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口195也可以兼容不同类型的SIM卡。SIM卡接口195也可以兼容外部存储卡。终端100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端100采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端100中,不能和终端100分离。The SIM card interface 195 is used to connect a SIM card. The SIM card can be connected to or separated from the terminal 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 . The terminal 100 may support one or more SIM card interfaces. SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 is also compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The terminal 100 interacts with the network through the SIM card to implement functions such as calls and data communications. In some embodiments, the terminal 100 adopts eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.
本申请示意的结构并不构成对终端100的具体限定。在另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。The structure illustrated in this application does not constitute a specific limitation on the terminal 100. In other embodiments, the terminal 100 may include more or fewer components than shown, or some components may be combined, or some components may be separated, or may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.
为便于理解本申请实施例提供的人机交互方法,下面将对适用于本申请实施例提供的人机交互方法的场景进行说明。可理解地,本申请实施例描述的应用场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定。In order to facilitate understanding of the human-computer interaction method provided by the embodiment of the present application, the scenarios applicable to the human-computer interaction method provided by the embodiment of the present application will be described below. It can be understood that the application scenarios described in the embodiments of the present application are for the purpose of more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application.
图2是适用于本申请实施例提供的方法的场景示意图。如图2所示,用户可以通过语音输入希望终端执行的操作,以实现与终端(图2中以手机为例)的交互。在某些场景中,语音交互成为重要且常用的人机交互方式之一。例如,用户驾驶车辆的过程中,可以通过语音实现与车机(终端的一示例)的交互。目前,用户可以先通过预设的唤醒词唤醒终端,更为详细地,用户可以先通过预设的唤醒词唤醒终端中的语音助手(或智慧助手、智能助手等,本申请对此不作限定),进而实现后续的交互,这种方式比较繁琐,进而用户体验不佳。还有一部分厂家提供了免唤醒功能,也即,用户无需预先唤醒语音助手,直接通过预定义的免唤醒指令即可实现与终端的交互。但是预定义的免唤醒指令固定且有限,如果用户语音输入的免唤醒指令不准确,则终端无响应,用户体验不佳。Figure 2 is a schematic diagram of a scenario applicable to the method provided by the embodiment of this application. As shown in Figure 2, the user can input operations that he wants the terminal to perform through voice to achieve interaction with the terminal (a mobile phone is used as an example in Figure 2). In some scenarios, voice interaction has become one of the important and commonly used human-computer interaction methods. For example, while the user is driving a vehicle, he or she can interact with the vehicle machine (an example of a terminal) through voice. Currently, the user can first wake up the terminal through the preset wake-up word. In more detail, the user can first wake up the voice assistant (or smart assistant, intelligent assistant, etc., this application does not limit this) through the preset wake-up word. , and then achieve subsequent interactions. This method is relatively cumbersome and results in poor user experience. Some manufacturers also provide a wake-up-free function, that is, users do not need to wake up the voice assistant in advance and can directly interact with the terminal through predefined wake-up-free instructions. However, the predefined wake-up-free instructions are fixed and limited. If the wake-up-free instructions input by the user's voice are inaccurate, the terminal will become unresponsive and the user experience will be poor.
下面将结合图3和图4详细描述上述两种已知的人机交互方法。The above two known human-computer interaction methods will be described in detail below with reference to FIGS. 3 and 4 .
图3示出了一种已知的人机交互方法。如图3所示,用户预先通过预设的唤醒词唤醒终端,更为详细地,用户先通过预设的唤醒词唤醒终端中的语音助手,如图3中示出的,唤醒词为“小艺小艺”,响应于用户通过语音输入“小艺小艺”的操作,语音助手回复“我在”。接着,用户通过语音输入“导航去地点A”,响应于用户语音输入“导航去地点A”的操作,语音助手回复“好的,开始为你导航”,并通过用户界面显示前往地点A的路线。可以看出,整个交互过程比较繁琐,导致用户体验不佳。Figure 3 shows a known human-computer interaction method. As shown in Figure 3, the user wakes up the terminal through a preset wake-up word in advance. More specifically, the user first wakes up the voice assistant in the terminal through a preset wake-up word. As shown in Figure 3, the wake-up word is "little Yi Xiaoyi", in response to the user inputting "Xiaoyi Xiaoyi" through voice, the voice assistant replies "I am here". Next, the user inputs "Navigate to location A" through voice. In response to the user's voice input of "Navigate to location A", the voice assistant replies "Okay, let's start navigating for you" and displays the route to location A through the user interface. . It can be seen that the entire interaction process is relatively cumbersome, resulting in poor user experience.
图4示出了另一种已知的人机交互方法。如图4所示,用户可以直接语音输入预定义的免唤醒指令,实现与终端的交互。例如,用户语音输入“导航去公司”,响应于用户语音输入“导航去公司”的操作,终端通过用户界面显示前往公司的路线,其中,终端上预存有该用户公司的地点。如果用户语音输入其他相似意图(或语义)的语句,如“出发去工作”、“我想去工作”、“我想去公司”等,语音助手均不会做出响应。总的来说,预定义的免唤醒指令固定且有限,很有可能导致语音助手无法响应用户的语音输 入,导致用户体验不佳。Figure 4 shows another known human-computer interaction method. As shown in Figure 4, users can directly voice input predefined wake-up-free instructions to interact with the terminal. For example, the user voice inputs "Navigate to the company", and in response to the user's voice input of "Navigate to the company", the terminal displays the route to the company through the user interface, in which the location of the user's company is pre-stored on the terminal. If the user voice inputs other statements with similar intentions (or semantics), such as "Go to work", "I want to go to work", "I want to go to the company", etc., the voice assistant will not respond. In general, the predefined wake-up-free instructions are fixed and limited, which may cause the voice assistant to be unable to respond to the user’s voice input. entry, resulting in poor user experience.
为提高用户的人机交互体验,本申请提供了一种人机交互方法,该方法包括:终端在接收到的来自用户的第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应,也就是说,在未预先唤醒终端的情况下,即使用户语音输入的语句不是预定义的第一免唤醒指令,只要与预定义的第一免唤醒指令语义相似,终端便可以识别并响应,有利于缓解预定义的第一免唤醒指令固定且有限导致的终端无响应的问题,进而有利于提高用户的语音交互体验。In order to improve the user's human-computer interaction experience, this application provides a human-computer interaction method. The method includes: when the terminal receives the first voice input from the user and is semantically similar to the predefined first wake-up-free instruction. , respond accordingly to the first voice input, that is, without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is consistent with the predefined first wake-up-free instruction. If the command semantics are similar, the terminal can recognize and respond, which will help alleviate the problem of terminal unresponsiveness caused by the fixed and limited predefined first wake-up-free command, which will further help improve the user's voice interaction experience.
为便于清楚描述本申请实施例的技术方案,首先做出如下说明。In order to clearly describe the technical solutions of the embodiments of the present application, the following description is first made.
第一,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。例如,第一语音输入和第二语音输入仅仅是为了区分不同的语音输入,并不对其先后顺序进行限定。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和位置进行限定,并且“第一”、“第二”等字样也并不限定一定不同。First, in the embodiments of the present application, words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects. For example, the first voice input and the second voice input are only used to distinguish different voice inputs, and their order is not limited. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and position, and words such as "first" and "second" do not limit the number and position.
第二,在本申请中,“至少一项(个)”是指一项(个)或者多项(个)。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系,但并不排除表示前后关联对象是一种“和”的关系的情况,具体表示的含义可以结合上下文进行理解。Second, in this application, "at least one item" refers to one item or multiple items. "And/or" describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the related objects are in an "or" relationship, but it does not exclude the situation that the related objects are in an "and" relationship. The specific meaning can be understood based on the context.
第三,在本申请中,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。Third, in this application, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or equipment that includes a series of steps or units does not necessarily are limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.
下面将结合具体的实施例详细描述本申请提供的人机交互方法。The human-computer interaction method provided by this application will be described in detail below with reference to specific embodiments.
应理解,下文所示的实施例可以由终端执行,或者,也可以由配置在终端中的部件(如芯片、芯片系统等)执行,或者,还可以由能够实现全部或部分终端功能的逻辑模块或软件实现,本申请实施例对此不作限定。该终端可以具有如图1所示的结构,或具有比图1更多或更少的结构,本申请实施例对此不作限定。It should be understood that the embodiments shown below can be executed by the terminal, or can also be executed by components configured in the terminal (such as chips, chip systems, etc.), or can also be executed by logic modules that can realize all or part of the terminal functions. Or software implementation, which is not limited in the embodiments of this application. The terminal may have a structure as shown in FIG. 1 , or may have more or less structures than in FIG. 1 , which is not limited in the embodiments of the present application.
图5是本申请实施例提供的第一种人机交互方法的示意性流程图。如图5所示,方法500可以包括步骤501和步骤502。下面将详细描述图5所示的各个步骤。Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application. As shown in Figure 5, the method 500 may include step 501 and step 502. Each step shown in Figure 5 will be described in detail below.
步骤501,接收来自用户的第一语音输入。Step 501: Receive first voice input from the user.
其中,该第一语音输入可以是用户未预先唤醒终端的情况下,终端接收到的来自用户的语音输入。The first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
示例性地,响应于用户的语音操作,接收来自用户的第一语音输入,该第一语音输入例如可以是“导航去地点A”、“导航去公司”、“出发去工作”、“播放歌曲B”、“我想听歌曲B”等,本申请实施例对第一语音输入的具体内容不作任何限定。Exemplarily, in response to the user's voice operation, a first voice input is received from the user. The first voice input may be, for example, "navigate to location A", "navigate to the company", "leave to work", "play a song" B", "I want to listen to song B", etc. The embodiments of this application do not place any restrictions on the specific content of the first voice input.
步骤502,在第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应。Step 502: If the first voice input is semantically similar to the predefined first wake-up-free instruction, make a corresponding response to the first voice input.
其中,第一免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行第一免唤醒指令对应的操作。The first wake-up-free instruction is used to instruct the terminal to perform an operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.
一种可能的实现方式是,终端接收到来自用户的第一语音输入后,基于自然语言处理(natural language processing,NLP),对第一语音输入和预定义的第一免唤醒指令做语义分析,在第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应。One possible implementation method is that after the terminal receives the first voice input from the user, it performs semantic analysis on the first voice input and the predefined first wake-up-free instruction based on natural language processing (NLP). When the first voice input is semantically similar to the predefined first wake-up-free instruction, a corresponding response is made to the first voice input.
另一种可能的实现方式是,终端接收到来自用户的第一语音输入后,判断第一语音输入是否属于基于对语音输入学习得到的第二免唤醒指令,其中,基于对语音输入学习得到的第二免唤醒指令与预定义的第一免唤醒语义相似,也就是说,第二免唤醒指令与预定义的第一免唤醒指令语义相似,但用语不同。例如,预定义的第一免唤醒指令为“导航去公司”,基于对语音输入学习得到的第二免唤醒指令为“出发去工作”,二者语义相似,但用语不同,第二免唤醒指令更口语化,第一免唤醒指令是标准的人机交互用语。在第一语音输入属于基于对语音输入学习得到的第二免唤醒指令的情况下(也即,第一语音输入与预定义的第一免唤醒指令语义相似),终端对第一语音输入做出相应的响应。Another possible implementation is that after the terminal receives the first voice input from the user, it determines whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, wherein the The second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction. That is to say, the second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction, but uses different terms. For example, the predefined first wake-up-free command is "Navigate to the company", and the second wake-up-free command obtained based on voice input learning is "Go to work". The semantics of the two are similar, but the terms are different. The second wake-up-free command is More colloquially, the first wake-up-free command is a standard human-computer interaction term. In the case where the first voice input belongs to the second wake-up-free instruction obtained based on learning the voice input (that is, the first voice input is semantically similar to the predefined first wake-up-free instruction), the terminal makes a response to the first voice input. Respond accordingly.
上述两种可能的实现方式可以只使用一种,也可以结合使用。当二者结合使用时,例如,终端接收到第一语音输入后,可以先判断第一语音输入是否属于基于对语音输入学习得到的第二免唤醒指令,如 果属于,则对其做出响应。如果不属于,则进一步基于NLP,对第一语音输入和预定义的第一免唤醒指令做语义分析。如果第一语音输入与预定义的第一免唤醒指令语义相似,则终端对第一语音输入做出相应的响应;如果不相似,则终端不对第一语音输入做出相应的响应。Only one of the above two possible implementation methods can be used, or they can be used in combination. When the two are used in combination, for example, after the terminal receives the first voice input, it can first determine whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, such as If it belongs, respond to it. If it does not belong, further semantic analysis is performed on the first voice input and the predefined first wake-up-free instruction based on NLP. If the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input; if they are not similar, the terminal does not make a corresponding response to the first voice input.
应理解,上述预定义的第一免唤醒指令和/或基于对语音输入学习得到的第一免唤醒指令可以存储于指令库中。终端在接收到第一语音输入之后,基于指令库中存储的第一免唤醒指令和第二免唤醒指令,确定是否对其做出相应的响应。如果第一语音输入与第一免唤醒指令语义相似,则终端对第一语音输入做出相应的响应。It should be understood that the above-mentioned predefined first wake-up-free instruction and/or the first wake-up-free instruction obtained based on learning of voice input can be stored in the instruction library. After receiving the first voice input, the terminal determines whether to respond accordingly based on the first wake-up-free command and the second wake-up-free command stored in the command library. If the first voice input is semantically similar to the first wake-up-free instruction, the terminal responds accordingly to the first voice input.
可选地,在对第一语音输入做出相应的响应之前,上述方法还包括:向用户确认第一语音输入的语义。Optionally, before making a corresponding response to the first voice input, the above method further includes: confirming the semantics of the first voice input to the user.
示例性地,终端接收到来自用户的第一语音输入后,在第一语音输入与预定义的第一免唤醒指令语义相似的情况下,向用户询问上述语义是否正确,如果用户回复上述语义正确,则终端对上述第一语音输入做出相应的响应。For example, after the terminal receives the first voice input from the user, if the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal asks the user whether the above semantics is correct. If the user replies that the above semantics are correct, , then the terminal responds accordingly to the first voice input.
其中,终端可以通过语音播报的方式向用户询问语义是否正确,还可以通过提示框(如toast)向用户询问语义是否正确,上述提示框中包含上述第一语音输入的语义,或者,还可以通过提示框(如toast)加上语音播报的方式向用户询问语义是否正确。本申请实施例对终端向用户询问语义时所使用的方式不作限定。Among them, the terminal can ask the user whether the semantics are correct through voice broadcast, and can also ask the user whether the semantics are correct through a prompt box (such as toast). The above prompt box contains the semantics of the above-mentioned first voice input, or it can also ask the user through a prompt box (such as toast). Prompt boxes (such as toast) and voice broadcasts are used to ask users whether the semantics are correct. The embodiments of this application do not limit the method used by the terminal to query the user for semantics.
可选地,上述方法还包括:向用户提示第一免唤醒指令。也就是说,终端处理执行第一语音输入所指示的操作(如导航去公司)外,还可以向用户提示下一次可以直接使用预定义的第一免唤醒指令来指示终端执行相应的操作。Optionally, the above method further includes: prompting the user with a first wake-up-free instruction. That is to say, in addition to performing the operation indicated by the first voice input (such as navigating to the company), the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time to instruct the terminal to perform the corresponding operation.
图6是本申请实施例提供的第一种人机交互方法的交互示意图。如图6所示,响应于用户语音输入“出发去工作”的操作,终端询问“是要导航去公司吗”,用户语音回复“是的”,响应于用户的回复,终端通过用户界面显示前往公司的路线。其中,图6中所示的终端通过语音播报的方式询问用户“是要导航去公司吗”仅为示例,不应对本申请实施例构成任何限定。在其他的实施例中,终端还可以通过提示框(如toast)询问用户“是要导航去公司吗”,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户“是要导航去公司吗”。Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application. As shown in Figure 6, in response to the user's voice input of "Go to work", the terminal asks "Do you want to navigate to the company?", the user voice replies "Yes", and in response to the user's reply, the terminal displays Go to via the user interface. Company route. Among them, the terminal shown in Figure 6 asks the user "Do you want to navigate to the company" through voice broadcasting is only an example, and should not constitute any limitation on the embodiment of the present application. In other embodiments, the terminal can also ask the user "Do you want to navigate to the company" through a prompt box (such as a toast), or can also ask the user "Do you want to navigate to the company" through a prompt box (such as a toast) plus a voice broadcast? Navigate to the company?"
可选地,终端还可以通过提示框和/语音播报的方式,向用户提示下次可以直接使用预定义的第一免唤醒指令。如图6中所示的,终端通过语音播报的方式,提示用户“下次试试说导航去公司”。Optionally, the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time through a prompt box and/or a voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time" through voice broadcast.
下面将详细描述终端基于对语音输入的学习得到第二免唤醒指令的过程。The process of the terminal obtaining the second wake-up-free instruction based on learning the voice input will be described in detail below.
可选地,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的第二语音输入;在第二语音输入与第一免唤醒指令语义相似的情况下,向用户确认第二语音输入的语义;响应于用户确认第二语音输入的语义的操作,生成与第二语音输入对应的第二免唤醒指令。Optionally, before receiving the first voice input from the user, the above method further includes: receiving a second voice input from the user; and if the second voice input is semantically similar to the first wake-up-free instruction, confirming the second voice input to the user. Semantics of the second voice input; in response to the user's operation of confirming the semantics of the second voice input, generating a second wake-up-free instruction corresponding to the second voice input.
示例性地,终端接收到来自用户的第二语音输入后,判断预定义的第一免唤醒指令中是否包含与上述第二语音输入语义相似的指令,例如,可以基于NLP对二者做语义分析,如果确定上述第二语音输入与某一预定义的第一免唤醒指令具有相似的语义,则向用户询问上述语义是否正确,如果用户回复上述语义正确,则终端生成与第二语音输入对应的第二免唤醒指令。另外,终端还可以将上述第二语音输入保存在指令库中。For example, after receiving the second voice input from the user, the terminal determines whether the predefined first wake-up-free instruction contains instructions that are semantically similar to the above-mentioned second voice input. For example, semantic analysis of the two can be performed based on NLP. , if it is determined that the above-mentioned second voice input has similar semantics to a certain predefined first wake-up-free instruction, the user is asked whether the above-mentioned semantics is correct. If the user replies that the above-mentioned semantics is correct, the terminal generates a message corresponding to the second voice input. The second wake-up-free command. In addition, the terminal can also save the second voice input in the command library.
可选地,接收来自用户的第二语音输入,包括:在预设时长范围内连续多次接收到上述第二语音输入。也即,如果终端在预设时长范围内连续多次接收到上述第二语音输入,终端再向用户确认第二语音输入的语义。这样一来,可以有效地避免用户聊天对话中误提及上述第二语音输入导致终端做出响应,进而有利于提高用户的体验。Optionally, receiving the second voice input from the user includes: receiving the above-mentioned second voice input multiple times continuously within a preset time range. That is, if the terminal receives the above-mentioned second voice input multiple times continuously within a preset time range, the terminal then confirms the semantics of the second voice input to the user. In this way, it can effectively prevent the user from mistakenly mentioning the second voice input in the chat conversation, causing the terminal to respond, thereby improving the user's experience.
示例性地,用户在1分钟内连续两次说出“出发去工作”,“出发去工作”与预定义的第一免唤醒指令“导航去公司”语义相似,则终端连续接收到上述语音输入后,通过提示框和/或语音播报的方式(例如可以参看图6所示),向用户询问“是要导航去公司吗”,并响应于用户的确认操作,通过用户界面显示前往公司的路线。另外,终端还可以通过提示框和/语音播报的方式,向用户提示下次可以直接使用第一免唤醒指令。如图6中所示的,终端通过语音播报的方式,提示用户“下次试试说导航去公司”。For example, if the user says "Go to work" twice in a row within 1 minute, and "Go to work" is semantically similar to the predefined first wake-up-free instruction "Navigate to the company", then the terminal continuously receives the above voice input. Afterwards, the user is asked "Do you want to navigate to the company" through a prompt box and/or voice broadcast (for example, see Figure 6), and in response to the user's confirmation operation, the route to the company is displayed through the user interface. . In addition, the terminal can also prompt the user to directly use the first wake-up-free command next time through a prompt box and/or voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time" through voice broadcast.
图7是本申请实施例提供的对语音输入的用语进行学习的流程示意图。FIG. 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application.
步骤701,接收来自用户的第二语音输入。 Step 701: Receive second voice input from the user.
响应于用户的语音操作,终端接收到来自用户的第二语音输入。例如,第二语音输入包括:“出发去工作”、“路上堵车吗”、“避开拥堵的道路”、“选择一条畅通的道路”等等,此处不再一一列举。In response to the user's voice operation, the terminal receives a second voice input from the user. For example, the second voice input includes: "Go to work", "Is there a traffic jam on the road?", "Avoid the congested road", "Choose a smooth road", etc., which will not be listed here.
步骤702,判断第二语音输入与预定义的第一免唤醒指令是否语义相似。Step 702: Determine whether the second voice input is semantically similar to the predefined first wake-up-free instruction.
终端接收到来自用户的第二语音输入后,判断预定义的第一免唤醒指令中是否包含与上述第二语音输入语义相似的指令,如果确定预定义的第一免唤醒指令中不包含与上述第二语音输入语义相似的指令,则执行步骤703,即不响应该第二语音输入;如果上述第二语音输入与某一第一免唤醒指令语义相似,则执行步骤704,即,向用户询问上述第二语音输入是否是上述语义。After receiving the second voice input from the user, the terminal determines whether the predefined first wake-up-free instruction contains an instruction semantically similar to the above-mentioned second voice input. If it is determined that the predefined first wake-up-free instruction does not include the above-mentioned If the second voice input has semantically similar instructions, step 703 is executed, that is, the second voice input is not responded to; if the second voice input is semantically similar to a first wake-up-free instruction, step 704 is executed, that is, the user is asked Whether the above-mentioned second speech input is the above-mentioned semantics.
步骤703,不响应该第二语音输入。Step 703: Do not respond to the second voice input.
步骤704,向用户询问上述第二语音输入是否是上述语义。Step 704: Ask the user whether the second voice input is the above semantics.
如果用户回复上述第二语音输入不是上述语义,则终端不响应上述第二语音输入;如果用户回复上述第二语音输入是上述语义,则终端执行步骤705。If the user's reply to the second voice input is not the above-mentioned semantics, the terminal does not respond to the above-mentioned second voice input; if the user's reply to the above-mentioned second voice input is the above-mentioned semantics, the terminal executes step 705.
终端可以通过语音播报的方式询问用户,还可以通过提示框(如toast)询问用户,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户。本申请对终端的询问方式不作任何限定。The terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method.
步骤705,生成第二免唤醒指令,并响应第二语音输入。Step 705: Generate a second wake-up-free instruction and respond to the second voice input.
如果用户回复上述第二语音输入是上述语义,则终端将上述第二语音输入确定为第二免唤醒指令,保存在指令库中,并响应上述第二语音输入。If the user replies that the second voice input has the above semantics, the terminal determines the second voice input as the second wake-up-free command, saves it in the command library, and responds to the second voice input.
可选地,终端还可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用第一免唤醒指令。关于图7所示的方法的示例可以参看步骤502的相关示例,此处不再列举。Optionally, the terminal can also prompt the user to directly use the first wake-up-free instruction next time through a prompt box and/or voice broadcast. For examples of the method shown in Figure 7, please refer to the relevant examples of step 502, which will not be listed here.
图8是本申请实施例提供的对语音输入的用语进行学习的又一流程示意图。图8所示的方法是终端连续多次接收到第二语音输入后,再触发向用户的询问的方法。FIG. 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application. The method shown in Figure 8 is a method in which the terminal triggers an inquiry to the user after receiving the second voice input multiple times in succession.
步骤801,接收来自用户的第二语音输入。Step 801: Receive second voice input from the user.
响应于用户的语音操作,终端接收到来自用户的第二语音输入。例如,第二语音输入包括:“出发去工作”、“路上堵车吗”、“避开拥堵的道路”、“选择一条畅通的道路”等等,此处不再一一列举。In response to the user's voice operation, the terminal receives a second voice input from the user. For example, the second voice input includes: "Go to work", "Is there a traffic jam on the road?", "Avoid the congested road", "Choose a smooth road", etc., which will not be listed here.
步骤802,判断是否连续多次接收到第二语音输入。Step 802: Determine whether the second voice input is received multiple times continuously.
终端接收到来自用户的第二语音输入后,判断在预设时长范围内是否连续多次接收到上述第二语音输入。如果终端在预设时长范围内连续多次接收到上述第二语音输入,再执行步骤804,否则,终端执行步骤803,也即不响应该第二语音输入。After receiving the second voice input from the user, the terminal determines whether the second voice input is received multiple times continuously within a preset time range. If the terminal receives the second voice input multiple times continuously within the preset time range, step 804 is executed again; otherwise, the terminal executes step 803, that is, it does not respond to the second voice input.
步骤803,不响应该第二语音输入。Step 803: Do not respond to the second voice input.
步骤804,向用户询问上述第二语音输入是否是上述语义。Step 804: Ask the user whether the second voice input is the above semantics.
如果用户回复上述第二语音输入不是上述语义,则终端不响应上述第二语音输入;如果用户回复上述第二语音输入是上述语义,则终端执行步骤805。If the user's reply to the second voice input is not the above semantics, the terminal does not respond to the above second voice input; if the user's reply to the second voice input is the above semantics, the terminal executes step 805.
终端可以通过语音播报的方式询问用户,还可以通过提示框(如toast)询问用户,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户。本申请对终端的询问方式不作任何限定。The terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method.
步骤805,生成第二免唤醒指令,并响应上述第二语音输入。Step 805: Generate a second wake-up-free instruction and respond to the second voice input.
如果用户回复上述第二语音输入是上述语义,则终端将上述第二语音输入确定为第二免唤醒指令,保存在指令库中,并响应上述第二语音输入。If the user replies that the second voice input has the above semantics, the terminal determines the second voice input as the second wake-up-free command, saves it in the command library, and responds to the second voice input.
可选地,终端还可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用预定义的第一免唤醒用语。Optionally, the terminal can also prompt the user to directly use the predefined first wake-up-free phrase next time through a prompt box and/or a voice broadcast.
基于上述技术方案,终端接收来自用户的第一语音输入,在上述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对第一语音输入做出相应的响应,也即,在未预先唤醒终端的情况下,即使用户语音输入的语句不是预定义的第一免唤醒指令,只要与预定义的第一免唤醒指令语义相似,终端便可以做出响应,有利于解决预定义的第一免唤醒指令固定且有限导致的终端无响应的问题,进而有利于提高用户的交互体验。Based on the above technical solution, the terminal receives the first voice input from the user, and when the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input, that is, Without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is semantically similar to the predefined first wake-up-free instruction, the terminal can respond, which is conducive to solving the predefined problem. The first wake-up-free command is fixed and limited, causing the problem of terminal unresponsiveness, which in turn helps improve the user's interactive experience.
图9是本申请实施例提供的第二种人机交互方法的示意性流程图。Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application.
如图9所示,方法900可以包括步骤901和步骤902。下面将详细描述图9所示的各个步骤。As shown in Figure 9, the method 900 may include step 901 and step 902. Each step shown in Figure 9 will be described in detail below.
步骤901,接收来自用户的第一语音输入。Step 901: Receive first voice input from the user.
其中,该第一语音输入可以是用户未预先唤醒终端的情况下,终端接收到的来自用户的语音输入。 The first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
示例性地,响应于用户的语音操作,接收来自用户的第一语音输入,该第一语音输入例如可以是“导航去地点A”、“出发去地点A”、“我想去地点A”、“播放歌曲B”、“我想听歌曲B”等,本申请实施例对第一语音输入的具体内容不作任何限定。Exemplarily, in response to the user's voice operation, a first voice input from the user is received. The first voice input may be, for example, "navigate to location A", "departure to location A", "I want to go to location A", "Play song B", "I want to listen to song B", etc., the embodiment of the present application does not place any limitation on the specific content of the first voice input.
步骤902,在未接收到预设的唤醒词,但第一语音输入包含目标对象的情况下,对第一语音输入做出相应的响应。Step 902: If the preset wake-up word is not received but the first voice input contains the target object, make a corresponding response to the first voice input.
其中,上述目标对象是在第一语音输入之前接收到的其他语音输入中被提及次数达到预设门限的对象,上述预设的唤醒词用于唤醒终端,更为详细地,上述预设的唤醒词用于唤醒终端中的语音助手(或智慧助手、智能助手等,本申请对此不作限定)。Wherein, the above-mentioned target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the above-mentioned preset wake-up word is used to wake up the terminal. More specifically, the above-mentioned preset wake-up word The wake-up word is used to wake up the voice assistant (or smart assistant, smart assistant, etc., this application does not limit this) in the terminal.
换言之,终端在未预先被唤醒的情况下,接收到第一语音输入时,若第一语音输入中包含目标对象,则对第一语音输入做出相应的响应;若第一语音输入中不包含目标对象,则不对第一语音输入做出响应。In other words, when the terminal receives the first voice input without being awakened in advance, if the first voice input contains the target object, it will respond accordingly to the first voice input; if the first voice input does not contain the The target object then does not respond to the first voice input.
可选地,上述目标对象例如可以是地点、媒体名(如歌曲名)或艺术家名等,本申请对目标对象的具体内容不作限定。Optionally, the above-mentioned target object may be, for example, a location, a media name (such as a song title), or an artist name, etc. This application does not limit the specific content of the target object.
下面将详细描述确定目标对象的过程,也即,对语音输入中的第一对象进行学习的过程。The process of determining the target object, that is, the process of learning the first object in the speech input will be described in detail below.
可选地,在接收来自用户的第一语音输入之前,上述方法还包括:接收来自用户的预设的唤醒词;接收来自用户的第二语音输入;在第二语音输入中包含的第一对象在第二语音输入及其之前的语音输入中被提及的次数超过预设门限的情况下,将第一对象确定为目标对象。Optionally, before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second voice input from the user; and the first object included in the second voice input. When the number of times mentioned in the second voice input and its previous voice input exceeds the preset threshold, the first object is determined as the target object.
其中,第一对象例如可以是地点、媒体名(如歌曲名)或艺术家名等,本申请对目标对象的具体内容不作限定。The first object may be, for example, a location, a media name (such as a song title), an artist name, etc. This application does not limit the specific content of the target object.
示例性地,终端基于接收到的来自用户的预设的唤醒词,被唤醒后,接收到来自用户的第二语音输入时,判断该第二语音输入中是否包含第一对象,在上述第二语音输入中包含第一对象的情况下,判断上述第一对象被提及次数,如果上述第一对象在当前第二语音输入及其之前接收到的语音输入中被提及次数超过预设门限,则将上述第一对象确定为目标对象,以便于下次用户直接说出包含上述目标对象的语音输入时,终端可以做出相应的响应。例如,终端将地点A确定为目标对象,则下一次无需预先唤醒终端,用户直接语音输入“导航去地点A”,终端接收到上述语音输入后,确定上述语音输入中包含地点A,则通过用户界面展示前往地点A的路线。这样一来,下次用户无需预先唤醒终端,简化了交互过程,有利于提高用户的体验。Exemplarily, based on the preset wake-up word received from the user, after the terminal is awakened and receives the second voice input from the user, it determines whether the second voice input contains the first object. In the above-mentioned second When the first object is included in the voice input, determine the number of mentions of the first object. If the number of mentions of the first object in the current second voice input and the previously received voice input exceeds a preset threshold, Then the above-mentioned first object is determined as the target object, so that next time the user directly speaks voice input containing the above-mentioned target object, the terminal can make a corresponding response. For example, if the terminal determines location A as the target object, there is no need to wake up the terminal in advance next time. The user directly inputs "navigate to location A" by voice. After the terminal receives the above voice input and determines that the above voice input contains location A, the user The interface displays the route to location A. In this way, the user does not need to wake up the terminal in advance next time, which simplifies the interaction process and helps improve the user experience.
另外,终端可以记录第一对象在语音输入中被提及的次数,每被提及一次,其对应的次数加1。In addition, the terminal may record the number of times the first object is mentioned in the voice input, and each time the first object is mentioned, the corresponding number is incremented by 1.
可选地,终端还可以基于目标对象,生成包含目标对象的免唤醒指令;向用户提示免唤醒指令,以便于用户下次可以直接使用上述免唤醒指令来控制终端执行对应的操作。Optionally, the terminal can also generate a wake-up-free instruction including the target object based on the target object; prompt the user with the wake-up-free instruction, so that the user can directly use the above wake-up-free instruction next time to control the terminal to perform the corresponding operation.
其中,终端可以通过提示框和/语音播报的方式,向用户提示上述免唤醒指令。Among them, the terminal can prompt the user with the above-mentioned wake-up-free instruction through a prompt box and/or voice broadcast.
示例性地,终端向用户语音提示“下次试试直接说导航去地点A”,其中,地点A为目标对象。For example, the terminal gives a voice prompt to the user, "Next time, try direct navigation to location A", where location A is the target object.
图10是本申请实施例提供的第二种人机交互方法的交互示意图。如图10所示,响应于用户语音输入“小艺小艺”的操作,终端回复“我在”,也即,终端被唤醒。进一步地,响应于用户语音输入“导航去地点A”的操作,终端回复“好的,开始为你导航”,终端通过用户界面显示前往地点A的路线。如果地点A在当前语音输入及其之前的语音输入中出现的次数超过预设门限,则终端可以通过语音播报的方式提示用户“下次试试直接说导航去地点A”。也就是说,下次用户无需预先唤醒终端,直接语音输入“导航去地点A”,终端即可通过用户界面显示前往地点A的路线。Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application. As shown in Figure 10, in response to the user's voice input operation of "Xiaoyi Xiaoyi", the terminal replies "I am here", that is, the terminal is awakened. Further, in response to the user's voice input operation of "navigate to location A", the terminal replies "OK, let's start navigating for you", and the terminal displays the route to location A through the user interface. If the number of times location A appears in the current voice input and its previous voice input exceeds the preset threshold, the terminal can prompt the user through voice broadcasting to "try next time and just use navigation to go to location A." In other words, next time the user does not need to wake up the terminal in advance and directly inputs "Navigation to location A" by voice, the terminal can display the route to location A through the user interface.
图11是本申请实施例提供的对第二语音输入中的第一对象进行学习的流程示意图。FIG. 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application.
步骤1101,接收来自用户的预设的唤醒词。Step 1101: Receive a preset wake-up word from the user.
上述预设的唤醒词用于唤醒终端,更为详细地,用于唤醒终端中的语音助手。The above preset wake-up words are used to wake up the terminal, and more specifically, are used to wake up the voice assistant in the terminal.
步骤1102,接收来自用户的第二语音输入。Step 1102: Receive second voice input from the user.
终端被唤醒后,接收来自用户的第二语音输入。例如,第二语音输入包括:“导航去地点A”、“出发去地点A”、“我想去地点A”等等,此处不再一一列举。After the terminal is awakened, it receives the second voice input from the user. For example, the second voice input includes: "Navigate to location A", "Leave to location A", "I want to go to location A", etc., which are not listed here one by one.
步骤1103,判断上述第二语音输入是否包含第一对象。Step 1103: Determine whether the second voice input includes the first object.
其中,第一对象例如包括但不限于:地点、媒体名(如歌曲名)或艺术家名等。The first object includes, for example, but is not limited to: location, media name (such as song title) or artist name, etc.
示例性地,终端接收到第二语音输入后,判断该第二语音输入中是否包含第一对象(如地点A)。如果该第二语音输入中不包含第一对象,则执行步骤1104;如果该第二语音输入中包含第一对象,则 执行步骤1105。For example, after receiving the second voice input, the terminal determines whether the second voice input contains the first object (such as location A). If the second voice input does not contain the first object, then step 1104 is executed; if the second voice input contains the first object, then Execute step 1105.
步骤1104,响应该第一语音输入。Step 1104, respond to the first voice input.
步骤1105,判断第一对象在当前语音输入及其之前接收到的语音输入中被提及次数是否超过预设门限。Step 1105: Determine whether the number of times the first object is mentioned in the current voice input and its previously received voice input exceeds a preset threshold.
如果第一对象在当前语音输入及其之前接收到的其他语音输入中被提及次数未超过预设门限,则执行步骤1104,即,响应该第二语音输入;如果第一对象在当前语音输入之前接收到的其他语音输入中被提及次数超过预设门限,则执行步骤1106。If the number of mentions of the first object in the current voice input and other previously received voice inputs does not exceed the preset threshold, step 1104 is executed, that is, responding to the second voice input; if the first object is mentioned in the current voice input If the number of mentions in other previously received voice inputs exceeds the preset threshold, step 1106 is executed.
步骤1106,将第一对象确定为目标对象。Step 1106: Determine the first object as the target object.
另外,终端还可以基于该目标对象生成免唤醒指令,并提示用户下次直接使用上述免唤醒指令。In addition, the terminal can also generate a wake-up-free instruction based on the target object, and prompt the user to directly use the above wake-up-free instruction next time.
可选地,终端可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用上述免唤醒指令。Optionally, the terminal can prompt the user to directly use the above wake-up-free instruction next time through a prompt box and/or voice broadcast.
基于上述技术方案,终端未被预先唤醒的情况下,接收到来自用户的第一语音输入后,若该第一语音输入中包含之前语音输入中被提及次数达到预设门限的对象,则对其做出相应的响应,也即,通过对之前语音输入的学习,保存被提及次数达到预设门限的目标对象后,只要接收到的语音输入中包含上述目标对象,即使不预先唤醒终端,终端也可以对其做出相应的响应,节省了唤醒终端的时间,简化了交互流程,有利于提高用户的交互体验。Based on the above technical solution, when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interaction experience.
图12是本申请实施例提供的第三种人机交互方法的示意性流程图。Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application.
如图12所示,方法1200可以包括步骤1201和步骤1202。下面将详细描述图12所示的各个步骤。As shown in Figure 12, the method 1200 may include step 1201 and step 1202. Each step shown in Figure 12 will be described in detail below.
步骤1201,接收来自用户的第一语音输入,该第一语音输入属于第一指令集合,该第一指令集合中的指令与预定义的免唤醒指令语义相似。Step 1201: Receive a first voice input from a user. The first voice input belongs to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.
其中,上述预定义的免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行免唤醒指令对应的操作。The above-mentioned predefined wake-up-free command is used to instruct the terminal to perform operations corresponding to the wake-up-free command without inputting a preset wake-up word.
一种可能的实现方式是,指令库中预存有预定义的第一指令集合和第二指令集合,第一指令集合中的指令与预定义的免唤醒指令语义相似,第二指令集合中的指令为预定义的免唤醒指令。终端接收到第一语音输入,确定该第一语音输入属于第一指令集合。One possible implementation method is that the instruction library pre-stores a predefined first instruction set and a second instruction set. The instructions in the first instruction set are semantically similar to the predefined wake-up-free instructions, and the instructions in the second instruction set It is a predefined wake-up-free command. The terminal receives the first voice input and determines that the first voice input belongs to the first instruction set.
另一种可能的实现方式是,指令库中预存有预定义的第二指令集合和基于语音输入学习到的与第二指令集合中的指令对应的第一指令集合,终端接收到第一语音输入,确定该第一语音输入属于第一指令集合。其中,终端基于语音输入学习到的与第二指令集合中的指令对应的第一指令集合的方法可以参看图5和图9的相关描述,此处不再赘述。Another possible implementation is that the instruction library pre-stores a predefined second instruction set and a first instruction set learned based on voice input that corresponds to the instructions in the second instruction set, and the terminal receives the first voice input , determining that the first voice input belongs to the first instruction set. For the method for the terminal to learn the first instruction set corresponding to the instructions in the second instruction set based on voice input, please refer to the relevant descriptions in FIG. 5 and FIG. 9 and will not be described again here.
表1是指令库中预存的第一指令集合和第二指令集合的示例。Table 1 is an example of the first instruction set and the second instruction set pre-stored in the instruction library.
表1
Table 1
如表1所示,第二指令集合中的指令是预定义的免唤醒指令,如“查看是否拥堵”、“向下滑动”、“把页面缩小”、“导航去公司”、“导航回家”等,第一指令集合中的指令是与第二指令集合中的指令语义相似的指令,如“路上堵车吗”、“下滑”、“缩小”、“出发去工作”、“我想回家”等。可以看出,第一指令集合中的指令与第二指令集合中的指令语义相似,但用语不同,第一指令集合中的指令更口语化,第二指令集合中的指令是标准的人机交互指令。As shown in Table 1, the instructions in the second instruction set are predefined wake-up-free instructions, such as "check whether there is congestion", "slide down", "reduce the page", "navigate to the company", "navigate home" ", etc. The instructions in the first instruction set are semantically similar to the instructions in the second instruction set, such as "Is there a traffic jam on the road?", "Scroll down", "Zoom out", "Go to work", "I want to go home" "wait. It can be seen that the instructions in the first instruction set are semantically similar to the instructions in the second instruction set, but the terms are different. The instructions in the first instruction set are more colloquial, while the instructions in the second instruction set are standard human-computer interaction. instruction.
应理解,上述指令的划分仅为示例,不应对本申请实施例构成任何限定,在其他实施例中,也可以是不同的划分形式,例如,第一指令集合可以继续划分为第一指令子集合1、第一指令子集合2,第一指令子集合2中的指令比第一指令子集合1中的指令更口语化。终端响应第一指令子集合2中的指令的条件比响应第一指令子集合1中的指令的条件更严格。It should be understood that the above division of instructions is only an example and shall not constitute any limitation on the embodiments of the present application. In other embodiments, different division forms may also be used. For example, the first instruction set may be further divided into a first instruction sub-set. 1. First instruction sub-set 2. The instructions in the first instruction sub-set 2 are more colloquial than the instructions in the first instruction sub-set 1. The conditions for the terminal to respond to the instructions in the first instruction subset 2 are stricter than the conditions for responding to the instructions in the first instruction subset 1 .
步骤1202,在满足预设条件的情况下,响应上述第一语音输入。 Step 1202: If the preset conditions are met, respond to the above-mentioned first voice input.
可选地,上述预设条件包括以下至少一项:与终端距离处于预设范围内的用户的数量不超过阈值;用户处于预定义的位置;第一语音输入所来自的用户不属于预设人群;或,接收到第一语音输入的时间落入预设时段。Optionally, the above preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined position; the user from whom the first voice input comes does not belong to the preset group ; Or, the time when the first voice input is received falls within a preset period.
其中,与终端距离处于预设范围内的用户的数量不超过阈值,也即,在与终端距离处于预设范围内的用户的数量较少的情况下,可以响应上述第一语音输入,不难理解,如果周围用户数量较少,则用户误提及第一语音输入的可能性越小,也即,用户可能确实是希望终端执行对应的操作,相对地,如果周围用户数量较多,则用户误提及第一语音输入的可能性越大。因此,上述预设条件可以有效地避免用户误提及第一语音输入导致终端响应。Wherein, the number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, it is not difficult to respond to the first voice input. It is understood that if the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller, that is, the user may really want the terminal to perform the corresponding operation. In contrast, if the number of surrounding users is large, the user The greater the possibility of mistakenly referring to the first speech input. Therefore, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.
用户处于预定义的位置,例如,终端响应来自距离自身最近的用户的第一语音输入,或,用户处于景区,希望终端提高服务的可能性更大等,终端响应来自用户的第一语音输入。The user is in a predefined position. For example, the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service. The terminal responds to the first voice input from the user.
第一语音输入所来自的用户不属于预设人群,预设人群例如小孩、老人等,可以理解,对于预设人群,其发出的指令可能存在危险性,终端可以不对其做出响应。The user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them.
接收到第一语音输入的时间落入预设时段,预设时段例如可以是上班时段(或称为通勤时段),这些时段终端可以响应上述第一语音输入,如果是其他时段,终端可以只响应预定义的免唤醒指令。The time when the first voice input is received falls within a preset period. The preset period may be, for example, working hours (or commuting hours). During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond Predefined wake-up-free instructions.
下面以上述方法应用于车为例(如,终端以车机为例),列举上述几种场景下,终端对第一语音输入的响应情况。The following takes the above method applied to a car as an example (for example, the terminal uses a car machine as an example), and enumerates the response of the terminal to the first voice input in the above scenarios.
场景一:在车内存在一个乘客的情况下,车机响应第一语音输入;或,在车内存在多个乘客的情况下,车机不响应上述第一语音输入。Scenario 1: When there is one passenger in the car, the car machine responds to the first voice input; or, when there are multiple passengers in the car, the car machine does not respond to the first voice input.
示例性地,车机可以基于车内的摄像头判断当前车内的人数,在车内存在一个乘客,也即,车内只有主驾的情况下,车机响应第一语音输入。在车内存在多个乘客的情况下,车机不响应第一语音输入。另外,车机在车内存在一个或多个乘客的情况下,均可以响应第二指令集合中的指令。这样一来,可以大大降低车内存在多个乘客的情况下,聊天对话中误唤醒车机的可能性。For example, the car computer can determine the number of people currently in the car based on the camera in the car. When there is one passenger in the car, that is, when there is only the driver in the car, the car computer responds to the first voice input. When there are multiple passengers in the car, the car machine does not respond to the first voice input. In addition, the vehicle engine can respond to the instructions in the second instruction set even when there are one or more passengers in the vehicle. In this way, the possibility of accidentally waking up the car during a chat conversation can be greatly reduced when there are multiple passengers in the car.
场景二:在语音输入来自主驾的情况下,车机响应第一语音输入;或在语音输入来自除主驾之外的其他乘客的情况下,车机不响应第一语音输入。Scenario 2: When the voice input comes from the main driver, the car machine responds to the first voice input; or when the voice input comes from other passengers other than the main driver, the car machine does not respond to the first voice input.
示例性地,车机接收到第一语音输入后,可以基于与座椅的交互,获取到该第一语音输入是来自于主驾还是其他乘客,若该第一语音输入来自于主驾,则车机响应该第一语音输入;若该第一语音输入来自于其他乘客,则车机不响应该第一语音输入。另外,无论是来自主驾还是其他乘客的第二指令集合中的指令,车机均可以响应。For example, after the car machine receives the first voice input, it can obtain whether the first voice input comes from the driver or other passengers based on the interaction with the seat. If the first voice input comes from the driver, then The car machine responds to the first voice input; if the first voice input comes from other passengers, the car machine does not respond to the first voice input. In addition, the vehicle machine can respond to instructions in the second instruction set whether from the main driver or other passengers.
场景三:在第一语音输入所来自的用户不属于预设人群的情况下,车机响应第一语音输入;或,在第一语音输入所来自的用户属于预设人群的情况下,车机不响应第一语音输入。Scenario 3: When the user from whom the first voice input comes does not belong to the preset group, the car machine responds to the first voice input; or, when the user from whom the first voice input comes belongs to the preset group, the car machine Does not respond to first voice input.
示例性地,车机可以判断该第一语音输入是否来自于预设人群,以小孩为例,如果该第一语音输入来自于小孩,则车机不响应上述第一语音输入,如果该第一语音输入不是来自于小孩,则车机响应上述第一语音输入,这样一来,可以有效地避免小孩误说出第一语音输入导致的车机做出响应的情况。For example, the car machine can determine whether the first voice input comes from a preset group of people, taking a child as an example. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the voice input does not come from a child, the car machine responds to the first voice input. In this way, it can effectively avoid the situation where the child mistakenly speaks the first voice input and causes the car machine to respond.
场景四:在接收到语音输入的时间落入预设时段的情况下,车机响应第一语音输入;或,在接收到语音输入的时间未落入预设时段的情况下,车机不响应第一语音输入。Scenario 4: When the time when the voice input is received falls within the preset time period, the car machine responds to the first voice input; or when the time when the voice input is received does not fall within the preset time period, the car machine does not respond. First voice input.
示例性地,预设时段以上班时段为例,车机如果在上班时段内接收到第一语音输入,则可以响应上述第一语音输入;如果在非上班时段内接收到上述第一语音输入,可以不响应上述第一语音输入。For example, the preset time period is the working period. If the vehicle machine receives the first voice input during the working period, it can respond to the first voice input; if the vehicle machine receives the first voice input during the non-working period, It is possible not to respond to the first voice input.
应理解,上文所述的几个场景中,在终端确定响应第一语音输入的情况下,可以先向用户确定语音输入的语义,响应于用户确认上述语义的操作,响应上述第一语音输入。It should be understood that in the several scenarios mentioned above, when the terminal determines to respond to the first voice input, it can first determine the semantics of the voice input to the user, and in response to the user's operation to confirm the above semantics, respond to the above first voice input .
还应理解,上述几种可能的场景也可以结合,例如,车机在第一语音输入来自于主驾,且接收到语音输入的时间落入预设时段的情况下,响应上述第一语音输入。又例如,车机在车内只有一个乘客,且接收到第一语音输入的时间落入预设时段的情况下,响应上述第一语音输入。为了简洁,此处不再一一列举。It should also be understood that the above possible scenarios can also be combined. For example, the vehicle responds to the first voice input when the first voice input comes from the driver and the time when the voice input is received falls within a preset period. . For another example, the car machine responds to the first voice input when there is only one passenger in the car and the time when the first voice input is received falls within a preset time period. For the sake of brevity, they are not listed here.
图13是本申请实施例提供的根据场景确定是否响应第一语音输入流程示意图。图13所述的方法是场景二和场景四结合的情况。Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application. The method described in Figure 13 is a combination of scenario two and scenario four.
步骤1301,接收来自用户的第一语音输入。Step 1301: Receive first voice input from the user.
在未预先唤醒车机的情况下,响应于用户输入第一语音输入的操作,车机接收到来自用户的第一语 音输入。该第一语音输入属于第一指令集合,该第一指令集合中的指令与预定义的免唤醒指令语义相似。Without waking up the car machine in advance, in response to the user's operation of inputting the first voice input, the car machine receives the first speech from the user. sound input. The first voice input belongs to a first instruction set, and instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.
步骤1302,判断第一语音输入是否来自于主驾。Step 1302: Determine whether the first voice input comes from the driver.
车机接收到第一语音输入之后,判断该第一语音输入是否来自于主驾,若上述第一语音输入不是来自于主驾,则车机执行步骤1303;若上述第一语音输入来自于主驾,则执行步骤1304。After the vehicle computer receives the first voice input, it determines whether the first voice input comes from the driver. If the first voice input does not come from the driver, the vehicle computer executes step 1303; if the first voice input comes from the driver If you are driving, perform step 1304.
步骤1303,不响应该第一语音输入。Step 1303, do not respond to the first voice input.
若上述第一语音输入不是来自于主驾,则车机不响应上述第一语音输入。另外,车机可以响应来自用户的第二指令集合中的指令。If the first voice input does not come from the driver, the vehicle machine does not respond to the first voice input. In addition, the vehicle machine can respond to instructions in the second instruction set from the user.
步骤1304,判断接收到第一语音输入的时间是否落入预设时段。Step 1304: Determine whether the time when the first voice input is received falls within a preset period.
若上述第一语音输入来自于主驾,则车机继续判断接收到第一语音输入的时间是否落入预设时段。如果接收到第一语音输入的时间落入预设时段,则车机可以执行步骤1305;若接收到第一语音输入的时间未落入预设时段,则车机可以执行步骤1306。If the first voice input comes from the driver, the vehicle computer continues to determine whether the time when the first voice input is received falls within the preset period. If the time when the first voice input is received falls within the preset period, the vehicle machine may execute step 1305; if the time when the first voice input is received does not fall within the preset period, the vehicle machine may execute step 1306.
步骤1305,响应该第一语音输入。Step 1305, respond to the first voice input.
如果接收到第一语音输入的时间落入预设时段,则车机可以响应该第一语音输入。If the time when the first voice input is received falls within a preset period, the vehicle machine can respond to the first voice input.
步骤1306,响应该第一语音输入,但需要向用户询问。Step 1306: Respond to the first voice input, but need to ask the user.
若接收到第一语音输入的时间未落入预设时段,则车机可以响应该第一语音输入,但是在响应该第一语音输入之前需要向用户确认该第一语音输入的语义,在用户确认语义的情况下,再响应上述第一语音输入。If the time when the first voice input is received does not fall within the preset period, the car machine can respond to the first voice input, but before responding to the first voice input, the user needs to confirm the semantics of the first voice input. When the semantics are confirmed, respond to the first voice input.
基于上述技术方案,终端接收到与预定义的免唤醒指令语义相似的第一语音输入后,在满足预设条件的情况下,响应第一语音输入,也就是说,对于与预定义的免唤醒指令语义相似的第一语音输入,满足预设条件,终端才会做出相应的响应,并不是任何情况下都能响应,这样可以避免用户误提及第一语音输入导致终端响应。可以想象,第一语音输入可能相对预定义的免唤醒指令来说比较口语化,如果任何情况下都做出响应,很可能出现用户交谈过程中频繁触发终端响应的情况,因此,通过设置预设条件,在满足预设条件的情况下,终端才会做出相应的响应,有利于大大提高用户的交互体验。Based on the above technical solution, after the terminal receives the first voice input that is semantically similar to the predefined wake-up-free instruction, it responds to the first voice input when the preset conditions are met. That is to say, for the predefined wake-up-free command The terminal will respond accordingly only if the first voice input with similar command semantics meets the preset conditions, which may not be possible in all circumstances. This can prevent the user from mistakenly mentioning the first voice input and causing the terminal to respond. It is conceivable that the first voice input may be more colloquial than the predefined wake-up-free instructions. If a response is made under any circumstances, it is likely that the terminal response will be frequently triggered during the user's conversation. Therefore, by setting the preset Conditions, the terminal will respond accordingly only when the preset conditions are met, which will greatly improve the user's interactive experience.
图14是本申请实施例提供的第四种人机交互方法的流程示意图。Figure 14 is a schematic flowchart of the fourth human-computer interaction method provided by an embodiment of the present application.
如图14所示,该方法1400可以包括步骤1401和步骤1402。下面将详细描述图14所示的各个步骤。As shown in Figure 14, the method 1400 may include step 1401 and step 1402. Each step shown in Figure 14 will be described in detail below.
步骤1401,接收来自用户的第一语音输入,该第一语音输入不属于预定义的免唤醒指令。Step 1401: Receive the first voice input from the user. The first voice input does not belong to the predefined wake-up-free instructions.
其中,该第一语音输入可以是用户未预先唤醒终端的情况下,终端接收到的来自用户的语音输入。The first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.
示例性地,响应于用户的语音操作,接收来自用户的第一语音输入,该第一语音输入例如可以是“出发去公司”、“导航到公司”、“出发去工作”等,本申请实施例对语音输入的具体内容不作任何限定。Exemplarily, in response to the user's voice operation, the first voice input from the user is received. The first voice input may be, for example, "leave to the company", "navigate to the company", "leave to work", etc. This application implements The specific content of voice input is not limited in any way.
步骤1402,在上述第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似的情况下,引导用户输入第一免唤醒指令。Step 1402: If the first voice input is semantically similar to the first no-wake-up instruction among the predefined no-wake-up instructions, guide the user to input the first no-wake-up instruction.
终端确定出上述第一语音输入与第一免唤醒指令具有相似的语义,则引导用户输入第一免唤醒指令,以便于终端对上述第一免唤醒指令做出响应。The terminal determines that the first voice input and the first wake-up-free instruction have similar semantics, and then guides the user to input the first wake-up-free instruction so that the terminal responds to the first wake-up-free instruction.
其中,终端可以基于自然语言处理中的语义分析确定出第一语音输入与第一免唤醒指令语义相似。The terminal may determine based on semantic analysis in natural language processing that the first voice input is semantically similar to the first wake-up-free instruction.
示例性地,上述语音输入为“出发去工作”,与其具有相似语义的第一免唤醒指令为“导航去公司”,终端接收到“出发去工作”的语音输入后,确定在上述语音输入不属于预定义的免唤醒指令,并识别出上述语音输入的语义与“导航去公司”相似。因此,终端可以引导用户说出“导航去公司”。For example, the above voice input is "Go to work", and the first wake-up-free instruction with similar semantics is "Navigate to the company". After the terminal receives the voice input of "Go to work", it determines that the above voice input does not occur. It belongs to the predefined wake-up-free instructions, and it is recognized that the semantics of the above voice input are similar to "navigate to the company". Therefore, the terminal can guide the user to say "Navigate to the company".
可选地,上述引导用户输入第一免唤醒指令,包括:通过提示框和/或语音播报,引导用户输入第一免唤醒指令。Optionally, the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast.
终端可以通过提示框引导用户发出第一免唤醒指令。例如,终端确定出上述语音输入与指令库中的第一免唤醒指令具有相似的语义之后,通过提示框在用户界面上显示上述第一免唤醒指令。终端还可以通过语音播报的方式引导用户发出第一免唤醒指令。例如,终端确定出上述语音输入与指令库中的第一免唤醒指令具有相似的语义之后,语音提醒用户使用上述第一免唤醒指令。终端可以通过提示框加上语音播报的方式,引导用户发出第一免唤醒指令。The terminal can guide the user to issue the first wake-up-free instruction through a prompt box. For example, after the terminal determines that the voice input has similar semantics to the first wake-up-free instruction in the command library, the terminal displays the first wake-up-free instruction on the user interface through a prompt box. The terminal can also guide the user to issue the first wake-up-free instruction through voice broadcast. For example, after the terminal determines that the above-mentioned voice input has similar semantics to the first no-wake-up command in the command library, the terminal reminds the user by voice to use the above-mentioned first no-wake-up command. The terminal can guide the user to issue the first wake-up-free command through a prompt box and voice broadcast.
例如,终端确定出上述语音输入与指令库中的第一免唤醒指令具有相似的语义之后,先通过提示框在用户界面上显示上述第一免唤醒指令,如果预设时长范围内用户仍未发出上述第一免唤醒指令,则终 端语音提醒用户使用上述第一免唤醒指令,或,提示框在用户界面上显示上述第一免唤醒指令,同时语音提醒用户使用上述第一免唤醒指令。本申请对终端的引导方式不作限定。For example, after the terminal determines that the above-mentioned voice input has similar semantics to the first wake-up-free command in the command library, the terminal first displays the above-mentioned first wake-up-free command on the user interface through a prompt box. If the user has not issued a wake-up command within the preset time range, The above first wake-up-free command will eventually The user is prompted with a voice to use the first wake-up-free command, or the prompt box displays the first wake-up-free command on the user interface, and at the same time, the voice prompts the user to use the first wake-up-free command. This application does not limit the terminal boot method.
可选地,通过提示框和/或语音播报,引导用户发出第一免唤醒指令,包括:通过提示框提示用户发出第一免唤醒指令,提示框中包含第一免唤醒指令;在预设时长范围内通过提示框提示的次数达到预设门限,但用户未发出第一免唤醒指令的情况下,通过语音播报,引导用户发出第一免唤醒指令。Optionally, guiding the user to issue the first wake-up-free instruction through a prompt box and/or voice broadcast, including: prompting the user to issue the first wake-up-free instruction through a prompt box, the prompt box containing the first wake-up-free instruction; When the number of prompts within the range reaches the preset threshold, but the user does not issue the first wake-up-free command, a voice broadcast is used to guide the user to issue the first wake-up-free command.
示例性地,终端第一次通过提示框提示用户发出第一免唤醒指令,提示框中包含第一免唤醒指令,第二次还是通过提示框提示用户发出第一免唤醒指令,在1分钟内通过提示框提示的次数达到两次,但用户未发出第一免唤醒指令的情况下,通过语音播报,引导用户发出第一免唤醒指令。For example, the terminal prompts the user to issue the first wake-up-free instruction through a prompt box for the first time, and the prompt box contains the first wake-up-free instruction. The second time it prompts the user to issue the first wake-up-free instruction through the prompt box again, within 1 minute. When the number of prompts through the prompt box reaches two, but the user does not issue the first wake-up-free command, the user is guided to issue the first wake-up-free command through voice broadcast.
图15是本申请实施例提供的引导用户发出第一免唤醒指令的交互示意图。Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application.
响应于用户语音输入“出发去工作”的操作,终端确定出该语音输入与指令库中的第一免唤醒指令“导航去公司”具有相似的语义,因此,第一次通过提示框提示用户“试试说导航去公司”,第二次用户还是使用的“出发去工作”,终端继续通过提示框提示用户“试试说导航去公司”,第三次用户还是使用的“出发去工作”,终端则通过提示框提示用户“试试说导航去公司”,并通过语音提示用户“试试说导航去公司”。In response to the user's voice input of "Go to work", the terminal determines that the voice input has similar semantics to the first wake-up-free command in the command library, "Navigate to the company". Therefore, the user is prompted through the prompt box for the first time to " Try using the navigation method to go to the company. The second time the user still uses "leave to work". The terminal continues to prompt the user through the prompt box to "try using the navigation method to go to the company." The third time the user still uses "leave to work". The terminal prompts the user through a prompt box to "try using the navigation system to go to the company" and prompts the user through a voice prompt to "try using the navigation system to go to the company."
基于上述技术方案,终端接收到第一语音输入,该第一语音输入不属于预定义的免唤醒指令,但该第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似,则终端引导用户输入对应的第一免唤醒指令,以便于用户输入第一免唤醒指令后,终端对其做出相应的响应,相比于终端不响应也不提示,可以大大提高用户的交互体验。Based on the above technical solution, the terminal receives the first voice input. The first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions. Then the terminal guides the user to input the corresponding first wake-up-free command, so that after the user inputs the first wake-up-free command, the terminal responds accordingly. Compared with the terminal not responding or prompting, the user's interactive experience can be greatly improved. .
图16是本申请实施例提供的第五种人机交互方法的示意性流程图。Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application.
如图16所示,该方法可以包括步骤1601至步骤1605。下面将详细描述图16所示的各个步骤。As shown in Figure 16, the method may include steps 1601 to 1605. Each step shown in Figure 16 will be described in detail below.
步骤1601,接收来自用户的第一语音输入。Step 1601: Receive first voice input from the user.
响应于用户输入第一语音输入的操作,终端接收到来自用户的第一语音输入。该第一语音输入是在未接收到来自用户的预设的唤醒词的情况下接收的,例如,第一语音输入包括:“导航去地点A”、“我想去地点A”、“地点A在哪里”等等,此处不再一一列举。In response to the user's operation of inputting the first voice input, the terminal receives the first voice input from the user. The first voice input is received without receiving a preset wake-up word from the user. For example, the first voice input includes: "Navigate to location A", "I want to go to location A", "Place A" "Where" and so on, I won't list them one by one here.
步骤1602,判断上述第一语音输入是否是用于请求导航。Step 1602: Determine whether the first voice input is used to request navigation.
换言之,终端接收到上述第一语音输入后,判断该第一语音输入的意图,是不是用于请求导航。如果该第一语音输入不是用于请求导航,则终端执行步骤1603;如果该第一语音输入是用于请求导航,则终端执行步骤1604。In other words, after receiving the first voice input, the terminal determines whether the intention of the first voice input is to request navigation. If the first voice input is not used to request navigation, the terminal performs step 1603; if the first voice input is used to request navigation, the terminal performs step 1604.
步骤1603,不响应该第一语音输入。Step 1603: Do not respond to the first voice input.
步骤1604,向用户询问请求导航的目的地。Step 1604: Ask the user for the destination requesting navigation.
如果该第一语音输入是用于请求导航,则终端向用户询问请求导航的目的地。例如,该语音输入为“导航去地点A”,则终端接收确定出该第一语音输入用于导航,进一步地,终端向用户询问导航的目的地,如向用户询问“你想去哪里”。用户反馈“地点A”,则终端接收到用户的反馈后,从云端获取到地点A的路线。If the first voice input is for requesting navigation, the terminal inquires the user about the destination for requesting navigation. For example, if the voice input is "Navigate to location A", then the terminal receives and determines that the first voice input is for navigation. Further, the terminal asks the user for the navigation destination, such as asking the user "where do you want to go." The user feedbacks "Place A", and after receiving the user's feedback, the terminal obtains the route to Location A from the cloud.
终端可以通过语音播报的方式询问用户,还可以通过提示框(如toast)询问用户,或者,还可以通过提示框(如toast)加上语音播报的方式询问用户。本申请对终端的询问方式不作任何限定。例如,终端前两次通过提示框(如toast)询问用户,第三次通过提示框(如toast)加上语音播报的方式询问用户。The terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method. For example, the terminal asks the user through a prompt box (such as toast) for the first two times, and the third time uses a prompt box (such as toast) plus voice broadcast to ask the user.
步骤1605,基于用户反馈的目的地,为用户提供导航服务。Step 1605: Provide navigation services to the user based on the destination fed back by the user.
终端获取到上述目的地的路线后,为用户提供导航服务。例如,通过用户界面显示目的地的路线。After obtaining the route to the above destination, the terminal provides navigation services to the user. For example, display directions to a destination through the user interface.
可选地,终端还可以基于该目的地,生成包含上述目的地的免唤醒指令,终端还可以通过提示框和/语音播报的方式,向用户提示下一次可以直接使用上述免唤醒指令。Optionally, the terminal can also generate a wake-up-free instruction including the above destination based on the destination. The terminal can also prompt the user through prompt boxes and/or voice broadcasts that the above-mentioned wake-up-free instruction can be used directly next time.
图17是本申请实施例提供的第五种人机交互方法的交互示意图。如图17所示,响应于用户语音输入“导航去地点A”的操作,终端通过语音播报的方式向用户询问“你要去哪里”。用户回复“地点A”,响应于用户的回复,终端通过用户界面向用户展示前往地点A的路线。Figure 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application. As shown in Figure 17, in response to the user's voice input operation of "navigate to location A", the terminal asks the user "where do you want to go" through voice broadcast. The user replies "Place A", and in response to the user's reply, the terminal displays the route to the location A to the user through the user interface.
基于上述技术方案,在未预先唤醒终端的情况下,终端接收到来自用户的第一语音输入后,发现其意图是想请求导航,便可以向用户询问导航的目的地,并根据用户反馈的目的地,向用户提供导航服务, 无需预先唤醒终端,简化了交互流程,有利于提高用户的交互体验。Based on the above technical solution, without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. to provide navigation services to users, There is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interaction experience.
本申请实施例还提供了一种终端,该终端包括用于执行上述图5至图17所述实施例中任意一个实施例中终端所执行的步骤的相应的模块。该终端可以用于实现上述图5至图17所述实施例中任意一个实施例中所述的方法。该终端包括的模块可以通过软件和/或硬件方式实现。An embodiment of the present application also provides a terminal, which includes corresponding modules for performing the steps performed by the terminal in any one of the embodiments described in FIGS. 5 to 17 . The terminal can be used to implement the method described in any of the embodiments described in Figures 5 to 17. The modules included in the terminal can be implemented by software and/or hardware.
本申请实施例还提供一种终端,该终端包括存储器和处理器,其中,存储器用于存储计算机程序,处理器用于调用并执行计算机程序,以使得该终端实现上述图5至图17所述实施例中任意一个实施例中所述的方法。An embodiment of the present application also provides a terminal, which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to call and execute the computer program, so that the terminal implements the implementation described in Figures 5 to 17. The method described in any of the examples.
本申请实施例还提供一种车辆,该车辆上部署有如前所述的终端,所述终端例如可以是车机。An embodiment of the present application also provides a vehicle, on which a terminal as described above is deployed. The terminal may be a vehicle machine, for example.
本申请还提供了一种芯片系统,所述芯片系统包括至少一个处理器,用于实现上述图5至图17所述实施例中任意一个实施例中所述的方法。This application also provides a chip system, which includes at least one processor and is used to implement the method described in any one of the embodiments described in FIGS. 5 to 17 .
在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存程序指令和数据,存储器位于处理器之内或处理器之外。In a possible design, the chip system further includes a memory, the memory is used to store program instructions and data, and the memory is located within the processor or outside the processor.
该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。The chip system can be composed of chips or include chips and other discrete devices.
本申请还提供一种计算机程序产品,所述计算机程序产品包括计算机可读指令,当所述计算机可读指令被计算机运行时,实现上述图5至图17所述实施例中任意一个实施例中所述的方法。This application also provides a computer program product. The computer program product includes computer-readable instructions. When the computer-readable instructions are run by a computer, any one of the embodiments described in FIGS. 5 to 17 can be implemented. the method described.
本申请还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机可读指令。当所述计算机可读指令被计算机运行时,实现上述图5至图17所述实施例中任意一个实施例中所述的方法。This application also provides a computer-readable storage medium that stores computer-readable instructions. When the computer readable instructions are executed by the computer, the method described in any one of the embodiments described in FIGS. 5 to 17 is implemented.
应理解,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)、现场可编程门阵列(field programmable gate array,FPGA)或者其它可编程逻辑器件、分立门电路或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It should be understood that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software. The above-mentioned processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA), or other available processors. Programmed logic devices, discrete gate or transistor logic devices, discrete hardware components. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
还应理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlinkDRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlinkDRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
本说明书中使用的术语“单元”、“模块”等,可用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。The terms "unit", "module", etc. used in this specification may be used to refer to computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各种说明性逻辑块(illustrative logical block)和步骤(step),能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。在本申请所提供的几个实施例中,应该理解到,所揭露的装置、设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可 以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。Those of ordinary skill in the art will appreciate that the various illustrative logical blocks and steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or a combination of computer software and electronic hardware. accomplish. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application. In the several embodiments provided in this application, it should be understood that the disclosed devices, equipment and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can to ignore or not execute. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分立部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as discrete components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
在上述实施例中,各功能单元的功能可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令(程序)。在计算机上加载和执行所述计算机程序指令(程序)时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其它可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,数字通用光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。In the above embodiments, the functions of each functional unit may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). When the computer program instructions (program) are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., digital video discs (DVD)), or semiconductor media (e.g., solid state disks (SSD) )wait.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。 If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Claims (23)

  1. 一种人机交互方法,其特征在于,包括:A human-computer interaction method, characterized by including:
    接收来自用户的第一语音输入;receiving first voice input from the user;
    在确定所述第一语音输入与预定义的第一免唤醒指令语义相似的情况下,对所述第一语音输入做出相应的响应,所述第一免唤醒指令用于在不输入预设的唤醒词的情况下指示终端执行所述第一免唤醒指令对应的操作。When it is determined that the first voice input is semantically similar to the predefined first wake-up-free instruction, a corresponding response is made to the first voice input, and the first wake-up-free instruction is used to perform the preset operation without inputting In the case of a wake-up word, the terminal is instructed to perform the operation corresponding to the first wake-up-free instruction.
  2. 如权利要求1所述的方法,其特征在于,在所述接收来自用户的第一语音输入之前,所述方法还包括:The method of claim 1, wherein before receiving the first voice input from the user, the method further includes:
    接收来自所述用户的第二语音输入;receiving second voice input from the user;
    在所述第二语音输入与所述第一免唤醒指令语义相似的情况下,向所述用户确认所述第二语音输入的语义;When the second voice input is semantically similar to the first wake-up-free instruction, confirm the semantics of the second voice input to the user;
    响应于所述用户确认所述第二语音输入的语义的操作,生成与所述第二语音输入对应的第二免唤醒指令。In response to the user's operation of confirming the semantics of the second voice input, a second wake-up-free instruction corresponding to the second voice input is generated.
  3. 如权利要求2所述的方法,其特征在于,所述第一语音输入与预定义的第一免唤醒指令语义相似,包括:The method of claim 2, wherein the first voice input is semantically similar to a predefined first wake-up-free instruction, including:
    所述第一语音输入与所述第二免唤醒指令相同。The first voice input is the same as the second wake-up-free instruction.
  4. 如权利要求2或3所述的方法,其特征在于,所述接收来自用户的第二语音输入,包括:The method of claim 2 or 3, wherein receiving the second voice input from the user includes:
    在预设时长范围内连续多次接收到所述第二语音输入。The second voice input is received multiple times continuously within a preset time range.
  5. 如权利要求1至4中任一项所述的方法,其特征在于,在对所述第一语音输入做出相应的响应之前,所述方法还包括:The method according to any one of claims 1 to 4, characterized in that, before making a corresponding response to the first voice input, the method further includes:
    向所述用户确认所述第一语音输入的语义。Confirming the semantics of the first speech input to the user.
  6. 如权利要求5所述的方法,其特征在于,所述向所述用户确认所述第一语音输入的语义,包括:The method of claim 5, wherein confirming the semantics of the first voice input to the user includes:
    通过提示框和/或语音播报,向所述用户确认所述第一语音输入的语义。Confirm the semantics of the first voice input to the user through prompt boxes and/or voice broadcasts.
  7. 如权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, characterized in that the method further includes:
    向所述用户提示所述第一免唤醒指令。Prompt the first wake-up-free instruction to the user.
  8. 一种人机交互方法,其特征在于,包括:A human-computer interaction method, characterized by including:
    接收来自用户的第一语音输入;receiving first voice input from the user;
    在未接收到预设的唤醒词,但所述第一语音输入包含目标对象的情况下,对所述第一语音输入做出相应的响应,所述目标对象是在所述第一语音输入之前接收到的其他语音输入中被提及次数达到预设门限的对象,所述预设的唤醒词用于唤醒终端。When the preset wake-up word is not received but the first voice input contains a target object, make a corresponding response to the first voice input, and the target object is before the first voice input. For objects whose mention times reach a preset threshold in other received voice inputs, the preset wake-up word is used to wake up the terminal.
  9. 如权利要求8所述的方法,其特征在于,在所述接收来自用户的第一语音输入之前,所述方法还包括:The method of claim 8, wherein before receiving the first voice input from the user, the method further includes:
    接收来自所述用户的预设的唤醒词;receiving a preset wake word from the user;
    接收来自所述用户的第二语音输入;receiving second voice input from the user;
    在所述第二语音输入中包含的第一对象在所述第二语音输入及其之前的语音输入中被提及的次数超过所述预设门限的情况下,将所述第一对象确定为目标对象。When the number of times the first object contained in the second voice input is mentioned in the second voice input and its previous voice input exceeds the preset threshold, the first object is determined to be target.
  10. 如权利要求9所述的方法,其特征在于,所述方法还包括:The method of claim 9, further comprising:
    基于所述目标对象,生成包含所述目标对象的免唤醒指令;Based on the target object, generate a wake-up-free instruction including the target object;
    向所述用户提示所述免唤醒指令。Prompt the user for the wake-up-free instruction.
  11. 一种人机交互方法,其特征在于,包括:A human-computer interaction method, characterized by including:
    接收来自用户的第一语音输入,所述第一语音输入属于第一指令集合,所述第一指令集合中的指令与预定义的免唤醒指令语义相似;Receive a first voice input from the user, the first voice input belongs to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions;
    在满足预设条件的情况下,响应所述第一语音输入。When the preset conditions are met, respond to the first voice input.
  12. 如权利要求11所述的方法,其特征在于,所述预设条件包括以下至少一项:The method of claim 11, wherein the preset conditions include at least one of the following:
    与终端距离处于预设范围内的用户的数量不超过阈值;The number of users within a preset distance from the terminal does not exceed the threshold;
    用户处于预定义的位置;The user is in a predefined position;
    所述第一语音输入所来自的用户不属于预设人群;或, The user from whom the first voice input comes does not belong to the preset group; or,
    接收到所述第一语音输入的时间落入预设时段。The time when the first voice input is received falls within a preset period.
  13. 如权利要求12所述的方法,其特征在于,所述方法应用于车,所述与终端距离处于预设范围内的用户的数量不超过阈值,包括:所述车内存在一个乘客;或,The method of claim 12, wherein the method is applied to a car, and the number of users within a preset range from the terminal does not exceed a threshold, including: there is one passenger in the car; or,
    所述用户处于预定义的位置,包括:所述用户处于主驾的位置。The user is in a predefined position, including: the user is in a driving position.
  14. 一种人机交互方法,其特征在于,包括:A human-computer interaction method, characterized by including:
    在未接收到来自用户的预设的唤醒词的情况下,根据来自所述用户的第一语音输入,确定所述第一语音输入用于请求导航;In the case where the preset wake-up word from the user is not received, determine based on the first voice input from the user that the first voice input is used to request navigation;
    向所述用户询问请求导航的目的地;Asking the user for a destination for which navigation is requested;
    基于所述用户反馈的所述目的地,为所述用户提供导航服务。Based on the destination fed back by the user, a navigation service is provided for the user.
  15. 如权利要求14所述的方法,其特征在于,所述方法还包括:The method of claim 14, further comprising:
    生成包含所述目的地的免唤醒指令;Generate wake-up-free instructions containing the destination;
    向所述用户提示所述免唤醒指令。Prompt the user for the wake-up-free instruction.
  16. 一种人机交互方法,其特征在于,包括:A human-computer interaction method, characterized by including:
    接收来自用户的第一语音输入,所述第一语音输入不属于预定义的免唤醒指令;Receive the first voice input from the user, the first voice input does not belong to the predefined wake-up-free instructions;
    在确定所述第一语音输入与预定义的免唤醒指令中的第一免唤醒指令语义相似的情况下,引导所述用户输入所述第一免唤醒指令。When it is determined that the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions, the user is guided to input the first wake-up-free instruction.
  17. 如权利要求16所述的方法,其特征在于,所述引导所述用户输入所述第一免唤醒指令,包括:The method of claim 16, wherein said guiding the user to input the first wake-up-free instruction includes:
    通过提示框和/或语音播报,引导所述用户输入所述第一免唤醒指令。The user is guided to input the first wake-up-free instruction through a prompt box and/or a voice broadcast.
  18. 如权利要求17所述的方法,其特征在于,所述通过提示框和/或语音播报,引导所述用户输入所述第一免唤醒指令,包括:The method according to claim 17, wherein the guiding the user to input the first wake-up-free instruction through a prompt box and/or a voice broadcast includes:
    通过所述提示框提示所述用户输入所述第一免唤醒指令,所述提示框中包含所述第一免唤醒指令;The user is prompted to input the first wake-up-free instruction through the prompt box, and the prompt box contains the first wake-up-free instruction;
    在预设时长范围内通过所述提示框提示的次数达到预设门限,但所述用户未发出所述第一免唤醒指令的情况下,通过所述语音播报,引导所述用户输入所述第一免唤醒指令。When the number of prompts through the prompt box within the preset time range reaches the preset threshold, but the user does not issue the first wake-up-free instruction, the user is guided to input the third wake-up-free instruction through the voice broadcast. One-free wake-up command.
  19. 一种计算机设备,其特征在于,包括用于执行如权利要求1至18中任一项所述方法的单元。A computer device, characterized by comprising a unit for executing the method according to any one of claims 1 to 18.
  20. 一种计算机设备,其特征在于,包括处理器和存储器,其中,A computer device, characterized by including a processor and a memory, wherein,
    所述存储器用于存储计算机可读指令;The memory is used to store computer readable instructions;
    所述处理器用于读取所述计算机可读指令,以使得所述计算机设备实现如权利要求1至18中任一项所述的方法。The processor is configured to read the computer readable instructions, so that the computer device implements the method according to any one of claims 1 to 18.
  21. 一种车辆,其特征在于,用于实现如权利要求1至18中任一项所述的方法;或,包括如权利要求19或20所述的计算机设备。A vehicle, characterized in that it is used to implement the method according to any one of claims 1 to 18; or includes the computer device according to claim 19 or 20.
  22. 一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机可读指令,当所述计算机可读指令被计算机执行时,实现如权利要求1至18中任一项所述的方法。A computer-readable storage medium, characterized in that computer-readable instructions are stored in the storage medium. When the computer-readable instructions are executed by a computer, the method of any one of claims 1 to 18 is implemented. method.
  23. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机可读指令,当所述计算机可读指令被计算机运行时,实现如权利要求1至18中任一项所述的方法。 A computer program product, characterized in that the computer program product includes computer-readable instructions, and when the computer-readable instructions are run by a computer, the method according to any one of claims 1 to 18 is implemented.
PCT/CN2023/116615 2022-09-05 2023-09-01 Human-machine interaction method and related apparatus WO2024051611A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211079452.4A CN117690423A (en) 2022-09-05 2022-09-05 Man-machine interaction method and related device
CN202211079452.4 2022-09-05

Publications (1)

Publication Number Publication Date
WO2024051611A1 true WO2024051611A1 (en) 2024-03-14

Family

ID=90133973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/116615 WO2024051611A1 (en) 2022-09-05 2023-09-01 Human-machine interaction method and related apparatus

Country Status (2)

Country Link
CN (1) CN117690423A (en)
WO (1) WO2024051611A1 (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509225A (en) * 2018-03-28 2018-09-07 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN108520748A (en) * 2018-02-01 2018-09-11 百度在线网络技术(北京)有限公司 A kind of smart machine functional guide and system
CN108735216A (en) * 2018-06-12 2018-11-02 广东小天才科技有限公司 A kind of voice based on semantics recognition searches topic method and private tutor's equipment
WO2020073288A1 (en) * 2018-10-11 2020-04-16 华为技术有限公司 Method for triggering electronic device to execute function and electronic device
CN111028846A (en) * 2019-12-25 2020-04-17 北京梧桐车联科技有限责任公司 Method and device for registration of wake-up-free words
CN111354360A (en) * 2020-03-17 2020-06-30 北京百度网讯科技有限公司 Voice interaction processing method and device and electronic equipment
CN111816192A (en) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 Voice equipment and control method, device and equipment thereof
CN112802465A (en) * 2019-11-14 2021-05-14 北京安云世纪科技有限公司 Voice control method and system
US20210183386A1 (en) * 2019-08-15 2021-06-17 Huawei Technologies Co., Ltd. Voice Interaction Method and Apparatus, Terminal, and Storage Medium
CN114155855A (en) * 2021-12-17 2022-03-08 海信视像科技股份有限公司 Voice recognition method, server and electronic equipment
CN114594923A (en) * 2022-02-16 2022-06-07 北京梧桐车联科技有限责任公司 Control method, device and equipment of vehicle-mounted terminal and storage medium
CN115662410A (en) * 2022-08-12 2023-01-31 安徽讯飞寰语科技有限公司 Vehicle-mounted machine voice interaction method and vehicle-mounted machine
CN115705844A (en) * 2021-08-12 2023-02-17 上海擎感智能科技有限公司 Voice interaction configuration method, electronic device and computer readable medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108520748A (en) * 2018-02-01 2018-09-11 百度在线网络技术(北京)有限公司 A kind of smart machine functional guide and system
CN108509225A (en) * 2018-03-28 2018-09-07 联想(北京)有限公司 A kind of information processing method and electronic equipment
CN108735216A (en) * 2018-06-12 2018-11-02 广东小天才科技有限公司 A kind of voice based on semantics recognition searches topic method and private tutor's equipment
WO2020073288A1 (en) * 2018-10-11 2020-04-16 华为技术有限公司 Method for triggering electronic device to execute function and electronic device
US20210183386A1 (en) * 2019-08-15 2021-06-17 Huawei Technologies Co., Ltd. Voice Interaction Method and Apparatus, Terminal, and Storage Medium
CN112802465A (en) * 2019-11-14 2021-05-14 北京安云世纪科技有限公司 Voice control method and system
CN111028846A (en) * 2019-12-25 2020-04-17 北京梧桐车联科技有限责任公司 Method and device for registration of wake-up-free words
CN111354360A (en) * 2020-03-17 2020-06-30 北京百度网讯科技有限公司 Voice interaction processing method and device and electronic equipment
CN111816192A (en) * 2020-07-07 2020-10-23 云知声智能科技股份有限公司 Voice equipment and control method, device and equipment thereof
CN115705844A (en) * 2021-08-12 2023-02-17 上海擎感智能科技有限公司 Voice interaction configuration method, electronic device and computer readable medium
CN114155855A (en) * 2021-12-17 2022-03-08 海信视像科技股份有限公司 Voice recognition method, server and electronic equipment
CN114594923A (en) * 2022-02-16 2022-06-07 北京梧桐车联科技有限责任公司 Control method, device and equipment of vehicle-mounted terminal and storage medium
CN115662410A (en) * 2022-08-12 2023-01-31 安徽讯飞寰语科技有限公司 Vehicle-mounted machine voice interaction method and vehicle-mounted machine

Also Published As

Publication number Publication date
CN117690423A (en) 2024-03-12

Similar Documents

Publication Publication Date Title
WO2021051989A1 (en) Video call method and electronic device
WO2021078284A1 (en) Content continuation method and electronic device
WO2020078337A1 (en) Translation method and electronic device
WO2021052282A1 (en) Data processing method, bluetooth module, electronic device, and readable storage medium
WO2020073288A1 (en) Method for triggering electronic device to execute function and electronic device
WO2020239013A1 (en) Interaction method and terminal device
JP7234379B2 (en) Methods and associated devices for accessing networks by smart home devices
WO2021000817A1 (en) Ambient sound processing method and related device
CN113133095A (en) Method for reducing power consumption of mobile terminal and mobile terminal
WO2023024852A1 (en) Short message notification method and electronic terminal device
WO2020006711A1 (en) Message playing method and terminal
CN114115770A (en) Display control method and related device
CN113488042B (en) Voice control method and electronic equipment
EP4221172A1 (en) Control method and apparatus for electronic device
CN113301544B (en) Method and equipment for voice intercommunication between audio equipment
WO2024051611A1 (en) Human-machine interaction method and related apparatus
CN113449068A (en) Voice interaction method and electronic equipment
WO2022161077A1 (en) Speech control method, and electronic device
WO2022135254A1 (en) Text editing method, electronic device and system
CN114327198A (en) Control function pushing method and device
CN113141665B (en) Method and device for receiving system-on-demand message and user equipment
WO2022068654A1 (en) Interaction method and apparatus for terminal device
WO2024067110A1 (en) Card updating method and related apparatus
WO2024022154A1 (en) Method for determining device user, and related apparatus
WO2022143048A1 (en) Dialogue task management method and apparatus, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23862305

Country of ref document: EP

Kind code of ref document: A1