WO2024051611A1

WO2024051611A1 - Human-machine interaction method and related apparatus

Info

Publication number: WO2024051611A1
Application number: PCT/CN2023/116615
Authority: WO
Inventors: 李凌飞; 沈波; 任亮亮; 张跃; 徐平; 吴奇强; 吴雪晨; 谭彬林; 耿安峰
Original assignee: 华为技术有限公司
Priority date: 2022-09-05
Filing date: 2023-09-01
Publication date: 2024-03-14
Also published as: CN117690423A

Abstract

Provided are a human-machine interaction method and a related apparatus. The method comprises: a terminal receiving a first speech input from a user, and when the first speech input is semantically similar to a predefined first wake-up-free instruction, making a corresponding response to the first speech input. That is, it is not necessary to wake up the terminal in advance, and as long as the received first speech input is semantically similar to the predefined first wake-up-free instruction, the terminal can execute an operation corresponding to the first speech input. Therefore, the problem of a terminal having no response due to the fact that a first wake-up-free instruction is fixed and limited is solved, and compared with an approach in which the terminal only responds to the predefined first wake-up-free instruction, the method greatly improves the interaction experience of a user.

Description

Human-computer interaction method and related devices

This application claims priority to the Chinese patent application filed with the China Patent Office on September 5, 2022, with the application number 202211079452.4 and the application title "Human-computer interaction method and related devices", the entire content of which is incorporated into this application by reference. .

Technical field

The present application relates to the field of terminal technology, and in particular to human-computer interaction methods and related devices.

Background technique

With the increasing popularity of smart terminals, voice interaction has become one of the commonly used and important human-computer interaction methods. At present, most voice interactions require users to wake up the terminal through a preset wake-up word first, and then implement subsequent interactions. This method is cumbersome and results in poor user experience. Some manufacturers also provide a wake-up-free function, that is, there is no need to wake up the terminal in advance, and you can directly enter the predefined wake-up-free instructions. However, the predefined wake-up-free instructions are fixed and limited, and it is easy to accidentally wake up the user while chatting. Affect user experience.

Therefore, it is hoped to provide human-computer interaction methods to improve the user's interactive experience.

Contents of the invention

This application provides a human-computer interaction method and related devices, in order to improve the user's interactive experience.

In the first aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.

Exemplarily, the method includes: receiving a first voice input from a user; and making a corresponding response to the first voice input if it is determined that the first voice input is semantically similar to a predefined first wake-up-free instruction. , the above-mentioned first wake-up-free instruction is used to instruct the terminal to perform the operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.

Based on the above technical solution, the terminal receives the first voice input from the user, and when the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input, that is, Without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is semantically similar to the predefined first wake-up-free instruction, the terminal can respond, which is conducive to solving the predefined problem. The first wake-up-free command is fixed and limited, causing the problem of terminal unresponsiveness, which in turn helps improve the user's interactive experience.

In conjunction with the first aspect, in some possible implementations of the first aspect, before making a corresponding response to the first voice input, the above method further includes: confirming the semantics of the first voice input to the user.

After the terminal receives the first voice input, it can confirm with the user whether the semantics of the recognized first voice input are correct. This can improve the accuracy on the one hand and prevent the user from accidentally mentioning the first voice input causing the terminal to To respond, for example, if the user mentions the first voice input by mistake, a negative reply can be made when the terminal confirms to the user, so as to prevent the terminal from continuing to perform the corresponding operation, which is conducive to improving the user's experience.

Optionally, confirming the semantics of the first voice input to the user includes: confirming the semantics of the first voice input to the user through a prompt box and/or voice broadcast.

The terminal can confirm the semantics of the first voice input to the user through the prompt box, which contains the semantics of the first voice input, and can also confirm the semantics of the first voice input to the user through voice broadcast, and can also confirm the semantics of the first voice input to the user through the prompt box and voice broadcast. In a combined manner, the semantics of the first voice input is confirmed to the user. By providing the above multiple confirmation methods, the terminal's flexibility in confirming semantics to users is greatly improved.

In conjunction with the first aspect, in some possible implementations of the first aspect, the above method further includes: prompting the user with a first wake-up-free instruction.

The terminal can also prompt the user to directly use the predefined first wake-up-free command next time. For example, the terminal may prompt the user with the first wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.

In conjunction with the first aspect, in some possible implementations of the first aspect, before receiving the first voice input from the user, the above method further includes: receiving a second voice input from the user; When the semantics of a wake-up-free instruction are similar, confirm the semantics of the second voice input to the user; in response to the user's operation of confirming the semantics of the second voice input, generate a message corresponding to the second voice input. The second wake-up-free command.

The terminal learns and generates a second wake-up-free instruction through the above method. The second wake-up-free instruction can be used to instruct the terminal to perform a corresponding operation without inputting a preset wake-up word. Specifically, after the terminal receives the second voice input from the user, if the second voice input is semantically similar to the first wake-up-free instruction, it will confirm with the user whether it is the above semantics. If the user confirms the semantics of the second voice input, If correct, a corresponding second wake-up-free instruction is generated, so that when the terminal is not awakened in advance next time and receives the second voice input again, it can respond to it. In other words, the number of wake-up-free instructions that can be used to instruct the terminal to perform corresponding operations without inputting a preset wake-up word is greatly increased, which in turn helps improve the user's interactive experience.

In conjunction with the first aspect, in some possible implementations of the first aspect, the first voice input is semantically similar to the predefined first wake-up-free instruction, including: the first voice input is the same as the second wake-up-free instruction.

After receiving the first voice input, the terminal determines whether the first voice input and the first wake-up-free instruction are semantically similar. One way is to perform semantic analysis to determine the two based on the first voice input and the predefined first wake-up-free instruction. Are the semantics similar? Another way is that the terminal can determine whether the first voice input and the generated second wake-up-free instruction are the same. It can be understood that the second wake-up-free instruction is an instruction generated based on the second voice input that is semantically similar to the first wake-up-free instruction. , if the first voice input is the same as the generated second wake-up-free instruction, the first voice input is semantically similar to the first wake-up-free instruction, so that the terminal can also respond to the first voice input. The above two methods can be used in combination or separately, which greatly improves the terminal's flexibility in determining whether the first voice input and the first wake-up-free instruction are semantically similar.

In conjunction with the first aspect, in some possible implementations of the first aspect, receiving the second voice input from the user includes: receiving the second voice input multiple times continuously within a preset time range.

In other words, when the terminal receives the second voice input multiple times continuously within the preset time range, it will confirm the semantics of the second voice input to the user. In this way, the user can effectively avoid accidentally mentioning the second voice input. In this case, the terminal mistakenly thinks that the user wants to perform the corresponding operation, which is beneficial to improving the user's interactive experience.

In the second aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or it can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or it can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.

Exemplarily, the method includes: receiving a first voice input from the user; and making a corresponding response to the first voice input if the preset wake-up word is not received but the first voice input contains the target object, The target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the preset wake-up word is used to wake up the terminal.

Based on the above technical solution, when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interactive experience.

Combined with the second aspect, in some possible implementations of the second aspect, before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second wake-up word from the user. Voice input; when the number of times the first object contained in the second voice input is mentioned in the second voice input and its previous voice input exceeds a preset threshold, the first object is determined as the target object.

The terminal may record the number of times the first object is mentioned in the voice input. If the number of times the first object is mentioned in the voice input exceeds a preset threshold, it is determined as the target object so that the user can subsequently refer to it without prior notice. When the terminal is woken up, voice input containing the target object is sent out. After the terminal receives the above voice input, it can respond. That is, there is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.

Combined with the second aspect, in some possible implementations of the second aspect, the above method further includes: based on the target object, generating a wake-up-free instruction including the target object; and prompting the user for the wake-up-free instruction.

The terminal can generate a wake-up-free instruction including the target object based on the target object, and prompt the user that the above wake-up-free instruction can be used directly next time, without waking up the terminal in advance, and the terminal can respond accordingly. The terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.

In the third aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic module or software that implements all or part of the terminal functions Software implementation, this application does not limit this.

Exemplarily, the method includes: receiving a first voice input from the user, the first voice input belonging to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions; condition, respond to the first voice input.

Based on the above technical solution, after the terminal receives the first voice input that is semantically similar to the predefined wake-up-free instruction, it responds to the first voice input when the preset conditions are met. That is to say, for the predefined wake-up-free command The terminal will respond accordingly only if the first voice input with similar command semantics meets the preset conditions, which may not be possible in all circumstances. This can prevent the user from mistakenly mentioning the first voice input and causing the terminal to respond. It is conceivable that the first voice input may be more colloquial than the predefined wake-up-free instructions. If a response is made under any circumstances, it is likely that the terminal response will be frequently triggered during the user's conversation. Therefore, by setting the preset Conditions, the terminal will respond accordingly only when the preset conditions are met, which will greatly improve the user's interactive experience.

Combined with the third aspect, in some possible implementations of the third aspect, the above-mentioned preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined location ; The user from whom the first voice input comes does not belong to the preset group; or, the time when the first voice input is received falls within the preset period.

The number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, the above first voice input can be responded to. It is not difficult to understand that, If the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller. That is, the user may really want the terminal to perform the corresponding operation. On the contrary, if the number of surrounding users is large, the user mistakenly mentions the first voice input. And the greater the possibility of first voice input. The user is in a predefined position. For example, the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service. The terminal responds to the first voice input from the user. The user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them. The time when the first voice input is received falls within a preset period. The preset period can be, for example, working hours. During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond to predefined wake-up-free instructions. In summary, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.

Combined with the third aspect, in some possible implementations of the third aspect, the above method is applied to a car, and the number of users within a preset range from the terminal does not exceed a threshold, including: there is a passenger in the car; or , the above-mentioned user is in a predefined position, including: the user is in the main driving position.

In the fourth aspect, the present application provides a human-computer interaction method, which can be executed by a terminal, or can be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.

Exemplarily, the method includes: without receiving a preset wake-up word from the user, determining according to the first voice input from the user that the first voice input is used to request navigation; and asking the user for the purpose of requesting navigation. location; provide navigation services to users based on destinations fed back by users.

Based on the above technical solution, without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. It provides navigation services to users without waking up the terminal in advance, which simplifies the interaction process and helps improve the user's interactive experience.

Combined with the fourth aspect, in some possible implementations of the fourth aspect, the above method further includes: generating a wake-up-free instruction including a destination; and prompting the user for the wake-up-free instruction.

The terminal can generate a wake-up-free command including the above destination, and prompt the user that the above wake-up-free command can be used directly next time, and the terminal can respond accordingly. The terminal may prompt the user with the above wake-up-free instruction through a prompt box and/or voice broadcast. This application does not limit the prompting method.

In the fifth aspect, this application provides a human-computer interaction method, which can be executed by a terminal, or can also be executed by components (such as chips, chip systems, etc.) configured in the terminal, or can also be executed by capable Logic modules or software implementations that realize all or part of the terminal functions are not limited in this application.

Exemplarily, the method includes: receiving a first voice input from the user, which does not belong to the predefined wake-up-free instructions; and receiving the first wake-up-free instruction among the first voice input and the predefined wake-up-free instructions. If the semantics are similar, the user is guided to input the above-mentioned first wake-up-free instruction.

Based on the above technical solution, the terminal receives the first voice input. The first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions. Then the terminal guides the user to enter the corresponding first free The wake-up command allows the terminal to respond accordingly after the user inputs the first wake-up-free command. Compared with the terminal not responding or prompting, the user's interactive experience can be greatly improved.

Combined with the fifth aspect, in some possible implementations of the fifth aspect, the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or a voice broadcast.

The terminal can guide the user to enter the first wake-up-free instruction through a prompt box, which contains the first wake-up-free instruction. It can also guide the user to enter the first wake-up-free instruction through voice broadcast, or it can also combine the prompt box and voice broadcast. , guiding the user to enter the first wake-up-free command. By providing the above multiple methods, the flexibility of the terminal in guiding the user to input the first wake-up-free command is greatly improved.

Combined with the fifth aspect, in some possible implementations of the fifth aspect, the above-mentioned guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast includes: prompting the user to input the first wake-up-free instruction through a prompt box , the prompt box contains the first wake-up-free command; when the number of prompts through the prompt box reaches the preset threshold within the preset time period, but the user does not issue the first wake-up-free command, the user is guided to enter the third wake-up command through voice broadcast. One-free wake-up command.

In a sixth aspect, the present application provides a computer device, including a unit for implementing the method in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects. It should be understood that each unit can implement the corresponding function by executing a computer program.

In a seventh aspect, the present application provides a computer device, including a processor configured to execute the method described in any one of the first to fifth aspects and any possible implementation manner of the first to fifth aspects.

The computer device may further include a memory for storing computer readable instructions, and the processor reads the computer readable instructions so that the computer device can implement the methods described in the above aspects. The computer device may also include a communication interface for the computer device to communicate with other devices. For example, the communication interface may be a transceiver, a circuit, a bus, a module or other types of communication interfaces.

In an eighth aspect, this application provides a vehicle for implementing the method in any of the first to fifth aspects and any possible implementation manner of the first to fifth aspects, or including the sixth aspect or the seventh aspect. Any of the computer equipment described above.

In a ninth aspect, the present application provides a chip system, which includes at least one processor and is used to support the implementation of any of the above-mentioned first to fifth aspects and any possible implementation manner of the first to fifth aspects. The functions involved, for example, include receiving or processing data and/or information involved in the above methods.

In a possible design, the chip system further includes a memory, the memory is used to store program instructions and data, and the memory is located within the processor or outside the processor.

The chip system may be composed of chips, or may include chips and other discrete devices.

In a tenth aspect, the present application provides a computer-readable storage medium. Computer-readable instructions are stored in the storage medium. When the computer-readable instructions are executed by a computer, the computer implements the first to fifth aspects. and the method in any possible implementation manner of the first aspect to the fifth aspect.

In an eleventh aspect, the present application provides a computer program product. The computer program product includes: computer readable instructions. When the computer readable instructions are run by a computer, the computer implements the first to fifth aspects and The method in any possible implementation manner of the first aspect to the fifth aspect.

It should be understood that the sixth to eleventh aspects of the present application correspond to the technical solutions of the first to fifth aspects of the present application, and the beneficial effects achieved by each aspect and corresponding feasible implementations are similar and are no longer Repeat.

Description of the drawings

Figure 1 is a schematic structural diagram of a terminal provided by an embodiment of the present application;

Figure 2 is a schematic diagram of a scenario applicable to the human-computer interaction method provided by the embodiment of the present application;

Figure 3 is a schematic diagram of a known human-computer interaction method;

Figure 4 is a schematic diagram of another known human-computer interaction method;

Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application;

Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application;

Figure 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application;

Figure 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application;

Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application;

Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application;

Figure 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application;

Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application;

Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application;

Figure 14 is a schematic flow chart of the fourth human-computer interaction method provided by the embodiment of the present application;

Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application;

Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application;

Figure 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application.

Detailed ways

The technical solutions in this application will be described below with reference to the accompanying drawings.

The methods provided by the embodiments of this application can be applied to mobile phones, tablet computers, smart watches, smart speakers, wearable devices, vehicle-mounted devices, augmented reality (AR)/virtual reality (VR) devices, notebook computers, On terminals such as personal computers (PCs), ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (PDAs), and distributed devices.

It should be noted that the embodiments of this application do not place any limitation on the specific type of terminal.

Exemplarily, FIG. 1 shows a schematic structural diagram of a terminal 100. As shown in Figure 1, the terminal 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , Antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, and ambient light. Sensor 180L, bone conduction sensor 180M, etc.

The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (application processor, AP), a microcontroller unit (microcontroller unit, MCU), a modem processor, a graphics processor (graphics processor). processing unit (GPU), image signal processor (ISP), controller, memory, video codec, digital signal processor (DSP), baseband processor and neural network processor (neural -One or more of -network processing unit, NPU), etc. Among them, different processing units can be independent devices or integrated in one or more processors.

The application processor outputs sound signals through the audio module 170 (such as the speaker 170A, etc.), or displays images or videos through the display screen 194 .

The controller may be the nerve center and command center of the terminal 100. The controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.

The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.

The processor 110 can perform different operations to implement different functions by executing instructions. For example, the instruction may be an instruction pre-stored in the memory before the device leaves the factory, or it may be an instruction read from the APP after the user installs a new application (APP) during use. This is not the case in the embodiments of this application. Any limitations.

In some embodiments, processor 110 may include one or more interfaces. Interfaces may include integrated circuit (inter-integrated circuit, I2C) interface, integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, secure digital input and output (SDIO), pulse code modulation (pulse code modulation (PCM) interface, universal asynchronous receiver/transmitter (UART) interface, universal synchronous asynchronous receiver/transmitter (USART), mobile industry processor interface , MIPI), general-purpose input/output (GPIO) interface, SIM interface and/or USB interface, etc.

The USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc. The USB interface 130 can be used to connect a charger to charge the terminal 100, and can also be used to transmit data between the terminal 100 and peripheral devices. It can also be used to connect headphones to play audio through them. This interface can also be used to connect other terminals.

It can be understood that the interface connection relationships between the modules illustrated in this application are only schematic illustrations and do not constitute a structural limitation on the terminal 100 . In other embodiments, the terminal 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from the wired charger through the USB interface 130 . In some wireless charging embodiments, the charging management module 140 may receive wireless charging input through the wireless charging coil of the terminal 100 . While charging the battery 142, the charging management module 140 can also provide power to the terminal through the power management module 141.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc. The power management module 141 can also be used to monitor battery capacity, battery cycle times, battery health status (leakage, impedance) and other parameters. In some other embodiments, the power management module 141 may also be provided in the processor 110 . In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.

Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in terminal 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.

The mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation. In some embodiments, at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 . In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.

A modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low-frequency baseband signal to be sent into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs sound signals through audio devices (not limited to speaker 170A, receiver 170B, etc.), or displays images or videos through display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be independent of the processor 110 and may be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN), such as wireless fidelity (wireless fidelity, Wi-Fi), Bluetooth (bluetooth, BT), and global navigation satellite systems. Wireless communication solutions such as global navigation satellite system (GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR), etc. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 . The wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.

In some embodiments, the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology. The wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), fifth generation (5th generation, 5G) communication system, BT, GNSS, WLAN, NFC, FM, and/or IR technology, etc. The GNSS may include a global positioning system (GPS), GNSS, BeiDou navigation satellite system (BDS), quasi-zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).

The terminal 100 can implement display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, videos, etc. Display 194 includes a display panel. The display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode). emitting diode (AMOLED), flexible light-emitting diode (FLED), mini LED (Mini LED), micro Led (Micro LED), micro OLED (Micro-OLED), quantum dot light emitting diode (quantum dot light emitting diodes, QLED), etc. In some embodiments, terminal 100 may include one or more display screens 194.

In this application, the display screen 194 can be used to display a prompt box, which contains a predefined wake-up-free instruction. The prompt box is used to prompt the user to directly use the above-mentioned wake-up-free instruction next time, that is, no need to wake up in advance. The terminal can realize voice interaction with the terminal through the above wake-up-free command.

The terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.

Camera 193 is used to capture still images or video. The object passes through the lens to produce an optical image that is projected onto the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. In some embodiments, terminal 100 may include one or more cameras 193.

Digital signal processors are used to process digital signals. In addition to digital image signals, they can also process other digital signals. For example, when the terminal 100 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy.

Video codecs are used to compress or decompress digital video. Terminal 100 may support one or more video codecs. In this way, the terminal 100 can play or record videos in multiple encoding formats, such as moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, etc.

The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.

Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the terminal 100 . The internal memory 121 may include a program storage area and a data storage area. Among them, the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.). The storage data area may store data created during use of the terminal 100 (such as audio data, phone book, etc.). In addition, the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.

The terminal 100 can implement audio functions through the audio module 170, such as the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. For example, music playback, recording, etc.

The audio module 170 is used to convert digital audio information into analog audio signal output, and is also used to convert analog audio input into digital audio signals. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .

Speaker 170A, also called "speaker", is used to convert audio electrical signals into sound signals. The terminal 100 can listen to music through the speaker 170A, or listen to a hands-free call.

Receiver 170B, also called "earpiece", is used to convert audio electrical signals into sound signals. When the terminal 100 answers a call or a voice message, the voice can be heard by bringing the receiver 170B close to the human ear.

Microphone 170C, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can speak close to the microphone 170C with the human mouth and input the sound signal to the microphone 170C. Terminal 100 can be set At least one microphone 170C. In other embodiments, the terminal 100 may be provided with two microphones 170C, which in addition to collecting sound signals, may also implement a noise reduction function. In other embodiments, the terminal 100 can also be equipped with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions, etc.

In this application, the microphone 170C can be used to receive voice input from the user, that is, can be used to collect sound signals from the user.

The headphone interface 170D is used to connect wired headphones. The headphone interface 170D can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, or a cellular telecommunications industry association of the USA (CTIA) standard interface.

The buttons 190 include a power button (also called a power button), a volume button, etc. The button 190 may be a mechanical button or a touch button. The terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100.

The motor 191 can generate vibration prompts. The motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback. For example, touch operations for different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. The motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 . Different application scenarios (such as time reminders, receiving information, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also be customized.

The indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be connected to or separated from the terminal 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 . The terminal 100 may support one or more SIM card interfaces. SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc. Multiple cards can be inserted into the same SIM card interface 195 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 is also compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The terminal 100 interacts with the network through the SIM card to implement functions such as calls and data communications. In some embodiments, the terminal 100 adopts eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the terminal 100 and cannot be separated from the terminal 100.

The structure illustrated in this application does not constitute a specific limitation on the terminal 100. In other embodiments, the terminal 100 may include more or fewer components than shown, or some components may be combined, or some components may be separated, or may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.

In order to facilitate understanding of the human-computer interaction method provided by the embodiment of the present application, the scenarios applicable to the human-computer interaction method provided by the embodiment of the present application will be described below. It can be understood that the application scenarios described in the embodiments of the present application are for the purpose of more clearly illustrating the technical solutions of the embodiments of the present application, and do not constitute a limitation on the technical solutions provided by the embodiments of the present application.

Figure 2 is a schematic diagram of a scenario applicable to the method provided by the embodiment of this application. As shown in Figure 2, the user can input operations that he wants the terminal to perform through voice to achieve interaction with the terminal (a mobile phone is used as an example in Figure 2). In some scenarios, voice interaction has become one of the important and commonly used human-computer interaction methods. For example, while the user is driving a vehicle, he or she can interact with the vehicle machine (an example of a terminal) through voice. Currently, the user can first wake up the terminal through the preset wake-up word. In more detail, the user can first wake up the voice assistant (or smart assistant, intelligent assistant, etc., this application does not limit this) through the preset wake-up word. , and then achieve subsequent interactions. This method is relatively cumbersome and results in poor user experience. Some manufacturers also provide a wake-up-free function, that is, users do not need to wake up the voice assistant in advance and can directly interact with the terminal through predefined wake-up-free instructions. However, the predefined wake-up-free instructions are fixed and limited. If the wake-up-free instructions input by the user's voice are inaccurate, the terminal will become unresponsive and the user experience will be poor.

The above two known human-computer interaction methods will be described in detail below with reference to FIGS. 3 and 4 .

Figure 3 shows a known human-computer interaction method. As shown in Figure 3, the user wakes up the terminal through a preset wake-up word in advance. More specifically, the user first wakes up the voice assistant in the terminal through a preset wake-up word. As shown in Figure 3, the wake-up word is "little Yi Xiaoyi", in response to the user inputting "Xiaoyi Xiaoyi" through voice, the voice assistant replies "I am here". Next, the user inputs "Navigate to location A" through voice. In response to the user's voice input of "Navigate to location A", the voice assistant replies "Okay, let's start navigating for you" and displays the route to location A through the user interface. . It can be seen that the entire interaction process is relatively cumbersome, resulting in poor user experience.

Figure 4 shows another known human-computer interaction method. As shown in Figure 4, users can directly voice input predefined wake-up-free instructions to interact with the terminal. For example, the user voice inputs "Navigate to the company", and in response to the user's voice input of "Navigate to the company", the terminal displays the route to the company through the user interface, in which the location of the user's company is pre-stored on the terminal. If the user voice inputs other statements with similar intentions (or semantics), such as "Go to work", "I want to go to work", "I want to go to the company", etc., the voice assistant will not respond. In general, the predefined wake-up-free instructions are fixed and limited, which may cause the voice assistant to be unable to respond to the user’s voice input. entry, resulting in poor user experience.

In order to improve the user's human-computer interaction experience, this application provides a human-computer interaction method. The method includes: when the terminal receives the first voice input from the user and is semantically similar to the predefined first wake-up-free instruction. , respond accordingly to the first voice input, that is, without waking up the terminal in advance, even if the sentence input by the user's voice is not the predefined first wake-up-free instruction, as long as it is consistent with the predefined first wake-up-free instruction. If the command semantics are similar, the terminal can recognize and respond, which will help alleviate the problem of terminal unresponsiveness caused by the fixed and limited predefined first wake-up-free command, which will further help improve the user's voice interaction experience.

In order to clearly describe the technical solutions of the embodiments of the present application, the following description is first made.

First, in the embodiments of the present application, words such as “first” and “second” are used to distinguish identical or similar items with basically the same functions and effects. For example, the first voice input and the second voice input are only used to distinguish different voice inputs, and their order is not limited. Those skilled in the art can understand that words such as "first" and "second" do not limit the number and position, and words such as "first" and "second" do not limit the number and position.

Second, in this application, "at least one item" refers to one item or multiple items. "And/or" describes the association of associated objects, indicating that there can be three relationships, for example, A and/or B, which can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character "/" generally indicates that the related objects are in an "or" relationship, but it does not exclude the situation that the related objects are in an "and" relationship. The specific meaning can be understood based on the context.

Third, in this application, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion, for example, a process, method, system, product or equipment that includes a series of steps or units does not necessarily are limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

The human-computer interaction method provided by this application will be described in detail below with reference to specific embodiments.

It should be understood that the embodiments shown below can be executed by the terminal, or can also be executed by components configured in the terminal (such as chips, chip systems, etc.), or can also be executed by logic modules that can realize all or part of the terminal functions. Or software implementation, which is not limited in the embodiments of this application. The terminal may have a structure as shown in FIG. 1 , or may have more or less structures than in FIG. 1 , which is not limited in the embodiments of the present application.

Figure 5 is a schematic flow chart of the first human-computer interaction method provided by the embodiment of the present application. As shown in Figure 5, the method 500 may include step 501 and step 502. Each step shown in Figure 5 will be described in detail below.

Step 501: Receive first voice input from the user.

The first voice input may be a voice input received by the terminal from the user without the user waking up the terminal in advance.

Exemplarily, in response to the user's voice operation, a first voice input is received from the user. The first voice input may be, for example, "navigate to location A", "navigate to the company", "leave to work", "play a song" B", "I want to listen to song B", etc. The embodiments of this application do not place any restrictions on the specific content of the first voice input.

Step 502: If the first voice input is semantically similar to the predefined first wake-up-free instruction, make a corresponding response to the first voice input.

The first wake-up-free instruction is used to instruct the terminal to perform an operation corresponding to the first wake-up-free instruction without inputting a preset wake-up word.

One possible implementation method is that after the terminal receives the first voice input from the user, it performs semantic analysis on the first voice input and the predefined first wake-up-free instruction based on natural language processing (NLP). When the first voice input is semantically similar to the predefined first wake-up-free instruction, a corresponding response is made to the first voice input.

Another possible implementation is that after the terminal receives the first voice input from the user, it determines whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, wherein the The second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction. That is to say, the second wake-up-free instruction has similar semantics to the predefined first wake-up-free instruction, but uses different terms. For example, the predefined first wake-up-free command is "Navigate to the company", and the second wake-up-free command obtained based on voice input learning is "Go to work". The semantics of the two are similar, but the terms are different. The second wake-up-free command is More colloquially, the first wake-up-free command is a standard human-computer interaction term. In the case where the first voice input belongs to the second wake-up-free instruction obtained based on learning the voice input (that is, the first voice input is semantically similar to the predefined first wake-up-free instruction), the terminal makes a response to the first voice input. Respond accordingly.

Only one of the above two possible implementation methods can be used, or they can be used in combination. When the two are used in combination, for example, after the terminal receives the first voice input, it can first determine whether the first voice input belongs to the second wake-up-free instruction based on learning the voice input, such as If it belongs, respond to it. If it does not belong, further semantic analysis is performed on the first voice input and the predefined first wake-up-free instruction based on NLP. If the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal makes a corresponding response to the first voice input; if they are not similar, the terminal does not make a corresponding response to the first voice input.

It should be understood that the above-mentioned predefined first wake-up-free instruction and/or the first wake-up-free instruction obtained based on learning of voice input can be stored in the instruction library. After receiving the first voice input, the terminal determines whether to respond accordingly based on the first wake-up-free command and the second wake-up-free command stored in the command library. If the first voice input is semantically similar to the first wake-up-free instruction, the terminal responds accordingly to the first voice input.

Optionally, before making a corresponding response to the first voice input, the above method further includes: confirming the semantics of the first voice input to the user.

For example, after the terminal receives the first voice input from the user, if the first voice input is semantically similar to the predefined first wake-up-free instruction, the terminal asks the user whether the above semantics is correct. If the user replies that the above semantics are correct, , then the terminal responds accordingly to the first voice input.

Among them, the terminal can ask the user whether the semantics are correct through voice broadcast, and can also ask the user whether the semantics are correct through a prompt box (such as toast). The above prompt box contains the semantics of the above-mentioned first voice input, or it can also ask the user through a prompt box (such as toast). Prompt boxes (such as toast) and voice broadcasts are used to ask users whether the semantics are correct. The embodiments of this application do not limit the method used by the terminal to query the user for semantics.

Optionally, the above method further includes: prompting the user with a first wake-up-free instruction. That is to say, in addition to performing the operation indicated by the first voice input (such as navigating to the company), the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time to instruct the terminal to perform the corresponding operation.

Figure 6 is an interaction schematic diagram of the first human-computer interaction method provided by the embodiment of the present application. As shown in Figure 6, in response to the user's voice input of "Go to work", the terminal asks "Do you want to navigate to the company?", the user voice replies "Yes", and in response to the user's reply, the terminal displays Go to via the user interface. Company route. Among them, the terminal shown in Figure 6 asks the user "Do you want to navigate to the company" through voice broadcasting is only an example, and should not constitute any limitation on the embodiment of the present application. In other embodiments, the terminal can also ask the user "Do you want to navigate to the company" through a prompt box (such as a toast), or can also ask the user "Do you want to navigate to the company" through a prompt box (such as a toast) plus a voice broadcast? Navigate to the company?"

Optionally, the terminal can also prompt the user to directly use the predefined first wake-up-free instruction next time through a prompt box and/or a voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time" through voice broadcast.

The process of the terminal obtaining the second wake-up-free instruction based on learning the voice input will be described in detail below.

Optionally, before receiving the first voice input from the user, the above method further includes: receiving a second voice input from the user; and if the second voice input is semantically similar to the first wake-up-free instruction, confirming the second voice input to the user. Semantics of the second voice input; in response to the user's operation of confirming the semantics of the second voice input, generating a second wake-up-free instruction corresponding to the second voice input.

For example, after receiving the second voice input from the user, the terminal determines whether the predefined first wake-up-free instruction contains instructions that are semantically similar to the above-mentioned second voice input. For example, semantic analysis of the two can be performed based on NLP. , if it is determined that the above-mentioned second voice input has similar semantics to a certain predefined first wake-up-free instruction, the user is asked whether the above-mentioned semantics is correct. If the user replies that the above-mentioned semantics is correct, the terminal generates a message corresponding to the second voice input. The second wake-up-free command. In addition, the terminal can also save the second voice input in the command library.

Optionally, receiving the second voice input from the user includes: receiving the above-mentioned second voice input multiple times continuously within a preset time range. That is, if the terminal receives the above-mentioned second voice input multiple times continuously within a preset time range, the terminal then confirms the semantics of the second voice input to the user. In this way, it can effectively prevent the user from mistakenly mentioning the second voice input in the chat conversation, causing the terminal to respond, thereby improving the user's experience.

For example, if the user says "Go to work" twice in a row within 1 minute, and "Go to work" is semantically similar to the predefined first wake-up-free instruction "Navigate to the company", then the terminal continuously receives the above voice input. Afterwards, the user is asked "Do you want to navigate to the company" through a prompt box and/or voice broadcast (for example, see Figure 6), and in response to the user's confirmation operation, the route to the company is displayed through the user interface. . In addition, the terminal can also prompt the user to directly use the first wake-up-free command next time through a prompt box and/or voice broadcast. As shown in Figure 6, the terminal prompts the user to "try navigating to the company next time" through voice broadcast.

FIG. 7 is a schematic flowchart of learning speech input terms provided by an embodiment of the present application.

Step 701: Receive second voice input from the user.

In response to the user's voice operation, the terminal receives a second voice input from the user. For example, the second voice input includes: "Go to work", "Is there a traffic jam on the road?", "Avoid the congested road", "Choose a smooth road", etc., which will not be listed here.

Step 702: Determine whether the second voice input is semantically similar to the predefined first wake-up-free instruction.

After receiving the second voice input from the user, the terminal determines whether the predefined first wake-up-free instruction contains an instruction semantically similar to the above-mentioned second voice input. If it is determined that the predefined first wake-up-free instruction does not include the above-mentioned If the second voice input has semantically similar instructions, step 703 is executed, that is, the second voice input is not responded to; if the second voice input is semantically similar to a first wake-up-free instruction, step 704 is executed, that is, the user is asked Whether the above-mentioned second speech input is the above-mentioned semantics.

Step 703: Do not respond to the second voice input.

Step 704: Ask the user whether the second voice input is the above semantics.

If the user's reply to the second voice input is not the above-mentioned semantics, the terminal does not respond to the above-mentioned second voice input; if the user's reply to the above-mentioned second voice input is the above-mentioned semantics, the terminal executes step 705.

The terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method.

Step 705: Generate a second wake-up-free instruction and respond to the second voice input.

If the user replies that the second voice input has the above semantics, the terminal determines the second voice input as the second wake-up-free command, saves it in the command library, and responds to the second voice input.

Optionally, the terminal can also prompt the user to directly use the first wake-up-free instruction next time through a prompt box and/or voice broadcast. For examples of the method shown in Figure 7, please refer to the relevant examples of step 502, which will not be listed here.

FIG. 8 is another schematic flowchart of learning speech input terms provided by an embodiment of the present application. The method shown in Figure 8 is a method in which the terminal triggers an inquiry to the user after receiving the second voice input multiple times in succession.

Step 801: Receive second voice input from the user.

Step 802: Determine whether the second voice input is received multiple times continuously.

After receiving the second voice input from the user, the terminal determines whether the second voice input is received multiple times continuously within a preset time range. If the terminal receives the second voice input multiple times continuously within the preset time range, step 804 is executed again; otherwise, the terminal executes step 803, that is, it does not respond to the second voice input.

Step 803: Do not respond to the second voice input.

Step 804: Ask the user whether the second voice input is the above semantics.

If the user's reply to the second voice input is not the above semantics, the terminal does not respond to the above second voice input; if the user's reply to the second voice input is the above semantics, the terminal executes step 805.

Step 805: Generate a second wake-up-free instruction and respond to the second voice input.

Optionally, the terminal can also prompt the user to directly use the predefined first wake-up-free phrase next time through a prompt box and/or a voice broadcast.

Figure 9 is a schematic flow chart of the second human-computer interaction method provided by the embodiment of the present application.

As shown in Figure 9, the method 900 may include step 901 and step 902. Each step shown in Figure 9 will be described in detail below.

Step 901: Receive first voice input from the user.

Exemplarily, in response to the user's voice operation, a first voice input from the user is received. The first voice input may be, for example, "navigate to location A", "departure to location A", "I want to go to location A", "Play song B", "I want to listen to song B", etc., the embodiment of the present application does not place any limitation on the specific content of the first voice input.

Step 902: If the preset wake-up word is not received but the first voice input contains the target object, make a corresponding response to the first voice input.

Wherein, the above-mentioned target object is an object whose number of mentions reaches a preset threshold in other voice inputs received before the first voice input, and the above-mentioned preset wake-up word is used to wake up the terminal. More specifically, the above-mentioned preset wake-up word The wake-up word is used to wake up the voice assistant (or smart assistant, smart assistant, etc., this application does not limit this) in the terminal.

In other words, when the terminal receives the first voice input without being awakened in advance, if the first voice input contains the target object, it will respond accordingly to the first voice input; if the first voice input does not contain the The target object then does not respond to the first voice input.

Optionally, the above-mentioned target object may be, for example, a location, a media name (such as a song title), or an artist name, etc. This application does not limit the specific content of the target object.

The process of determining the target object, that is, the process of learning the first object in the speech input will be described in detail below.

Optionally, before receiving the first voice input from the user, the above method further includes: receiving a preset wake-up word from the user; receiving a second voice input from the user; and the first object included in the second voice input. When the number of times mentioned in the second voice input and its previous voice input exceeds the preset threshold, the first object is determined as the target object.

The first object may be, for example, a location, a media name (such as a song title), an artist name, etc. This application does not limit the specific content of the target object.

Exemplarily, based on the preset wake-up word received from the user, after the terminal is awakened and receives the second voice input from the user, it determines whether the second voice input contains the first object. In the above-mentioned second When the first object is included in the voice input, determine the number of mentions of the first object. If the number of mentions of the first object in the current second voice input and the previously received voice input exceeds a preset threshold, Then the above-mentioned first object is determined as the target object, so that next time the user directly speaks voice input containing the above-mentioned target object, the terminal can make a corresponding response. For example, if the terminal determines location A as the target object, there is no need to wake up the terminal in advance next time. The user directly inputs "navigate to location A" by voice. After the terminal receives the above voice input and determines that the above voice input contains location A, the user The interface displays the route to location A. In this way, the user does not need to wake up the terminal in advance next time, which simplifies the interaction process and helps improve the user experience.

In addition, the terminal may record the number of times the first object is mentioned in the voice input, and each time the first object is mentioned, the corresponding number is incremented by 1.

Optionally, the terminal can also generate a wake-up-free instruction including the target object based on the target object; prompt the user with the wake-up-free instruction, so that the user can directly use the above wake-up-free instruction next time to control the terminal to perform the corresponding operation.

Among them, the terminal can prompt the user with the above-mentioned wake-up-free instruction through a prompt box and/or voice broadcast.

For example, the terminal gives a voice prompt to the user, "Next time, try direct navigation to location A", where location A is the target object.

Figure 10 is an interaction schematic diagram of the second human-computer interaction method provided by the embodiment of the present application. As shown in Figure 10, in response to the user's voice input operation of "Xiaoyi Xiaoyi", the terminal replies "I am here", that is, the terminal is awakened. Further, in response to the user's voice input operation of "navigate to location A", the terminal replies "OK, let's start navigating for you", and the terminal displays the route to location A through the user interface. If the number of times location A appears in the current voice input and its previous voice input exceeds the preset threshold, the terminal can prompt the user through voice broadcasting to "try next time and just use navigation to go to location A." In other words, next time the user does not need to wake up the terminal in advance and directly inputs "Navigation to location A" by voice, the terminal can display the route to location A through the user interface.

FIG. 11 is a schematic flowchart of learning the first object in the second voice input provided by an embodiment of the present application.

Step 1101: Receive a preset wake-up word from the user.

The above preset wake-up words are used to wake up the terminal, and more specifically, are used to wake up the voice assistant in the terminal.

Step 1102: Receive second voice input from the user.

After the terminal is awakened, it receives the second voice input from the user. For example, the second voice input includes: "Navigate to location A", "Leave to location A", "I want to go to location A", etc., which are not listed here one by one.

Step 1103: Determine whether the second voice input includes the first object.

The first object includes, for example, but is not limited to: location, media name (such as song title) or artist name, etc.

For example, after receiving the second voice input, the terminal determines whether the second voice input contains the first object (such as location A). If the second voice input does not contain the first object, then step 1104 is executed; if the second voice input contains the first object, then Execute step 1105.

Step 1104, respond to the first voice input.

Step 1105: Determine whether the number of times the first object is mentioned in the current voice input and its previously received voice input exceeds a preset threshold.

If the number of mentions of the first object in the current voice input and other previously received voice inputs does not exceed the preset threshold, step 1104 is executed, that is, responding to the second voice input; if the first object is mentioned in the current voice input If the number of mentions in other previously received voice inputs exceeds the preset threshold, step 1106 is executed.

Step 1106: Determine the first object as the target object.

In addition, the terminal can also generate a wake-up-free instruction based on the target object, and prompt the user to directly use the above wake-up-free instruction next time.

Optionally, the terminal can prompt the user to directly use the above wake-up-free instruction next time through a prompt box and/or voice broadcast.

Based on the above technical solution, when the terminal is not awakened in advance, after receiving the first voice input from the user, if the first voice input contains an object whose number of mentions in the previous voice input reaches the preset threshold, then the It responds accordingly, that is, by learning from previous voice inputs and saving target objects whose mention times have reached a preset threshold, as long as the received voice input contains the above target objects, even if the terminal is not woken up in advance, The terminal can also respond accordingly, which saves the time to wake up the terminal, simplifies the interaction process, and helps improve the user's interaction experience.

Figure 12 is a schematic flow chart of the third human-computer interaction method provided by the embodiment of the present application.

As shown in Figure 12, the method 1200 may include step 1201 and step 1202. Each step shown in Figure 12 will be described in detail below.

Step 1201: Receive a first voice input from a user. The first voice input belongs to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.

The above-mentioned predefined wake-up-free command is used to instruct the terminal to perform operations corresponding to the wake-up-free command without inputting a preset wake-up word.

One possible implementation method is that the instruction library pre-stores a predefined first instruction set and a second instruction set. The instructions in the first instruction set are semantically similar to the predefined wake-up-free instructions, and the instructions in the second instruction set It is a predefined wake-up-free command. The terminal receives the first voice input and determines that the first voice input belongs to the first instruction set.

Another possible implementation is that the instruction library pre-stores a predefined second instruction set and a first instruction set learned based on voice input that corresponds to the instructions in the second instruction set, and the terminal receives the first voice input , determining that the first voice input belongs to the first instruction set. For the method for the terminal to learn the first instruction set corresponding to the instructions in the second instruction set based on voice input, please refer to the relevant descriptions in FIG. 5 and FIG. 9 and will not be described again here.

Table 1 is an example of the first instruction set and the second instruction set pre-stored in the instruction library.

Table 1

As shown in Table 1, the instructions in the second instruction set are predefined wake-up-free instructions, such as "check whether there is congestion", "slide down", "reduce the page", "navigate to the company", "navigate home" ", etc. The instructions in the first instruction set are semantically similar to the instructions in the second instruction set, such as "Is there a traffic jam on the road?", "Scroll down", "Zoom out", "Go to work", "I want to go home" "wait. It can be seen that the instructions in the first instruction set are semantically similar to the instructions in the second instruction set, but the terms are different. The instructions in the first instruction set are more colloquial, while the instructions in the second instruction set are standard human-computer interaction. instruction.

It should be understood that the above division of instructions is only an example and shall not constitute any limitation on the embodiments of the present application. In other embodiments, different division forms may also be used. For example, the first instruction set may be further divided into a first instruction sub-set. 1. First instruction sub-set 2. The instructions in the first instruction sub-set 2 are more colloquial than the instructions in the first instruction sub-set 1. The conditions for the terminal to respond to the instructions in the first instruction subset 2 are stricter than the conditions for responding to the instructions in the first instruction subset 1 .

Step 1202: If the preset conditions are met, respond to the above-mentioned first voice input.

Optionally, the above preset conditions include at least one of the following: the number of users within a preset range from the terminal does not exceed a threshold; the user is in a predefined position; the user from whom the first voice input comes does not belong to the preset group ; Or, the time when the first voice input is received falls within a preset period.

Wherein, the number of users within the preset range from the terminal does not exceed the threshold, that is, when the number of users within the preset range from the terminal is small, it is not difficult to respond to the first voice input. It is understood that if the number of surrounding users is small, the possibility of the user mistakenly mentioning the first voice input is smaller, that is, the user may really want the terminal to perform the corresponding operation. In contrast, if the number of surrounding users is large, the user The greater the possibility of mistakenly referring to the first speech input. Therefore, the above preset conditions can effectively prevent the user from mistakenly mentioning the first voice input, causing the terminal to respond.

The user is in a predefined position. For example, the terminal responds to the first voice input from the user closest to the user, or the user is in a scenic spot and hopes that the terminal is more likely to improve service. The terminal responds to the first voice input from the user.

The user from whom the first voice input comes does not belong to the preset group, such as children, the elderly, etc. It is understandable that for the preset group, instructions issued by them may be dangerous, and the terminal may not respond to them.

The time when the first voice input is received falls within a preset period. The preset period may be, for example, working hours (or commuting hours). During these periods, the terminal can respond to the above-mentioned first voice input. If it is other time periods, the terminal can only respond Predefined wake-up-free instructions.

The following takes the above method applied to a car as an example (for example, the terminal uses a car machine as an example), and enumerates the response of the terminal to the first voice input in the above scenarios.

Scenario 1: When there is one passenger in the car, the car machine responds to the first voice input; or, when there are multiple passengers in the car, the car machine does not respond to the first voice input.

For example, the car computer can determine the number of people currently in the car based on the camera in the car. When there is one passenger in the car, that is, when there is only the driver in the car, the car computer responds to the first voice input. When there are multiple passengers in the car, the car machine does not respond to the first voice input. In addition, the vehicle engine can respond to the instructions in the second instruction set even when there are one or more passengers in the vehicle. In this way, the possibility of accidentally waking up the car during a chat conversation can be greatly reduced when there are multiple passengers in the car.

Scenario 2: When the voice input comes from the main driver, the car machine responds to the first voice input; or when the voice input comes from other passengers other than the main driver, the car machine does not respond to the first voice input.

For example, after the car machine receives the first voice input, it can obtain whether the first voice input comes from the driver or other passengers based on the interaction with the seat. If the first voice input comes from the driver, then The car machine responds to the first voice input; if the first voice input comes from other passengers, the car machine does not respond to the first voice input. In addition, the vehicle machine can respond to instructions in the second instruction set whether from the main driver or other passengers.

Scenario 3: When the user from whom the first voice input comes does not belong to the preset group, the car machine responds to the first voice input; or, when the user from whom the first voice input comes belongs to the preset group, the car machine Does not respond to first voice input.

For example, the car machine can determine whether the first voice input comes from a preset group of people, taking a child as an example. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the first voice input comes from a child, the car machine will not respond to the first voice input. If the voice input does not come from a child, the car machine responds to the first voice input. In this way, it can effectively avoid the situation where the child mistakenly speaks the first voice input and causes the car machine to respond.

Scenario 4: When the time when the voice input is received falls within the preset time period, the car machine responds to the first voice input; or when the time when the voice input is received does not fall within the preset time period, the car machine does not respond. First voice input.

For example, the preset time period is the working period. If the vehicle machine receives the first voice input during the working period, it can respond to the first voice input; if the vehicle machine receives the first voice input during the non-working period, It is possible not to respond to the first voice input.

It should be understood that in the several scenarios mentioned above, when the terminal determines to respond to the first voice input, it can first determine the semantics of the voice input to the user, and in response to the user's operation to confirm the above semantics, respond to the above first voice input .

It should also be understood that the above possible scenarios can also be combined. For example, the vehicle responds to the first voice input when the first voice input comes from the driver and the time when the voice input is received falls within a preset period. . For another example, the car machine responds to the first voice input when there is only one passenger in the car and the time when the first voice input is received falls within a preset time period. For the sake of brevity, they are not listed here.

Figure 13 is a schematic flowchart of determining whether to respond to the first voice input according to the scenario provided by the embodiment of the present application. The method described in Figure 13 is a combination of scenario two and scenario four.

Step 1301: Receive first voice input from the user.

Without waking up the car machine in advance, in response to the user's operation of inputting the first voice input, the car machine receives the first speech from the user. sound input. The first voice input belongs to a first instruction set, and instructions in the first instruction set are semantically similar to predefined wake-up-free instructions.

Step 1302: Determine whether the first voice input comes from the driver.

After the vehicle computer receives the first voice input, it determines whether the first voice input comes from the driver. If the first voice input does not come from the driver, the vehicle computer executes step 1303; if the first voice input comes from the driver If you are driving, perform step 1304.

Step 1303, do not respond to the first voice input.

If the first voice input does not come from the driver, the vehicle machine does not respond to the first voice input. In addition, the vehicle machine can respond to instructions in the second instruction set from the user.

Step 1304: Determine whether the time when the first voice input is received falls within a preset period.

If the first voice input comes from the driver, the vehicle computer continues to determine whether the time when the first voice input is received falls within the preset period. If the time when the first voice input is received falls within the preset period, the vehicle machine may execute step 1305; if the time when the first voice input is received does not fall within the preset period, the vehicle machine may execute step 1306.

Step 1305, respond to the first voice input.

If the time when the first voice input is received falls within a preset period, the vehicle machine can respond to the first voice input.

Step 1306: Respond to the first voice input, but need to ask the user.

If the time when the first voice input is received does not fall within the preset period, the car machine can respond to the first voice input, but before responding to the first voice input, the user needs to confirm the semantics of the first voice input. When the semantics are confirmed, respond to the first voice input.

Figure 14 is a schematic flowchart of the fourth human-computer interaction method provided by an embodiment of the present application.

As shown in Figure 14, the method 1400 may include step 1401 and step 1402. Each step shown in Figure 14 will be described in detail below.

Step 1401: Receive the first voice input from the user. The first voice input does not belong to the predefined wake-up-free instructions.

Exemplarily, in response to the user's voice operation, the first voice input from the user is received. The first voice input may be, for example, "leave to the company", "navigate to the company", "leave to work", etc. This application implements The specific content of voice input is not limited in any way.

Step 1402: If the first voice input is semantically similar to the first no-wake-up instruction among the predefined no-wake-up instructions, guide the user to input the first no-wake-up instruction.

The terminal determines that the first voice input and the first wake-up-free instruction have similar semantics, and then guides the user to input the first wake-up-free instruction so that the terminal responds to the first wake-up-free instruction.

The terminal may determine based on semantic analysis in natural language processing that the first voice input is semantically similar to the first wake-up-free instruction.

For example, the above voice input is "Go to work", and the first wake-up-free instruction with similar semantics is "Navigate to the company". After the terminal receives the voice input of "Go to work", it determines that the above voice input does not occur. It belongs to the predefined wake-up-free instructions, and it is recognized that the semantics of the above voice input are similar to "navigate to the company". Therefore, the terminal can guide the user to say "Navigate to the company".

Optionally, the above-mentioned guiding the user to input the first wake-up-free instruction includes: guiding the user to input the first wake-up-free instruction through a prompt box and/or voice broadcast.

The terminal can guide the user to issue the first wake-up-free instruction through a prompt box. For example, after the terminal determines that the voice input has similar semantics to the first wake-up-free instruction in the command library, the terminal displays the first wake-up-free instruction on the user interface through a prompt box. The terminal can also guide the user to issue the first wake-up-free instruction through voice broadcast. For example, after the terminal determines that the above-mentioned voice input has similar semantics to the first no-wake-up command in the command library, the terminal reminds the user by voice to use the above-mentioned first no-wake-up command. The terminal can guide the user to issue the first wake-up-free command through a prompt box and voice broadcast.

For example, after the terminal determines that the above-mentioned voice input has similar semantics to the first wake-up-free command in the command library, the terminal first displays the above-mentioned first wake-up-free command on the user interface through a prompt box. If the user has not issued a wake-up command within the preset time range, The above first wake-up-free command will eventually The user is prompted with a voice to use the first wake-up-free command, or the prompt box displays the first wake-up-free command on the user interface, and at the same time, the voice prompts the user to use the first wake-up-free command. This application does not limit the terminal boot method.

Optionally, guiding the user to issue the first wake-up-free instruction through a prompt box and/or voice broadcast, including: prompting the user to issue the first wake-up-free instruction through a prompt box, the prompt box containing the first wake-up-free instruction; When the number of prompts within the range reaches the preset threshold, but the user does not issue the first wake-up-free command, a voice broadcast is used to guide the user to issue the first wake-up-free command.

For example, the terminal prompts the user to issue the first wake-up-free instruction through a prompt box for the first time, and the prompt box contains the first wake-up-free instruction. The second time it prompts the user to issue the first wake-up-free instruction through the prompt box again, within 1 minute. When the number of prompts through the prompt box reaches two, but the user does not issue the first wake-up-free command, the user is guided to issue the first wake-up-free command through voice broadcast.

Figure 15 is an interactive schematic diagram for guiding a user to issue a first wake-up-free instruction provided by an embodiment of the present application.

In response to the user's voice input of "Go to work", the terminal determines that the voice input has similar semantics to the first wake-up-free command in the command library, "Navigate to the company". Therefore, the user is prompted through the prompt box for the first time to " Try using the navigation method to go to the company. The second time the user still uses "leave to work". The terminal continues to prompt the user through the prompt box to "try using the navigation method to go to the company." The third time the user still uses "leave to work". The terminal prompts the user through a prompt box to "try using the navigation system to go to the company" and prompts the user through a voice prompt to "try using the navigation system to go to the company."

Based on the above technical solution, the terminal receives the first voice input. The first voice input does not belong to the predefined wake-up-free instructions, but the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions. Then the terminal guides the user to input the corresponding first wake-up-free command, so that after the user inputs the first wake-up-free command, the terminal responds accordingly. Compared with the terminal not responding or prompting, the user's interactive experience can be greatly improved. .

Figure 16 is a schematic flow chart of the fifth human-computer interaction method provided by the embodiment of the present application.

As shown in Figure 16, the method may include steps 1601 to 1605. Each step shown in Figure 16 will be described in detail below.

Step 1601: Receive first voice input from the user.

In response to the user's operation of inputting the first voice input, the terminal receives the first voice input from the user. The first voice input is received without receiving a preset wake-up word from the user. For example, the first voice input includes: "Navigate to location A", "I want to go to location A", "Place A" "Where" and so on, I won't list them one by one here.

Step 1602: Determine whether the first voice input is used to request navigation.

In other words, after receiving the first voice input, the terminal determines whether the intention of the first voice input is to request navigation. If the first voice input is not used to request navigation, the terminal performs step 1603; if the first voice input is used to request navigation, the terminal performs step 1604.

Step 1603: Do not respond to the first voice input.

Step 1604: Ask the user for the destination requesting navigation.

If the first voice input is for requesting navigation, the terminal inquires the user about the destination for requesting navigation. For example, if the voice input is "Navigate to location A", then the terminal receives and determines that the first voice input is for navigation. Further, the terminal asks the user for the navigation destination, such as asking the user "where do you want to go." The user feedbacks "Place A", and after receiving the user's feedback, the terminal obtains the route to Location A from the cloud.

The terminal can ask the user through voice broadcast, or through a prompt box (such as toast), or through a prompt box (such as toast) plus voice broadcast. This application does not place any restrictions on the terminal's query method. For example, the terminal asks the user through a prompt box (such as toast) for the first two times, and the third time uses a prompt box (such as toast) plus voice broadcast to ask the user.

Step 1605: Provide navigation services to the user based on the destination fed back by the user.

After obtaining the route to the above destination, the terminal provides navigation services to the user. For example, display directions to a destination through the user interface.

Optionally, the terminal can also generate a wake-up-free instruction including the above destination based on the destination. The terminal can also prompt the user through prompt boxes and/or voice broadcasts that the above-mentioned wake-up-free instruction can be used directly next time.

Figure 17 is an interaction schematic diagram of the fifth human-computer interaction method provided by the embodiment of the present application. As shown in Figure 17, in response to the user's voice input operation of "navigate to location A", the terminal asks the user "where do you want to go" through voice broadcast. The user replies "Place A", and in response to the user's reply, the terminal displays the route to the location A to the user through the user interface.

Based on the above technical solution, without waking up the terminal in advance, after the terminal receives the first voice input from the user and finds that its intention is to request navigation, it can ask the user for the navigation destination and determine the purpose according to the user's feedback. to provide navigation services to users, There is no need to wake up the terminal in advance, which simplifies the interaction process and helps improve the user's interaction experience.

An embodiment of the present application also provides a terminal, which includes corresponding modules for performing the steps performed by the terminal in any one of the embodiments described in FIGS. 5 to 17 . The terminal can be used to implement the method described in any of the embodiments described in Figures 5 to 17. The modules included in the terminal can be implemented by software and/or hardware.

An embodiment of the present application also provides a terminal, which includes a memory and a processor, wherein the memory is used to store a computer program, and the processor is used to call and execute the computer program, so that the terminal implements the implementation described in Figures 5 to 17. The method described in any of the examples.

An embodiment of the present application also provides a vehicle, on which a terminal as described above is deployed. The terminal may be a vehicle machine, for example.

This application also provides a chip system, which includes at least one processor and is used to implement the method described in any one of the embodiments described in FIGS. 5 to 17 .

The chip system can be composed of chips or include chips and other discrete devices.

This application also provides a computer program product. The computer program product includes computer-readable instructions. When the computer-readable instructions are run by a computer, any one of the embodiments described in FIGS. 5 to 17 can be implemented. the method described.

This application also provides a computer-readable storage medium that stores computer-readable instructions. When the computer readable instructions are executed by the computer, the method described in any one of the embodiments described in FIGS. 5 to 17 is implemented.

It should be understood that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software. The above-mentioned processor can be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA), or other available processors. Programmed logic devices, discrete gate or transistor logic devices, discrete hardware components. Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.

It should also be understood that the memory in the embodiments of the present application may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory. Among them, non-volatile memory can be read-only memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically removable memory. Erase electrically programmable read-only memory (EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlinkDRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.

The terms "unit", "module", etc. used in this specification may be used to refer to computer-related entities, hardware, firmware, a combination of hardware and software, software, or software in execution.

Those of ordinary skill in the art will appreciate that the various illustrative logical blocks and steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or a combination of computer software and electronic hardware. accomplish. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application. In the several embodiments provided in this application, it should be understood that the disclosed devices, equipment and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or can be integrated into another system, or some features can to ignore or not execute. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.

The units described as discrete components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.

In the above embodiments, the functions of each functional unit may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented using software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). When the computer program instructions (program) are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable device. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated. The available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., digital video discs (DVD)), or semiconductor media (e.g., solid state disks (SSD) )wait.

If the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Claims

A human-computer interaction method, characterized by including:

receiving first voice input from the user;

When it is determined that the first voice input is semantically similar to the predefined first wake-up-free instruction, a corresponding response is made to the first voice input, and the first wake-up-free instruction is used to perform the preset operation without inputting In the case of a wake-up word, the terminal is instructed to perform the operation corresponding to the first wake-up-free instruction.
The method of claim 1, wherein before receiving the first voice input from the user, the method further includes:

receiving second voice input from the user;

When the second voice input is semantically similar to the first wake-up-free instruction, confirm the semantics of the second voice input to the user;

In response to the user's operation of confirming the semantics of the second voice input, a second wake-up-free instruction corresponding to the second voice input is generated.
The method of claim 2, wherein the first voice input is semantically similar to a predefined first wake-up-free instruction, including:

The first voice input is the same as the second wake-up-free instruction.
The method of claim 2 or 3, wherein receiving the second voice input from the user includes:

The second voice input is received multiple times continuously within a preset time range.
The method according to any one of claims 1 to 4, characterized in that, before making a corresponding response to the first voice input, the method further includes:

Confirming the semantics of the first speech input to the user.
The method of claim 5, wherein confirming the semantics of the first voice input to the user includes:

Confirm the semantics of the first voice input to the user through prompt boxes and/or voice broadcasts.
The method according to any one of claims 1 to 6, characterized in that the method further includes:

Prompt the first wake-up-free instruction to the user.
A human-computer interaction method, characterized by including:

receiving first voice input from the user;

When the preset wake-up word is not received but the first voice input contains a target object, make a corresponding response to the first voice input, and the target object is before the first voice input. For objects whose mention times reach a preset threshold in other received voice inputs, the preset wake-up word is used to wake up the terminal.
The method of claim 8, wherein before receiving the first voice input from the user, the method further includes:

receiving a preset wake word from the user;

receiving second voice input from the user;

When the number of times the first object contained in the second voice input is mentioned in the second voice input and its previous voice input exceeds the preset threshold, the first object is determined to be target.
The method of claim 9, further comprising:

Based on the target object, generate a wake-up-free instruction including the target object;

Prompt the user for the wake-up-free instruction.
A human-computer interaction method, characterized by including:

Receive a first voice input from the user, the first voice input belongs to a first instruction set, and the instructions in the first instruction set are semantically similar to predefined wake-up-free instructions;

When the preset conditions are met, respond to the first voice input.
The method of claim 11, wherein the preset conditions include at least one of the following:

The number of users within a preset distance from the terminal does not exceed the threshold;

The user is in a predefined position;

The user from whom the first voice input comes does not belong to the preset group; or,

The time when the first voice input is received falls within a preset period.
The method of claim 12, wherein the method is applied to a car, and the number of users within a preset range from the terminal does not exceed a threshold, including: there is one passenger in the car; or,

The user is in a predefined position, including: the user is in a driving position.
A human-computer interaction method, characterized by including:

In the case where the preset wake-up word from the user is not received, determine based on the first voice input from the user that the first voice input is used to request navigation;

Asking the user for a destination for which navigation is requested;

Based on the destination fed back by the user, a navigation service is provided for the user.
The method of claim 14, further comprising:

Generate wake-up-free instructions containing the destination;

Prompt the user for the wake-up-free instruction.
A human-computer interaction method, characterized by including:

Receive the first voice input from the user, the first voice input does not belong to the predefined wake-up-free instructions;

When it is determined that the first voice input is semantically similar to the first wake-up-free instruction among the predefined wake-up-free instructions, the user is guided to input the first wake-up-free instruction.
The method of claim 16, wherein said guiding the user to input the first wake-up-free instruction includes:

The user is guided to input the first wake-up-free instruction through a prompt box and/or a voice broadcast.
The method according to claim 17, wherein the guiding the user to input the first wake-up-free instruction through a prompt box and/or a voice broadcast includes:

The user is prompted to input the first wake-up-free instruction through the prompt box, and the prompt box contains the first wake-up-free instruction;

When the number of prompts through the prompt box within the preset time range reaches the preset threshold, but the user does not issue the first wake-up-free instruction, the user is guided to input the third wake-up-free instruction through the voice broadcast. One-free wake-up command.
A computer device, characterized by comprising a unit for executing the method according to any one of claims 1 to 18.
A computer device, characterized by including a processor and a memory, wherein,

The memory is used to store computer readable instructions;

The processor is configured to read the computer readable instructions, so that the computer device implements the method according to any one of claims 1 to 18.
A vehicle, characterized in that it is used to implement the method according to any one of claims 1 to 18; or includes the computer device according to claim 19 or 20.
A computer-readable storage medium, characterized in that computer-readable instructions are stored in the storage medium. When the computer-readable instructions are executed by a computer, the method of any one of claims 1 to 18 is implemented. method.
A computer program product, characterized in that the computer program product includes computer-readable instructions, and when the computer-readable instructions are run by a computer, the method according to any one of claims 1 to 18 is implemented.