WO2024055831A1 - Voice interaction method and apparatus, and terminal - Google Patents

Voice interaction method and apparatus, and terminal Download PDF

Info

Publication number
WO2024055831A1
WO2024055831A1 PCT/CN2023/114613 CN2023114613W WO2024055831A1 WO 2024055831 A1 WO2024055831 A1 WO 2024055831A1 CN 2023114613 W CN2023114613 W CN 2023114613W WO 2024055831 A1 WO2024055831 A1 WO 2024055831A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
terminal
mouth
breath
voice
Prior art date
Application number
PCT/CN2023/114613
Other languages
French (fr)
Chinese (zh)
Inventor
王石磊
Original Assignee
荣耀终端有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 荣耀终端有限公司 filed Critical 荣耀终端有限公司
Publication of WO2024055831A1 publication Critical patent/WO2024055831A1/en

Links

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01DMEASURING NOT SPECIALLY ADAPTED FOR A SPECIFIC VARIABLE; ARRANGEMENTS FOR MEASURING TWO OR MORE VARIABLES NOT COVERED IN A SINGLE OTHER SUBCLASS; TARIFF METERING APPARATUS; MEASURING OR TESTING NOT OTHERWISE PROVIDED FOR
    • G01D21/00Measuring or testing not otherwise provided for
    • G01D21/02Measuring two or more variables by means not covered by a single other subclass
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present application belongs to the field of human-computer interaction technology, and in particular relates to a voice interaction method, device and terminal.
  • Voice interaction is a new generation of interaction mode based on voice input. Based on the voice information input by the user to the terminal, feedback results corresponding to the input voice information can be obtained.
  • the voice interaction system (such as a voice assistant) on the terminal must first be awakened.
  • the voice assistant can be awakened through a specific wake-up word.
  • the user can conduct voice interaction with the terminal.
  • the terminal outputs the feedback result corresponding to the voice. Then, the user can speak the next voice, thus realizing a continuous dialogue with the terminal.
  • the current continuous dialogue function of the terminal is achieved by extending the radio time of the terminal. For example, after the terminal outputs the feedback result corresponding to the first voice message, the terminal continues to listen for a period of time, such as 10 seconds. If no voice signal is received within 10 seconds, the terminal will stop collecting; if a voice signal is received within 10 seconds, the terminal will continue to output feedback results for the received voice information. In this way, during the period when the terminal extends the listening period, if the user does not make any sound, but there are other people talking around, the terminal will continue to provide feedback on what other people say, which will cause trouble and disgust to the user. Affect user experience.
  • This application provides a voice interaction method, device and terminal, which can solve the problem of incorrect responses to other people or other surrounding noises during the period when the terminal extends the sound collection.
  • the present application provides a voice interaction method, which method includes: detecting a wake-up instruction to initiate voice interaction; responding to the wake-up instruction, entering the working state of voice interaction; detecting the first voice information; and outputting the corresponding The feedback result of the first voice information; if the second voice information is detected within the preset time period, the user's breath is detected; if the user's breath is detected, the feedback result for the second voice information is output.
  • the method further includes: determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, then The working state of the voice interaction is extended by the preset duration; if it is determined that the terminal is not close to the user's mouth, the working state of the voice interaction is ended.
  • the terminal Before performing user breath detection, it is first determined whether the terminal is close to the user's mouth. If it is determined If the terminal is close to the user's mouth, the sound collection time will be extended; if it is determined that the terminal is not close to the user's mouth, the sound collection will be ended directly. This can greatly reduce the power consumption problem caused by radio.
  • the method further includes: determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, detecting The user breathes; if it is determined that the terminal is not close to the user's mouth, the working state of voice interaction is ended.
  • the second voice information when the second voice information is detected, it is first determined whether the terminal is close to the user's mouth, and then it is determined whether to detect the user's breath. If the terminal is not close to the user's mouth, it is considered that the second voice message is not the sound made by the user, and there is no need to detect the user's breath.
  • determining whether the terminal is close to the user's mouth includes: identifying the user's gesture in the working state of the voice interaction; If the user's gesture is a first gesture, it is determined that the terminal is close to the user's mouth, and the first gesture is used to represent that the user is holding the terminal in a stationary state; if the user's gesture is a second gesture, it is determined that the terminal is close to the user's mouth. gesture, it is determined that the terminal is not close to the user's mouth, and the second gesture is used to represent that the user is holding the terminal and moving away from the user's mouth.
  • the terminal 100 is still near the user's mouth by determining whether the user holds the terminal 100 away from the user's mouth.
  • the method includes: determining whether to output feedback for the first voice information. Before the result, whether a third gesture is recognized, the third gesture is used to represent that the user holds the terminal and approaches the user's mouth; if the third gesture is recognized, it is determined that the output for the After the feedback result of the first voice information, whether the terminal is still close to the user's mouth; if the third gesture is not recognized, the working state of voice interaction is ended.
  • the present application may first determine whether the user holds the terminal and approaches the user's mouth before outputting the feedback result for the first voice information. If it is determined that the user held the terminal closer to the user's mouth before outputting the feedback result for the first voice information, then it is then determined whether the terminal is still near the user's mouth after the feedback result for the first voice information is output.
  • the identifying the user's gestures in the working state of the voice interaction includes: obtaining the angular velocity and acceleration at different times in the working state of the voice interaction; using the The angular velocity, acceleration, and gesture recognition module at different times determine the user's gesture; wherein, the gesture recognition module is used to recognize that the user's handheld terminal is approaching toward the user's mouth, the user's handheld terminal is moving away from the user's mouth, or The user holds the terminal in a stationary state.
  • the gesture recognition module can be used to determine the user's gesture based on angular velocity and acceleration data at different times.
  • detecting the user's breath includes: inputting the second voice information into a breath recognition module, and the breath recognition module is used to identify whether the second voice information is the mouth of the user. Sounds emitted within a preset distance from the terminal; if the breath recognition module recognizes that the second voice information is a sound emitted within a preset distance from the user's mouth to the terminal, it is determined that the user is detected breath; if said breath When the recognition module recognizes that the second voice information is not a sound emitted by the user's mouth within a preset distance from the terminal, it determines that the user's breath is not detected.
  • the breath recognition module can be used to perform feature recognition on the second voice information to determine whether the second voice information is the sound produced by the user's mouth close to the terminal.
  • the terminal includes a pressure sensor
  • detecting the user's breath includes: obtaining the pressure value corresponding to the pressure sensor when the second voice information is collected; if the pressure value is greater than a predetermined If the pressure threshold is set, it is determined that the user's breath is detected; if the pressure value is less than or equal to the preset pressure threshold, it is determined that the user's breath is not detected.
  • the terminal includes a temperature sensor
  • detecting the user's breath includes: obtaining a first temperature and a second temperature, where the first temperature is before the second voice information is collected, The temperature corresponding to the temperature sensor, the second temperature is the temperature corresponding to the temperature sensor when the second voice information is collected; if the second temperature is greater than the first temperature, it is determined that the user is detected Breath; if the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
  • the terminal includes a humidity sensor
  • detecting the user's breath includes: obtaining the humidity corresponding to the humidity sensor when the second voice information is collected; if the humidity is greater than the preset humidity If the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
  • the terminal includes a carbon dioxide sensor
  • detecting the user's breath includes: obtaining the carbon dioxide concentration corresponding to the carbon dioxide sensor when the second voice information is collected; if the carbon dioxide concentration is greater than a predetermined If the carbon dioxide concentration threshold is set, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
  • this application can use pressure sensors, Temperature sensor, humidity sensor or carbon dioxide sensor to detect the user's breath.
  • the present application provides a voice interaction method, which method includes: detecting a wake-up instruction to initiate voice interaction; responding to the wake-up instruction, entering the working state of voice interaction; detecting the first voice information; and outputting the corresponding The feedback result of the first voice information; determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extend the working state of the voice interaction for a preset time; if it is detected within the preset time second voice information, then output the feedback result for the second voice information.
  • the present application provides a voice interaction method, which method includes: detecting a wake-up instruction to initiate voice interaction; responding to the wake-up instruction, entering the working state of voice interaction; detecting the first voice information; and outputting the corresponding The feedback result of the first voice information; if the second voice information is detected within the preset time period, determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, output a response to the first voice information. 2. Feedback results of voice information.
  • the present application provides a voice interaction device, which includes a processor; the processor is configured to detect a wake-up instruction that initiates voice interaction; and enter a working state of voice interaction in response to the wake-up instruction; Detect the first voice information; output the feedback result for the first voice information; if the second voice information is detected within the preset time period, detect the user's breath; if the user's breath is detected, output the feedback result for the second voice information. Feedback results of voice information.
  • the processor is further configured to output a feedback result for the first voice information. After that, determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extend the working state of the voice interaction by the preset duration; if it is determined that the terminal is not close The user's mouth ends the voice interaction working state.
  • the processor is further configured to determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, detect the user's breath; if it is determined that the terminal is not close The user's mouth ends the working state of voice interaction.
  • the processor is further configured to identify the user's gesture in the working state of the voice interaction; if the user's gesture is a first gesture, determine that the terminal is close to The user's mouth, the first gesture is used to indicate that the user is holding the terminal in a stationary state; if the user's gesture is the second gesture, it is determined that the terminal is not close to the user's mouth, the The second gesture is used to represent that the user is holding the terminal and moving away from the user's mouth.
  • the processor is further configured to determine whether to output the target for the user's mouth. Before the feedback result of the first voice information, whether the third gesture is recognized, the third gesture is used to represent that the user holds the terminal and approaches the user's mouth; if the third gesture is recognized, Then it is determined whether the terminal is still close to the user's mouth after outputting the feedback result for the first voice information; if the third gesture is not recognized, the working state of the voice interaction is ended.
  • the processor is also used to obtain the angular velocity and acceleration at different times in the working state of the voice interaction; using the angular velocity, acceleration and gesture recognition module at different times, determine The user's gesture; wherein, the gesture recognition module is used to recognize that the user holds the terminal toward the user's mouth, the user holds the terminal away from the user's mouth, or the user holds the terminal in a stationary state.
  • the processor is further configured to input the second voice information into a breath recognition module, and the breath recognition module is used to identify whether the second voice information is the mouth of the user. Sounds emitted within a preset distance from the terminal; if the breath recognition module recognizes that the second voice information is a sound emitted within a preset distance from the user's mouth to the terminal, it is determined that the user is detected Breath; if the breath recognition module recognizes that the second voice information is not a sound emitted by the user's mouth within a preset distance from the terminal, it is determined that the user's breath is not detected.
  • the terminal includes a pressure sensor
  • the processor is further configured to obtain a pressure value corresponding to the pressure sensor when the second voice information is collected; if the pressure value is greater than a predetermined If the pressure threshold is set, it is determined that the user's breath is detected; if the pressure value is less than or equal to the preset pressure threshold, it is determined that the user's breath is not detected.
  • the terminal includes a temperature sensor
  • the processor is further configured to obtain a first temperature and a second temperature, where the first temperature is before the second voice information is collected, The temperature corresponding to the temperature sensor, the second temperature is the temperature corresponding to the temperature sensor when the second voice information is collected; if the second temperature is greater than the first temperature, it is determined that the user is detected Breath; if the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
  • the terminal includes a humidity sensor
  • the processor is further configured to obtain the humidity corresponding to the humidity sensor when the second voice information is collected; if the humidity is greater than the preset humidity If the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
  • the terminal includes a carbon dioxide sensor
  • the processor is further configured to obtain the carbon dioxide concentration corresponding to the carbon dioxide sensor when the second voice information is collected; if the carbon dioxide concentration is greater than a predetermined If the carbon dioxide concentration threshold is set, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
  • the present application provides a voice interaction device, which includes a processor; the processor is configured to detect a wake-up instruction that initiates voice interaction; and enter a working state of voice interaction in response to the wake-up instruction; Detecting the first voice information; outputting a feedback result for the first voice information; determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extending the working state of the voice interaction for a preset time ; If it is determined that the terminal is not close to the user's mouth, the working state of voice interaction is ended.
  • the present application provides a voice interaction device, which includes a processor; the processor is configured to detect a wake-up instruction that initiates voice interaction; and enter a working state of voice interaction in response to the wake-up instruction; Detecting the first voice information; outputting a feedback result for the first voice information; if the second voice information is detected within a preset time period, determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth mouth, then output the feedback result for the second voice information.
  • the present application provides a terminal.
  • the terminal includes a memory and a processor; the memory is coupled to the processor; the memory is used to store computer program code, and the computer program code includes computer instructions.
  • the processor executes the computer instructions, the electronic device is caused to execute the method described in any one of the first to third aspects.
  • the present application provides a computer-readable storage medium in which computer programs or instructions are stored.
  • the computer programs or instructions are executed, as in the first to third aspects, Any of the methods described are executed.
  • the present application provides a computer program product.
  • the computer program product includes a computer program or instructions.
  • the computer program or instructions When the computer program or instructions are run on a computer, the computer causes the computer to perform any of the first to third aspects. The method described in 1.
  • the voice interaction method, device and terminal provided by this application can detect the user's breath and/or determine whether the terminal is close to the user's mouth, and recognize with a high probability that the user himself has the intention to continue voice interaction, effectively reducing the terminal's Error responses from other people or other surrounding noises improve the accuracy and user experience of voice interaction.
  • Figure 1 is an application scenario diagram of voice interaction provided by an embodiment of the present application
  • FIG. 2 is a hardware structure block diagram of the terminal 100 provided by the embodiment of the present application.
  • Figure 3 is a flow chart of a voice interaction method provided by an embodiment of the present application.
  • Figure 4 is a flow chart of a first implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
  • Figure 5 is a flow chart of a second implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
  • Figure 6 is a flow chart of a third implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
  • Figure 7 is a flow chart of a fourth implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
  • Figure 8 is a flow chart of a fifth implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
  • Figure 9 is a schematic structural diagram of a voice interaction device provided by an embodiment of the present application.
  • FIG 1 is an application scenario diagram of voice interaction provided by an embodiment of the present application.
  • the application scenario diagram includes a terminal 100 and a user 200.
  • the terminal 100 has a voice interaction function, and the user 200 can perform voice interaction with the terminal 100 .
  • a specific event needs to trigger the voice interaction function of the terminal so that the terminal 100 can enter the voice interaction working state.
  • triggering the voice interaction function of the terminal is waking up voice interaction.
  • the method of wake-up voice interaction can be wake-up word wake-up, long press the power button to wake up, click on the desktop voice assistant application, etc. This application does not limit this.
  • the user 200 can perform voice interaction with the terminal 100 .
  • the terminal 100 outputs a feedback result corresponding to the voice. For example: after the voice interaction function is awakened, user 200 says "How is the weather today?". After receiving the voice message "How is the weather today?" from user 200, the terminal 100 recognizes the voice information and outputs the same message as the voice message. Feedback corresponding to the piece of voice information, for example, the terminal 100 outputs "the weather is sunny today" through the speaker.
  • the user 200 can directly speak the next voice message after the terminal 100 has fed back the previous voice message, thus realizing a continuous conversation with the terminal 100.
  • the terminal 100 implements the above-mentioned continuous dialogue function by extending the sound collection time after each round of voice interaction with the user 200 is completed. For example, after the terminal 100 outputs the feedback result corresponding to the first voice message, the terminal 100 does not exit the sound collection, but continues to monitor the sound for a period of time, such as 10 seconds. If no voice signal is received within 10 seconds, the terminal 100 will exit the radio at this time; if a voice signal is received within 10 seconds, the terminal 100 will continue to perform feedback on the received voice information.
  • the terminal 100 extends the sound collection, if the user 200 does not make any sound, that is, the user 200 has no intention to continue the conversation, and there are other people talking or other noises around, the terminal 100 will continue to target other people. Feedback of what people say or other noises around them will cause trouble and disgust to the user 200 and affect the user experience.
  • this application provides a voice interaction method, which can effectively reduce the terminal 100's erroneous response to other people or other surrounding noises and improve the accuracy of voice interaction.
  • the voice interaction method provided by this application can be applied to the terminal 100.
  • the terminal 100 may be a mobile phone, a remote control, or a smart wearable device such as a watch or bracelet.
  • the hardware structure of the terminal 100 is introduced below.
  • FIG. 2 is a hardware structure block diagram of the terminal 100 provided by the embodiment of the present application.
  • the terminal 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , Antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
  • SIM subscriber identification module
  • the above-mentioned sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, and a touch sensor 180K.
  • Sensors such as ambient light sensor 180L, bone conduction sensor 180M, humidity sensor 180N and carbon dioxide sensor 180P.
  • the structure illustrated in this embodiment does not constitute a specific limitation on the terminal 100.
  • the terminal 100 may include more or fewer components than shown, or some components may be combined, or some components may be separated, or may be arranged differently.
  • the components illustrated may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) wait.
  • application processor application processor, AP
  • modem processor graphics processing unit
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • NPU neural-network processing unit
  • different processing units can be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the terminal 100.
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • processor 110 may include one or more interfaces.
  • the interface connection relationships between the modules illustrated in this embodiment are only schematic illustrations and do not constitute a structural limitation on the terminal 100 .
  • the terminal 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger. While charging the battery 142, the charging management module 140 can also provide power to the terminal through the power management module 141.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc.
  • the wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 360, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in terminal 100 may be used to cover a single one or more communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • the wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (blue tooth, BT), and global navigation satellites.
  • WLAN wireless local area networks
  • Wi-Fi wireless fidelity
  • Bluetooth blue tooth
  • BT Bluetooth
  • global navigation satellites Global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
  • the terminal 100 implements the display function through the GPU, the display screen 194, and the application processor.
  • the GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • the display screen 194 is used to display images, videos, etc.
  • the display screen 194 includes a display panel.
  • display screen 194 may be a touch screen.
  • the terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the terminal 100 .
  • the processor 110 can execute instructions stored in the internal memory 121, and the internal memory 121 can include a program storage area and a data storage area.
  • the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.).
  • the storage data area may store data created during use of the terminal 100 (such as audio data, phone book, etc.).
  • the terminal 100 can implement audio functions through the audio module 370, the speaker 370A, the receiver 370B, the microphone 370C, the headphone interface 370D, and the application processor.
  • the user's voice information can be collected through the microphone 370C, and the feedback result for the user's voice information can be played through the speaker 370A.
  • Touch sensor also called “touch panel”.
  • the touch sensor can be disposed on the display screen 194, and the touch sensor and the display screen 194 form a touch screen, which is also called a "touch screen”. Touch sensors are used to detect touches on or near them.
  • the touch sensor can pass the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation may be provided through display screen 194 .
  • the touch sensor may also be disposed on the surface of the terminal 100 in a position different from that of the display screen 194 .
  • the terminal 100 can detect the touch operation input by the user on the touch screen through a touch sensor. operation, and collect one or more of the touch position of the touch operation on the touch screen, the touch time, etc. In some embodiments, the terminal 100 can determine the touch location of the touch operation on the touch screen through a combination of the touch sensor 180K and the pressure sensor 180A.
  • the buttons 190 include a power button, a volume button, etc.
  • Key 190 may be a mechanical key. It can also be a touch button.
  • the terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100. For example, the voice interaction function can be awakened by long pressing the power button.
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications can correspond to different vibration feedback effects.
  • the motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 .
  • Different application scenarios such as time reminders, receiving information, alarm clocks, games, etc.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card.
  • the SIM card can be connected to or separated from the terminal 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • the terminal 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1.
  • SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc.
  • the gyro sensor 180B may be a three-axis gyroscope, used to track state changes of the terminal 100 in six directions.
  • the acceleration sensor 180E is used to detect the movement speed, direction and displacement of the terminal 100 .
  • the terminal 100 can detect the status and position of the terminal 100 through the gyroscope sensor 180B and the acceleration sensor 180E, and can determine the gesture of the user holding the terminal 100 based on the status and position of the terminal 100 at different times. For example, the user holds the terminal 100 toward the user's mouth, or the user holds the terminal 100 away from the user's mouth.
  • Figure 3 is a flow chart of a voice interaction method provided by an embodiment of the present application. As shown in Figure 3, the method may include the following steps:
  • Step S1 A wake-up instruction to initiate voice interaction is detected.
  • the wake-up instruction is used to wake up the terminal 100 to enter the working state of voice interaction.
  • the wake-up instruction may be a specific wake-up word input by the user to the terminal 100, the user may press and hold the power button, or the user may click on the desktop voice assistant application, etc.
  • the embodiment of the present application also provides a wake-up method for breath wake-up.
  • Breath awakening means that the user awakens the terminal 100 to enter the voice interaction working state by pointing the mouth towards the terminal 100 and generating breath (such as speaking or blowing) within a preset distance range from the terminal 100 .
  • breath such as speaking or blowing
  • the user can put the terminal 100 to his mouth and speak or blow directly into the terminal 100 to wake up the terminal 100 and enter the voice interaction working state without using a specific wake-up word or pressing a button.
  • the terminal 100 detects the user's breath, it enters the voice interaction working state.
  • the user's breath detection method can be implemented in the following way: using the microphone 170C to collect voice information. If the voice information is collected, the breath recognition module can be used to determine whether the collected voice information is the user's breath through the mouth. words or air blown towards the terminal 100 and within a preset distance range from the terminal 100 .
  • the breath recognition module may be a neural network trained for identifying breath.
  • the breath recognition module can be obtained by learning the characteristics of the popping sound produced when the user speaks into the microphone 170C through training.
  • the breath recognition module is a trained neural network that can identify whether the input voice information is a sound input close to the microphone 170C.
  • a well-trained neural network can accurately detect human voices within 5 centimeters of the microphone. In this way, when the breath recognition module recognizes that the input voice information is a human voice within 5 cm of the microphone, it is determined that the user's breath is detected, thereby waking up the terminal 100 to enter the voice interaction working state.
  • the microphone 170C can also collect the sound.
  • the sound generated by blowing is also called voice information.
  • the user's breath detection method can also adopt the following implementation: if the microphone 170C collects voice information, obtain the pressure value collected by the pressure sensor 180A when the voice information is collected. If the pressure value is greater than the preset pressure threshold, it is determined that the user's breath is detected.
  • the airflow generated when the user's mouth is facing the terminal and speaks or blows within a preset distance range from the terminal 100 will produce a certain pressure on the terminal 100 .
  • the embodiment of the present application can use the pressure sensor 180A to detect the pressure generated by the user on the terminal 100 when speaking. If the pressure value is greater than the preset pressure threshold, it means that the user's mouth is facing the terminal 100 and is within the preset distance range from the terminal 100 Talk or blow into the device to ensure that the user's breath is detected. On the contrary, if the pressure value is less than or equal to the preset pressure threshold, it means that the user does not speak or blow within the preset distance range from the terminal 100, so it can be determined that the user's breath is not detected.
  • the parameters of the pressure sensor 180A must be able to meet the accuracy requirements of breath detection.
  • the airflow generated when the user's mouth is facing the terminal 100 and speaking or blowing within a preset distance range from the terminal 100 will produce a pressure of 0.07Mpa on the terminal 100.
  • the measurement range of the pressure sensor 180A is 0 to 0.3Mpa.
  • the accuracy is 0.001Mpa.
  • the pressure sensor 180A can be disposed near the microphone 170C. In this way, when the user speaks close to the microphone 170C, the pressure sensor 180A near the microphone 170C can detect the pressure exerted on the pressure sensor 180A by the airflow generated by speaking.
  • the user's breath detection method can also adopt the following implementation: if the microphone 170C collects voice information, the first temperature and the second temperature are obtained.
  • the first temperature is the temperature collected by the temperature sensor 180J before the microphone 170C collects the voice information;
  • the second temperature is the temperature collected by the temperature sensor 180J when the microphone 170C collects the voice information. If the second temperature is greater than the first temperature, it is determined that the user's breath is detected; if the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
  • the user's breath detection method can also be implemented in the following implementation: if the microphone 170C collects voice information, the humidity collected by the humidity sensor 180N when the voice information is collected is obtained. If the humidity is greater than the preset humidity threshold, it is determined that the user's breath is detected; if the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
  • the user's breath detection method can also adopt the following implementation: if the microphone 170C collects voice information, obtain the carbon dioxide concentration collected by the carbon dioxide sensor 180P when the voice information is collected. If the carbon dioxide concentration is greater than the preset carbon dioxide concentration threshold, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
  • the embodiments of this application can Based on the data collected by the temperature sensor 180J, the humidity sensor 180N, or the carbon dioxide sensor 180P, it is determined whether the user's breath is detected.
  • the above embodiments only illustrate the implementation of detecting the user's breath and do not limit the specific implementation of detecting the user's breath.
  • the various implementation methods listed in the above embodiments can also be used in combination.
  • the solution of "breath recognition module” and “pressure sensor” can be used in combination.
  • the “breath recognition module” and “temperature sensor” can be used in combination.
  • solution for example, the solution of "breath identification module” and “humidity sensor” can be combined.
  • the solution of "breath identification module” and “carbon dioxide sensor” can be combined.
  • the voice interaction method provided by the embodiment of the present application is not available.
  • the terminal 100 when the user makes a call using the terminal 100, even if the user's mouth is facing the terminal 100 and breath is generated within a preset distance range from the terminal 100, the terminal 100 will not be awakened to enter the voice interaction working state.
  • Step S2 In response to the wake-up instruction, enter the voice interaction working state.
  • the terminal 100 After entering the voice interaction working state, the terminal 100 will continue to collect audio to obtain the user's voice information.
  • Step S3 The first voice information is detected.
  • Step S4 Output the feedback result for the first voice information.
  • the feedback result for the first voice information may be voice, text, image, or entering an application program, etc. This application does not limit this.
  • the terminal 100 After entering the voice interaction working state, the user says a sentence, such as "How is the weather today?", and then "How is the weather today" is detected by the terminal 100 as the first voice information. Then, the terminal 100 outputs the feedback result for the first voice information. For example, the terminal 100 outputs the voice "Today's weather is sunny” through the speaker 170A. For example, the terminal 100 can display the text "Today's weather is sunny” through the display screen 194.
  • the terminal 100 After entering the voice interaction working state, the user says a sentence, such as "Dial Zhang San", and then "Dial Zhang San” as the first voice message, which is detected by the terminal 100. Then, the terminal 100 outputs the feedback result for the first voice information. For example, the terminal 100 enters the voice call application and dials Zhang San's phone number.
  • Step S5 Determine whether the user has the intention to continue voice interaction.
  • the voice interaction working state is maintained; if the user has no intention to continue voice interaction, the voice interaction working state is ended.
  • the terminal 100 can continue to collect sounds; after ending the voice interaction working state, the terminal stops collecting sounds.
  • the embodiments of this application provide the following implementation methods for determining whether the user has the intention to continue voice interaction.
  • Figure 4 is a flowchart of a first implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
  • the first implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
  • Step S51 determine whether the terminal 100 is close to the user's mouth.
  • determining whether the terminal 100 is close to the user's mouth means determining whether the terminal 100 is near the user's mouth.
  • the working state of voice interaction is awakened by breathing, it means that the terminal 100 is near the user's mouth when the terminal 100 is awakened. Therefore, after outputting the feedback result for the first voice information, it can be determined by determining whether the user holds the terminal 100 away from the user's mouth during the period from step S1 to step S4. Whether the terminal 100 is still at the user's mouth. If yes, it is considered that the terminal 100 is no longer near the user's mouth. In this case, it can be considered that the user has no intention to continue voice interaction, and the voice interaction working state can be ended; if no, it is considered that the terminal 100 is still near the user's mouth. In this case, it is considered that the terminal 100 is no longer near the user's mouth. The user may have the intention to continue voice interaction and can continue to perform subsequent steps.
  • the present application may first determine whether the user holds the terminal 100 closer to the user's mouth before outputting the feedback result for the first voice information. If it is determined that the user holds the terminal 100 closer to the user's mouth before outputting the feedback result for the first voice information, then it is then determined whether the terminal 100 is still in the direction of the user's mouth after outputting the feedback result for the first voice information. side (specifically, it is determined whether the user holds the terminal 100 away from the user's mouth after outputting the feedback result for the first voice information). If it is determined that the user does not hold the terminal 100 toward the user's mouth before outputting the feedback result for the first voice information, in this case, it can be considered that the user has no intention to continue the voice interaction, and the working state of the voice interaction can be ended.
  • the gyro sensor 180C and the acceleration sensor 180E on the terminal 100 can be used to collect the angular velocity and acceleration of the terminal 100; then, the collected angular velocity and acceleration are used to determine the user's gesture.
  • the user's gestures may include a first gesture, a second gesture and a third gesture.
  • the first gesture is used to indicate that the user's handheld terminal 100 is in a stationary state
  • the second gesture is used to indicate that the user's handheld terminal 100 is moving towards the user's mouth.
  • the third gesture indicates that the user holds the terminal 100 closer to the user's mouth.
  • the gyro sensor 180C and the acceleration sensor 180E can be used to collect the angular velocity and acceleration during the period from step S1 to step S4. Then, the collected angular velocity and acceleration are input into the gesture recognition module.
  • the gesture recognition module can be a neural network trained for gesture recognition. After processing by the gesture recognition module, the user's gesture is output. Among them, the gesture recognition module can determine the user's handholding based on the angular velocity and acceleration of the terminal 100 at different times.
  • the user's gesture is considered to be static. For example, if the user holds the terminal 100 at a distance of 5 cm from the mouth, and the user holds the terminal 100 at a distance of 4 cm from the mouth, the user's gesture is considered to be the first gesture.
  • Step S52 If it is determined that the terminal 100 is close to the user's mouth, the voice interaction working state is extended for a preset time.
  • this application extends the voice interaction working state for a preset time. Within the extended preset time period, the terminal 100 continues to receive audio.
  • This application does not limit the preset time, for example, it can be 5s, 10s, 20s, etc.
  • Step S53 Determine whether the second voice message is detected within a preset time period.
  • Step S54 If the second voice information is detected within the preset time period, determine whether the user's breath is detected.
  • the present application further detects the user's breath to determine whether the second voice information is what the user said to the terminal 100 .
  • the user's mouth in the working state of voice interaction, the user's mouth needs to be close to the terminal 100 to perform voice interaction with the terminal 100 . Therefore, if the second voice message is spoken by the user, then The breath produced by the user while speaking can be detected by the terminal. In other words, this application can determine whether the second voice information is what the user said or whether it is what other people around him said based on whether the user's breath can be detected.
  • the breath recognition module can be used to detect the user's breath.
  • Step S55 If the user's breath is detected, the feedback result for the second voice information is output.
  • the user's breath it means that the second voice information is spoken by the user's mouth toward the terminal 100 and within a preset distance range from the terminal 100 . In this case, it is considered that the user has the intention to continue voice interaction, and the feedback result for the second voice information is output. If the user's breath is not detected, it means that the second voice information is what other people around him said, not what the user said. In this case, it is considered that the user has no intention of continuing the voice interaction, and the working state of the voice interaction can be ended.
  • the voice interaction working state is ended; if it is determined that the terminal 100 is near the user's mouth, the voice interaction working state is extended for a preset time. Then, if the second voice information is not detected within the preset time period, the voice interaction working state is ended; if the second voice information is detected within the preset time period, the user's breath is detected.
  • the voice interaction working state is ended; if the user's breath is detected, the feedback result for the second voice information is output. That is to say, in the first implementation manner, if the terminal 100 is near the user's mouth and can detect the user's breath, it is determined that the user has the intention to continue voice interaction. In this way, the voice interaction method provided by the embodiments of the present application can identify with a high probability that the user himself has the intention to continue voice interaction, effectively reducing the terminal 100's erroneous response to other people or other surrounding noises, and improving the accuracy of voice interaction and user satisfaction. experience.
  • Figure 5 is a flowchart of a second implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
  • the second implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
  • Step S61 After outputting the feedback result for the first voice information, the voice interaction working state is extended for a preset time period.
  • Step S62 determine whether the second voice message is detected within a preset time period
  • Step S63 If the second voice information is detected within the preset time period, determine whether the terminal 100 is close to the user's mouth.
  • Step S64 If it is determined that the terminal 100 is close to the user's mouth, it is determined whether the user's breath is detected.
  • Step S65 If the user's breath is detected, the feedback result for the second voice information is output.
  • the voice interaction working state is directly extended for a preset time. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. If the second voice information is detected within the preset time period, it is first determined whether the terminal 100 is near the user's mouth. If the terminal 100 is not near the user's mouth, the voice interaction working state is ended. If it is determined that the terminal 100 is near the user's mouth, the user's breath is detected again. If the user's breath is not detected, the voice interaction working state is ended; if the user's breath is detected, the feedback result for the second voice information is output.
  • step S63 for the specific implementation method of determining whether the terminal 100 is near the user's mouth in step S63, please refer to the description of step S51, and for the specific implementation method of detecting the user's breath in step S64, please refer to the description of step S54.
  • step S65 for the specific implementation of step S65, please refer to the description of step S55, which will not be described again here.
  • FIG. 6 is a flow chart of a third implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
  • the third implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
  • Step S71 After outputting the feedback result for the first voice information, the voice interaction working state is extended for a preset time period.
  • Step S72 determine whether the second voice message is detected within the preset time period
  • Step S73 If the second voice information is detected within the preset time period, determine whether the user's breath is detected.
  • Step S74 If the user's breath is detected, a feedback result for the second voice information is output.
  • the voice interaction working state is directly extended for a preset time. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. If the second voice information is detected within the preset time period, the user's breath is detected. If the user's breath is not detected, the voice interaction working state is ended. If the user's breath is detected, a feedback result for the second voice information is output.
  • step S73 for the specific implementation method of detecting the user's breath in step S73, please refer to the description of step S54, and for the specific implementation method of step S74, please refer to the description of step S55, which will not be described again here.
  • FIG. 7 is a flow chart of a fourth implementation manner for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
  • the fourth implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
  • Step S81 After outputting the feedback result for the first voice information, the voice interaction working state is extended for a preset time period.
  • Step S82 determine whether the second voice message is detected within the preset time period
  • Step S83 If the second voice information is detected within the preset time period, determine whether the terminal 100 is close to the user's mouth.
  • Step S84 If it is determined that the terminal 100 is close to the user's mouth, a feedback result for the second voice information is output.
  • the voice interaction working state is directly extended for a preset time period. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. If the second voice information is detected within the preset time period, it is determined whether the terminal 100 is near the user's mouth. If the terminal 100 is not near the user's mouth, the voice interaction working state ends. If it is determined that the terminal 100 is at the user's mouth, the feedback result for the second voice information is output.
  • step S83 for the specific implementation method of determining whether the terminal 100 is near the user's mouth in step S83, please refer to the description of step S51, and for the specific implementation method of step S84, please refer to the description of step S55, which will not be described again here.
  • Figure 8 is a flowchart of a fifth implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
  • the fifth implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
  • Step S91 determine whether the terminal 100 is close to the user's mouth.
  • Step S92 If it is determined that the terminal 100 is close to the user's mouth, the working state of the voice interaction is extended for a preset time.
  • Step S93 Determine whether the second voice message is detected within a preset time period.
  • Step S94 If the second voice information is detected within the preset time period, a feedback result for the second voice information is output.
  • the fifth implementation manner it is first determined whether the terminal 100 is at the user's mouth. If it is determined that the terminal 100 is at the user's mouth, the working state of the voice interaction is extended for a preset time period. If it is determined that the terminal 100 is not near the user's mouth, the working state of voice interaction is ended. In this way, the power consumption of the terminal 100 can be reduced. Further, if the second voice information is detected within a preset time period, a feedback result for the second voice information is output. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. In the fifth implementation manner, after outputting the feedback result for the first voice information, if the terminal 100 is still near the user's mouth, it is considered that the user has the intention to continue voice interaction, so that the listening time can be extended.
  • the user's breath can be detected first. If the user's breath is detected, the feedback result for the second voice information can be output.
  • the first implementation method mentioned above which will not be described again here.
  • the voice interaction method provided by the embodiments of the present application can identify with a high probability that the user himself has the intention to continue voice interaction, effectively reduce the terminal 100's incorrect response to other people or other surrounding noises, and improve the accuracy and accuracy of voice interaction. user experience.
  • the methods and operations implemented by the electronic device can also be implemented by components (such as chips or circuits) that can be used in the electronic device.
  • the terminal includes hardware structures and/or software modules corresponding to each function.
  • the terminal includes hardware structures and/or software modules corresponding to each function.
  • Persons skilled in the art should easily realize that, with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
  • Figure 9 is a schematic structural diagram of a voice interaction device provided by an embodiment of the present application.
  • the terminal can implement corresponding functions through the hardware device shown in Figure 9.
  • the device 1000 may include: a processor 1001 and a memory 1002.
  • the processor 1001 may include one or more processing units.
  • the processor 1001 may include an application processor, a modem processor, a graphics processor, an image signal processor, a controller, a video codec, a digital signal processor, baseband processor, and/or neural network processor, etc.
  • different processing units can be independent devices or integrated in one or more processors.
  • Memory 1002 is coupled to processor 1001 for storing various software programs and/or sets of instructions.
  • Memory 1002 may include volatile memory. storage and/or non-volatile memory.
  • the device 1000 can perform the operations performed in the above method embodiments.
  • the processor 1001 may be configured to detect a wake-up instruction initiating voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect the first Voice information; output a feedback result for the first voice information; if the second voice information is detected within a preset time period, detect the user's breath; if the user's breath is detected, output feedback for the second voice information result.
  • the processor is further configured to determine whether the terminal is close to the user's mouth after outputting the feedback result for the first voice information; if it is determined that the terminal is close to the user If it is determined that the terminal is not close to the user's mouth, then the working state of voice interaction is ended.
  • the processor is further configured to determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, detect the user's breath; if it is determined that the terminal is not close The user's mouth ends the working state of voice interaction.
  • the processor is further configured to identify the user's gesture in the working state of the voice interaction; if the user's gesture is a first gesture, determine that the terminal is close to The user's mouth, the first gesture is used to indicate that the user is holding the terminal in a stationary state; if the user's gesture is the second gesture, it is determined that the terminal is not close to the user's mouth, the The second gesture is used to represent that the user is holding the terminal and moving away from the user's mouth.
  • the processor is further configured to determine whether to output the target for the user's mouth. Before the feedback result of the first voice information, whether the third gesture is recognized, the third gesture is used to represent that the user holds the terminal and approaches the user's mouth; if the third gesture is recognized, Then it is determined whether the terminal is still close to the user's mouth after outputting the feedback result for the first voice information; if the third gesture is not recognized, the working state of the voice interaction is ended.
  • the processor is also used to obtain the angular velocity and acceleration at different times in the working state of the voice interaction; using the angular velocity, acceleration and gesture recognition module at different times, determine The user's gesture; wherein, the gesture recognition module is used to recognize that the user holds the terminal toward the user's mouth, the user holds the terminal away from the user's mouth, or the user holds the terminal in a stationary state.
  • the processor is further configured to input the second voice information into a breath recognition module, and the breath recognition module is used to identify whether the second voice information is the mouth of the user. Sounds emitted within a preset distance from the terminal; if the breath recognition module recognizes that the second voice information is a sound emitted within a preset distance from the user's mouth to the terminal, it is determined that the user is detected Breath; if the breath recognition module recognizes that the second voice information is not a sound emitted by the user's mouth within a preset distance from the terminal, it is determined that the user's breath is not detected.
  • the terminal includes a pressure sensor
  • the processor is further configured to obtain a pressure value corresponding to the pressure sensor when the second voice information is collected; if the pressure value is greater than a predetermined If the pressure threshold is set, it is determined that the user's breath is detected; if the pressure value is less than or equal to the preset pressure threshold, it is determined that the user's breath is not detected.
  • the terminal includes a temperature sensor
  • the processor is further configured to obtain a first temperature and a second temperature, where the first temperature is before the second voice information is collected, The temperature sensor The corresponding temperature, the second temperature is the temperature corresponding to the temperature sensor when the second voice information is collected; if the second temperature is greater than the first temperature, it is determined that the user's breath is detected; if the If the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
  • the terminal includes a humidity sensor
  • the processor is further configured to obtain the humidity corresponding to the humidity sensor when the second voice information is collected; if the humidity is greater than the preset humidity If the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
  • the terminal includes a carbon dioxide sensor
  • the processor is further configured to obtain the carbon dioxide concentration corresponding to the carbon dioxide sensor when the second voice information is collected; if the carbon dioxide concentration is greater than a predetermined If the carbon dioxide concentration threshold is set, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
  • the processor 1001 may be configured to detect a wake-up instruction initiating voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect that the third a voice message; output the feedback result for the first voice message; determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extend the working state of the voice interaction for a preset time; if determined If the terminal is not close to the user's mouth, the voice interaction working state ends.
  • the processor 1001 may be configured to detect a wake-up instruction that initiates voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect that the third a voice message; output the feedback result for the first voice information; if the second voice information is detected within a preset time period, determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, Then the feedback result for the second voice information is output.
  • each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor.
  • the steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
  • the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities.
  • each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software.
  • the above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components.
  • DSP digital signal processor
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc.
  • the steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field.
  • the storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
  • the memory in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories.
  • the non-volatile memory can be a read-only memory (read-only memory). memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which is used as an external cache.
  • RAM static random access memory
  • DRAM dynamic random access memory
  • SDRAM synchronous dynamic random access memory
  • double data rate SDRAM double data rate SDRAM
  • DDR SDRAM double data rate SDRAM
  • ESDRAM enhanced synchronous dynamic random access memory
  • SLDRAM synchronous link dynamic random access memory
  • direct rambus RAM direct rambus RAM
  • the embodiment of the present application also provides a computer program product.
  • the computer program product includes: a computer program or instructions. When the computer program or instructions are run on a computer, the computer executes the method. method in any of the examples.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program or instructions.
  • the computer program or instructions When the computer program or instructions are run on the computer, the The computer executes the method of any one of the method embodiments.
  • the embodiment of the present application also provides a terminal, including a memory and a processor; the memory is coupled to the processor; the memory is used to store computer program code, and the computer program code It includes computer instructions, and when the processor executes the computer instructions, the electronic device causes the electronic device to execute the method of any one of the method embodiments.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules is only a logical function division. In actual implementation, there may be other division methods.
  • multiple modules or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional module in each embodiment of the present application can be integrated into a processing unit, or each module can exist physically alone, or two or more modules can be integrated into one unit.
  • the functions described are implemented in the form of software functional units and sold or used as independent products, they can be saved. Stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
  • the voice interaction devices, chips, computer storage media, computer program products, and terminals provided by the above embodiments of the present application are all used to execute the methods provided above. Therefore, the beneficial effects they can achieve can refer to the methods provided above. The corresponding beneficial effects will not be described again here.
  • each step should be determined by its function and internal logic.
  • the size of each step number does not mean the order of execution, and does not limit the implementation process of the embodiment.

Abstract

A voice interaction method and apparatus, and a terminal. The method comprises: detecting a wake-up instruction for initiating voice interaction (S1); in response to the wake-up instruction, entering an operation state of voice interaction (S2); detecting first voice information (S3); outputting a feedback result for the first voice information (S4); determining whether a terminal (100) is close to the mouth of a user (S51); if it is determined that the terminal (100) is close to the mouth of the user, prolonging the operation state of voice interaction for a preset duration (S52); if second voice information is detected within the preset duration, detecting the breath of the user (S54); and if the breath of the user is detected, outputting a feedback result for the second voice information (S55). By means of such a voice interaction method, a user himself/herself having an intent to continue voice interaction can be identified with high probability, thereby effectively reducing erroneous responses of a terminal to other people or other surrounding noise, and thus improving the accuracy and user experience of voice interaction.

Description

一种语音交互方法、装置及终端A voice interaction method, device and terminal
本申请要求于2022年9月14日提交到国家知识产权局、申请号为202211113419.9、发明名称为“一种语音交互方法、装置及终端”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application submitted to the State Intellectual Property Office on September 14, 2022, with the application number 202211113419.9 and the invention title "A voice interaction method, device and terminal", the entire content of which is incorporated by reference. in this application.
技术领域Technical field
本申请属于人机交互技术领域,尤其涉及一种语音交互方法、装置及终端。The present application belongs to the field of human-computer interaction technology, and in particular relates to a voice interaction method, device and terminal.
背景技术Background technique
语音交互是基于语音输入的新一代交互模式,基于用户向终端输入的语音信息,就可以得到与输入的语音信息对应的反馈结果。Voice interaction is a new generation of interaction mode based on voice input. Based on the voice information input by the user to the terminal, feedback results corresponding to the input voice information can be obtained.
在与终端进行语音交互之前,首先要唤醒终端上的语音交互系统(例如语音助手),例如,可以通过特定的唤醒词唤醒语音助手。语音助手被唤醒后,用户可以与终端进行语音交互。在用户与终端语音交互的过程中,一般用户说完一条语音后,终端输出与这条语音对应的反馈结果,接着,用户可以说下一条语音,这样就实现了与终端的连续对话。Before performing voice interaction with the terminal, the voice interaction system (such as a voice assistant) on the terminal must first be awakened. For example, the voice assistant can be awakened through a specific wake-up word. After the voice assistant is awakened, the user can conduct voice interaction with the terminal. During the voice interaction process between the user and the terminal, generally after the user finishes speaking a voice, the terminal outputs the feedback result corresponding to the voice. Then, the user can speak the next voice, thus realizing a continuous dialogue with the terminal.
但是,目前终端的连续对话功能,是通过延长终端的收音时间来实现的。例如,终端输出与第一条语音对应的反馈结果后,终端继续收音一段时间,比如10s。如果10s内没有收到任何语音信号,此时终端再结束收音;如果10s内有收到语音信号,则终端会继续输出针对收到的语音信息的反馈结果。这样,在终端延长收音的这段时间内,如果用户没有发出任何声音,但是,周围有其他人在说话,终端也会继续针对其他人说的话进行反馈,这样会给用户带来困扰和反感,影响用户体验。However, the current continuous dialogue function of the terminal is achieved by extending the radio time of the terminal. For example, after the terminal outputs the feedback result corresponding to the first voice message, the terminal continues to listen for a period of time, such as 10 seconds. If no voice signal is received within 10 seconds, the terminal will stop collecting; if a voice signal is received within 10 seconds, the terminal will continue to output feedback results for the received voice information. In this way, during the period when the terminal extends the listening period, if the user does not make any sound, but there are other people talking around, the terminal will continue to provide feedback on what other people say, which will cause trouble and disgust to the user. Affect user experience.
发明内容Contents of the invention
本申请提供一种语音交互方法、装置及终端,能够解决在终端延长收音的这段时间内,对于其他人或者周围其他噪声的错误响应问题。This application provides a voice interaction method, device and terminal, which can solve the problem of incorrect responses to other people or other surrounding noises during the period when the terminal extends the sound collection.
第一方面,本申请提供一种语音交互方法,所述方法包括:检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则检测用户气息;如果检测到用户气息,则输出针对所述第二语音信息的反馈结果。In a first aspect, the present application provides a voice interaction method, which method includes: detecting a wake-up instruction to initiate voice interaction; responding to the wake-up instruction, entering the working state of voice interaction; detecting the first voice information; and outputting the corresponding The feedback result of the first voice information; if the second voice information is detected within the preset time period, the user's breath is detected; if the user's breath is detected, the feedback result for the second voice information is output.
这样,能够通过用户气息检测,大概率识别到是用户本人有继续语音交互的意图,有效降低终端对于其他人或者周围其他噪声的错误响应,提升语音交互的准确性和用户体验。In this way, through user breath detection, it can be identified with a high probability that the user himself has the intention to continue voice interaction, effectively reducing the terminal's erroneous response to other people or other surrounding noises, and improving the accuracy of voice interaction and user experience.
在一种可实现方式中,在输出针对所述第一语音信息的反馈结果之后,还包括:确定所述终端是否靠近用户的嘴部;如果确定所述终端靠近所述用户的嘴部,则将所述语音交互的工作状态延长所述预设时长;如果确定所述终端不靠近所述用户的嘴部,则结束语音交互的工作状态。In an implementable manner, after outputting the feedback result for the first voice information, the method further includes: determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, then The working state of the voice interaction is extended by the preset duration; if it is determined that the terminal is not close to the user's mouth, the working state of the voice interaction is ended.
这样,在进行用户气息检测之前,先确定所述终端是否靠近用户的嘴部,如果确定 所述终端靠近所述用户的嘴部,再延长收音时间;如果确定所述终端不靠近所述用户的嘴部,则直接结束收音。这样可以很大程度上降低收音带来的功耗问题。In this way, before performing user breath detection, it is first determined whether the terminal is close to the user's mouth. If it is determined If the terminal is close to the user's mouth, the sound collection time will be extended; if it is determined that the terminal is not close to the user's mouth, the sound collection will be ended directly. This can greatly reduce the power consumption problem caused by radio.
在一种可实现方式中,所述如果在预设时长内检测到第二语音信息,还包括:确定终端是否靠近用户的嘴部;如果确定所述终端靠近所述用户的嘴部,则检测用户气息;如果确定所述终端不靠近用户的嘴部,则结束语音交互的工作状态。In an implementable manner, if the second voice information is detected within a preset time period, the method further includes: determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, detecting The user breathes; if it is determined that the terminal is not close to the user's mouth, the working state of voice interaction is ended.
这样,在检测到第二语音信息的情况下,先判断终端是否靠近用户的嘴部,再确定要不要检测用户气息。如果终端不靠近用户的嘴部,则认为第二语音信息不是用户发出的声音,也就无需检测用户气息了。In this way, when the second voice information is detected, it is first determined whether the terminal is close to the user's mouth, and then it is determined whether to detect the user's breath. If the terminal is not close to the user's mouth, it is considered that the second voice message is not the sound made by the user, and there is no need to detect the user's breath.
在一种可实现方式中,如果所述唤醒指示为用户气息,则所述确定所述终端是否靠近用户的嘴部,包括:识别在所述语音交互的工作状态下,所述用户的手势;如果所述用户的手势为第一手势,则确定所述终端靠近用户的嘴部,所述第一手势用于表征所述用户手持所述终端处于静止状态;如果所述用户的手势为第二手势,则确定所述终端不靠近用户的嘴部,所述第二手势用于表征所述用户手持所述终端向所述用户的嘴部方向远离。In an implementable manner, if the wake-up indication is the user's breath, determining whether the terminal is close to the user's mouth includes: identifying the user's gesture in the working state of the voice interaction; If the user's gesture is a first gesture, it is determined that the terminal is close to the user's mouth, and the first gesture is used to represent that the user is holding the terminal in a stationary state; if the user's gesture is a second gesture, it is determined that the terminal is close to the user's mouth. gesture, it is determined that the terminal is not close to the user's mouth, and the second gesture is used to represent that the user is holding the terminal and moving away from the user's mouth.
这样,如果语音交互的工作状态是通过气息唤醒的方式唤醒的,说明在唤醒终端时,终端是在用户嘴边的。因此,在输出针对第一语音信息的反馈结果之后,可以通过判断用户是否手持终端100向用户的嘴部方向远离,确定终端100是否依然在用户嘴边。In this way, if the working state of voice interaction is awakened by breathing, it means that the terminal is near the user's mouth when waking up the terminal. Therefore, after the feedback result for the first voice information is output, it can be determined whether the terminal 100 is still near the user's mouth by determining whether the user holds the terminal 100 away from the user's mouth.
在一种可实现方式中,如果所述唤醒指示为除用户气息以外的其他方式,则在确定所述终端是否靠近用户的嘴部之前,包括:确定在输出针对所述第一语音信息的反馈结果之前,是否识别到第三手势,所述第三手势用于表征所述用户手持所述终端向所述用户的嘴部方向靠近;如果识别到所述第三手势,则确定在输出针对所述第一语音信息的反馈结果之后,所述终端是否仍然靠近用户的嘴部;如果没有识别到所述第三手势,则结束语音交互的工作状态。In an implementable manner, if the wake-up indication is other than the user's breath, before determining whether the terminal is close to the user's mouth, the method includes: determining whether to output feedback for the first voice information. Before the result, whether a third gesture is recognized, the third gesture is used to represent that the user holds the terminal and approaches the user's mouth; if the third gesture is recognized, it is determined that the output for the After the feedback result of the first voice information, whether the terminal is still close to the user's mouth; if the third gesture is not recognized, the working state of voice interaction is ended.
这样,如果语音交互的工作状态不是通过气息唤醒的方式唤醒的,说明在唤醒终端时,终端不在用户嘴边。这种情况下,本申请可以在进入语音交互的工作状态之后,先确定在输出针对所述第一语音信息的反馈结果之前,用户是否手持终端向用户的嘴部方向靠近。如果确定在输出针对所述第一语音信息的反馈结果之前,用户手持终端向用户的嘴部方向靠近,则再确定在输出针对第一语音信息的反馈结果之后,终端是否仍然在用户嘴边。In this way, if the working state of voice interaction is not awakened by breath awakening, it means that the terminal is not near the user's mouth when the terminal is awakened. In this case, after entering the working state of voice interaction, the present application may first determine whether the user holds the terminal and approaches the user's mouth before outputting the feedback result for the first voice information. If it is determined that the user held the terminal closer to the user's mouth before outputting the feedback result for the first voice information, then it is then determined whether the terminal is still near the user's mouth after the feedback result for the first voice information is output.
在一种可实现方式中,所述识别在所述语音交互的工作状态下,所述用户的手势,包括:获取在所述语音交互的工作状态下,不同时刻的角速度和加速度;利用所述不同时刻的角速度、加速度、以及手势识别模块,确定用户的手势;其中,所述手势识别模块用于识别用户手持终端向用户的嘴部方向靠近、用户手持终端向用户的嘴部方向远离、或者用户手持所述终端处于静止状态。In an implementable manner, the identifying the user's gestures in the working state of the voice interaction includes: obtaining the angular velocity and acceleration at different times in the working state of the voice interaction; using the The angular velocity, acceleration, and gesture recognition module at different times determine the user's gesture; wherein, the gesture recognition module is used to recognize that the user's handheld terminal is approaching toward the user's mouth, the user's handheld terminal is moving away from the user's mouth, or The user holds the terminal in a stationary state.
这样,可以利用手势识别模块,基于不同时刻的角速度和加速度数据,确定用户的手势。In this way, the gesture recognition module can be used to determine the user's gesture based on angular velocity and acceleration data at different times.
在一种可实现方式中,所述检测用户气息,包括:将所述第二语音信息输入气息识别模块,所述气息识别模块用于识别所述第二语音信息是否为所述用户的嘴部距离所述终端预设距离内发出的声音;如果所述气息识别模块识别到所述第二语音信息为所述用户的嘴部距离所述终端预设距离内发出的声音,则确定检测到用户气息;如果所述气息 识别模块识别到所述第二语音信息不是所述用户的嘴部距离所述终端预设距离内发出的声音,则确定没有检测到用户气息。In an implementable manner, detecting the user's breath includes: inputting the second voice information into a breath recognition module, and the breath recognition module is used to identify whether the second voice information is the mouth of the user. Sounds emitted within a preset distance from the terminal; if the breath recognition module recognizes that the second voice information is a sound emitted within a preset distance from the user's mouth to the terminal, it is determined that the user is detected breath; if said breath When the recognition module recognizes that the second voice information is not a sound emitted by the user's mouth within a preset distance from the terminal, it determines that the user's breath is not detected.
这样,可以利用气息识别模块,对第二语音信息进行特征识别,以确定第二语音信息是否为用户的嘴部靠近终端发出的声音。In this way, the breath recognition module can be used to perform feature recognition on the second voice information to determine whether the second voice information is the sound produced by the user's mouth close to the terminal.
在一种可实现方式中,所述终端包括压力传感器,所述检测用户气息,包括:获取采集到所述第二语音信息时,所述压力传感器对应的压力值;如果所述压力值大于预设压力阈值,则确定检测到用户气息;如果所述压力值小于或等于预设压力阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a pressure sensor, and detecting the user's breath includes: obtaining the pressure value corresponding to the pressure sensor when the second voice information is collected; if the pressure value is greater than a predetermined If the pressure threshold is set, it is determined that the user's breath is detected; if the pressure value is less than or equal to the preset pressure threshold, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括温度传感器,所述检测用户气息,包括:获取第一温度和第二温度,其中,所述第一温度为采集到所述第二语音信息之前,所述温度传感器对应的温度,所述第二温度为采集到所述第二语音信息时,所述温度传感器对应的温度;如果所述第二温度大于所述第一温度,则确定检测到用户气息;如果所述第二温度小于或等于所述第一温度,则确定没有检测到用户气息。In an implementable manner, the terminal includes a temperature sensor, and detecting the user's breath includes: obtaining a first temperature and a second temperature, where the first temperature is before the second voice information is collected, The temperature corresponding to the temperature sensor, the second temperature is the temperature corresponding to the temperature sensor when the second voice information is collected; if the second temperature is greater than the first temperature, it is determined that the user is detected Breath; if the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括湿度传感器,所述检测用户气息,包括:获取采集到所述第二语音信息时,所述湿度传感器对应的湿度;如果所述湿度大于预设湿度阈值,则确定检测到用户气息;如果所述湿度小于或等于预设湿度阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a humidity sensor, and detecting the user's breath includes: obtaining the humidity corresponding to the humidity sensor when the second voice information is collected; if the humidity is greater than the preset humidity If the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括二氧化碳传感器,所述检测用户气息,包括:获取采集到所述第二语音信息时,所述二氧化碳传感器对应的二氧化碳浓度;如果所述二氧化碳浓度大于预设二氧化碳浓度阈值,则确定检测到用户气息;如果所述二氧化碳浓度小于或等于预设二氧化碳浓度阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a carbon dioxide sensor, and detecting the user's breath includes: obtaining the carbon dioxide concentration corresponding to the carbon dioxide sensor when the second voice information is collected; if the carbon dioxide concentration is greater than a predetermined If the carbon dioxide concentration threshold is set, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
这样,如果用户的嘴部靠近终端说话,则说话产生的气流对对终端产生一定的压力,以及在终端附近的温度、湿度、二氧化碳浓度也会有一定的变化,这样本申请可以利用压力传感器、温度传感器、湿度传感器或二氧化碳传感器,检测用户气息。In this way, if the user speaks close to the terminal, the air flow generated by speaking will exert a certain pressure on the terminal, and the temperature, humidity, and carbon dioxide concentration near the terminal will also change to a certain extent. In this way, this application can use pressure sensors, Temperature sensor, humidity sensor or carbon dioxide sensor to detect the user's breath.
第二方面,本申请提供一种语音交互方法,所述方法包括:检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则将语音交互的工作状态延长预设时长;如果在预设时长内检测到第二语音信息,则输出针对所述第二语音信息的反馈结果。In a second aspect, the present application provides a voice interaction method, which method includes: detecting a wake-up instruction to initiate voice interaction; responding to the wake-up instruction, entering the working state of voice interaction; detecting the first voice information; and outputting the corresponding The feedback result of the first voice information; determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extend the working state of the voice interaction for a preset time; if it is detected within the preset time second voice information, then output the feedback result for the second voice information.
第三方面,本申请提供一种语音交互方法,所述方法包括:检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则输出针对所述第二语音信息的反馈结果。In a third aspect, the present application provides a voice interaction method, which method includes: detecting a wake-up instruction to initiate voice interaction; responding to the wake-up instruction, entering the working state of voice interaction; detecting the first voice information; and outputting the corresponding The feedback result of the first voice information; if the second voice information is detected within the preset time period, determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, output a response to the first voice information. 2. Feedback results of voice information.
第四方面,本申请提供一种语音交互装置,所述装置包括处理器;所述处理器,用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则检测用户气息;如果检测到用户气息,则输出针对所述第二语音信息的反馈结果。In a fourth aspect, the present application provides a voice interaction device, which includes a processor; the processor is configured to detect a wake-up instruction that initiates voice interaction; and enter a working state of voice interaction in response to the wake-up instruction; Detect the first voice information; output the feedback result for the first voice information; if the second voice information is detected within the preset time period, detect the user's breath; if the user's breath is detected, output the feedback result for the second voice information. Feedback results of voice information.
在一种可实现方式中,所述处理器,还用于在输出针对所述第一语音信息的反馈结 果之后,确定所述终端是否靠近用户的嘴部;如果确定所述终端靠近所述用户的嘴部,则将所述语音交互的工作状态延长所述预设时长;如果确定所述终端不靠近所述用户的嘴部,则结束语音交互的工作状态。In an implementable manner, the processor is further configured to output a feedback result for the first voice information. After that, determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extend the working state of the voice interaction by the preset duration; if it is determined that the terminal is not close The user's mouth ends the voice interaction working state.
在一种可实现方式中,所述处理器,还用于确定终端是否靠近用户的嘴部;如果确定所述终端靠近所述用户的嘴部,则检测用户气息;如果确定所述终端不靠近用户的嘴部,则结束语音交互的工作状态。In an implementable manner, the processor is further configured to determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, detect the user's breath; if it is determined that the terminal is not close The user's mouth ends the working state of voice interaction.
在一种可实现方式中,所述处理器,还用于识别在所述语音交互的工作状态下,所述用户的手势;如果所述用户的手势为第一手势,则确定所述终端靠近用户的嘴部,所述第一手势用于表征所述用户手持所述终端处于静止状态;如果所述用户的手势为第二手势,则确定所述终端不靠近用户的嘴部,所述第二手势用于表征所述用户手持所述终端向所述用户的嘴部方向远离。In an implementable manner, the processor is further configured to identify the user's gesture in the working state of the voice interaction; if the user's gesture is a first gesture, determine that the terminal is close to The user's mouth, the first gesture is used to indicate that the user is holding the terminal in a stationary state; if the user's gesture is the second gesture, it is determined that the terminal is not close to the user's mouth, the The second gesture is used to represent that the user is holding the terminal and moving away from the user's mouth.
在一种可实现方式中,如果所述唤醒指示为除用户气息以外的其他方式,则在确定所述终端是否靠近用户的嘴部之前,所述处理器,还用于确定在输出针对所述第一语音信息的反馈结果之前,是否识别到第三手势,所述第三手势用于表征所述用户手持所述终端向所述用户的嘴部方向靠近;如果识别到所述第三手势,则确定在输出针对所述第一语音信息的反馈结果之后,所述终端是否仍然靠近用户的嘴部;如果没有识别到所述第三手势,则结束语音交互的工作状态。In an implementable manner, if the wake-up indication is other than the user's breath, before determining whether the terminal is close to the user's mouth, the processor is further configured to determine whether to output the target for the user's mouth. Before the feedback result of the first voice information, whether the third gesture is recognized, the third gesture is used to represent that the user holds the terminal and approaches the user's mouth; if the third gesture is recognized, Then it is determined whether the terminal is still close to the user's mouth after outputting the feedback result for the first voice information; if the third gesture is not recognized, the working state of the voice interaction is ended.
在一种可实现方式中,所述处理器,还用于获取在所述语音交互的工作状态下,不同时刻的角速度和加速度;利用所述不同时刻的角速度、加速度、以及手势识别模块,确定用户的手势;其中,所述手势识别模块用于识别用户手持终端向用户的嘴部方向靠近、用户手持终端向用户的嘴部方向远离、或者用户手持所述终端处于静止状态。In an implementable manner, the processor is also used to obtain the angular velocity and acceleration at different times in the working state of the voice interaction; using the angular velocity, acceleration and gesture recognition module at different times, determine The user's gesture; wherein, the gesture recognition module is used to recognize that the user holds the terminal toward the user's mouth, the user holds the terminal away from the user's mouth, or the user holds the terminal in a stationary state.
在一种可实现方式中,所述处理器,还用于将所述第二语音信息输入气息识别模块,所述气息识别模块用于识别所述第二语音信息是否为所述用户的嘴部距离所述终端预设距离内发出的声音;如果所述气息识别模块识别到所述第二语音信息为所述用户的嘴部距离所述终端预设距离内发出的声音,则确定检测到用户气息;如果所述气息识别模块识别到所述第二语音信息不是所述用户的嘴部距离所述终端预设距离内发出的声音,则确定没有检测到用户气息。In an implementable manner, the processor is further configured to input the second voice information into a breath recognition module, and the breath recognition module is used to identify whether the second voice information is the mouth of the user. Sounds emitted within a preset distance from the terminal; if the breath recognition module recognizes that the second voice information is a sound emitted within a preset distance from the user's mouth to the terminal, it is determined that the user is detected Breath; if the breath recognition module recognizes that the second voice information is not a sound emitted by the user's mouth within a preset distance from the terminal, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括压力传感器,所述处理器,还用于获取采集到所述第二语音信息时,所述压力传感器对应的压力值;如果所述压力值大于预设压力阈值,则确定检测到用户气息;如果所述压力值小于或等于预设压力阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a pressure sensor, and the processor is further configured to obtain a pressure value corresponding to the pressure sensor when the second voice information is collected; if the pressure value is greater than a predetermined If the pressure threshold is set, it is determined that the user's breath is detected; if the pressure value is less than or equal to the preset pressure threshold, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括温度传感器,所述处理器,还用于获取第一温度和第二温度,其中,所述第一温度为采集到所述第二语音信息之前,所述温度传感器对应的温度,所述第二温度为采集到所述第二语音信息时,所述温度传感器对应的温度;如果所述第二温度大于所述第一温度,则确定检测到用户气息;如果所述第二温度小于或等于所述第一温度,则确定没有检测到用户气息。In an implementable manner, the terminal includes a temperature sensor, and the processor is further configured to obtain a first temperature and a second temperature, where the first temperature is before the second voice information is collected, The temperature corresponding to the temperature sensor, the second temperature is the temperature corresponding to the temperature sensor when the second voice information is collected; if the second temperature is greater than the first temperature, it is determined that the user is detected Breath; if the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括湿度传感器,所述处理器,还用于获取采集到所述第二语音信息时,所述湿度传感器对应的湿度;如果所述湿度大于预设湿度阈值,则确定检测到用户气息;如果所述湿度小于或等于预设湿度阈值,则确定没有检测到用户气息。 In an implementable manner, the terminal includes a humidity sensor, and the processor is further configured to obtain the humidity corresponding to the humidity sensor when the second voice information is collected; if the humidity is greater than the preset humidity If the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括二氧化碳传感器,所述处理器,还用于获取采集到所述第二语音信息时,所述二氧化碳传感器对应的二氧化碳浓度;如果所述二氧化碳浓度大于预设二氧化碳浓度阈值,则确定检测到用户气息;如果所述二氧化碳浓度小于或等于预设二氧化碳浓度阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a carbon dioxide sensor, and the processor is further configured to obtain the carbon dioxide concentration corresponding to the carbon dioxide sensor when the second voice information is collected; if the carbon dioxide concentration is greater than a predetermined If the carbon dioxide concentration threshold is set, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
第五方面,本申请提供一种语音交互装置,所述装置包括处理器;所述处理器,用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则将语音交互的工作状态延长预设时长;如果确定所述终端不靠近用户的嘴部,则结束语音交互的工作状态。In a fifth aspect, the present application provides a voice interaction device, which includes a processor; the processor is configured to detect a wake-up instruction that initiates voice interaction; and enter a working state of voice interaction in response to the wake-up instruction; Detecting the first voice information; outputting a feedback result for the first voice information; determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extending the working state of the voice interaction for a preset time ; If it is determined that the terminal is not close to the user's mouth, the working state of voice interaction is ended.
第六方面,本申请提供一种语音交互装置,所述装置包括处理器;所述处理器,用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则输出针对所述第二语音信息的反馈结果。In a sixth aspect, the present application provides a voice interaction device, which includes a processor; the processor is configured to detect a wake-up instruction that initiates voice interaction; and enter a working state of voice interaction in response to the wake-up instruction; Detecting the first voice information; outputting a feedback result for the first voice information; if the second voice information is detected within a preset time period, determining whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth mouth, then output the feedback result for the second voice information.
第七方面,本申请提供一种终端,所述终端包括存储器和处理器;所述存储器和所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,使所述电子设备执行如第一方面至第三方面中任一所述的方法。In a seventh aspect, the present application provides a terminal. The terminal includes a memory and a processor; the memory is coupled to the processor; the memory is used to store computer program code, and the computer program code includes computer instructions. When the processor executes the computer instructions, the electronic device is caused to execute the method described in any one of the first to third aspects.
第八方面,本申请提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序或指令,当所述计算机程序或指令被执行时,如第一方面至第三方面中任一所述的方法被执行。In an eighth aspect, the present application provides a computer-readable storage medium in which computer programs or instructions are stored. When the computer programs or instructions are executed, as in the first to third aspects, Any of the methods described are executed.
第九方面,本申请提供一种计算机程序产品,所述计算机程序产品包括计算机程序或指令,当所述计算机程序或指令在计算机上运行时,使得计算机执行如第一方面至第三方面中任一所述的方法。In a ninth aspect, the present application provides a computer program product. The computer program product includes a computer program or instructions. When the computer program or instructions are run on a computer, the computer causes the computer to perform any of the first to third aspects. The method described in 1.
综上,本申请提供的语音交互方法、装置及终端,能够通过检测用户气息和/或确定终端是否靠近用户的嘴部,大概率识别到是用户本人有继续语音交互的意图,有效降低终端对于其他人或者周围其他噪声的错误响应,提升语音交互的准确性和用户体验。In summary, the voice interaction method, device and terminal provided by this application can detect the user's breath and/or determine whether the terminal is close to the user's mouth, and recognize with a high probability that the user himself has the intention to continue voice interaction, effectively reducing the terminal's Error responses from other people or other surrounding noises improve the accuracy and user experience of voice interaction.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without exerting creative efforts.
图1为本申请实施例提供的一种语音交互的应用场景图;Figure 1 is an application scenario diagram of voice interaction provided by an embodiment of the present application;
图2为本申请实施例提供的终端100的硬件结构框图;Figure 2 is a hardware structure block diagram of the terminal 100 provided by the embodiment of the present application;
图3为本申请实施例提供的一种语音交互方法的流程图;Figure 3 is a flow chart of a voice interaction method provided by an embodiment of the present application;
图4为本申请实施例提供的确定用户是否有继续语音交互的意图的第一种实现方式的流程图;Figure 4 is a flow chart of a first implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
图5为本申请实施例提供的确定用户是否有继续语音交互的意图的第二种实现方式的流程图; Figure 5 is a flow chart of a second implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
图6为本申请实施例提供的确定用户是否有继续语音交互的意图的第三种实现方式的流程图;Figure 6 is a flow chart of a third implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
图7为本申请实施例提供的确定用户是否有继续语音交互的意图的第四种实现方式的流程图;Figure 7 is a flow chart of a fourth implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
图8为本申请实施例提供的确定用户是否有继续语音交互的意图的第五种实现方式的流程图;Figure 8 is a flow chart of a fifth implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application;
图9为本申请实施例提供的一种语音交互装置的结构示意图。Figure 9 is a schematic structural diagram of a voice interaction device provided by an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only some of the embodiments of the present application, rather than all of the embodiments. Based on the embodiments in this application, all other embodiments obtained by those skilled in the art without making creative efforts fall within the scope of protection of this application.
在对本申请的技术方案进行说明之前,先对本申请的应用场景进行说明。Before describing the technical solution of the present application, the application scenarios of the present application will be described first.
图1为本申请实施例提供的一种语音交互的应用场景图。如图1所示,该应用场景图中包括终端100和用户200。终端100具有语音交互功能,用户200可以与终端100进行语音交互。目前,需要一个特定事件触发终端的语音交互功能,终端100才能进入语音交互的工作状态。通常,我们将触发终端的语音交互功能称为唤醒语音交互。唤醒语音交互的方式可以为唤醒词唤醒、长按电源键唤醒、点击桌面语音助手应用程序等方式,本申请对此不进行限定。Figure 1 is an application scenario diagram of voice interaction provided by an embodiment of the present application. As shown in Figure 1, the application scenario diagram includes a terminal 100 and a user 200. The terminal 100 has a voice interaction function, and the user 200 can perform voice interaction with the terminal 100 . Currently, a specific event needs to trigger the voice interaction function of the terminal so that the terminal 100 can enter the voice interaction working state. Usually, we refer to triggering the voice interaction function of the terminal as waking up voice interaction. The method of wake-up voice interaction can be wake-up word wake-up, long press the power button to wake up, click on the desktop voice assistant application, etc. This application does not limit this.
语音交互功能被唤醒后,用户200可以与终端100进行语音交互。在用户200与终端100语音交互的过程中,一般用户200说完一条语音后,终端100输出与这条语音对应的反馈结果。例如:语音交互功能被唤醒后,用户200说“今天天气如何?”,终端100接收到用户200说的“今天天气如何?”这条语音后,会对这条语音信息识别,并输出与这条语音信息对应的反馈,如终端100通过扬声器输出“今天天气为晴”。After the voice interaction function is awakened, the user 200 can perform voice interaction with the terminal 100 . During the voice interaction process between the user 200 and the terminal 100, generally after the user 200 finishes speaking a voice, the terminal 100 outputs a feedback result corresponding to the voice. For example: after the voice interaction function is awakened, user 200 says "How is the weather today?". After receiving the voice message "How is the weather today?" from user 200, the terminal 100 recognizes the voice information and outputs the same message as the voice message. Feedback corresponding to the piece of voice information, for example, the terminal 100 outputs "the weather is sunny today" through the speaker.
接着,如果用户200想继续与终端100进行语音交互,用户200可以在终端100反馈完上一条语音信息后,直接说下一条语音,这样就实现了与终端100的连续对话。Then, if the user 200 wants to continue voice interaction with the terminal 100, the user 200 can directly speak the next voice message after the terminal 100 has fed back the previous voice message, thus realizing a continuous conversation with the terminal 100.
一种可实现方式中,终端100在与用户200每完成一轮语音交互后,通过延长收音时间来实现上述连续对话功能。例如,在终端100输出与第一条语音对应的反馈结果后,终端100并没有退出收音,而是继续监听声音一段时间,比如10s。如果10s内没有收到任何语音信号,此时终端100再退出收音;如果10s内有收到语音信号,则终端100会继续执行针对收到的语音信息的反馈。In one implementation manner, the terminal 100 implements the above-mentioned continuous dialogue function by extending the sound collection time after each round of voice interaction with the user 200 is completed. For example, after the terminal 100 outputs the feedback result corresponding to the first voice message, the terminal 100 does not exit the sound collection, but continues to monitor the sound for a period of time, such as 10 seconds. If no voice signal is received within 10 seconds, the terminal 100 will exit the radio at this time; if a voice signal is received within 10 seconds, the terminal 100 will continue to perform feedback on the received voice information.
但是,在终端100延长收音的这段时间内,如果用户200没有发出任何声音,即用户200没有继续对话的意图,而周围有其他人在说话或者周围有其他噪声,终端100也会继续针对其他人说的话或者周围其的他噪声进行反馈,这样就会给用户200带来困扰和反感,影响用户体验。However, during the period when the terminal 100 extends the sound collection, if the user 200 does not make any sound, that is, the user 200 has no intention to continue the conversation, and there are other people talking or other noises around, the terminal 100 will continue to target other people. Feedback of what people say or other noises around them will cause trouble and disgust to the user 200 and affect the user experience.
为了解决上述技术问题,本申请提供一种语音交互方法,该方法能够有效降低终端100对于其他人或者周围其他噪声的错误响应,提升语音交互的准确性。本申请提供的一种语音交互方法,可以应用于终端100。本申请实施例中,终端100可以是手机、遥控器或手表、手环等智能穿戴设备。 In order to solve the above technical problems, this application provides a voice interaction method, which can effectively reduce the terminal 100's erroneous response to other people or other surrounding noises and improve the accuracy of voice interaction. The voice interaction method provided by this application can be applied to the terminal 100. In this embodiment of the present application, the terminal 100 may be a mobile phone, a remote control, or a smart wearable device such as a watch or bracelet.
下面以终端100是手机为例,对终端100的硬件结构进行介绍。Taking the terminal 100 as a mobile phone as an example, the hardware structure of the terminal 100 is introduced below.
图2为本申请实施例提供的终端100的硬件结构框图。如图2所示,终端100可以包括:处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。Figure 2 is a hardware structure block diagram of the terminal 100 provided by the embodiment of the present application. As shown in Figure 2, the terminal 100 may include: a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, and a battery 142 , Antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195, etc.
其中,上述传感器模块180可以包括压力传感器180A,陀螺仪传感器180B,气压传感器180C,磁传感器180D,加速度传感器180E,距离传感器180F,接近光传感器180G,指纹传感器180H,温度传感器180J,触摸传感器180K,环境光传感器180L、骨传导传感器180M、湿度传感器180N和二氧化碳传感器180P等传感器。Among them, the above-mentioned sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, and a touch sensor 180K. Sensors such as ambient light sensor 180L, bone conduction sensor 180M, humidity sensor 180N and carbon dioxide sensor 180P.
可以理解的是,本实施例示意的结构并不构成对终端100的具体限定。在另一些实施例中,终端100可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。It can be understood that the structure illustrated in this embodiment does not constitute a specific limitation on the terminal 100. In other embodiments, the terminal 100 may include more or fewer components than shown, or some components may be combined, or some components may be separated, or may be arranged differently. The components illustrated may be implemented in hardware, software, or a combination of software and hardware.
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) wait. Among them, different processing units can be independent devices or integrated in one or more processors.
控制器可以是终端100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be the nerve center and command center of the terminal 100. The controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。在一些实施例中,处理器110可以包括一个或多个接口。The processor 110 may also be provided with a memory for storing instructions and data. In some embodiments, the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system. In some embodiments, processor 110 may include one or more interfaces.
可以理解的是,本实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端100的结构限定。在另一些实施例中,终端100也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。It can be understood that the interface connection relationships between the modules illustrated in this embodiment are only schematic illustrations and do not constitute a structural limitation on the terminal 100 . In other embodiments, the terminal 100 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为终端供电。The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. While charging the battery 142, the charging management module 140 can also provide power to the terminal through the power management module 141.
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc.
终端100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块360,调制解调处理器以及基带处理器等实现。The wireless communication function of the terminal 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 360, the modem processor and the baseband processor.
天线1和天线2用于发射和接收电磁波信号。终端100中的每个天线可用于覆盖单 个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in terminal 100 may be used to cover a single one or more communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
移动通信模块150可以提供应用在终端100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。The mobile communication module 150 can provide wireless communication solutions including 2G/3G/4G/5G applied to the terminal 100. The mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
无线通信模块160可以提供应用在终端100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(blue tooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。The wireless communication module 160 can provide applications on the terminal 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (blue tooth, BT), and global navigation satellites. System (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
在一些实施例中,终端100的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得终端100可以通过无线通信技术与网络以及其他设备通信。In some embodiments, the antenna 1 of the terminal 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal 100 can communicate with the network and other devices through wireless communication technology.
终端100通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The terminal 100 implements the display function through the GPU, the display screen 194, and the application processor. The GPU is an image processing microprocessor and is connected to the display screen 194 and the application processor. GPUs are used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
显示屏194用于显示图像,视频等。该显示屏194包括显示面板。例如,显示屏194可以是触摸屏。The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. For example, display screen 194 may be a touch screen.
终端100可以通过ISP,摄像头193,视频编解码器,GPU,显示屏194以及应用处理器等实现拍摄功能。The terminal 100 can implement the shooting function through the ISP, camera 193, video codec, GPU, display screen 194, application processor, etc.
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端100的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. Such as saving music, videos, etc. files in external memory card.
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行终端100的各种功能应用以及数据处理。例如,在本申请实施例中,处理器110可以通过执行存储在内部存储器121中的指令,内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端100使用过程中所创建的数据(比如音频数据,电话本等)等。Internal memory 121 may be used to store computer executable program code, which includes instructions. The processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the terminal 100 . For example, in the embodiment of the present application, the processor 110 can execute instructions stored in the internal memory 121, and the internal memory 121 can include a program storage area and a data storage area. Among them, the stored program area can store an operating system, at least one application program required for a function (such as a sound playback function, an image playback function, etc.). The storage data area may store data created during use of the terminal 100 (such as audio data, phone book, etc.).
终端100可以通过音频模块370,扬声器370A,受话器370B,麦克风370C,耳机接口370D,以及应用处理器等实现音频功能。例如,可以通过麦克风370C采集用户的语音信息,通过扬声器370A播放针对用户的语音信息的反馈结果。The terminal 100 can implement audio functions through the audio module 370, the speaker 370A, the receiver 370B, the microphone 370C, the headphone interface 370D, and the application processor. For example, the user's voice information can be collected through the microphone 370C, and the feedback result for the user's voice information can be played through the speaker 370A.
触摸传感器,也称“触控面板”。触摸传感器可以设置于显示屏194,由触摸传感器与显示屏194组成触摸屏,也称“触控屏”。触摸传感器用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器也可以设置于终端100的表面,与显示屏194所处的位置不同。Touch sensor, also called "touch panel". The touch sensor can be disposed on the display screen 194, and the touch sensor and the display screen 194 form a touch screen, which is also called a "touch screen". Touch sensors are used to detect touches on or near them. The touch sensor can pass the detected touch operation to the application processor to determine the touch event type. Visual output related to the touch operation may be provided through display screen 194 . In other embodiments, the touch sensor may also be disposed on the surface of the terminal 100 in a position different from that of the display screen 194 .
本申请实施例中,终端100可以通过触摸传感器检测到用户在触摸屏输入的触摸操 作,并采集该触摸操作在触摸屏上的触控位置,以及触控时间等中的一项或多项。在一些实施例中,终端100可以通过触摸传感器180K和压力传感器180A结合起来,确定触摸操作在触摸屏的触控位置。In this embodiment of the present application, the terminal 100 can detect the touch operation input by the user on the touch screen through a touch sensor. operation, and collect one or more of the touch position of the touch operation on the touch screen, the touch time, etc. In some embodiments, the terminal 100 can determine the touch location of the touch operation on the touch screen through a combination of the touch sensor 180K and the pressure sensor 180A.
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。终端100可以接收按键输入,产生与终端100的用户设置以及功能控制有关的键信号输入。例如,可以通过长按电源键唤醒语音交互功能。The buttons 190 include a power button, a volume button, etc. Key 190 may be a mechanical key. It can also be a touch button. The terminal 100 may receive key input and generate key signal input related to user settings and function control of the terminal 100. For example, the voice interaction function can be awakened by long pressing the power button.
马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏194不同区域的触摸操作,马达191也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 191 can generate vibration prompts. The motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback. For example, touch operations for different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects. The motor 191 can also respond to different vibration feedback effects for touch operations in different areas of the display screen 194 . Different application scenarios (such as time reminders, receiving information, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also be customized.
指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现和终端100的接触和分离。终端100可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口195可以支持Nano SIM卡,Micro SIM卡,SIM卡等。The indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc. The SIM card interface 195 is used to connect a SIM card. The SIM card can be connected to or separated from the terminal 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 . The terminal 100 can support 1 or N SIM card interfaces, where N is a positive integer greater than 1. SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card, etc.
陀螺仪传感器180B可以是三轴陀螺仪,用于追踪终端100在6个方向的状态变化。加速度传感器180E用于检测终端100的运动速度、方向以及位移。本申请实施例中,终端100可以通过陀螺仪传感器180B和加速度传感器180E检测终端100的状态和位置,并可以基于终端100在不同时刻的状态和位置,确定用户手持终端100的手势。如,用户手持终端100向用户的嘴部方向靠近,或者,用户手持终端100向用户的嘴部方向远离。The gyro sensor 180B may be a three-axis gyroscope, used to track state changes of the terminal 100 in six directions. The acceleration sensor 180E is used to detect the movement speed, direction and displacement of the terminal 100 . In the embodiment of the present application, the terminal 100 can detect the status and position of the terminal 100 through the gyroscope sensor 180B and the acceleration sensor 180E, and can determine the gesture of the user holding the terminal 100 based on the status and position of the terminal 100 at different times. For example, the user holds the terminal 100 toward the user's mouth, or the user holds the terminal 100 away from the user's mouth.
以下实施例中的方法均可以在具有上述硬件结构的终端100中实现。The methods in the following embodiments can all be implemented in the terminal 100 with the above hardware structure.
下面对本申请实施例提供的语音交互方法进行示例性说明。The following is an exemplary description of the voice interaction method provided by the embodiment of the present application.
图3为本申请实施例提供的一种语音交互方法的流程图。如图3所示,该方法可以包括以下步骤:Figure 3 is a flow chart of a voice interaction method provided by an embodiment of the present application. As shown in Figure 3, the method may include the following steps:
步骤S1,检测到发起语音交互的唤醒指示。Step S1: A wake-up instruction to initiate voice interaction is detected.
唤醒指示用于唤醒终端100进入语音交互的工作状态。其中,唤醒指示可以是用户向终端100输入的特定的唤醒词,可以是用户长按电源键的操作,或者,可以是用户点击桌面语音助手应用程序的操作等。The wake-up instruction is used to wake up the terminal 100 to enter the working state of voice interaction. The wake-up instruction may be a specific wake-up word input by the user to the terminal 100, the user may press and hold the power button, or the user may click on the desktop voice assistant application, etc.
本申请实施例还提供一种气息唤醒的唤醒方式。气息唤醒是指用户通过嘴部对着终端100、且在距离终端100预设距离范围内生成气息(如,说话或吹气)的方式,唤醒终端100进入语音交互的工作状态。这样,用户可以将终端100放在嘴边,并直接对着终端100说话或吹气,即可唤醒终端100进入语音交互的工作状态,而无需使用特定的唤醒词或按下按键。对应的,当终端100检测到用户气息,则进入语音交互的工作状态。The embodiment of the present application also provides a wake-up method for breath wake-up. Breath awakening means that the user awakens the terminal 100 to enter the voice interaction working state by pointing the mouth towards the terminal 100 and generating breath (such as speaking or blowing) within a preset distance range from the terminal 100 . In this way, the user can put the terminal 100 to his mouth and speak or blow directly into the terminal 100 to wake up the terminal 100 and enter the voice interaction working state without using a specific wake-up word or pressing a button. Correspondingly, when the terminal 100 detects the user's breath, it enters the voice interaction working state.
一种可实现方式中,用户气息的检测方法,可以采用以下实现方式:利用麦克风170C采集语音信息,如果采集到语音信息,则可以利用气息识别模块,确定采集到的语音信息是否为用户通过嘴部对着终端100、且在距离终端100预设距离范围内说的话或吹的气。气息识别模块可以是一种训练好的用于识别气息的神经网络。In one possible implementation, the user's breath detection method can be implemented in the following way: using the microphone 170C to collect voice information. If the voice information is collected, the breath recognition module can be used to determine whether the collected voice information is the user's breath through the mouth. words or air blown towards the terminal 100 and within a preset distance range from the terminal 100 . The breath recognition module may be a neural network trained for identifying breath.
示例性的,用户与麦克风170C在不同距离条件下说话时,在麦克风170C上会形成 不同的气流。例如当用户靠近麦克风170C说话时,如说话内容中包括“b,c,d,f,j,k,l,p,q,r,s,t,v,w,x,y,z”等辅音,会在麦克风170C上引起的爆音。这样,可以通过训练学习用户对着麦克风170C说话时,产生的爆音的特征,得到气息识别模块。气息识别模块为训练好的神经网络,能够针对输入的语音信息,识别出该语音信息是否为靠近麦克风170C输入的声音。如,训练好的神经网络能够准确检测接近麦克风5厘米以内的人声。这样,当气息识别模块识别到输入的语音信息为接近麦克风5厘米以内的人声,则确定检测到用户气息,从而唤醒终端100进入语音交互的工作状态。For example, when the user speaks at different distances from the microphone 170C, a sound will be formed on the microphone 170C. Different airflow. For example, when the user speaks close to the microphone 170C, the speech content includes "b, c, d, f, j, k, l, p, q, r, s, t, v, w, x, y, z" etc. Consonants will cause popping sounds on the microphone 170C. In this way, the breath recognition module can be obtained by learning the characteristics of the popping sound produced when the user speaks into the microphone 170C through training. The breath recognition module is a trained neural network that can identify whether the input voice information is a sound input close to the microphone 170C. For example, a well-trained neural network can accurately detect human voices within 5 centimeters of the microphone. In this way, when the breath recognition module recognizes that the input voice information is a human voice within 5 cm of the microphone, it is determined that the user's breath is detected, thereby waking up the terminal 100 to enter the voice interaction working state.
需要说明的是,用户向终端100吹气,麦克风170C也能够采集到声音,本申请将吹气产生的声音也称为语音信息。It should be noted that when the user blows air into the terminal 100, the microphone 170C can also collect the sound. In this application, the sound generated by blowing is also called voice information.
一种可实现方式中,用户气息的检测方法,还可以采用以下实现方式:如果麦克风170C采集到语音信息,则获取采集到该语音信息时,压力传感器180A采集到的压力值。如果压力值大于预设压力阈值,则确定检测到用户气息。In one possible implementation, the user's breath detection method can also adopt the following implementation: if the microphone 170C collects voice information, obtain the pressure value collected by the pressure sensor 180A when the voice information is collected. If the pressure value is greater than the preset pressure threshold, it is determined that the user's breath is detected.
当用户嘴部对着终端、且在距离终端100预设距离范围内说话或吹气时产生的气流,会对终端100产生一定的压力。这样,本申请实施例可以利用压力传感器180A,检测用户说话时对终端100产生的压力,如果压力值大于预设压力阈值,表示用户嘴部对着终端100、且在距离终端100预设距离范围内说话或吹气,这样可以确定检测到用户气息。相反,如果压力值小于或等于预设压力阈值,表示用户没有在距离终端100预设距离范围内说话或吹气,这样可以确定没有检测到用户气息。The airflow generated when the user's mouth is facing the terminal and speaks or blows within a preset distance range from the terminal 100 will produce a certain pressure on the terminal 100 . In this way, the embodiment of the present application can use the pressure sensor 180A to detect the pressure generated by the user on the terminal 100 when speaking. If the pressure value is greater than the preset pressure threshold, it means that the user's mouth is facing the terminal 100 and is within the preset distance range from the terminal 100 Talk or blow into the device to ensure that the user's breath is detected. On the contrary, if the pressure value is less than or equal to the preset pressure threshold, it means that the user does not speak or blow within the preset distance range from the terminal 100, so it can be determined that the user's breath is not detected.
需要说明的是,本申请实施例中,压力传感器180A的参数要能够满足气息检测的精度需求。例如,用户嘴部对着终端100、且在距离终端100预设距离范围内说话或吹气时产生的气流,对终端100产生压力为0.07Mpa,压力传感器180A的量程为0~0.3Mpa,测量精度为0.001Mpa。It should be noted that in the embodiment of the present application, the parameters of the pressure sensor 180A must be able to meet the accuracy requirements of breath detection. For example, the airflow generated when the user's mouth is facing the terminal 100 and speaking or blowing within a preset distance range from the terminal 100 will produce a pressure of 0.07Mpa on the terminal 100. The measurement range of the pressure sensor 180A is 0 to 0.3Mpa. The accuracy is 0.001Mpa.
还需要说明的是,为了提高压力传感器180A检测的准确度,可以将压力传感器180A设置在麦克风170C附近。这样,在用户靠近麦克风170C说话时,麦克风170C附近的压力传感器180A能够检测到说话产生的气流对压力传感器180A产生的压力。It should also be noted that in order to improve the detection accuracy of the pressure sensor 180A, the pressure sensor 180A can be disposed near the microphone 170C. In this way, when the user speaks close to the microphone 170C, the pressure sensor 180A near the microphone 170C can detect the pressure exerted on the pressure sensor 180A by the airflow generated by speaking.
一种可实现方式中,用户气息的检测方法,还可以采用以下实现方式:如果麦克风170C采集到语音信息,则获取第一温度和第二温度。其中,第一温度为在麦克风170C采集到该语音信息之前,温度传感器180J采集到的温度;第二温度为在麦克风170C采集到该语音信息时,温度传感器180J采集到的温度。如果第二温度大于第一温度,则确定检测到用户气息;如果第二温度小于或等于第一温度,则确定没有检测到用户气息。In one possible implementation, the user's breath detection method can also adopt the following implementation: if the microphone 170C collects voice information, the first temperature and the second temperature are obtained. The first temperature is the temperature collected by the temperature sensor 180J before the microphone 170C collects the voice information; the second temperature is the temperature collected by the temperature sensor 180J when the microphone 170C collects the voice information. If the second temperature is greater than the first temperature, it is determined that the user's breath is detected; if the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
一种可实现方式中,用户气息的检测方法,还可以采用以下实现方式:如果麦克风170C采集到语音信息,则获取采集到该语音信息时,湿度传感器180N采集到的湿度。如果湿度大于预设湿度阈值,则确定检测到用户气息;如果湿度小于或等于预设湿度阈值,则确定没有检测到用户气息。In one possible implementation, the user's breath detection method can also be implemented in the following implementation: if the microphone 170C collects voice information, the humidity collected by the humidity sensor 180N when the voice information is collected is obtained. If the humidity is greater than the preset humidity threshold, it is determined that the user's breath is detected; if the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
一种可实现方式中,用户气息的检测方法,还可以采用以下实现方式:如果麦克风170C采集到语音信息,则获取采集到该语音信息时,二氧化碳传感器180P采集到的二氧化碳浓度。如果二氧化碳浓度大于预设二氧化碳浓度阈值,则确定检测到用户气息;如果二氧化碳浓度小于或等于预设二氧化碳浓度阈值,则确定没有检测到用户气息。In one possible implementation, the user's breath detection method can also adopt the following implementation: if the microphone 170C collects voice information, obtain the carbon dioxide concentration collected by the carbon dioxide sensor 180P when the voice information is collected. If the carbon dioxide concentration is greater than the preset carbon dioxide concentration threshold, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
当用户嘴部对着终端100、且在距离终端100预设距离范围内说话或吹气时,在终端100附近的温度、湿度、二氧化碳浓度会产生一定的变化。因此,本申请实施例可以 基于温度传感器180J、湿度传感器180N或二氧化碳传感器180P采集的数据,确定是否检测到用户气息。When the user's mouth faces the terminal 100 and speaks or blows within a preset distance range from the terminal 100, the temperature, humidity, and carbon dioxide concentration near the terminal 100 will change to a certain extent. Therefore, the embodiments of this application can Based on the data collected by the temperature sensor 180J, the humidity sensor 180N, or the carbon dioxide sensor 180P, it is determined whether the user's breath is detected.
需要说明的是,以上实施例仅对检测用户气息的实现方式进行示例性说明,并不表示对检测用户气息的具体实现方式的限定。例如,还可以组合采用以上实施例中列举的多种实现方式,例如,可以组合采用“气息识别模块”和“压力传感器”的方案,又例如,可以组合采用“气息识别模块”和“温度传感器”的方案,又例如,可以组合采用“气息识别模块”和“湿度传感器”的方案,再例如,可以组合采用“气息识别模块”和“二氧化碳传感器”的方案。It should be noted that the above embodiments only illustrate the implementation of detecting the user's breath and do not limit the specific implementation of detecting the user's breath. For example, the various implementation methods listed in the above embodiments can also be used in combination. For example, the solution of "breath recognition module" and "pressure sensor" can be used in combination. For example, the "breath recognition module" and "temperature sensor" can be used in combination. " solution, for example, the solution of "breath identification module" and "humidity sensor" can be combined. For another example, the solution of "breath identification module" and "carbon dioxide sensor" can be combined.
还需要说明的是,在其他应用程序占用麦克风时,本申请实施例提供的语音交互方法不可用。例如,当用户使用终端100打电话时,即使用户嘴部对着终端100、且在距离终端100预设距离范围内生成气息,也不会唤醒终端100进入语音交互工作状态。It should also be noted that when other applications occupy the microphone, the voice interaction method provided by the embodiment of the present application is not available. For example, when the user makes a call using the terminal 100, even if the user's mouth is facing the terminal 100 and breath is generated within a preset distance range from the terminal 100, the terminal 100 will not be awakened to enter the voice interaction working state.
步骤S2,响应于唤醒指示,进入语音交互工作状态。Step S2: In response to the wake-up instruction, enter the voice interaction working state.
进入语音交互工作状态后,终端100会继续收音,以获取到用户的语音信息。After entering the voice interaction working state, the terminal 100 will continue to collect audio to obtain the user's voice information.
步骤S3,检测到第一语音信息。Step S3: The first voice information is detected.
步骤S4,输出针对第一语音信息的反馈结果。Step S4: Output the feedback result for the first voice information.
本申请实施例中针对第一语音信息的反馈结果,可以是语音、文字、图像或者是进入某个应用程序等,本申请对此不进行限定。In the embodiment of this application, the feedback result for the first voice information may be voice, text, image, or entering an application program, etc. This application does not limit this.
示例性的,进入语音交互工作状态后,用户说一句话,如“今天天气怎么样”,接着,“今天天气怎么样”作为第一语音信息,被终端100检测到。然后,终端100输出针对第一语音信息的反馈结果,如终端100通过扬声器170A输出语音“今天天气为晴”,又例如,终端100可以通过显示屏194显示文字“今天天气为晴”。For example, after entering the voice interaction working state, the user says a sentence, such as "How is the weather today?", and then "How is the weather today" is detected by the terminal 100 as the first voice information. Then, the terminal 100 outputs the feedback result for the first voice information. For example, the terminal 100 outputs the voice "Today's weather is sunny" through the speaker 170A. For example, the terminal 100 can display the text "Today's weather is sunny" through the display screen 194.
示例性的,进入语音交互工作状态后,用户说一句话,如“拨打张三电话”,接着,“拨打张三电话”作为第一语音信息,被终端100检测到。然后,终端100输出针对第一语音信息的反馈结果,如终端100进入语音通话应用程序,并拨打张三的电话号码。For example, after entering the voice interaction working state, the user says a sentence, such as "Dial Zhang San", and then "Dial Zhang San" as the first voice message, which is detected by the terminal 100. Then, the terminal 100 outputs the feedback result for the first voice information. For example, the terminal 100 enters the voice call application and dials Zhang San's phone number.
步骤S5,确定用户是否有继续语音交互的意图。Step S5: Determine whether the user has the intention to continue voice interaction.
如果用户有继续语音交互的意图,则保持语音交互工作状态;如果用户没有继续语音交互的意图,则结束语音交互工作状态。If the user has the intention to continue voice interaction, the voice interaction working state is maintained; if the user has no intention to continue voice interaction, the voice interaction working state is ended.
需要说明的是,在语音交互的工作状态下,终端100可以持续收音;结束语音交互工作状态后,终端结束收音。It should be noted that in the voice interaction working state, the terminal 100 can continue to collect sounds; after ending the voice interaction working state, the terminal stops collecting sounds.
本申请实施例提供如下几种确定用户是否有继续语音交互的意图的实现方式。The embodiments of this application provide the following implementation methods for determining whether the user has the intention to continue voice interaction.
图4为本申请实施例提供的确定用户是否有继续语音交互的意图的第一种实现方式的流程图。Figure 4 is a flowchart of a first implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
如图4所示,确定用户是否有继续语音交互的意图的第一种实现方式,可以包括以下步骤:As shown in Figure 4, the first implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
步骤S51,确定终端100是否靠近用户的嘴部。Step S51, determine whether the terminal 100 is close to the user's mouth.
本申请实施例中确定终端100是否靠近用户的嘴部,是指确定终端100是否在用户嘴边。In the embodiment of the present application, determining whether the terminal 100 is close to the user's mouth means determining whether the terminal 100 is near the user's mouth.
如果语音交互的工作状态是通过气息唤醒的方式唤醒的,说明在唤醒终端100时,终端100是在用户嘴边的。因此,在输出针对第一语音信息的反馈结果之后,可以通过判断步骤S1至步骤S4这段时间,用户是否手持终端100向用户的嘴部方向远离,确定 终端100是否依然在用户嘴边。如果是,则认为终端100已不在用户嘴边,这种情况可以认为用户没有继续语音交互的意图,可以结束语音交互工作状态;如果否,则认为终端100仍然在用户嘴边,这种情况认为用户可能有继续语音交互的意图,可以继续执行后续步骤。If the working state of voice interaction is awakened by breathing, it means that the terminal 100 is near the user's mouth when the terminal 100 is awakened. Therefore, after outputting the feedback result for the first voice information, it can be determined by determining whether the user holds the terminal 100 away from the user's mouth during the period from step S1 to step S4. Whether the terminal 100 is still at the user's mouth. If yes, it is considered that the terminal 100 is no longer near the user's mouth. In this case, it can be considered that the user has no intention to continue voice interaction, and the voice interaction working state can be ended; if no, it is considered that the terminal 100 is still near the user's mouth. In this case, it is considered that the terminal 100 is no longer near the user's mouth. The user may have the intention to continue voice interaction and can continue to perform subsequent steps.
如果语音交互的工作状态不是通过气息唤醒的方式唤醒的,说明在唤醒终端100时,终端100不在用户嘴边。这种情况下,本申请可以在进入语音交互的工作状态之后,先确定在输出针对所述第一语音信息的反馈结果之前,用户是否手持终端100向用户的嘴部方向靠近。如果确定在输出针对所述第一语音信息的反馈结果之前,用户手持终端100向用户的嘴部方向靠近,则再确定在输出针对第一语音信息的反馈结果之后,终端100是否仍然在用户嘴边(具体的,确定在输出针对第一语音信息的反馈结果之后,用户是否手持终端100向用户的嘴部方向远离)。如果确定在输出针对所述第一语音信息的反馈结果之前,用户没有手持终端100向用户的嘴部方向靠近,这种情况可以认为用户没有继续语音交互的意图,可以结束语音交互的工作状态。If the working state of voice interaction is not awakened by breathing, it means that the terminal 100 is not near the user's mouth when the terminal 100 is awakened. In this case, after entering the voice interaction working state, the present application may first determine whether the user holds the terminal 100 closer to the user's mouth before outputting the feedback result for the first voice information. If it is determined that the user holds the terminal 100 closer to the user's mouth before outputting the feedback result for the first voice information, then it is then determined whether the terminal 100 is still in the direction of the user's mouth after outputting the feedback result for the first voice information. side (specifically, it is determined whether the user holds the terminal 100 away from the user's mouth after outputting the feedback result for the first voice information). If it is determined that the user does not hold the terminal 100 toward the user's mouth before outputting the feedback result for the first voice information, in this case, it can be considered that the user has no intention to continue the voice interaction, and the working state of the voice interaction can be ended.
一种可实现方式中,可以利用终端100上的陀螺仪传感器180C和加速度传感器180E,采集终端100的角速度和加速度;然后,利用采集的角速度和加速度,确定用户的手势。其中,用户的手势可以包括第一手势、第二手势和第三手势,第一手势用于表征用户手持终端100处于静止状态,第二手势用于表征用户手持终端100向用户的嘴部方向远离,第三手势用于表征用户手持终端100向用户的嘴部方向靠近。In one implementation, the gyro sensor 180C and the acceleration sensor 180E on the terminal 100 can be used to collect the angular velocity and acceleration of the terminal 100; then, the collected angular velocity and acceleration are used to determine the user's gesture. The user's gestures may include a first gesture, a second gesture and a third gesture. The first gesture is used to indicate that the user's handheld terminal 100 is in a stationary state, and the second gesture is used to indicate that the user's handheld terminal 100 is moving towards the user's mouth. The third gesture indicates that the user holds the terminal 100 closer to the user's mouth.
示例性的,可以利用陀螺仪传感器180C和加速度传感器180E,采集步骤S1至步骤S4这段时间的角速度和加速度。然后,将采集的角速度和加速度输入手势识别模块,手势识别模块可以是训练好的用于手势识别的神经网络。经过手势识别模块处理后,输出用户的手势。其中,手势识别模块,能够基于终端100在不同时刻的角速度和加速度,确定用户的手持。For example, the gyro sensor 180C and the acceleration sensor 180E can be used to collect the angular velocity and acceleration during the period from step S1 to step S4. Then, the collected angular velocity and acceleration are input into the gesture recognition module. The gesture recognition module can be a neural network trained for gesture recognition. After processing by the gesture recognition module, the user's gesture is output. Among them, the gesture recognition module can determine the user's handholding based on the angular velocity and acceleration of the terminal 100 at different times.
需要说明的是,本申请实施例中,如果用户手势变化较小,在预设变化范围内,则认为用户的手势为静止。例如,如果由用户手持终端100距离嘴部5cm,变化为用户手持终端100距离嘴部4cm,则认为用户的手势为第一手势。It should be noted that in this embodiment of the present application, if the change of the user's gesture is small and within the preset change range, the user's gesture is considered to be static. For example, if the user holds the terminal 100 at a distance of 5 cm from the mouth, and the user holds the terminal 100 at a distance of 4 cm from the mouth, the user's gesture is considered to be the first gesture.
步骤S52,如果确定终端100靠近用户的嘴部,则将语音交互工作状态延长预设时长。Step S52: If it is determined that the terminal 100 is close to the user's mouth, the voice interaction working state is extended for a preset time.
如果在输出针对第一语音信息的反馈结果之后,终端100仍然在用户嘴边,则认为用户可能有继续语音交互的意图。这种情况下,本申请将语音交互工作状态延长预设时长。在延长的预设时长内,终端100持续收音。If the terminal 100 is still near the user's mouth after outputting the feedback result for the first voice information, it is considered that the user may have the intention to continue voice interaction. In this case, this application extends the voice interaction working state for a preset time. Within the extended preset time period, the terminal 100 continues to receive audio.
本申请对预设时长不进行限定,例如可以是5s,10s,20s等。This application does not limit the preset time, for example, it can be 5s, 10s, 20s, etc.
步骤S53,确定预设时长内是否检测到第二语音信息。Step S53: Determine whether the second voice message is detected within a preset time period.
步骤S54,如果在预设时长内检测到第二语音信息,则确定是否检测到用户气息。Step S54: If the second voice information is detected within the preset time period, determine whether the user's breath is detected.
如果在预设时长内没有检测到第二语音信息,则结束语音交互的工作状态。如果在预设时长内检测到第二语音信息,终端100检测到的第二语音信息可能是用户说的话,也可能是周围有其他人在说话。因此,本申请进一步检测用户气息,以确定第二语音信息是否为用户对终端100说的话。If the second voice information is not detected within the preset time period, the working state of voice interaction is ended. If the second voice information is detected within the preset time period, the second voice information detected by the terminal 100 may be what the user said, or it may be that other people are talking around. Therefore, the present application further detects the user's breath to determine whether the second voice information is what the user said to the terminal 100 .
此处需要说明的是,本申请实施例中,在语音交互的工作状态下,需要用户的嘴部靠近终端100,与终端100进行语音交互。因此,如果第二语音信息为用户说的话,则 用户在说话时产生的气息可以被终端检测到。也就是说,本申请可以基于是否能够检测到用户气息,确定第二语音信息为用户说的话,还是周围有其他人说的话。It should be noted here that in the embodiment of the present application, in the working state of voice interaction, the user's mouth needs to be close to the terminal 100 to perform voice interaction with the terminal 100 . Therefore, if the second voice message is spoken by the user, then The breath produced by the user while speaking can be detected by the terminal. In other words, this application can determine whether the second voice information is what the user said or whether it is what other people around him said based on whether the user's breath can be detected.
其中,用户气息的检测方式可以参见上述步骤S1的描述,此处不再赘述。例如,可以利用气息识别模块、压力传感器180A、温度传感器180J、湿度传感器180N或二氧化碳传感器180P,检测用户气息。For the detection method of the user's breath, please refer to the description of step S1 above, and will not be described again here. For example, the breath recognition module, pressure sensor 180A, temperature sensor 180J, humidity sensor 180N or carbon dioxide sensor 180P can be used to detect the user's breath.
步骤S55,如果检测到用户气息,则输出针对第二语音信息的反馈结果。Step S55: If the user's breath is detected, the feedback result for the second voice information is output.
如果检测到用户气息,则说明第二语音信息为用户的嘴部对着终端100、且在距离终端100预设距离范围内说的话。这种情况认为用户有继续语音交互的意图,则输出针对第二语音信息的反馈结果。如果没有检测到用户气息,则说明第二语音信息为周围其他人说的话,不是用户说的话。这种情况认为用户没有继续语音交互的意图,则可以结束语音交互的工作状态。If the user's breath is detected, it means that the second voice information is spoken by the user's mouth toward the terminal 100 and within a preset distance range from the terminal 100 . In this case, it is considered that the user has the intention to continue voice interaction, and the feedback result for the second voice information is output. If the user's breath is not detected, it means that the second voice information is what other people around him said, not what the user said. In this case, it is considered that the user has no intention of continuing the voice interaction, and the working state of the voice interaction can be ended.
综上,本申请实施例提供的确定用户是否有继续语音交互的意图的第一种实现方式中,在输出针对第一语音信息的反馈结果之后,先确定终端100是否在用户嘴边,如果终端100不在用户嘴边,则结束语音交互工作状态;如果确定终端100在用户嘴边,则将语音交互工作状态延长预设时长。接着,如果在预设时长内没有检测到第二语音信息,则结束语音交互工作状态;如果在预设时长内检测到第二语音信息,则检测用户气息。如果没有检测到用户气息,则结束语音交互的工作状态;如果检测到用户气息,则输出针对第二语音信息的反馈结果。也就是说,第一种实现方式中,如果终端100在用户嘴边、且能够检测到用户气息,则确定用户有继续语音交互的意图。这样,本申请实施例提供的语音交互方法,能够大概率识别到是用户本人有继续语音交互的意图,有效降低终端100对于其他人或者周围其他噪声的错误响应,提升语音交互的准确性和用户体验。In summary, in the first implementation method of determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application, after outputting the feedback result for the first voice information, it is first determined whether the terminal 100 is near the user's mouth. If the terminal 100 is not near the user's mouth, the voice interaction working state is ended; if it is determined that the terminal 100 is near the user's mouth, the voice interaction working state is extended for a preset time. Then, if the second voice information is not detected within the preset time period, the voice interaction working state is ended; if the second voice information is detected within the preset time period, the user's breath is detected. If the user's breath is not detected, the voice interaction working state is ended; if the user's breath is detected, the feedback result for the second voice information is output. That is to say, in the first implementation manner, if the terminal 100 is near the user's mouth and can detect the user's breath, it is determined that the user has the intention to continue voice interaction. In this way, the voice interaction method provided by the embodiments of the present application can identify with a high probability that the user himself has the intention to continue voice interaction, effectively reducing the terminal 100's erroneous response to other people or other surrounding noises, and improving the accuracy of voice interaction and user satisfaction. experience.
图5为本申请实施例提供的确定用户是否有继续语音交互的意图的第二种实现方式的流程图。Figure 5 is a flowchart of a second implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
如图5所示,确定用户是否有继续语音交互的意图的第二种实现方式,可以包括以下步骤:As shown in Figure 5, the second implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
步骤S61,输出针对第一语音信息的反馈结果之后,将语音交互工作状态延长预设时长。Step S61: After outputting the feedback result for the first voice information, the voice interaction working state is extended for a preset time period.
步骤S62,确定预设时长内是否检测到第二语音信息;Step S62, determine whether the second voice message is detected within a preset time period;
步骤S63,如果在预设时长内检测到第二语音信息,则确定终端100是否靠近用户的嘴部。Step S63: If the second voice information is detected within the preset time period, determine whether the terminal 100 is close to the user's mouth.
步骤S64,如果确定终端100靠近用户的嘴部,则确定是否检测到用户气息。Step S64: If it is determined that the terminal 100 is close to the user's mouth, it is determined whether the user's breath is detected.
步骤S65,如果检测到用户气息,则输出针对第二语音信息的反馈结果。Step S65: If the user's breath is detected, the feedback result for the second voice information is output.
综上,第二种实现方式中,在输出针对第一语音信息的反馈结果之后,直接将语音交互工作状态延长预设时长。如果在预设时长内没有检测到第二语音信息,则结束语音交互工作状态。如果在预设时长内检测到第二语音信息,则先确定终端100是否在用户嘴边,如果终端100不在用户嘴边,则结束语音交互工作状态。如果确定终端100在用户嘴边,则再检测用户气息。如果没有检测到用户气息,则结束语音交互的工作状态;如果检测到用户气息,则输出针对第二语音信息的反馈结果。To sum up, in the second implementation method, after outputting the feedback result for the first voice information, the voice interaction working state is directly extended for a preset time. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. If the second voice information is detected within the preset time period, it is first determined whether the terminal 100 is near the user's mouth. If the terminal 100 is not near the user's mouth, the voice interaction working state is ended. If it is determined that the terminal 100 is near the user's mouth, the user's breath is detected again. If the user's breath is not detected, the voice interaction working state is ended; if the user's breath is detected, the feedback result for the second voice information is output.
需要说明的是,上述步骤S63中确定终端100是否在用户嘴边的具体实现方式可以参见步骤S51的描述,步骤S64中检测用户气息的具体实现方式可以参见步骤S54的描 述,步骤S65的具体实现方式可以参见步骤S55的描述,此处不再赘述。It should be noted that, for the specific implementation method of determining whether the terminal 100 is near the user's mouth in step S63, please refer to the description of step S51, and for the specific implementation method of detecting the user's breath in step S64, please refer to the description of step S54. For the specific implementation of step S65, please refer to the description of step S55, which will not be described again here.
图6为本申请实施例提供的确定用户是否有继续语音交互的意图的第三种实现方式的流程图。FIG. 6 is a flow chart of a third implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
如图6所示,确定用户是否有继续语音交互的意图的第三种实现方式,可以包括以下步骤:As shown in Figure 6, the third implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
步骤S71,输出针对第一语音信息的反馈结果之后,将语音交互工作状态延长预设时长。Step S71: After outputting the feedback result for the first voice information, the voice interaction working state is extended for a preset time period.
步骤S72,确定预设时长内是否检测到第二语音信息;Step S72, determine whether the second voice message is detected within the preset time period;
步骤S73,如果在预设时长内检测到第二语音信息,则确定是否检测到用户气息。Step S73: If the second voice information is detected within the preset time period, determine whether the user's breath is detected.
步骤S74,如果检测到用户气息,则输出针对第二语音信息的反馈结果。Step S74: If the user's breath is detected, a feedback result for the second voice information is output.
综上,第三种实现方式中,在输出针对第一语音信息的反馈结果之后,直接将语音交互工作状态延长预设时长。如果在预设时长内没有检测到第二语音信息,则结束语音交互工作状态。如果在预设时长内检测到第二语音信息,则检测用户气息。如果没有检测到用户气息,则结束语音交互的工作状态。如果检测到用户气息,则输出针对第二语音信息的反馈结果。To sum up, in the third implementation method, after outputting the feedback result for the first voice information, the voice interaction working state is directly extended for a preset time. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. If the second voice information is detected within the preset time period, the user's breath is detected. If the user's breath is not detected, the voice interaction working state is ended. If the user's breath is detected, a feedback result for the second voice information is output.
需要说明的是,上述步骤S73中检测用户气息的具体实现方式可以参见步骤S54的描述,步骤S74的具体实现方式可以参见步骤S55的描述,此处不再赘述。It should be noted that, for the specific implementation method of detecting the user's breath in step S73, please refer to the description of step S54, and for the specific implementation method of step S74, please refer to the description of step S55, which will not be described again here.
图7为本申请实施例提供的确定用户是否有继续语音交互的意图的第四种实现方式的流程图。FIG. 7 is a flow chart of a fourth implementation manner for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
如图7所示,确定用户是否有继续语音交互的意图的第四种实现方式,可以包括以下步骤:As shown in Figure 7, the fourth implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
步骤S81,输出针对第一语音信息的反馈结果之后,将语音交互工作状态延长预设时长。Step S81: After outputting the feedback result for the first voice information, the voice interaction working state is extended for a preset time period.
步骤S82,确定预设时长内是否检测到第二语音信息;Step S82, determine whether the second voice message is detected within the preset time period;
步骤S83,如果在预设时长内检测到第二语音信息,则确定终端100是否靠近用户的嘴部。Step S83: If the second voice information is detected within the preset time period, determine whether the terminal 100 is close to the user's mouth.
步骤S84,如果确定终端100靠近用户的嘴部,则输出针对第二语音信息的反馈结果。Step S84: If it is determined that the terminal 100 is close to the user's mouth, a feedback result for the second voice information is output.
综上,第四种实现方式中,在输出针对第一语音信息的反馈结果之后,直接将语音交互工作状态延长预设时长。如果在预设时长内没有检测到第二语音信息,则结束语音交互工作状态。如果在预设时长内检测到第二语音信息,则确定终端100是否在用户嘴边。如果终端100不在用户嘴边,则结束语音交互工作状态。如果确定终端100在用户嘴边,则输出针对第二语音信息的反馈结果。To sum up, in the fourth implementation method, after outputting the feedback result for the first voice information, the voice interaction working state is directly extended for a preset time period. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. If the second voice information is detected within the preset time period, it is determined whether the terminal 100 is near the user's mouth. If the terminal 100 is not near the user's mouth, the voice interaction working state ends. If it is determined that the terminal 100 is at the user's mouth, the feedback result for the second voice information is output.
需要说明的是,上述步骤S83中确定终端100是否在用户嘴边的具体实现方式可以参见步骤S51的描述,步骤S84的具体实现方式可以参见步骤S55的描述,此处不再赘述。It should be noted that, for the specific implementation method of determining whether the terminal 100 is near the user's mouth in step S83, please refer to the description of step S51, and for the specific implementation method of step S84, please refer to the description of step S55, which will not be described again here.
图8为本申请实施例提供的确定用户是否有继续语音交互的意图的第五种实现方式的流程图。Figure 8 is a flowchart of a fifth implementation method for determining whether the user has the intention to continue voice interaction provided by the embodiment of the present application.
如图8所示,确定用户是否有继续语音交互的意图的第五种实现方式,可以包括以下步骤: As shown in Figure 8, the fifth implementation method of determining whether the user has the intention to continue voice interaction may include the following steps:
步骤S91,确定终端100是否靠近用户的嘴部。Step S91, determine whether the terminal 100 is close to the user's mouth.
步骤S92,如果确定终端100靠近用户的嘴部,则将语音交互的工作状态延长预设时长。Step S92: If it is determined that the terminal 100 is close to the user's mouth, the working state of the voice interaction is extended for a preset time.
步骤S93,确定预设时长内是否检测到第二语音信息。Step S93: Determine whether the second voice message is detected within a preset time period.
步骤S94,如果在预设时长内检测到第二语音信息,则输出针对第二语音信息的反馈结果。Step S94: If the second voice information is detected within the preset time period, a feedback result for the second voice information is output.
综上,第五种实现方式中,先确定终端100是否在用户嘴边,如果确定终端100在用户嘴边,则将语音交互的工作状态延长预设时长。如果确定终端100不在用户嘴边,则结束语音交互的工作状态。这样,可以减少终端100的功耗。进一步的,如果在预设时长内检测到第二语音信息,则输出针对第二语音信息的反馈结果。如果在预设时长内没有检测到第二语音信息,则结束语音交互工作状态。第五种实现方式中,在输出针对所述第一语音信息的反馈结果之后,如果终端100依然在用户嘴边,则认为用户有继续语音交互的意图,这样,可以延长收音时间。To sum up, in the fifth implementation manner, it is first determined whether the terminal 100 is at the user's mouth. If it is determined that the terminal 100 is at the user's mouth, the working state of the voice interaction is extended for a preset time period. If it is determined that the terminal 100 is not near the user's mouth, the working state of voice interaction is ended. In this way, the power consumption of the terminal 100 can be reduced. Further, if the second voice information is detected within a preset time period, a feedback result for the second voice information is output. If the second voice information is not detected within the preset time period, the voice interaction working state is ended. In the fifth implementation manner, after outputting the feedback result for the first voice information, if the terminal 100 is still near the user's mouth, it is considered that the user has the intention to continue voice interaction, so that the listening time can be extended.
进一步的,为了提高识别用户有继续语音交互的意图,可以在预设时长内检测到第二语音信息之后,先检测用户气息,如果检测到用户气息,再输出针对第二语音信息的反馈结果。具体的可以参见上述第一种实现方式,此处不再赘述。Furthermore, in order to improve the recognition of the user's intention to continue voice interaction, after the second voice information is detected within a preset time period, the user's breath can be detected first. If the user's breath is detected, the feedback result for the second voice information can be output. For details, please refer to the first implementation method mentioned above, which will not be described again here.
综上,本申请实施例提供的语音交互方法,能够大概率识别到是用户本人有继续语音交互的意图,有效降低终端100对于其他人或者周围其他噪声的错误响应,提升语音交互的准确性和用户体验。In summary, the voice interaction method provided by the embodiments of the present application can identify with a high probability that the user himself has the intention to continue voice interaction, effectively reduce the terminal 100's incorrect response to other people or other surrounding noises, and improve the accuracy and accuracy of voice interaction. user experience.
本文中描述的各个方法实施例可以为独立的方案,也可以根据内在逻辑进行组合,这些方案都落入本申请的保护范围中。Each method embodiment described in this article can be an independent solution or can be combined according to internal logic. These solutions all fall within the protection scope of this application.
可以理解的是,上述各个方法实施例中,由电子设备实现的方法和操作,也可以由可用于电子设备的部件(例如芯片或者电路)实现。It can be understood that in the above method embodiments, the methods and operations implemented by the electronic device can also be implemented by components (such as chips or circuits) that can be used in the electronic device.
上述实施例对本申请提供的语音交互方法进行了介绍。可以理解的是,终端为了实现上述功能,其包含了执行每一个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。The above embodiments introduce the voice interaction method provided by this application. It can be understood that in order to implement the above functions, the terminal includes hardware structures and/or software modules corresponding to each function. Persons skilled in the art should easily realize that, with the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
以上详细说明了本申请实施例提供的方法。以下,结合图9详细说明本申请实施例提供的装置。应理解,装置实施例的描述与方法实施例的描述相互对应,因此,未详细描述的内容可以参见上文方法实施例,为了简洁,这里不再赘述。The method provided by the embodiment of the present application is described in detail above. Hereinafter, the device provided by the embodiment of the present application will be described in detail with reference to FIG. 9 . It should be understood that the description of the device embodiments corresponds to the description of the method embodiments. Therefore, for content that is not described in detail, please refer to the above method embodiments. For the sake of brevity, they will not be described again here.
图9是本申请实施例提供的一种语音交互装置的结构示意图。在一个实施例中,终端可以通过图9所示的硬件装置实现相应的功能。如图9所示,该装置1000可以包括:处理器1001和存储器1002。其中,处理器1001可以包括一个或多个处理单元,例如:处理器1001可以包括应用处理器,调制解调处理器,图形处理器,图像信号处理器,控制器,视频编解码器,数字信号处理器,基带处理器,和/或神经网络处理器等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。存储器1002与处理器1001耦合,用于存储各种软件程序和/或多组指令,存储器1002可包括易失性存 储器和/或非易失性存储器。Figure 9 is a schematic structural diagram of a voice interaction device provided by an embodiment of the present application. In one embodiment, the terminal can implement corresponding functions through the hardware device shown in Figure 9. As shown in Figure 9, the device 1000 may include: a processor 1001 and a memory 1002. The processor 1001 may include one or more processing units. For example, the processor 1001 may include an application processor, a modem processor, a graphics processor, an image signal processor, a controller, a video codec, a digital signal processor, baseband processor, and/or neural network processor, etc. Among them, different processing units can be independent devices or integrated in one or more processors. Memory 1002 is coupled to processor 1001 for storing various software programs and/or sets of instructions. Memory 1002 may include volatile memory. storage and/or non-volatile memory.
该装置1000可以执行上述方法实施例中执行的操作。The device 1000 can perform the operations performed in the above method embodiments.
例如,在本申请一种可选的实施例中,所述处理器1001,可以用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则检测用户气息;如果检测到用户气息,则输出针对所述第二语音信息的反馈结果。For example, in an optional embodiment of the present application, the processor 1001 may be configured to detect a wake-up instruction initiating voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect the first Voice information; output a feedback result for the first voice information; if the second voice information is detected within a preset time period, detect the user's breath; if the user's breath is detected, output feedback for the second voice information result.
在一种可实现方式中,所述处理器,还用于在输出针对所述第一语音信息的反馈结果之后,确定所述终端是否靠近用户的嘴部;如果确定所述终端靠近所述用户的嘴部,则将所述语音交互的工作状态延长所述预设时长;如果确定所述终端不靠近所述用户的嘴部,则结束语音交互的工作状态。In an implementable manner, the processor is further configured to determine whether the terminal is close to the user's mouth after outputting the feedback result for the first voice information; if it is determined that the terminal is close to the user If it is determined that the terminal is not close to the user's mouth, then the working state of voice interaction is ended.
在一种可实现方式中,所述处理器,还用于确定终端是否靠近用户的嘴部;如果确定所述终端靠近所述用户的嘴部,则检测用户气息;如果确定所述终端不靠近用户的嘴部,则结束语音交互的工作状态。In an implementable manner, the processor is further configured to determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, detect the user's breath; if it is determined that the terminal is not close The user's mouth ends the working state of voice interaction.
在一种可实现方式中,所述处理器,还用于识别在所述语音交互的工作状态下,所述用户的手势;如果所述用户的手势为第一手势,则确定所述终端靠近用户的嘴部,所述第一手势用于表征所述用户手持所述终端处于静止状态;如果所述用户的手势为第二手势,则确定所述终端不靠近用户的嘴部,所述第二手势用于表征所述用户手持所述终端向所述用户的嘴部方向远离。In an implementable manner, the processor is further configured to identify the user's gesture in the working state of the voice interaction; if the user's gesture is a first gesture, determine that the terminal is close to The user's mouth, the first gesture is used to indicate that the user is holding the terminal in a stationary state; if the user's gesture is the second gesture, it is determined that the terminal is not close to the user's mouth, the The second gesture is used to represent that the user is holding the terminal and moving away from the user's mouth.
在一种可实现方式中,如果所述唤醒指示为除用户气息以外的其他方式,则在确定所述终端是否靠近用户的嘴部之前,所述处理器,还用于确定在输出针对所述第一语音信息的反馈结果之前,是否识别到第三手势,所述第三手势用于表征所述用户手持所述终端向所述用户的嘴部方向靠近;如果识别到所述第三手势,则确定在输出针对所述第一语音信息的反馈结果之后,所述终端是否仍然靠近用户的嘴部;如果没有识别到所述第三手势,则结束语音交互的工作状态。In an implementable manner, if the wake-up indication is other than the user's breath, before determining whether the terminal is close to the user's mouth, the processor is further configured to determine whether to output the target for the user's mouth. Before the feedback result of the first voice information, whether the third gesture is recognized, the third gesture is used to represent that the user holds the terminal and approaches the user's mouth; if the third gesture is recognized, Then it is determined whether the terminal is still close to the user's mouth after outputting the feedback result for the first voice information; if the third gesture is not recognized, the working state of the voice interaction is ended.
在一种可实现方式中,所述处理器,还用于获取在所述语音交互的工作状态下,不同时刻的角速度和加速度;利用所述不同时刻的角速度、加速度、以及手势识别模块,确定用户的手势;其中,所述手势识别模块用于识别用户手持终端向用户的嘴部方向靠近、用户手持终端向用户的嘴部方向远离、或者用户手持所述终端处于静止状态。In an implementable manner, the processor is also used to obtain the angular velocity and acceleration at different times in the working state of the voice interaction; using the angular velocity, acceleration and gesture recognition module at different times, determine The user's gesture; wherein, the gesture recognition module is used to recognize that the user holds the terminal toward the user's mouth, the user holds the terminal away from the user's mouth, or the user holds the terminal in a stationary state.
在一种可实现方式中,所述处理器,还用于将所述第二语音信息输入气息识别模块,所述气息识别模块用于识别所述第二语音信息是否为所述用户的嘴部距离所述终端预设距离内发出的声音;如果所述气息识别模块识别到所述第二语音信息为所述用户的嘴部距离所述终端预设距离内发出的声音,则确定检测到用户气息;如果所述气息识别模块识别到所述第二语音信息不是所述用户的嘴部距离所述终端预设距离内发出的声音,则确定没有检测到用户气息。In an implementable manner, the processor is further configured to input the second voice information into a breath recognition module, and the breath recognition module is used to identify whether the second voice information is the mouth of the user. Sounds emitted within a preset distance from the terminal; if the breath recognition module recognizes that the second voice information is a sound emitted within a preset distance from the user's mouth to the terminal, it is determined that the user is detected Breath; if the breath recognition module recognizes that the second voice information is not a sound emitted by the user's mouth within a preset distance from the terminal, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括压力传感器,所述处理器,还用于获取采集到所述第二语音信息时,所述压力传感器对应的压力值;如果所述压力值大于预设压力阈值,则确定检测到用户气息;如果所述压力值小于或等于预设压力阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a pressure sensor, and the processor is further configured to obtain a pressure value corresponding to the pressure sensor when the second voice information is collected; if the pressure value is greater than a predetermined If the pressure threshold is set, it is determined that the user's breath is detected; if the pressure value is less than or equal to the preset pressure threshold, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括温度传感器,所述处理器,还用于获取第一温度和第二温度,其中,所述第一温度为采集到所述第二语音信息之前,所述温度传感器 对应的温度,所述第二温度为采集到所述第二语音信息时,所述温度传感器对应的温度;如果所述第二温度大于所述第一温度,则确定检测到用户气息;如果所述第二温度小于或等于所述第一温度,则确定没有检测到用户气息。In an implementable manner, the terminal includes a temperature sensor, and the processor is further configured to obtain a first temperature and a second temperature, where the first temperature is before the second voice information is collected, The temperature sensor The corresponding temperature, the second temperature is the temperature corresponding to the temperature sensor when the second voice information is collected; if the second temperature is greater than the first temperature, it is determined that the user's breath is detected; if the If the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括湿度传感器,所述处理器,还用于获取采集到所述第二语音信息时,所述湿度传感器对应的湿度;如果所述湿度大于预设湿度阈值,则确定检测到用户气息;如果所述湿度小于或等于预设湿度阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a humidity sensor, and the processor is further configured to obtain the humidity corresponding to the humidity sensor when the second voice information is collected; if the humidity is greater than the preset humidity If the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
在一种可实现方式中,所述终端包括二氧化碳传感器,所述处理器,还用于获取采集到所述第二语音信息时,所述二氧化碳传感器对应的二氧化碳浓度;如果所述二氧化碳浓度大于预设二氧化碳浓度阈值,则确定检测到用户气息;如果所述二氧化碳浓度小于或等于预设二氧化碳浓度阈值,则确定没有检测到用户气息。In an implementable manner, the terminal includes a carbon dioxide sensor, and the processor is further configured to obtain the carbon dioxide concentration corresponding to the carbon dioxide sensor when the second voice information is collected; if the carbon dioxide concentration is greater than a predetermined If the carbon dioxide concentration threshold is set, it is determined that the user's breath is detected; if the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
又例如,在本申请一种可选的实施例中,所述处理器1001,可以用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则将语音交互的工作状态延长预设时长;如果确定所述终端不靠近用户的嘴部,则结束语音交互的工作状态。For another example, in an optional embodiment of the present application, the processor 1001 may be configured to detect a wake-up instruction initiating voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect that the third a voice message; output the feedback result for the first voice message; determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, extend the working state of the voice interaction for a preset time; if determined If the terminal is not close to the user's mouth, the voice interaction working state ends.
再例如,在本申请一种可选的实施例中,所述处理器1001,可以用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则输出针对所述第二语音信息的反馈结果。For another example, in an optional embodiment of the present application, the processor 1001 may be configured to detect a wake-up instruction that initiates voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect that the third a voice message; output the feedback result for the first voice information; if the second voice information is detected within a preset time period, determine whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, Then the feedback result for the second voice information is output.
在实现过程中,上述方法的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。为避免重复,这里不再详细描述。During the implementation process, each step of the above method can be completed by instructions in the form of hardware integrated logic circuits or software in the processor. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware. To avoid repetition, it will not be described in detail here.
应注意,本申请实施例中的处理器可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It should be noted that the processor in the embodiment of the present application may be an integrated circuit chip with signal processing capabilities. During the implementation process, each step of the above method embodiment can be completed through an integrated logic circuit of hardware in the processor or instructions in the form of software. The above-mentioned processor may be a general-purpose processor, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, or discrete hardware components. . Each method, step and logical block diagram disclosed in the embodiment of this application can be implemented or executed. A general-purpose processor may be a microprocessor or the processor may be any conventional processor, etc. The steps of the method disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module can be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other mature storage media in this field. The storage medium is located in the memory, and the processor reads the information in the memory and completes the steps of the above method in combination with its hardware.
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only  memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory in the embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memories. Among them, the non-volatile memory can be a read-only memory (read-only memory). memory (ROM), programmable ROM (PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically erasable programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory may be random access memory (RAM), which is used as an external cache. By way of illustration, but not limitation, many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (direct rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
根据本申请实施例提供的方法,本申请实施例还提供一种计算机程序产品,该计算机程序产品包括:计算机程序或指令,当该计算机程序或指令在计算机上运行时,使得该计算机执行方法实施例中任意一个实施例的方法。According to the method provided by the embodiment of the present application, the embodiment of the present application also provides a computer program product. The computer program product includes: a computer program or instructions. When the computer program or instructions are run on a computer, the computer executes the method. method in any of the examples.
根据本申请实施例提供的方法,本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序或指令,当该计算机程序或指令在计算机上运行时,使得该计算机执行方法实施例中任意一个实施例的方法。According to the method provided by the embodiment of the present application, the embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium stores a computer program or instructions. When the computer program or instructions are run on the computer, the The computer executes the method of any one of the method embodiments.
根据本申请实施例提供的方法,本申请实施例还提供一种终端,包括存储器和处理器;所述存储器和所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,使所述电子设备执行方法实施例中任意一个实施例的方法。According to the method provided by the embodiment of the present application, the embodiment of the present application also provides a terminal, including a memory and a processor; the memory is coupled to the processor; the memory is used to store computer program code, and the computer program code It includes computer instructions, and when the processor executes the computer instructions, the electronic device causes the electronic device to execute the method of any one of the method embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各种说明性逻辑块(illustrative logical block)和步骤(step),能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the various illustrative logical blocks and steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or a combination of computer software and electronic hardware. accomplish. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that for the convenience and simplicity of description, the specific working processes of the systems, devices and modules described above can be referred to the corresponding processes in the foregoing method embodiments, and will not be described again here.
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods can be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple modules or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
另外,在本申请各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。In addition, each functional module in each embodiment of the present application can be integrated into a processing unit, or each module can exist physically alone, or two or more modules can be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存 储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(read-only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the functions described are implemented in the form of software functional units and sold or used as independent products, they can be saved. Stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk and other media that can store program code. .
上述本申请实施例提供的语音交互装置、芯片、计算机存储介质、计算机程序产品、终端均用于执行上文所提供的方法,因此,其所能达到的有益效果可参考上文所提供的方法对应的有益效果,在此不再赘述。The voice interaction devices, chips, computer storage media, computer program products, and terminals provided by the above embodiments of the present application are all used to execute the methods provided above. Therefore, the beneficial effects they can achieve can refer to the methods provided above. The corresponding beneficial effects will not be described again here.
应理解,在本申请的各个实施例中,各步骤的执行顺序应以其功能和内在逻辑确定,各步骤序号的大小并不意味着执行顺序的先后,不对实施例的实施过程构成限定。It should be understood that in each embodiment of the present application, the execution order of each step should be determined by its function and internal logic. The size of each step number does not mean the order of execution, and does not limit the implementation process of the embodiment.
本说明书的各个部分均采用递进的方式进行描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点介绍的都是与其他实施例不同之处。尤其,对于语音交互装置、芯片、计算机存储介质、计算机程序产品、终端的实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例中的说明即可。Each part of this specification is described in a progressive manner. The same and similar parts between various embodiments can be referred to each other. Each embodiment focuses on the differences from other embodiments. In particular, the embodiments of voice interaction devices, chips, computer storage media, computer program products, and terminals are described simply because they are basically similar to the method embodiments. For relevant details, please refer to the description in the method embodiments. Can.
尽管已描述了本申请的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本申请范围的所有变更和修改。Although the preferred embodiments of the present application have been described, those skilled in the art will be able to make additional changes and modifications to these embodiments once the basic inventive concepts are apparent. Therefore, it is intended that the appended claims be construed to include the preferred embodiments and all changes and modifications that fall within the scope of this application.
以上所述的本申请实施方式并不构成对本申请保护范围的限定。 The above-described embodiments of the present application do not limit the scope of protection of the present application.

Claims (19)

  1. 一种语音交互方法,其特征在于,所述方法包括:A voice interaction method, characterized in that the method includes:
    检测到发起语音交互的唤醒指示;A wake-up indication to initiate a voice interaction is detected;
    响应于所述唤醒指示,进入语音交互的工作状态;In response to the wake-up instruction, enter the working state of voice interaction;
    检测到第一语音信息;The first voice message is detected;
    输出针对所述第一语音信息的反馈结果;Output the feedback result for the first voice information;
    如果在预设时长内检测到第二语音信息,则检测用户气息;If the second voice information is detected within the preset time period, the user's breath is detected;
    如果检测到用户气息,则输出针对所述第二语音信息的反馈结果。If the user's breath is detected, a feedback result for the second voice information is output.
  2. 根据权利要求1所述的方法,其特征在于,在输出针对所述第一语音信息的反馈结果之后,还包括:The method according to claim 1, characterized in that, after outputting the feedback result for the first voice information, it further includes:
    确定终端是否靠近用户的嘴部;Determine whether the terminal is close to the user's mouth;
    如果确定所述终端靠近所述用户的嘴部,则将所述语音交互的工作状态延长所述预设时长;If it is determined that the terminal is close to the user's mouth, extend the working state of the voice interaction for the preset duration;
    如果确定所述终端不靠近所述用户的嘴部,则结束语音交互的工作状态。If it is determined that the terminal is not close to the user's mouth, the working state of the voice interaction is ended.
  3. 根据权利要求1所述的方法,其特征在于,所述如果在预设时长内检测到第二语音信息,还包括:The method according to claim 1, characterized in that if the second voice information is detected within a preset time period, it further includes:
    确定终端是否靠近用户的嘴部;Determine whether the terminal is close to the user's mouth;
    如果确定所述终端靠近所述用户的嘴部,则检测用户气息;If it is determined that the terminal is close to the user's mouth, detecting the user's breath;
    如果确定所述终端不靠近所述用户的嘴部,则结束语音交互的工作状态。If it is determined that the terminal is not close to the user's mouth, the working state of the voice interaction is ended.
  4. 根据权利要求2或3所述的方法,其特征在于,如果所述唤醒指示为用户气息,则所述确定所述终端是否靠近用户的嘴部,包括:The method according to claim 2 or 3, characterized in that if the wake-up indication is the user's breath, then determining whether the terminal is close to the user's mouth includes:
    识别在所述语音交互的工作状态下,所述用户的手势;Recognize the user's gestures in the working state of the voice interaction;
    如果所述用户的手势为第一手势,则确定所述终端靠近用户的嘴部,所述第一手势用于表征所述用户手持所述终端处于静止状态;If the user's gesture is a first gesture, it is determined that the terminal is close to the user's mouth, and the first gesture is used to represent that the user is holding the terminal in a stationary state;
    如果所述用户的手势为第二手势,则确定所述终端不靠近用户的嘴部,所述第二手势用于表征所述用户手持所述终端向所述用户的嘴部方向远离。If the user's gesture is a second gesture, it is determined that the terminal is not close to the user's mouth. The second gesture is used to represent that the user is holding the terminal and moving away from the user's mouth.
  5. 根据权利要求2或3所述的方法,其特征在于,如果所述唤醒指示为除用户气息以外的其他方式,则在确定所述终端是否靠近用户的嘴部之前,包括:The method according to claim 2 or 3, characterized in that if the wake-up indication is other than the user's breath, before determining whether the terminal is close to the user's mouth, the method includes:
    确定在输出针对所述第一语音信息的反馈结果之前,是否识别到第三手势,所述第三手势用于表征所述用户手持所述终端向所述用户的嘴部方向靠近;Determine whether a third gesture is recognized before outputting the feedback result for the first voice information, where the third gesture is used to represent that the user holds the terminal and approaches the user's mouth;
    如果识别到所述第三手势,则确定在输出针对所述第一语音信息的反馈结果之后,所述终端是否仍然靠近用户的嘴部;If the third gesture is recognized, determine whether the terminal is still close to the user's mouth after outputting the feedback result for the first voice information;
    如果没有识别到所述第三手势,则结束语音交互的工作状态。If the third gesture is not recognized, the working state of voice interaction ends.
  6. 根据权利要求4所述的方法,其特征在于,所述识别在所述语音交互的工作状态下,所述用户的手势,包括:The method according to claim 4, characterized in that the recognition of the user's gestures in the working state of the voice interaction includes:
    获取在所述语音交互的工作状态下,不同时刻的角速度和加速度;Obtain the angular velocity and acceleration at different times in the working state of the voice interaction;
    利用所述不同时刻的角速度、加速度、以及手势识别模块,确定用户的手势;其中,所述手势识别模块用于识别用户手持终端向用户的嘴部方向靠近、用户手持终端向用户的嘴部方向远离、或者用户手持所述终端处于静止状态。 The angular velocity, acceleration, and gesture recognition module at different times are used to determine the user's gesture; wherein, the gesture recognition module is used to recognize that the user's handheld terminal is approaching the direction of the user's mouth, and the user's handheld terminal is moving toward the user's mouth. away, or the user is holding the terminal in a stationary state.
  7. 根据权利要求1所述的方法,其特征在于,所述检测用户气息,包括:The method according to claim 1, characterized in that detecting the user's breath includes:
    将所述第二语音信息输入气息识别模块,所述气息识别模块用于识别所述第二语音信息是否为所述用户的嘴部距离终端预设距离内发出的声音;Input the second voice information into the breath recognition module, and the breath recognition module is used to identify whether the second voice information is a sound emitted within a preset distance between the user's mouth and the terminal;
    如果所述气息识别模块识别到所述第二语音信息为所述用户的嘴部距离所述终端预设距离内发出的声音,则确定检测到用户气息;If the breath recognition module recognizes that the second voice information is a sound emitted by the user's mouth within a preset distance from the terminal, it is determined that the user's breath is detected;
    如果所述气息识别模块识别到所述第二语音信息不是所述用户的嘴部距离所述终端预设距离内发出的声音,则确定没有检测到用户气息。If the breath recognition module recognizes that the second voice information is not a sound emitted by the user's mouth within a preset distance from the terminal, it is determined that the user's breath is not detected.
  8. 根据权利要求1所述的方法,其特征在于,终端包括压力传感器,所述检测用户气息,包括:The method according to claim 1, wherein the terminal includes a pressure sensor, and detecting the user's breath includes:
    获取采集到所述第二语音信息时,所述压力传感器对应的压力值;Obtain the pressure value corresponding to the pressure sensor when the second voice information is collected;
    如果所述压力值大于预设压力阈值,则确定检测到用户气息;If the pressure value is greater than the preset pressure threshold, it is determined that the user's breath is detected;
    如果所述压力值小于或等于预设压力阈值,则确定没有检测到用户气息。If the pressure value is less than or equal to the preset pressure threshold, it is determined that the user's breath is not detected.
  9. 根据权利要求1所述的方法,其特征在于,终端包括温度传感器,所述检测用户气息,包括:The method according to claim 1, wherein the terminal includes a temperature sensor, and detecting the user's breath includes:
    获取第一温度和第二温度,其中,所述第一温度为采集到所述第二语音信息之前,所述温度传感器对应的温度,所述第二温度为采集到所述第二语音信息时,所述温度传感器对应的温度;Obtain the first temperature and the second temperature, where the first temperature is the temperature corresponding to the temperature sensor before the second voice information is collected, and the second temperature is the temperature when the second voice information is collected. , the temperature corresponding to the temperature sensor;
    如果所述第二温度大于所述第一温度,则确定检测到用户气息;If the second temperature is greater than the first temperature, it is determined that the user's breath is detected;
    如果所述第二温度小于或等于所述第一温度,则确定没有检测到用户气息。If the second temperature is less than or equal to the first temperature, it is determined that the user's breath is not detected.
  10. 根据权利要求1所述的方法,其特征在于,终端包括湿度传感器,所述检测用户气息,包括:The method according to claim 1, wherein the terminal includes a humidity sensor, and detecting the user's breath includes:
    获取采集到所述第二语音信息时,所述湿度传感器对应的湿度;Obtain the humidity corresponding to the humidity sensor when the second voice information is collected;
    如果所述湿度大于预设湿度阈值,则确定检测到用户气息;If the humidity is greater than the preset humidity threshold, it is determined that the user's breath is detected;
    如果所述湿度小于或等于预设湿度阈值,则确定没有检测到用户气息。If the humidity is less than or equal to the preset humidity threshold, it is determined that the user's breath is not detected.
  11. 根据权利要求1所述的方法,其特征在于,终端包括二氧化碳传感器,所述检测用户气息,包括:The method according to claim 1, wherein the terminal includes a carbon dioxide sensor, and detecting the user's breath includes:
    获取采集到所述第二语音信息时,所述二氧化碳传感器对应的二氧化碳浓度;Obtain the carbon dioxide concentration corresponding to the carbon dioxide sensor when the second voice information is collected;
    如果所述二氧化碳浓度大于预设二氧化碳浓度阈值,则确定检测到用户气息;If the carbon dioxide concentration is greater than the preset carbon dioxide concentration threshold, it is determined that the user's breath is detected;
    如果所述二氧化碳浓度小于或等于预设二氧化碳浓度阈值,则确定没有检测到用户气息。If the carbon dioxide concentration is less than or equal to the preset carbon dioxide concentration threshold, it is determined that the user's breath is not detected.
  12. 一种语音交互方法,其特征在于,所述方法包括:A voice interaction method, characterized in that the method includes:
    检测到发起语音交互的唤醒指示;A wake-up indication to initiate a voice interaction is detected;
    响应于所述唤醒指示,进入语音交互的工作状态;In response to the wake-up instruction, enter the working state of voice interaction;
    检测到第一语音信息;The first voice message is detected;
    输出针对所述第一语音信息的反馈结果;Output the feedback result for the first voice information;
    确定终端是否靠近用户的嘴部;Determine whether the terminal is close to the user's mouth;
    如果确定所述终端靠近用户的嘴部,则将语音交互的工作状态延长预设时长;If it is determined that the terminal is close to the user's mouth, extend the working state of voice interaction for a preset time;
    如果在预设时长内检测到第二语音信息,则输出针对所述第二语音信息的反馈结果。If the second voice information is detected within the preset time period, a feedback result for the second voice information is output.
  13. 一种语音交互方法,其特征在于,所述方法包括:A voice interaction method, characterized in that the method includes:
    检测到发起语音交互的唤醒指示; A wake-up indication to initiate a voice interaction is detected;
    响应于所述唤醒指示,进入语音交互的工作状态;In response to the wake-up instruction, enter the working state of voice interaction;
    检测到第一语音信息;The first voice message is detected;
    输出针对所述第一语音信息的反馈结果;Output the feedback result for the first voice information;
    如果在预设时长内检测到第二语音信息,则确定终端是否靠近用户的嘴部;If the second voice information is detected within the preset time period, determine whether the terminal is close to the user's mouth;
    如果确定所述终端靠近用户的嘴部,则输出针对所述第二语音信息的反馈结果。If it is determined that the terminal is close to the user's mouth, a feedback result for the second voice information is output.
  14. 一种语音交互装置,其特征在于,所述装置包括处理器;A voice interaction device, characterized in that the device includes a processor;
    所述处理器,用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则检测用户气息;如果检测到用户气息,则输出针对所述第二语音信息的反馈结果。The processor is configured to detect a wake-up instruction that initiates voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect the first voice information; output a feedback result for the first voice information; if If the second voice information is detected within the preset time period, the user's breath is detected; if the user's breath is detected, a feedback result for the second voice information is output.
  15. 一种语音交互装置,其特征在于,所述装置包括处理器;A voice interaction device, characterized in that the device includes a processor;
    所述处理器,用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则将语音交互的工作状态延长预设时长;如果确定所述终端不靠近用户的嘴部,则结束语音交互的工作状态。The processor is configured to detect a wake-up instruction that initiates voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect the first voice information; output a feedback result for the first voice information; determine Whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, the working state of voice interaction is extended for a preset time; if it is determined that the terminal is not close to the user's mouth, the working state of voice interaction is ended .
  16. 一种语音交互装置,其特征在于,所述装置包括处理器;A voice interaction device, characterized in that the device includes a processor;
    所述处理器,用于检测到发起语音交互的唤醒指示;响应于所述唤醒指示,进入语音交互的工作状态;检测到第一语音信息;输出针对所述第一语音信息的反馈结果;如果在预设时长内检测到第二语音信息,则确定终端是否靠近用户的嘴部;如果确定所述终端靠近用户的嘴部,则输出针对所述第二语音信息的反馈结果。The processor is configured to detect a wake-up instruction that initiates voice interaction; respond to the wake-up instruction, enter the working state of voice interaction; detect the first voice information; output a feedback result for the first voice information; if If the second voice information is detected within the preset time period, it is determined whether the terminal is close to the user's mouth; if it is determined that the terminal is close to the user's mouth, a feedback result for the second voice information is output.
  17. 一种终端,其特征在于,所述终端包括存储器和处理器;所述存储器和所述处理器耦合;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令,当所述处理器执行所述计算机指令时,使所述终端执行如权利要求1-13中任一项所述的方法。A terminal, characterized in that the terminal includes a memory and a processor; the memory is coupled to the processor; the memory is used to store computer program code, and the computer program code includes computer instructions. When the processing When the computer instruction is executed by the computer, the terminal is caused to execute the method according to any one of claims 1-13.
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有计算机程序或指令,当所述计算机程序或指令被执行时,如权利要求1-13中任一项所述的方法被执行。A computer-readable storage medium, characterized in that a computer program or instructions are stored in the computer-readable storage medium. When the computer program or instructions are executed, as described in any one of claims 1-13 method is executed.
  19. 一种计算机程序产品,其特征在于,所述计算机程序产品包括计算机程序或指令,当所述计算机程序或指令在计算机上运行时,使得计算机执行如权利要求1-13中任一项所述的方法。 A computer program product, characterized in that the computer program product includes a computer program or instructions, and when the computer program or instructions are run on a computer, the computer executes the method described in any one of claims 1-13. method.
PCT/CN2023/114613 2022-09-14 2023-08-24 Voice interaction method and apparatus, and terminal WO2024055831A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211113419.9 2022-09-14
CN202211113419.9A CN117746849A (en) 2022-09-14 2022-09-14 Voice interaction method, device and terminal

Publications (1)

Publication Number Publication Date
WO2024055831A1 true WO2024055831A1 (en) 2024-03-21

Family

ID=90274207

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/114613 WO2024055831A1 (en) 2022-09-14 2023-08-24 Voice interaction method and apparatus, and terminal

Country Status (2)

Country Link
CN (1) CN117746849A (en)
WO (1) WO2024055831A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180211665A1 (en) * 2017-01-20 2018-07-26 Samsung Electronics Co., Ltd. Voice input processing method and electronic device for supporting the same
CN109712621A (en) * 2018-12-27 2019-05-03 维沃移动通信有限公司 A kind of interactive voice control method and terminal
CN110097875A (en) * 2019-06-03 2019-08-06 清华大学 Interactive voice based on microphone signal wakes up electronic equipment, method and medium
CN110262767A (en) * 2019-06-03 2019-09-20 清华大学 Based on voice input Rouser, method and the medium close to mouth detection
CN111402900A (en) * 2018-12-29 2020-07-10 华为技术有限公司 Voice interaction method, device and system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180211665A1 (en) * 2017-01-20 2018-07-26 Samsung Electronics Co., Ltd. Voice input processing method and electronic device for supporting the same
CN109712621A (en) * 2018-12-27 2019-05-03 维沃移动通信有限公司 A kind of interactive voice control method and terminal
CN111402900A (en) * 2018-12-29 2020-07-10 华为技术有限公司 Voice interaction method, device and system
CN110097875A (en) * 2019-06-03 2019-08-06 清华大学 Interactive voice based on microphone signal wakes up electronic equipment, method and medium
CN110262767A (en) * 2019-06-03 2019-09-20 清华大学 Based on voice input Rouser, method and the medium close to mouth detection

Also Published As

Publication number Publication date
CN117746849A (en) 2024-03-22

Similar Documents

Publication Publication Date Title
CN108711430B (en) Speech recognition method, intelligent device and storage medium
CN108710615B (en) Translation method and related equipment
WO2014008843A1 (en) Method for updating voiceprint feature model and terminal
US9570076B2 (en) Method and system for voice recognition employing multiple voice-recognition techniques
CN107919138B (en) Emotion processing method in voice and mobile terminal
US9818404B2 (en) Environmental noise detection for dialog systems
EP4191579A1 (en) Electronic device and speech recognition method therefor, and medium
CN110364156A (en) Voice interactive method, system, terminal and readable storage medium storing program for executing
US11348584B2 (en) Method for voice recognition via earphone and earphone
US20220239269A1 (en) Electronic device controlled based on sound data and method for controlling electronic device based on sound data
WO2021103449A1 (en) Interaction method, mobile terminal and readable storage medium
WO2021212388A1 (en) Interactive communication implementation method and device, and storage medium
US20200125603A1 (en) Electronic device and system which provides service based on voice recognition
CN111681655A (en) Voice control method and device, electronic equipment and storage medium
US20230239800A1 (en) Voice Wake-Up Method, Electronic Device, Wearable Device, and System
CN112256135A (en) Equipment control method and device, equipment and storage medium
WO2024055831A1 (en) Voice interaction method and apparatus, and terminal
WO2023006033A1 (en) Speech interaction method, electronic device, and medium
CN114999496A (en) Audio transmission method, control equipment and terminal equipment
CN115841814A (en) Voice interaction method and electronic equipment
CN114765026A (en) Voice control method, device and system
CN111681654A (en) Voice control method and device, electronic equipment and storage medium
CN115331672B (en) Device control method, device, electronic device and storage medium
US20220189477A1 (en) Method for controlling ambient sound and electronic device therefor
CN114093357A (en) Control method, intelligent terminal and readable storage medium