CN117292687B

CN117292687B - Voice interaction method, device, chip, electronic equipment and medium

Info

Publication number: CN117292687B
Application number: CN202311575750.7A
Authority: CN
Inventors: 王升升
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-11-24
Filing date: 2023-11-24
Publication date: 2024-04-05
Anticipated expiration: 2043-11-24
Also published as: CN117292687A

Abstract

The application provides a voice interaction method, a device, a chip, electronic equipment and a medium, wherein the method comprises the following steps: initializing a voice assistant in response to an activation request for the voice assistant of the electronic device; after initializing the voice assistant, displaying a first page of the voice assistant in a first state, wherein the first page comprises a first animation corresponding to the first state; the first state is a voice listening state, a voice recognition result display state, a voice interaction result display state or a target state between the voice recognition result display state and the voice interaction result display state. The method and the device can support voice interaction between the user and the electronic equipment, and voice interaction experience of the user can be improved through displaying the page animation.

Description

Voice interaction method, device, chip, electronic equipment and medium

Technical Field

The present disclosure relates to the field of electronic devices, and in particular, to a method, an apparatus, a chip, an electronic device, and a medium for voice interaction.

Background

The electronic device may be installed with a voice assistant application and voice interaction between the user and the electronic device is accomplished through the voice assistant.

In a related technology, after a user sends out voice, the electronic device can analyze and process the voice of the user to obtain voice interaction result information, and then broadcast the voice interaction result information so as to feed back the voice interaction processing result to the user.

However, the implementation of the related technology makes the user have no perception on the state of the voice assistant, and influences the voice interaction experience of the user.

Disclosure of Invention

The embodiment of the application provides a voice interaction method, a device, a chip, electronic equipment and a medium, which can support voice interaction between a user and the electronic equipment and can promote voice interaction experience of the user by displaying page animation.

In a first aspect, an embodiment of the present application provides a voice interaction method, including: initializing a voice assistant in response to an activation request for the voice assistant of the electronic device; after initializing the voice assistant, displaying a first page of the voice assistant in a first state, wherein the first page comprises a first animation corresponding to the first state; the first state is a voice listening state, a voice recognition result display state, a voice interaction result display state or a target state between the voice recognition result display state and the voice interaction result display state.

In the embodiment of the application, the states of the voice assistant in the voice interaction process can be a listening state, an identification state, a result state and a thinking state (i.e. a target state). After initializing the voice assistant, displaying the page of the state of the voice assistant, and limiting the page to comprise the animation corresponding to the state, so that the user can intuitively know the state of the voice assistant, and support is provided for voice interaction between the user and the electronic equipment.

Optionally, the voice interaction method further comprises: in the process of initializing the voice assistant, loading a picture set of the first animation into a memory of the electronic equipment; the step of displaying the first page when the voice assistant is in the first state includes: and displaying the picture set of the first animation loaded in the memory.

By preloading the page animation into the memory in the process of initializing the voice assistant, when the page animation needs to be displayed, the page animation loaded in the memory can be quickly displayed, the picture loading efficiency is improved, and smooth display of the page animation in the voice interaction process of the electronic equipment can be supported.

Optionally, the number of pictures of the first animation is smaller than or equal to the number of allowed display pictures of the single-play animation of the electronic device; the step of displaying the first page when the voice assistant is in the first state includes: and circularly playing the first animation.

By limiting the number of the pictures of the page animation, the problem of discontinuous animation effect in the process of circularly playing the animation can be avoided, the display integrity of the animation pictures is ensured, and the display effect of the page animation is improved.

Optionally, the step of cyclically playing the first animation includes: performing a loop play of the first animation using the first frame rate; the time of playing the first animation once by using the first frame rate is within the allowable time range of playing the animation once by the electronic device.

By limiting the playing frame rate of the page animation, the problem of discontinuous animation effect in the process of circularly playing the animation can be avoided, the display integrity of the animation pictures is ensured, and the display effect of the page animation is improved.

Optionally, in the case that the first state is a speech recognition result display state, the step of displaying the first page when the speech assistant is in the first state includes: if the first identification information corresponding to the first page is the first information under the condition that the voice recognition result is obtained, the first animation is played in a circulating way, and the voice recognition result is displayed so as to display the first page; the voice interaction method further comprises the following steps: after starting to cyclically play the first animation, setting the first identification information as other information different from the first information; after finishing displaying the first page, the first identification information is set as the first information.

By modifying the first identification information according to the requirement, the newly generated voice recognition result can not cause the repeated restarting of the recognition animation in the recognition page during the display of the recognition page (namely, the page when the voice assistant is in the voice recognition result display state), thereby ensuring the continuity of the cyclic playing of the recognition animation and improving the fluency of the recognition animation display.

Optionally, the second page is a page when the voice assistant is in a second state, the second page includes a second animation corresponding to the second state, and the first state and the second state are two states of a voice listening state, a voice recognition result display state, a voice interaction result display state and a target state; the voice interaction method further comprises the following steps: displaying a first transition animation that transitions from a first animation to a second animation; after the first transitional animation is displayed, the state of the first transitional animation is set to the invisible state, and the second animation is displayed.

Based on the difference of page animation among different pages, the smoothness of page animation switching can be improved by playing corresponding transition animation during page switching. By setting the transition animation invisible after the transition animation is played, the problem that the animation display effect is affected due to the superposition display of the transition animation and the page animation can be avoided, and the page animation display effect can be improved.

Optionally, in the case that the first state is a speech recognition result display state and the first page includes the first speech recognition result, the method further includes: acquiring first processing information, wherein the first processing information is information obtained by processing a first voice recognition result; if the first processing information comprises a first jump instruction, displaying a third page of the voice assistant in a target state, wherein the third page comprises an animation corresponding to the target state; if the first processing information comprises voice interaction result information, displaying a fourth page of the voice assistant in a voice interaction result display state, wherein the fourth page comprises the acquired voice interaction result information and animation corresponding to the voice interaction result display state; and if the first processing information comprises an application display instruction, displaying a page of the application program corresponding to the acquired application display instruction.

Based on different requirements of user voice interaction, the type of voice interaction result is correspondingly different, for example, information can be displayed, and an application program can be opened. Based on the timeliness of the acquisition of the voice interaction result, there may be a case of delaying the acquisition of the voice interaction result. Therefore, based on the type of the voice interaction result and the timeliness of acquiring the voice interaction result, after the identification page is displayed, the user can jump to a result page, an application page or a thinking page as required so as to enable the user to know the state of the voice interaction.

Optionally, after displaying the third page in the target state by the voice assistant, the method further comprises: acquiring a second voice recognition result or second processing information, wherein the second processing information is information obtained by processing the first voice recognition result; if the second voice recognition result is obtained, displaying a fifth page of the voice assistant in a voice recognition result display state, wherein the fifth page comprises the second voice recognition result and the first animation; if the second processing information is acquired and the second processing information comprises voice interaction result information, displaying a sixth page of the voice assistant in a voice interaction result display state, wherein the sixth page comprises the acquired voice interaction result information and animation corresponding to the voice interaction result display state; and if the second processing information is acquired and the second processing information comprises an application display instruction, displaying a page of the application program corresponding to the acquired application display instruction.

If the user continues to send out voice during the process of displaying the thinking page, the user can jump back to the recognition page, otherwise, the user can jump to the result page or the application page as required based on the type of the voice interaction result, so that the voice interaction page under different conditions can jump as required, and the voice interaction requirement of the user is met.

In a second aspect, an embodiment of the present application provides a voice interaction device, including: an initialization module for initializing a voice assistant in response to an activation request for the voice assistant of the electronic device; the display module is used for displaying a first page when the voice assistant is in a first state after initializing the voice assistant, wherein the first page comprises a first animation corresponding to the first state; the first state is a voice listening state, a voice recognition result display state, a voice interaction result display state or a target state between the voice recognition result display state and the voice interaction result display state.

In a third aspect, an embodiment of the present application provides an electronic chip, including: a processor for executing computer program instructions stored on a memory, wherein the computer program instructions, when executed by the processor, trigger the electronic chip to perform the method according to any of the first aspects.

In a fourth aspect, embodiments of the present application provide an electronic device comprising one or more memories for storing computer program instructions, and one or more processors, wherein the computer program instructions, when executed by the one or more processors, trigger the electronic device to perform a method as in any of the first aspects.

In a fifth aspect, embodiments of the present application provide a computer-readable storage medium, in which a computer program is stored which, when run on a computer, causes the computer to perform the method as in any one of the first aspects.

In a sixth aspect, embodiments of the present application provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the method as in any of the first aspects.

The technical effects of the foregoing aspects may be referred to each other, and will not be described herein.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below.

Fig. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 2 is a schematic diagram of a wristwatch according to an embodiment of the present application when displaying an initialization page;

fig. 3a to 3b are schematic diagrams of a wristwatch according to an embodiment of the present disclosure when displaying a listening page;

fig. 4a to fig. 4c are schematic diagrams of a wristwatch according to an embodiment of the present application when displaying an identification page;

fig. 5a to 5c are schematic diagrams of a wristwatch according to an embodiment of the present application when displaying a result page;

Fig. 6 is a schematic diagram of a watch according to an embodiment of the present application when displaying an alarm clock page;

fig. 7a to fig. 7b are schematic diagrams of a watch according to an embodiment of the present application when displaying a thinking page;

fig. 8 is a schematic diagram of a voice interaction method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of another watch according to an embodiment of the present application when a listening page is displayed;

fig. 10 is a schematic view of a software framework of a wristwatch according to an embodiment of the present application;

fig. 11 is a flow chart of a method according to an embodiment of the present application.

Detailed Description

For a better understanding of the technical solutions of the present application, embodiments of the present application are described in detail below with reference to the accompanying drawings.

It should be understood that the described embodiments are merely some, but not all, of the embodiments of the present application. All other embodiments, based on the embodiments herein, which would be apparent to one of ordinary skill in the art without making any inventive effort, are intended to be within the scope of the present application.

The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be understood that the term "at least one" as used herein means one or more, and "a plurality" means two or more. The term "and/or" as used herein is merely one association relationship describing the associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. Wherein A, B may be singular or plural. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. "at least one of the following" and the like means any combination of these items, including any combination of single or plural items. For example, at least one of a, b and c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that although the terms first, second, etc. may be used in embodiments of the present application to describe the set threshold values, these set threshold values should not be limited to these terms. These terms are only used to distinguish the set thresholds from each other. For example, a first set threshold may also be referred to as a second set threshold, and similarly, a second set threshold may also be referred to as a first set threshold, without departing from the scope of embodiments of the present application.

The voice interaction method provided in any embodiment of the present application may be applied to the electronic device 100 shown in fig. 1. Fig. 1 shows a schematic configuration of an electronic device 100.

In one embodiment, the electronic device shown in fig. 1 may be a terminal device such as a mobile phone or a tablet computer. In another embodiment, the electronic device shown in fig. 1 may be a wearable device, such as a wearable athletic watch, or the like.

The electronic device 100 may include a processor 110, an internal memory 121, an antenna 2, a wireless communication module 160, an audio module 170, a speaker 170A, a microphone 170C, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and the like. Wherein the sensor module 180 may include a pressure sensor, a touch sensor, etc.

It is to be understood that the structure illustrated in the embodiments of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.

In some embodiments, the processor 110 may be a System On Chip (SOC), and the processor 110 may include a central processing unit (Central Processing Unit, CPU) and may further include other types of processors. In some embodiments, the processor 110 may be a PWM control chip.

The processor 110 may also include the necessary hardware accelerators or logic processing hardware circuitry, such as an ASIC, or one or more integrated circuits for controlling the execution of a technical program, etc. Further, the processor 110 may have a function of operating one or more software programs, which may be stored in a storage medium.

A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.

In some embodiments, the memory of electronic device 100 may be read-only memory (ROM), other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), or other types of dynamic storage devices that can store information and instructions, electrically erasable programmable read-only memory (electrically erasable programmable read-only memory, EEPROM), or any computer-readable medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

In some embodiments, the processor 110 and the memory may be combined into a single processing device, or may be separate components, and the processor 110 may be configured to execute program code stored in the memory. In particular implementations, the memory may also be integrated into the processor 110 or may be separate from the processor 110.

In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and does not limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also use different interfacing manners, or a combination of multiple interfacing manners in the foregoing embodiments.

The wireless communication function of the electronic device 100 can be implemented by the antenna 2, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antenna 2 is used for transmitting and receiving electromagnetic wave signals. Each antenna in the electronic device 100 may be used to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas.

The wireless communication module 160 may provide solutions for wireless communication including wireless local area network (wireless local area networks, WLAN) (e.g., wireless fidelity (wireless fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field wireless communication technology (near field communication, NFC), infrared technology (IR), etc., as applied to the electronic device 100. The wireless communication module 160 may be one or more devices that integrate at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, the antenna 2 and the wireless communication module 160 of the electronic device 100 are coupled such that the electronic device 100 may communicate with a network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (global system for mobile communications, GSM), general packet radio service (general packet radio service, GPRS), code division multiple access (code division multiple access, CDMA), wideband code division multiple access (wideband code division multiple access, WCDMA), time division code division multiple access (time-division code division multiple access, TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others.

In one embodiment, the electronic device 100 may be a wearable watch of a user, where the wearable watch of the user may communicate with a mobile phone or a tablet computer of the user through a bluetooth wireless communication technology, for example, the wearable watch may send a voice of the user to the mobile phone, receive a voice recognition result returned by the mobile phone, then display the received voice recognition result through a watch display screen, receive an application display instruction generated and returned by the mobile phone based on the voice recognition result, and execute the received application display instruction (for example, open a certain application program on the watch, etc.).

The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, and the like. The display 194 includes a display panel. The display panel may employ a liquid crystal display (liquid crystal display, LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED) or an active-matrix organic light-emitting diode (matrix organic light emitting diode), a flexible light-emitting diode (flex), a mini, a Micro led, a Micro-OLED, a quantum dot light-emitting diode (quantum dot light emitting diodes, QLED), or the like. In some embodiments, the electronic device 100 may include 1 or N display screens 194, N being a positive integer greater than 1.

The internal memory 121 may be used to store computer executable program code including instructions. The internal memory 121 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device 100 (e.g., audio data, phonebook, etc.), and so on. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like. The processor 110 performs various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.

In one embodiment, a set of pictures of the page animation displayed during the voice interaction may be stored in the internal memory 121.

The electronic device 100 may implement audio functions through a speaker 170A, a microphone 170C, an application processor, and the like. Such as music playing, recording, etc.

In one embodiment, the electronic device 100 may collect, through the microphone 170C, a voice signal of a voice sent by a user during a voice interaction process, and may broadcast, through the speaker 170A, voice interaction result information obtained by being in the voice signal, and display, through the display screen 194, the voice interaction result information, so as to feed back, through voice broadcasting and information display, the voice interaction result to the user.

The pressure sensor is used for sensing a pressure signal and can convert the pressure signal into an electric signal. In some embodiments, the pressure sensor may be provided on the display screen 194. Pressure sensors are of many kinds, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, etc. When a touch operation is applied to the display screen 194, the electronic apparatus 100 detects the intensity of the touch operation according to the pressure sensor. The electronic device 100 may also calculate the location of the touch based on the detection signal of the pressure sensor.

Touch sensors, also known as "touch devices". The touch sensor may be disposed on the display screen 194, and the touch sensor and the display screen 194 form a touch screen, which is also referred to as a "touch screen". The touch sensor is used to detect a touch operation acting on or near it. The touch sensor may communicate the detected touch operation to the application processor to determine the touch event type. Visual output related to touch operations may be provided through the display 194. In other embodiments, the touch sensor may also be disposed on a surface of the electronic device 100 at a different location than the display 194.

The keys 190 include a power-on key, a volume key, a shortcut key for activating a voice assistant, and the like. The keys 190 may be mechanical keys or touch keys.

In one embodiment, the user may press the shortcut key to cause the electronic device 100 to initialize the voice assistant.

In one embodiment, during display of the results page by the electronic device 100 through the display 194, the user may trigger the electronic device 100 to jump to the listening page by touching a designated area of the results page.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration alerting as well as for touch vibration feedback. The indicator 192 may be an indicator light, may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture.

The electronic device may be installed with a voice assistant application (or simply a voice assistant) and voice interaction between the user and the electronic device is accomplished through the voice assistant. For example, during the operation of the voice assistant by the electronic device, the user may send a voice for inquiring about weather conditions to the electronic device, and the electronic device may acquire weather information through voice interaction processing on the voice sent by the user, and feedback the acquired weather information to the user.

In one embodiment, the electronic device may be a wearable device, such as a user's wearable athletic watch. Possibly, the wearable device may be matched with the terminal device to realize a voice interaction function. For example, the sports watch of the user can send the voice of the user to the mobile phone of the user to perform voice recognition processing, and display the voice recognition result fed back by the mobile phone.

In another embodiment, the electronic device may be a user terminal device such as a mobile phone, a tablet computer, or the like. The terminal equipment can realize the voice interaction function by itself.

In a related technology, after a user sends out voice, the electronic device can analyze and process the voice of the user to obtain voice interaction result information, and then broadcast the voice interaction result information so as to feed back the voice interaction processing result to the user. However, the implementation of the related technology makes the user have no perception on the state of the voice assistant, and influences the voice interaction experience of the user.

In order to enhance the voice interaction experience of the user, in the embodiment of the present application, after initializing the voice assistant, the electronic device may display a voice assistant page including a status-related animation, so that the user knows the status of the voice assistant. In the following, some embodiments of the present application are described separately, and implementation manners and beneficial effects between different embodiments may be referred to each other. Although described separately, various embodiments may be implemented simultaneously on the same electronic device, which is not limited in this application.

In one embodiment of the present application, the electronic device may initialize the voice assistant in response to a request from a user to activate the voice assistant, and display an initialization page during the initialization of the voice assistant, the initialization page may include information indicating that the electronic device is in an initialized voice assistant state.

For example, a schematic diagram of a watch displaying an initialization page may be as shown in fig. 2, and referring to fig. 2, "being connected …" in the initialization page indicates that the watch is in an initialized voice assistant state.

In one possible implementation, a user may send a request to activate a voice assistant to an electronic device by triggering a key for activating the voice assistant. Referring to fig. 2, the key of the watch for activating the voice assistant may be a key indicated by reference numeral 201.

In another possible implementation, the user may also send a request to activate the voice assistant to the electronic device by speaking a set activation word.

In one embodiment of the present application, the electronic device may have a speech listening state (or listening state), a speech recognition result display state (or recognition state), a speech interaction result display state (or result state), and a target state (or thinking state) between the speech recognition result display state and the speech interaction result display state after initializing the speech assistant. Next, pages and animations of the voice assistant in different states will be described.

(1) Listening state

In one embodiment of the present application, the electronic device may display a page (or listening page) of the voice assistant in a listening state, and the listening page includes an animation (or listening animation) corresponding to the listening state. During the display of the listening page by the electronic device, the user may speak, for example, a voice asking for weather conditions.

The electronic device may display the listening animation by sequentially displaying a plurality of pictures constituting the listening animation. The display of the listening animation has a consistent dynamic effect during the display of the listening page by the electronic device.

For example, a schematic diagram of a watch displaying a listening page at one time may be shown in fig. 3a, and a schematic diagram of a watch displaying a listening page at another time may be shown in fig. 3 b. Referring to fig. 3a and 3b, during the time that the watch displays a listening page, there is a dynamic effect in the display of the listening animation, wherein the listening animation is displayed at one time as an image indicated by reference numeral 301a and at another time as an image indicated by reference numeral 301 b.

Optionally, in addition to including a listening animation, the listening page may include information indicating that the electronic device is in a listening state. Referring to fig. 3a and 3b, listening to the text "hi" in the page and the logo 302 may indicate that the watch is in a listening state.

(2) Identifying a state

In one embodiment of the present application, the electronic device may process the user's voice (autonomously recognize the voice or send another device to recognize) to obtain a voice recognition result, and then may display a page (or recognition page) in which the voice assistant is in a recognition state. At this time, the user can view the result of the speech recognition of the uttered speech.

The electronic device may display the recognition animation by sequentially displaying a plurality of pictures constituting the recognition animation. The display of the recognition animation has a consistent dynamic effect during the display of the recognition page by the electronic device.

In order to ensure timeliness of voice recognition, during the voice emission of a user, the electronic device may sequentially acquire (e.g., acquire word by word, etc.) voice signals sequentially emitted by the user, so as to sequentially display corresponding voice recognition results in the recognition page. In this way, the user can continue to speak the voice during the display of the listening page and the recognition page by the electronic device, the electronic device can process the user voice, and the obtained voice recognition result can be displayed in the recognition page in real time.

For example, if the user utters a "weather forecast" voice, the recognition page may display a recognition page including the voice recognition result "weather" first, and then display a recognition page including the voice recognition result "weather forecast".

For example, a schematic diagram of a wristwatch displaying a recognition page including a speech recognition result "weather forecast" may be shown in fig. 4 a. During the display of the identification page by the watch, the display of the identification animation has a dynamic effect, and with reference to fig. 4a, the identification animation is displayed at a certain moment as an image indicated by the reference numeral 401.

For another example, if the user utters "tomorrow weather" voice, the recognition page may display a recognition page including the voice recognition result "tomorrow", then display a recognition page including the voice recognition result "tomorrow", and then display a recognition page including the voice recognition result "tomorrow weather". For example, a schematic diagram of a wristwatch displaying a recognition page including the speech recognition result "tomorrow's weather" may be shown in fig. 4 b.

For another example, if the user utters "alarm clock" speech, the recognition page may display a recognition page that includes the speech recognition result "alarm clock". For example, a schematic diagram of a watch displaying a recognition page including a speech recognition result "alarm" may be shown in fig. 4 c.

Considering that the user can continue to speak during the display of the listening page and the recognition page by the electronic device, the listening animation and the recognition animation can be the same animation with reference to fig. 3a and 3b and with reference to fig. 4 a-4 c.

(3) Status of results

In one embodiment of the present application, after the electronic device obtains the voice interaction result, if the voice interaction result is voice interaction result information (such as weather information), a page (or called a result page) of the voice assistant in a result state may be displayed, and the result page includes an animation (or called a result animation) corresponding to the result state, and includes the voice interaction result information. Thus, the user can view the voice interaction result information.

The electronic device may display the result animation by sequentially displaying a plurality of pictures constituting the result animation. The display of the results animation has a consistent dynamic effect during the display of the results page by the electronic device.

If there is no information matching the user's voice recognition result (such as the user utters "but" voice "), the voice interaction result information may be preset information.

For example, a schematic diagram of a watch displaying a results page including preset information may be shown in fig. 5 a. Referring to fig. 5a, the preset information may be "i do not understand your meaning much, and then give i a little time to learn the bar. ". Referring to fig. 5a, the resulting animation is shown at some point as an image indicated by reference numeral 501 a.

The voice interaction result information may be information matching with the user voice recognition result, as applicable.

In one example, where the speech recognition result is "weather forecast", a schematic diagram of the watch display results page may be as shown in fig. 5 b. Referring to fig. 5b, the results page includes results animation, and voice interaction result information matched with the voice recognition result of "weather forecast". Referring to fig. 5b, the resulting animation is shown at some point as an image indicated by reference numeral 501 b.

In another example, if the speech recognition result is "tomorrow's weather", a schematic diagram of the watch display result page may be shown in fig. 5 c. Referring to fig. 5c, the results page includes a results animation, and voice interaction result information matched with the voice recognition result of "tomorrow's weather".

As can be seen by way of example with reference to fig. 5 a-5 c, during the display of the results page by the watch, there is a dynamic effect in the display of the results animation, which is displayed at one instant as an image indicated by reference numeral 501a and at another instant as an image indicated by reference numeral 501 b.

For example, referring to fig. 3a and 3b, and fig. 5 a-5 c, the resulting animation and the listening animation may be different animations.

For example, referring to fig. 4 a-4 c and fig. 5 a-5 c, the resulting animation and the recognition animation may be different animations.

In one embodiment of the present application, after the electronic device obtains the voice interaction result, if the voice interaction result is an application display instruction (such as an instruction for opening a certain application program), the application program may be opened and a page of the application program may be displayed. In this way, the user can use the application.

The application may be, for example, an alarm clock, sports, camera, etc. application on the electronic device. Illustratively, if the voice recognition result is "alarm clock", a schematic diagram of the watch displaying an alarm clock page may be shown in fig. 6.

(4) Thinking state

Based on the timeliness of the acquisition of the voice interaction result, there may be a case of delaying the acquisition of the voice interaction result. Thus, in one embodiment of the present application, during the display of the recognition page, the electronic device displays the thought page if an instruction to jump to the page (or thought page) that the voice assistant is in the thought state is obtained.

The electronic device may display the thinking animation by sequentially displaying a plurality of pictures constituting the thinking animation. The display of the thinking animation has a coherent dynamic effect during the display of the thinking page by the electronic device.

In one example, a schematic diagram of a watch at a certain moment in time when the watch displays a thought page may be as shown in fig. 7 a. In another example, a schematic diagram of a watch at a certain moment in time when the watch displays a thought page may be as shown in fig. 7 b.

Referring to fig. 7a and 7b, it can be seen that during the time when the watch displays a thinking page, the display of a thinking animation has a dynamic effect, wherein the thinking animation is displayed as an image shown by reference numeral 701a at one time and as an image shown by reference numeral 701b at another time.

Possibly, the thought page may include other information in addition to the thought animation. In one embodiment, the other information may be the speech recognition results displayed in the recognition page (such as "tomorrow's weather" shown in FIG. 7 a). In another embodiment, the other information may be setup information indicating that the user can speak (such as "you say, i am listening … …" shown in fig. 7 b).

Illustratively, referring to fig. 3a and 3b, and to fig. 4 a-4 c, 5 a-5 c, 7a and 7b, the thinking animation is different from the listening animation, the recognition animation and the resulting animation, respectively.

If the voice interaction result is obtained during the display of the thinking page by the electronic equipment and the voice interaction result is an application display instruction, the application program can be opened and the page of the application program can be displayed.

If the voice interaction result is obtained and the voice interaction result is voice interaction result information during the display of the thinking page by the electronic equipment, a result page can be displayed and comprises result animation and voice interaction result information.

If the voice recognition result is obtained during the display of the thinking page by the electronic equipment, the recognition page can be displayed, and the recognition page comprises a recognition animation and the voice recognition result.

Therefore, after initializing the voice assistant, the embodiment of the application displays the page with the state of the voice assistant, and limits the page to include the animation corresponding to the state, so that the user can intuitively know the state of the voice assistant, support is provided for voice interaction between the user and the electronic equipment, and the voice interaction experience of the user can be improved.

Referring to fig. 8, taking a watch for implementing a voice interaction function as an example, a voice interaction method provided in the embodiments of the present application may include the following steps 1 to 8.

Step 1, in case the user presses the shortcut key of the watch for activating the voice assistant long, the watch initializes the voice assistant and displays the connecting page 31 during the initialization of the voice assistant.

Illustratively, the shortcut key may be the key indicated by reference numeral 201 in FIG. 2. The user may issue a request to activate the voice assistant to the watch by pressing the shortcut key long to cause the watch to initialize the voice assistant.

Illustratively, the connecting page 31 may be as shown in FIG. 2.

Step 2, after the voice assistant is initialized successfully, the watch displays a listening page 32, and the listening page 32 comprises a listening animation.

The animation may include a plurality of pictures and the watch may display the animation by sequentially displaying the plurality of pictures.

Illustratively, the listening page 32 may be as shown in fig. 3a, 3b, and the listening animation may be a line animation that presents an animation effect by dynamic change of a line.

During the watch display listening to the page 32, the watch may send a voice signal of the user's voice to the cell phone via bluetooth. The mobile phone can recognize the voice signal sent by the watch and feed back the recognized voice recognition result to the watch.

And 3, after receiving the voice recognition result sent by the mobile phone, the watch displays a recognition page 33, wherein the recognition page 33 comprises a recognition animation and the received voice recognition result.

For example, the recognition page 33 may be a linear animation that presents an animation effect by dynamic change of a line, as shown in fig. 4a to 4 c.

Referring to fig. 8, in the case where the listening animation and the recognition animation are the same animation, the watch may not display the transition animation between displaying the listening animation and displaying the recognition animation.

In one embodiment, the handset may determine whether the user has completed speech output based on the time interval in which the user is speaking. If the mobile phone determines that the user has completed voice output, the mobile phone can start to generate a voice interaction result after feeding back the corresponding voice recognition result, and then feed back the voice interaction result to the watch.

If the user sends out voice to inquire information, the voice interaction result generated by the mobile phone can be voice interaction result information and is generated into a message state of 2; if the user sends out voice to request to open a certain application program, the voice interaction result generated by the mobile phone can be an application display instruction of the application program and generate a message state of 3; if the mobile phone does not generate the voice interaction result within the set period, that is, if the mobile phone delays generating the voice interaction result, a message state of 0 or 1 can be generated.

The mobile phone generates message states with different values so as to support the watch to execute corresponding different processing flows according to the message states, so that the voice interaction requirements of the user and the timeliness of the mobile phone for generating the voice interaction result are matched. In other embodiments, the message status may be set to other different values, which is not limited in this embodiment.

Step 4, the watch receives the message state sent by the mobile phone, if the message state is 0 or 1, step 5 is executed, if the message state is 2 and the voice interaction result information sent by the mobile phone is received, step 6 is executed, and if the message state is 3 and the alarm clock display instruction sent by the mobile phone is received, the alarm clock page 36 is displayed.

Illustratively, the alarm clock page 36 may be as shown in FIG. 6. In other embodiments, if the mobile phone sends a display instruction from another application (such as an exercise application, etc.), the watch displays the page of the other application.

Step 5, the watch displays a thinking page 35, the thinking page 35 comprises a thinking animation, and step 7 or step 8 is executed.

For example, the thinking page 35 may be a click animation that presents an animation effect through dynamic changes of points, as shown in fig. 7 a-7 b.

Referring to fig. 8, in the case where the recognition animation and the thinking animation are different animations, the wristwatch may display a transition animation (i.e., a line-to-point transition animation) that transitions from the recognition animation to the thinking animation between displaying the recognition animation and displaying the thinking animation, so that the animation display effect is smooth and unobtrusive.

In one embodiment, the watch may display a line-to-point transition animation prior to displaying the mind animation during the display of the mind page 35 to achieve a smooth transition from the recognition animation to the mind animation.

When the mobile phone delays generating the voice interaction result, the watch may display the thinking page 35 first, and display the corresponding page according to the voice interaction result after the mobile phone generates and feeds back the voice interaction result.

Step 6, the watch displays a result page 34, and the result page 34 comprises result animation and voice interaction result information.

For example, the results page 34 may be as shown in fig. 5 a-5 c, and the results animation may be a ball animation that presents an animation effect by dynamic variation of irregular spheres.

Referring to fig. 8, in the case where the recognition animation and the resultant animation are different animations, the wristwatch may display a transition animation (i.e., a line-to-ball transition animation) that transitions from the recognition animation to the resultant animation between displaying the recognition animation and displaying the resultant animation, so that the animation display effect is smooth and unobtrusive.

In one embodiment, the watch may display a line-to-ball transition animation prior to the result animation during the display of the result page 34 to achieve a smooth transition from the recognition animation to the result animation.

In one embodiment of the present application, after the watch displays the results page 34, if the user needs to perform another voice interaction, the user may touch a designated area of the watch display screen (e.g., a display area that may be animated for the results in the results page 34). In response to a touch operation of the user on the designated area, the wristwatch may display the listening page 32 again, i.e., jump from the results page 34 to the listening page 32.

Referring to fig. 8, in the case where the result animation and the listening animation are different animations, the watch may display a transition animation (i.e., a ball-to-line transition animation) that transitions from the result animation to the listening animation between displaying the result animation and displaying the listening animation, so that the animation display effect is smooth and unobtrusive.

In one embodiment, the watch may display a ball-to-wire transition animation prior to displaying the listening animation during display of the listening page 32 to achieve a smooth transition from the resulting animation to the listening animation.

Step 7, when the watch receives the message state sent by the mobile phone, if the message state is 2 and the voice interaction result information sent by the mobile phone is received, step 6 is executed, and if the message state is 3 and the alarm clock display instruction sent by the mobile phone is received, an alarm clock page 36 is displayed.

If the phone sends out the voice interaction result during the watch displaying the thinking page 35, the watch jumps from the thinking page 35 to the result page 34 or the alarm page 36 as required according to the voice interaction result.

Referring to fig. 8, in the case where the thought animation and the result animation are different animations, the wristwatch may display a transition animation (i.e., a point-to-ball transition animation) that transitions from the thought animation to the result animation between displaying the thought animation and displaying the result animation, so that the animation display effect is smooth and unobtrusive.

In one embodiment, the watch may display a point-to-ball transition animation prior to the result animation during the display of the result page 34 to achieve a smooth transition from the thought animation to the result animation.

And 8, when the watch receives the voice recognition result sent by the mobile phone, displaying a recognition page 33, wherein the recognition page 33 comprises a recognition animation and the acquired voice recognition result.

If the mobile phone judges whether the user has completed voice output and the actual requirement of the user are not the same, the mobile phone may receive the voice signal sent by the watch after the mobile phone feeds back the message state, and then the mobile phone can feed back the newly recognized voice recognition result to the watch during the period that the watch displays the thinking page 35, so that the watch jumps back to the recognition page 33 from the thinking page 35 and displays the newly recognized voice recognition result.

Thus, unlike the speech recognition result sent by the mobile phone before the watch displays the recognition page 33 and during the display of the recognition page 33 in step 3, the speech recognition result in step 8 is the speech recognition result sent by the mobile phone during the display of the thinking page 35 by the watch, and the watch display page can be jumped back to the recognition page 33 by the thinking page 35.

In one embodiment, the watch may display a transitional animation (i.e., a point-to-line transitional animation) that transitions from a thinking animation to a recognition animation, such that the animation display is smooth and unobtrusive. In another embodiment, considering that the display duration of the thinking page is shorter than that of other pages (e.g. recognition page and result page), the transition animation from the thinking animation to the recognition animation may not be displayed.

Based on the voice interaction processing logic of the embodiment shown in fig. 8, if the voice interaction function is implemented by using a terminal device such as a mobile phone as an execution subject, the terminal device can acquire a voice signal of a voice sent by a user, identify the voice signal, display a voice identification result through an identification page, generate a voice interaction result according to the voice identification result, and display an application page, a result page or a thinking page as required after the identification page is displayed based on the content of the voice interaction result and the timeliness of generating the voice interaction result. The information which needs to be transmitted between the mobile phone and the watch can be transmitted between the internal modules of the mobile phone. The specific logic for implementing the voice interaction processing for the terminal device may be described with reference to the related art of the embodiment shown in fig. 8, which is not described herein.

For terminal equipment with weaker data processing capability and smaller memory, when the terminal equipment realizes a voice interaction function, the problem of unsmooth playing of animation possibly caused by time consumption of picture loading can exist. Next, technical contents that can solve this problem will be described.

In the embodiment of fig. 8, the watch needs to obtain a picture set of the result animation before displaying the result page 34, so as to support displaying the result animation by sequentially displaying pictures in the picture set during displaying the result page 34.

In one possible implementation, the results animation may be obtained after the recognition page 33 or thought page 35 is displayed and before the results page 34 is displayed. The watch may use a result page switch function (e.g., switchToPage (RESULT) function) to implement a page switch procedure to result page switch. During the page switching process, the watch can adopt an xml (eXtensible Markup Language ) file parsing mode to load the picture set of the result animation from the GUI (Graphical User Interface, graphic user page) file into the memory.

However, the time consumption of the RESULT Page switching function is 1080ms, for example, because the animation effect of the RESULT animation is complex, and the data volume of the RESULT animation (for example, the number of pictures can be about 70), so that the UI (User Interface) task always loads pictures in the Page switching process, and the problem of unsmooth animation playing in the process of identifying the Page (or thinking the Page) and switching the RESULT Page exists.

To solve this problem, in one embodiment of the present application, the process of preloading the picture may be performed during the process of initializing the voice assistant by the watch, that is, during the process of displaying the connecting page 31 by the watch, so as to load the picture set of the resulting animation from the GUI file into the memory, thereby improving the picture loading efficiency.

Taking the recognition page switching result page as an example, because the result animation resource is loaded into the memory before the result page is to be displayed, the watch can acquire the picture set of the result animation from the memory to display the result page after displaying the recognition page, and does not need to acquire the picture set of the result animation from the GUI file, so that the watch sequentially displays the recognition animation, the transition animation from the recognition animation to the result animation and the result animation in the page switching process, the animation is smoothly and uninterruptedly played, and the voice interaction experience of the user is better.

In one embodiment, the watch may load a first frame of the resulting animation into the page animation display location so that, when the resulting animation needs to be displayed, the pictures of the resulting animation are sequentially displayed starting from the first frame of the resulting animation.

If the watch preloads a plurality of page animations in the initialization process, the watch can place the first frame of picture of each preloaded page animation (the first frame of picture is used for identifying the picture when the page animation starts to be displayed) in different layers at the same display position (such as the position of the image shown by the reference numeral 401 in fig. 4 a), and display the corresponding page animation in different page display processes by controlling the display time period (or called hiding time period) of each layer.

In one embodiment, the watch may also perform a pre-load process for transitional animations during initialization of the voice assistant. The watch may load the first frame of the transition animation to the page animation display position, so that when the transition animation needs to be displayed, the pictures of the transition animation are sequentially displayed from the first frame of the transition animation.

Unlike a wristwatch, for a terminal device (such as a mobile phone) having superior performance such as file reading speed and drawing speed of pictures, when the terminal device implements a voice interactive function, a result animation can be acquired from a GUI file during recognition of a page-switching result page, without performing an operation of preloading the result animation during initialization of a voice assistant.

For terminal equipment with weaker data processing capability and smaller memory, when the terminal equipment realizes the voice interaction function, the problem of discontinuous animation playing caused by incomplete animation playing may exist. Next, technical contents that can solve this problem will be described.

In the embodiment shown in fig. 8, the number of pictures allowed to be displayed for playing the animation in a single time of the watch may be limited due to the performance characteristics of the data processing capability of the watch chip and the power consumption of the watch, for example, the animation component may have 50 limits on the number of animation to be played, that is, the maximum number of pictures that the watch can display for playing the animation once is 50. If the number of pictures in the picture set of the animation exceeds the maximum number, the problem of discontinuous animation playing exists in the process of circularly playing the animation by the watch.

For example, as the animation effect of the result animation is complex, the number of pictures of the result animation can be more than 50, the watch can only sequentially display the first 50 pictures in all the pictures and the rest pictures are not displayed when playing the result animation each time, so that the result animation is not continuously played, and the voice interaction experience of the user is affected.

To support continuity of the resulting animation play, in one embodiment of the present application, the number of pictures of the resulting animation may be defined such that the number of pictures of the resulting animation is less than or equal to the allowed number of pictures of the watch single play animation, e.g., the number of pictures of the resulting animation does not exceed 50. Based on the above, the watch can circularly play the result animation during the display of the result page, and each time the result animation is played, each picture forming the result animation is sequentially played, so that the result animation can be circularly and continuously played, and the voice interaction experience of the user is better.

Besides limiting the number of the pictures of the animation to match the watch performance, the playing speed of the pictures can be limited to match the watch performance and improve the animation display effect. Illustratively, the number of pictures of the resulting animation may be 49, and the watch may play the resulting animation at a frame rate of 20 frames per second.

Different from a watch, when the terminal equipment (such as a mobile phone) with larger memory and stronger data processing capability realizes the voice interaction function, the terminal equipment can not limit the number of the pictures of the animation, or the maximum number of the pictures which can be displayed when the terminal equipment plays the animation once is larger.

When a terminal device (such as a mobile phone, a watch, etc.) realizes a voice interaction function, there may be a problem that the recognition animation triggered by the user sending out the voice sequentially repeatedly resumes playing. Next, technical contents that can solve this problem will be described.

In the embodiment shown in fig. 8, based on the voices uttered by the user successively, the wristwatch may display the corresponding voice recognition results successively in the recognition page 33. For example, if the user voice is "tomorrow weather", the recognition page 33 of the watch may display "tomorrow" first, then "tomorrow" and then "tomorrow weather", that is, the watch displays three voice recognition results of "tomorrow", "weather" in the recognition page 33.

In one possible implementation, the watch may call a function (e.g., a Show Recognition Page function) for displaying a recognition page to implement the jumping word back display function each time the watch receives a voice recognition result from the mobile phone. Since there is a Start-to-play recognition animation function (such as recognition Animated- > Start () function) in the Show Recognition Page function, the watch repeatedly calls Show Recognition Page function to cause recognition Animated- > Start () to be repeatedly called, thereby causing the recognition animation to be repeatedly replayed during the display of the recognition page 33 by the watch, so that the play of the recognition animation is not consistent.

In order to solve the problem that the watch replays the recognition animation after receiving the voice recognition result each time, so that the recognition animation is not consistent in playing during the display of the recognition page, in one embodiment of the present application, the watch may acquire the flag bit of the recognition page after receiving the voice recognition result each time, if the flag bit is true, it indicates that the voice recognition result is not received for the first time during the display of the recognition page (i.e., the voice recognition result received during the display of the recognition page), and the watch may not call recognition Animated- > Start (), so that the recognition animation is played continuously based on the existing playing progress.

Otherwise, if the flag bit of the recognition page is false, which indicates that the voice recognition result is received for the first time during the display period of the recognition page (i.e., the voice recognition result received before the display of the recognition page), the watch may call recognition Animated- > Start () to Start playing the recognition animation in a circulating manner, and then set the flag bit of the recognition page to true.

In one embodiment, the watch may set the flag bit of the identification page to false after finishing displaying the identification page, so as to support the watch to start to play the identification animation again when the identification page is displayed next time.

By setting the flag bit of the recognition page as required, the watch continuously plays the recognition animation during the display of the recognition page without being affected by the user's subsequent new voice.

When a terminal device (such as a mobile phone, a watch and the like) realizes a voice interaction function, the problem that the display effect of animation is affected due to the superposition display of transitional animation and page animation may exist. Next, technical contents that can solve this problem will be described.

In the embodiment shown in fig. 8, the watch may display the transition animation first and then display the page animation in the process of displaying the result page, thinking page and listening to the page, so as to support the continuous playing of the animation in the process of switching the page.

In a possible implementation manner, taking a thinking page display as an example, an identifier (such as an Id of an image) of a first frame image of a line-to-point transition animation may be set in a code program of a thinking page display process, so that each image of the transition animation may be sequentially displayed from the first frame image of the transition animation according to the identifier, to realize a display process of the transition animation. And the identification of the first frame image of the thinking animation can be set in the code program of the thinking page display flow, so that the display process of the thinking animation can be realized by sequentially displaying all the images of the thinking animation from the first frame image of the thinking animation after the display of the transitional animation is finished according to the identification. Wherein the first frame image of the transitional animation and the first frame image of the thinking animation may be located at different layers at the same display position. Since the first frame image of the transitional animation is at the display position of the thinking animation although the transitional animation is not played during the process of playing the thinking animation, the problem that the first frame image of the transitional animation and the thinking animation are displayed in a superimposed manner when the thinking animation starts to be played can exist, and therefore the voice interaction experience of a user is affected.

In the case where the wristwatch switches from displaying the results page to displaying the listening page, a schematic diagram when the wristwatch starts displaying the listening page may be as shown in fig. 9. Referring to the schematic diagram when the wristwatch shown in fig. 9 displays a listening page, when the wristwatch starts displaying a listening animation of the listening page, there is a case where the listening animation and a ball-to-line transitional animation are superimposed and displayed, and the superimposed animation image is an image as shown by reference numeral 901.

To solve the problem of the superimposed display of the transitional animation and the page animation, in one embodiment of the present application, the state of the transitional animation may be set to an invisible state after the transitional animation is played and before the page animation is played. For example, the state of the transitional animation may be set to false.

By setting the state of the line-to-point transition animation to false after the line-to-point transition animation is displayed, the first frame image of the line-to-point transition animation is not displayed when the watch displays the thinking animation, and thus the situation that the first frame image of the transition animation and the thinking animation are overlapped and displayed when the thinking animation is displayed can be avoided.

Referring to fig. 10, one embodiment of the present application provides a software framework for a watch, including an application layer 21, a framework layer 22, a kernel layer 23, a hardware abstraction layer 24, and a hardware driver layer 25.

The application layer 21 may include a variety of applications such as an interconnection 211 (e.g., a voice assistant application), a health application 212 (e.g., a heart rate application), a sports application 213 (e.g., a professional sports application), a system application 214 (e.g., an alarm application), and the like.

The framework layer 22 includes an application framework 221, a base library 222, an algorithm library 223 (including sports algorithms), a legacy bluetooth protocol stack 224, and a bluetooth low energy protocol stack 225. The application framework 221 includes, among other things, a user interface framework 2211, system base capabilities 2212 (including interworking and voice services), underlying software service capabilities 2213, sports health service capabilities 2214 (including sports services), hardware service capabilities 2215.

Wherein, the watch may rely on UIKit in the user interface frame 2211 to achieve animated display and text display of the voice assistant when using the voice assistant. The functional implementation of UIKit may rely on library functions of the open source library of JS (JavaScript), c++, etc. in the user interface framework 2211.

The kernel layer 23 includes an operating system 231.

The hardware abstraction layer 24 includes keys 241, a touch screen 242, a Flash memory 243, a display 244, bluetooth 245.

The hardware driver layer 25 includes a touch screen driver 251, a Flash driver 252, a display screen driver 253, and a bluetooth driver 254.

Referring to fig. 10, in response to a user's operation of the voice assisted shortcut in key 241, the watch may initialize the voice assistant application and display the initialization page through an internal display related component through the display.

The watch can acquire a voice signal of voice uttered by the user through an internal voice service.

Based on the interconnection and intercommunication related components in the watch, the watch can communicate with external equipment such as a mobile phone through a Bluetooth communication technology, for example, the watch sends collected voice signals to the mobile phone and receives voice recognition results, voice interaction results and the like fed back by the mobile phone.

If the voice interaction result comprises voice interaction result information, the watch can display a result page comprising the voice interaction result information through the display screen. During the time that the watch displays the results page, the user may touch a designated area of the display screen as needed to request the next voice interaction flow to begin.

If the voice interaction result comprises an application display instruction, the watch can display a corresponding application page, such as an alarm clock page, a heart rate page, a professional sports page and the like, through the display screen.

The Flash memory of the watch can be used for storing code programs of the watch for realizing voice interaction functions and can be used for storing a picture set of page animations.

The software framework of the watch shown in fig. 10 may further include other components, for example, the application layer 21 may further include applications such as contacts for implementing communication functions, calls, call records, and other communication applications, and the hardware abstraction layer 24 may further include sensors such as acceleration, gyroscopes, positioning modules for implementing sensing functions, which are not listed here.

Referring to fig. 11, an embodiment of the present application provides a voice interaction method, which may include the following steps 1101 to 1102. The method can be applied to electronic equipment such as wearable equipment and terminal equipment.

Step 1101, initializing a voice assistant in response to an activation request for the voice assistant of the electronic device.

Referring to fig. 2, a user may press a key 201 of a watch for a long time to issue an activation request, and the watch may initialize a voice assistant after receiving the activation request, so that the voice assistant is in an operation state.

In one embodiment, the user may speak a voice corresponding to the activation word to the terminal device to request the terminal device to activate the voice assistant.

Step 1102, after initializing the voice assistant, displaying a first page of the voice assistant in a first state, the first page including a first animation corresponding to the first state. The first state is a speech listening state, a speech recognition result display state, a speech interaction result display state, or a target state (i.e., a thinking state described in other embodiments of the present application) between the speech recognition result display state and the speech interaction result display state.

In one embodiment, after initializing the voice assistant, the terminal device may sequentially display a listening page, a recognition page, and a result page, and may display a thinking page as needed and implement a page skip display based on the timeliness of the generation of the voice interaction result. The related art implementation of displaying the page when the voice assistant is in the listening state, the recognition state, the thinking state, and the result state respectively by the terminal device may refer to the description of other embodiments of the present application, and will not be described herein.

In one embodiment of the voice interaction method shown in fig. 11, the voice interaction method may further include: in the process of initializing the voice assistant, the picture set of the first animation is loaded into the memory of the electronic device. As such, the step of the electronic device displaying the first page when the voice assistant is in the first state may include: and displaying the picture set of the first animation loaded in the memory.

In one embodiment, a set of pictures of the page animation may be obtained from the GUI file and loaded into memory.

The time consumption of picture loading of the page animation in different states can be different, so that the page animation with high time consumption of picture loading can be preloaded during the initialization of the voice assistant. The page animation preloaded during initialization of the voice assistant may include listening to the animation, identifying the animation, thinking the animation, resulting in some or all of the animation.

In one embodiment, the electronic device may place the first frame of the preloaded page animation in different layers at the same display position, and display the corresponding page animation in different page display processes by controlling the display period of each layer.

In one embodiment, the electronic device may also preload the transitional animations during initialization of the voice assistant and place the first frame of picture of each preloaded transitional animation (which is used to identify the picture when the transitional animation begins to be displayed) in a different layer at the display location described above.

By preloading the page animation into the memory during initializing the voice assistant, the problem that the animation playing is not smooth due to picture loading time consumption can be avoided.

In one embodiment of the voice interaction method shown in fig. 11, the number of pictures of the first animation is less than or equal to the number of allowed display pictures of the single play animation of the electronic device. As such, the step of the electronic device displaying a first page of the voice assistant in the first state may include: and circularly playing the first animation.

During the page display, page animation can be circularly played so as to promote the voice interaction experience of the user.

In one embodiment, the number of pictures for each page animation displayed by the electronic device may be limited.

The number of the pictures of the page animation is limited according to the equipment performance of the electronic equipment, so that the page animation can be circularly and continuously played, and the problem that the animation playing is discontinuous due to incomplete animation playing caused by excessive number of the pictures is avoided.

In one embodiment of the voice interaction method shown in fig. 11, the step of the electronic device to cyclically play the first animation may include: performing a loop play of the first animation using the first frame rate; the time of playing the first animation once by using the first frame rate is within the allowable time range of playing the animation once by the electronic device. For example, the time taken to play the first animation once using the first frame rate is less than or equal to the allowable time for the electronic device to play the animation once.

In one embodiment, the frame rate of play of each page animation displayed by the electronic device may be limited.

Through reasonably limiting the playing frame rate of the page animation, the situation that the page animation is played too fast or too slowly can be avoided, the page animation is displayed at a proper playing speed of the page animation, and voice interaction experience of a user is improved.

In one embodiment of the voice interaction method shown in fig. 11, in the case that the first state is a voice recognition result display state, the step of displaying, by the electronic device, a first page when the voice assistant is in the first state may include: and under the condition that the voice recognition result is obtained, if the first identification information corresponding to the first page is the first information, displaying the first page by starting to circularly play the first animation and displaying the voice recognition result. And if the first identification information is the second information, the first animation is played through continuous circulation, and the voice recognition result is displayed so as to display the first page.

If the first information is the first information, a related function can be called to start to circularly play the recognition animation, and the currently received voice recognition result is displayed. If the second information is the second information, the recognition animation starts to be circularly played, so that the related function is not called to enable the recognition animation to be continuously circularly played, and the currently received voice recognition result is displayed.

In one embodiment, the first information may be true and the second information may be false.

Possibly, in the case that the first state is a speech recognition result display state, the speech interaction method may further include: after starting the loop play of the first animation, the first identification information is set to other information (such as second information) different from the first information.

After the recognition animation is circularly played, the identification information of the recognition page is further modified, so that the recognition animation is circularly played when the voice recognition result is obtained for the first time during the display of the recognition page, and the recognition animation is continuously circularly played when the voice recognition result is obtained for the first time, thereby avoiding discontinuous playing of the recognition animation caused by repeatedly restarting playing the recognition animation during the display of the recognition page.

Possibly, in the case that the first state is a speech recognition result display state, the speech interaction method may further include: after finishing displaying the first page, the first identification information is set as the first information.

After the identification page is displayed, the identification information of the identification page is modified, so that the cyclic playing of the identification animation can be started when the identification page is displayed again, and the situation that the identification animation is not displayed during the period of displaying the identification page again is avoided.

After the electronic equipment newly obtains the voice recognition result, the electronic equipment executes the operation of starting to circularly play the recognition animation according to the requirement based on the different identification information of the recognition page so as to enable the recognition animation to start to circularly play or continue to circularly play. Therefore, the problem that the recognition animation triggered by the voice sent by the user repeatedly restarts playing can be avoided, and the electronic equipment is supported to continuously play the recognition animation during the display of the recognition page.

In one embodiment of the voice interaction method shown in fig. 11, the second page is a page when the voice assistant is in the second state, the second page includes a second animation corresponding to the second state, and the first state and the second state are two states of a voice listening state, a voice recognition result display state, a voice interaction result display state and a target state. Based on this, the voice interaction method may further include: a first transition animation is displayed that transitions from a first animation to a second animation.

The page animation of the first page and the page animation of the second page are different or have larger difference, so that when the first page is switched to the second page, transition animation for transitioning animation effects between the two page animations can be displayed, and the continuity switching effects of the page animations can be presented. Referring to fig. 8, based on the page animation of the page in the embodiment shown in fig. 8, the transition animation may be a wire-to-ball transition animation, a wire-to-point transition animation, or the like.

Optionally, the voice interaction method may further include: after the first transitional animation is displayed, the state of the first transitional animation is set to the invisible state, and the second animation is displayed.

In one embodiment, by setting the transitional animation to the invisible state, the transitional animation can be displayed without being superimposed when the page animation is displayed. By modifying the state of the transition animation to the invisible state after the transition animation is displayed, the situation that the transition animation and the page animation are displayed in a superimposed manner can be avoided.

In one embodiment of the voice interaction method shown in fig. 11, in a case where the first state is a voice recognition result display state and the first page includes a first voice recognition result, the voice interaction method may further include: and acquiring first processing information, wherein the first processing information is information obtained by processing the first voice recognition result.

By processing the recognized voice recognition result, the mobile phone can generate first processing information and send the first processing information to the watch, and the watch can display a corresponding page according to the content of the first processing information.

And if the first processing information comprises a first jump instruction, displaying a third page of the voice assistant in a target state, wherein the third page comprises an animation corresponding to the target state. For example, if the mobile phone delays generating the voice interaction result, the first jump instruction may be fed back to jump the watch from the recognition page to the thinking page.

And if the first processing information comprises the voice interaction result information, displaying a fourth page of the voice assistant in the voice interaction result display state, wherein the fourth page comprises the acquired voice interaction result information and the animation corresponding to the voice interaction result display state. For example, if the user sends out the information inquiry voice, the voice interaction result generated by the mobile phone may include voice interaction result information, so that the watch jumps from the identification page to the result page.

And if the first processing information comprises an application display instruction, displaying a page of the application program corresponding to the acquired application display instruction. For example, if the user makes a voice requesting to open the application, the voice interaction result generated by the mobile phone may include an application display instruction, so that the watch jumps from the identification page to the page of the corresponding application.

In one embodiment, referring to fig. 8, the case where the first jump instruction is acquired may be the case where the watch receives a message status of 0 or 1; the condition of acquiring the voice interaction result information can be that the watch receives a message state of 2 and receives the voice interaction result information; the case where the application display instruction is acquired may be a case where the wristwatch receives a message state of 3.

Based on different voice interaction demands of users and uncertainty of timeliness of generation of voice interaction results, the electronic equipment can jump to a result page, a thinking page or an application page as required after displaying the identification page so as to match the demands of the users and the timeliness of generation of the voice interaction results.

In one embodiment of the voice interaction method shown in fig. 11, after displaying the third page in the target state of the voice assistant, the voice interaction method may further include: and acquiring a second voice recognition result or second processing information, wherein the second processing information is information obtained by processing the first voice recognition result.

For example, if the user utters a voice during the time when the watch displays the thinking page, the watch may acquire a voice recognition result of the voice uttered by the mobile phone to the user, otherwise, may acquire the second processing information generated by delaying the mobile phone.

If the second speech recognition result is obtained, a fifth page of the speech assistant in the speech recognition result display state is displayed, the fifth page including the second speech recognition result and the first animation (here, expressed as a recognition animation). For example, if the user utters speech during the time the watch displays the thought page, the cell phone may generate and feed back corresponding speech recognition results to cause the watch to jump from the thought page back to the recognition page.

And if the second processing information is acquired and the second processing information comprises the voice interaction result information, displaying a sixth page of the voice assistant in the voice interaction result display state, wherein the sixth page comprises the acquired voice interaction result information and the animation corresponding to the voice interaction result display state. For example, if the user makes an information inquiry voice, the voice interaction result generated by the mobile phone delay can include voice interaction result information, so that the watch jumps from the thinking page to the result page.

And if the second processing information is acquired and the second processing information comprises an application display instruction, displaying a page of the application program corresponding to the acquired application display instruction. For example, if the user makes a voice requesting to open the application, the voice interaction result generated by the mobile phone delay may include an application display instruction to cause the watch to jump from the thinking page to the page of the corresponding application.

In one embodiment, referring to fig. 8, the case where the second speech recognition result is obtained may be the case where the watch receives the speech recognition result; the condition of acquiring the voice interaction result information can be that the watch receives a message state of 2 and receives the voice interaction result information; the case where the application display instruction is acquired may be a case where the wristwatch receives a message state of 3.

Under the condition that the electronic equipment delays to obtain the voice interaction result, the electronic equipment displays a corresponding thinking page, and after the thinking page is displayed, the electronic equipment jumps to a result page, an application page or jumps back to an identification page as required to match the user requirement.

The specific technical implementation of the voice interaction method shown in fig. 11 may refer to the related technical descriptions of other embodiments of the present application, and will not be described herein.

The embodiment of the application also provides a voice interaction device, which comprises: an initialization module for initializing a voice assistant in response to an activation request for the voice assistant of the electronic device; the display module is used for displaying a first page when the voice assistant is in a first state after initializing the voice assistant, wherein the first page comprises a first animation corresponding to the first state; the first state is a voice listening state, a voice recognition result display state, a voice interaction result display state or a target state between the voice recognition result display state and the voice interaction result display state.

The embodiment of the application also provides an electronic chip, the task processing chip is installed in electronic equipment (UE), and the electronic chip comprises: a processor for executing computer program instructions stored on a memory, wherein the computer program instructions, when executed by the processor, trigger an electronic chip to perform the method steps provided by any of the method embodiments of the present application.

The embodiment of the application also provides a terminal device, which comprises a communication module, a memory for storing computer program instructions and a processor for executing the program instructions, wherein when the computer program instructions are executed by the processor, the terminal device is triggered to execute the method steps provided by any method embodiment of the application.

The embodiment of the application also provides a server device, which comprises a communication module, a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the server device to execute the method steps provided by any of the method embodiments of the application.

The embodiment of the application further provides an electronic device, where the electronic device includes a plurality of antennas, a memory for storing computer program instructions, a processor for executing the computer program instructions, and a communication apparatus (such as a communication module capable of implementing 5G communication based on NR protocol), where the computer program instructions, when executed by the processor, trigger the electronic device to execute the method steps provided by any of the method embodiments of the application.

In particular, in an embodiment of the present application, one or more computer programs are stored in the memory, which include instructions that, when executed by the apparatus, cause the apparatus to perform the method steps described in the embodiments of the present application.

Further, the devices, apparatuses, modules illustrated in the embodiments of the present application may be implemented by a computer chip or entity, or by a product having a certain function.

It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied therein.

In several embodiments provided herein, any of the functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application.

In particular, a computer readable storage medium is provided in an embodiment of the present application, where a computer program is stored, when the computer program is run on a computer, to make the computer execute the method steps provided in the embodiment of the present application.

The present embodiments also provide a computer program product comprising a computer program which, when run on a computer, causes the computer to perform the method steps provided by the embodiments of the present application.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the elements is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or units, which may be in electrical, mechanical, or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.

The integrated units, implemented in the form of software functional units, may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a Processor (Processor) to perform part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk or an optical disk.

In the present embodiments, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as a combination of electronic hardware, computer software, and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be apparent to those skilled in the art that the same and similar parts of the various embodiments in the present application are referred to each other for convenience and brevity of description. For example, specific working processes of the system, the device and the unit described in the embodiments of the present application may refer to corresponding processes in the embodiments of the method of the present application, which are not described herein again.

The foregoing description is only illustrative of the present application and is not intended to limit the scope of the present application, which is defined by the claims.

Claims

1. A method of voice interaction, comprising:

initializing a voice assistant of the electronic device in response to an activation request for the voice assistant;

After initializing the voice assistant, displaying a first page of the voice assistant in a first state, the first page including a first animation corresponding to the first state;

wherein the first state is a speech listening state, a speech recognition result display state, a speech interaction result display state, or a target state between the speech recognition result display state and the speech interaction result display state;

and displaying a first page of the voice assistant in the first state under the condition that the first state is the voice recognition result display state, wherein the first page comprises the following components:

if the first identification information corresponding to the first page is first information under the condition that the voice recognition result is obtained, the first animation is played in a circulating mode, and the voice recognition result is displayed so as to display the first page;

the method further comprises the steps of:

after starting to circularly play the first animation, setting the first identification information as other information different from the first information;

after finishing displaying the first page, the first identification information is set as the first information.

2. The method according to claim 1, wherein the method further comprises:

in the process of initializing the voice assistant, loading the picture set of the first animation into the memory of the electronic equipment;

the displaying the first page when the voice assistant is in the first state includes:

and displaying the picture set of the first animation loaded in the memory.

3. The method of claim 1 or 2, wherein the number of pictures of the first animation is less than or equal to the allowed number of pictures of the single play animation of the electronic device;

and circularly playing the first animation.

4. A method according to claim 3, wherein the cyclically playing the first animation comprises:

performing a loop play of the first animation using a first frame rate;

the time of playing the first animation once by using the first frame rate is within the allowable time range of playing the animation once by the electronic equipment.

5. The method according to claim 1 or 2, wherein a second page is a page when the voice assistant is in a second state, the second page including a second animation corresponding to the second state, the first state and the second state being two states of the voice listening state, the voice recognition result display state, the voice interaction result display state, and the target state;

The method further comprises the steps of:

displaying a first transition animation that transitions from the first animation to the second animation;

after the first transitional animation is displayed, setting the state of the first transitional animation to an invisible state, and displaying the second animation.

6. The method according to claim 1, wherein in the case where the first state is the speech recognition result display state and the first page includes a first speech recognition result, the method further comprises:

acquiring first processing information, wherein the first processing information is information obtained by processing the first voice recognition result;

if the first processing information comprises a first jump instruction, displaying a third page of the voice assistant in the target state, wherein the third page comprises an animation corresponding to the target state;

if the first processing information comprises voice interaction result information, displaying a fourth page of the voice assistant in the voice interaction result display state, wherein the fourth page comprises the acquired voice interaction result information and animation corresponding to the voice interaction result display state;

and if the first processing information comprises an application display instruction, displaying a page of an application program corresponding to the acquired application display instruction.

7. The method of claim 6, wherein after the displaying the third page with the voice assistant in the target state, the method further comprises:

acquiring a second voice recognition result or second processing information, wherein the second processing information is information obtained by processing the first voice recognition result;

if the second voice recognition result is obtained, displaying a fifth page of the voice assistant in the voice recognition result display state, wherein the fifth page comprises the second voice recognition result and the first animation;

if the second processing information is acquired and the second processing information comprises voice interaction result information, displaying a sixth page of the voice assistant in the voice interaction result display state, wherein the sixth page comprises the acquired voice interaction result information and animation corresponding to the voice interaction result display state;

and if the second processing information is acquired and the second processing information comprises an application display instruction, displaying a page of an application program corresponding to the acquired application display instruction.

8. A voice interaction device, comprising:

An initialization module for initializing a voice assistant of the electronic device in response to an activation request of the voice assistant;

the display module is used for displaying a first page when the voice assistant is in a first state after initializing the voice assistant, wherein the first page comprises a first animation corresponding to the first state;

the voice interaction device is also used for:

9. An electronic chip, comprising:

a processor for executing computer program instructions stored on a memory, wherein the computer program instructions, when executed by the processor, trigger the electronic chip to perform the method of any of claims 1-7.

10. An electronic device comprising one or more memories for storing computer program instructions, and one or more processors, wherein the computer program instructions, when executed by the one or more processors, trigger the electronic device to perform the method of any of claims 1-7.

11. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when run on a computer, causes the computer to perform the method according to any of claims 1-7.