CN111462744B

CN111462744B - Voice interaction method and device, electronic equipment and storage medium

Info

Publication number: CN111462744B
Application number: CN202010256089.3A
Authority: CN
Inventors: 何亚欣
Original assignee: Shenzhen Skyworth RGB Electronics Co Ltd
Current assignee: Shenzhen Skyworth RGB Electronics Co Ltd
Priority date: 2020-04-02
Filing date: 2020-04-02
Publication date: 2024-01-30
Anticipated expiration: 2040-04-02
Also published as: CN111462744A; WO2021196617A1

Abstract

The application provides a voice interaction method, a device, electronic equipment and a storage medium, wherein the voice interaction method comprises the following steps: after a voice wake-up instruction is received, enabling a first audio channel for transmitting interactive audio information, and setting a second audio channel currently enabled for transmitting spot broadcasting audio information to be in a target state; wherein the target state is a closed state or a bass state; after receiving the interactive audio instruction, searching the interactive audio information matched with the interactive audio instruction, and transmitting the interactive audio information to a playing end for playing through the first audio channel. According to the method and the device, the volume of the interactive audio information and the volume of the spot broadcast audio information can be controlled based on different audio channels respectively, the identification efficiency of the interactive audio information is improved, and then the efficiency of man-machine interaction is improved.

Description

Voice interaction method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of speech recognition technologies, and in particular, to a speech interaction method, a device, an electronic apparatus, and a storage medium.

Background

In recent years, as voice recognition technology gradually matures, voice recognition technology is often applied to the field of smart televisions to implement a voice interaction function between the smart television and a user, for example, channel switching, volume adjustment, and switching on or off of the smart television based on voice.

In practice, in the process of using the smart television, the user can perform voice interaction with the smart television while watching the television program to acquire the voice interaction content fed back by the smart television, at this time, the user is difficult to distinguish the television program from the voice interaction content under the influence of the television program being played, so that the efficiency of identifying the voice interaction content by the user is reduced, and the interaction efficiency of the user and the smart television is further reduced.

Disclosure of Invention

Accordingly, an object of the embodiments of the present application is to provide a voice interaction method, apparatus, electronic device, and storage medium, which can control volumes of interactive audio information and on-demand audio information based on different audio channels, so as to improve recognition efficiency of the interactive audio information, and further improve efficiency of man-machine interaction.

In a first aspect, an embodiment of the present application provides a voice interaction method, where the method includes:

after a voice wake-up instruction is received, enabling a first audio channel for transmitting interactive audio information, and setting a second audio channel currently enabled for transmitting spot broadcasting audio information to be in a target state; wherein the target state is a closed state or a bass state;

after receiving the interactive audio instruction, searching the interactive audio information matched with the interactive audio instruction, and transmitting the interactive audio information to a playing end for playing through the first audio channel.

In one possible implementation manner, the voice interaction method further comprises:

and after receiving the voice closing instruction, closing the first audio channel, and switching the second audio channel from the target state to the working state.

and searching interactive audio information matched with the voice awakening instruction, and transmitting the interactive audio information to a playing end through the first audio channel for playing.

In a possible implementation manner, the first audio channel is further used for transmitting prompt audio information, and after the first audio channel is enabled, the method further comprises:

if the prompt audio information to be played is detected, determining the transmission sequence of the prompt audio information and the interactive audio information based on the audio information transmission priority corresponding to the first audio channel;

and based on the transmission sequence, the prompt audio information and the interactive audio information are sequentially transmitted to a playing end through the first audio channel for playing.

In a possible implementation manner, the first audio channel is further used for transmitting prompt audio information, and after the first audio channel is closed, the method further includes:

if the prompt audio information to be played is detected, enabling the first audio channel, and setting the second audio channel which is currently enabled to be in a target state;

and transmitting the prompt audio information to a playing end for playing through the first audio channel, closing the first audio channel after the prompt audio information is played, and switching the second audio channel from a target state to a working state.

In one possible implementation manner, the switching the second audio channel from the target state to the working state includes:

re-enabling the second audio channel in the off state;

or,

and switching the second audio channel from a bass state to a preset volume state.

In a second aspect, an embodiment of the present application provides a voice interaction device, where the device includes:

the first setting module is used for starting a first audio channel for transmitting interactive audio information after receiving a voice awakening instruction and setting a second audio channel currently started for transmitting the on-demand audio information into a target state; wherein the target state is a closed state or a bass state;

the searching module is used for searching the interactive audio information matched with the interactive audio instruction after receiving the interactive audio instruction;

the first transmission module is used for transmitting the interactive audio information to a playing end for playing through the first audio channel.

In a possible implementation manner, the voice interaction device further comprises:

and the second setting module is used for closing the first audio channel after receiving the voice closing instruction and switching the second audio channel from the target state to the working state.

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the voice interaction method of any of the first aspects.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the voice interaction method of any of the first aspects.

According to the voice interaction method, the voice interaction device, the electronic equipment and the storage medium, after a voice awakening instruction is received, a first audio channel used for transmitting interactive audio information is started, and a second audio channel used for transmitting the video-on-demand information which is started currently is set to be in a target state; wherein the target state is a closed state or a bass state; after receiving the interactive audio instruction, searching for interactive audio information matched with the interactive audio instruction, and transmitting the interactive audio information to a playing end for playing through a first audio channel.

Further, according to the voice interaction method, the voice interaction device, the electronic equipment and the storage medium, after the prompt audio information to be played is detected, the transmission sequence of the prompt audio information and the interaction audio information can be determined based on the audio information transmission priority corresponding to the first audio channel; and based on the transmission sequence, the prompt audio information and the interactive audio information are sequentially transmitted to the playing end for playing through the first audio channel, wherein the first audio channel is used for transmitting the prompt audio information and the interactive audio information, so that the number of occupied audio channels can be reduced, the utilization rate of the first audio channel is improved, and based on the audio information transmission priority corresponding to the first audio channel, the transmission sequence of the audio information is determined, and the transmission quality of the audio information of the first audio channel can be improved.

In order to make the above objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flowchart of a voice interaction method provided in an embodiment of the present application;

FIG. 2 is a flowchart illustrating another method of voice interaction provided by an embodiment of the present application;

FIG. 3 is a flowchart illustrating another method of voice interaction provided by an embodiment of the present application;

fig. 4 shows a schematic structural diagram of a voice interaction device according to an embodiment of the present application;

fig. 5 shows a schematic diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, but not all embodiments. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

In the current stage, the user can perform voice interaction with the intelligent television while watching the television program in the process of using the intelligent television to acquire voice interaction content fed back by the intelligent television, at the moment, the user is difficult to distinguish the television program from the voice interaction content under the influence of the television program being played, so that the efficiency of identifying the voice interaction content by the user is reduced, and the interaction efficiency of the user and the intelligent television is further reduced.

Based on the above problems, the embodiments of the present application provide a voice interaction method, apparatus, electronic device, and storage medium, which enable a first audio channel for transmitting interactive audio information after receiving a voice wake-up instruction, and set a second audio channel currently enabled for transmitting unicast audio information to a target state; wherein the target state is a closed state or a bass state; after receiving the interactive audio instruction, searching for interactive audio information matched with the interactive audio instruction, and transmitting the interactive audio information to a playing end for playing through a first audio channel.

The present invention is directed to a method for manufacturing a semiconductor device, and a semiconductor device manufactured by the method.

In order to enable those skilled in the art to use the present disclosure, the following embodiments are presented in connection with a specific application scenario "smart tv field". It will be apparent to those having ordinary skill in the art that the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present application. Although the present application is described primarily in the context of "smart tv technology," it should be understood that this is but one exemplary embodiment.

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all, of the embodiments of the present application. The components of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

For the sake of understanding the present embodiment, a detailed description is first provided of a voice interaction method disclosed in the embodiments of the present application.

As shown in fig. 1, a flowchart of a voice interaction method provided in an embodiment of the present application includes the following steps:

s101, after a voice awakening instruction is received, enabling a first audio channel for transmitting interactive audio information, and setting a second audio channel currently enabled for transmitting broadcast audio information to be in a target state; wherein the target state is a closed state or a bass state.

In this embodiment of the present invention, the smart tv includes at least two audio channels, where a first audio channel is used to transmit interactive audio information, and a second audio channel is used to transmit broadcast audio information, for example, a television play requested by a user, and the first audio channel corresponds to a first volume, and the second audio channel corresponds to a second volume, which can respectively adjust the first volume and the second volume.

When the intelligent television plays the audio information (the voice interaction function between the user and the intelligent television is not started), the first audio channel is in a closed state, the second audio channel is in an open state, after a voice wake-up instruction is received, the voice interaction function between the user and the intelligent television is started, at the moment, the first audio channel is switched from the closed state to the open state, and the second audio channel is switched from the open state to the target state, so that voice interaction between the user and the intelligent television is realized.

When the first audio channel is switched from the closed state to the open state, the first volume corresponding to the first audio channel is set to be the first preset volume, wherein the first preset volume can be the volume locally pre-stored or the volume selected by the user according to the self requirement.

The switching of the second audio channel from the on state to the target state specifically includes: and switching the second audio channel from the on state to the off state, or switching the second audio channel from the on state to the bass state, wherein the bass state corresponds to a second preset volume, namely the second volume corresponding to the second audio channel is set to the second preset volume, and further, the on-demand audio information is transmitted to the playing end through the second audio channel to be played at the second volume (the second preset volume), wherein the second preset volume is smaller than the first preset volume.

In the embodiment of the application, the voice wake-up instruction is received by one of the following modes:

1. specific voice information sent by the user is received, for example, "turn on voice interaction function", "let us chat bar".

2. And detecting that the user clicks (long presses) a voice interaction control key on the intelligent television.

3. And detecting that the user clicks (long-press and slide) the voice interaction control on the display screen of the intelligent television.

S102, after receiving the interactive audio instruction, searching interactive audio information matched with the interactive audio instruction, and transmitting the interactive audio information to a playing end for playing through the first audio channel.

In this embodiment of the present application, a correspondence between an interactive audio instruction and interactive audio information is locally pre-stored, after the interactive audio instruction is received, based on the correspondence, the interactive audio information corresponding to the interactive audio instruction is searched, and the searched interactive audio information is transmitted to a playing end through the first audio channel to be played, where the playing end includes a display screen and a sound box.

The interactive audio information comprises interactive voice information and interactive video information, the interactive voice information is transmitted to a sound box of the intelligent television through a first audio channel to be played at the first volume (first preset volume), and the interactive video information is transmitted to a display screen of the intelligent television through the first audio channel to be played.

In this embodiment of the present application, the interactive audio instruction may correspond to fixed interactive audio information, or may correspond to dynamic interactive audio information, for example, after receiving the interactive audio instruction ("how large the your display screen is"), the fixed interactive audio information ("my screen is 55 inches of wool") matched with the interactive audio instruction is transmitted to the playing end through the first audio channel for playing, or after receiving the interactive audio instruction ("the current point"), the dynamic interactive audio information ("the current time is three pm three points") matched with the interactive audio instruction is transmitted to the playing end through the first audio channel for playing.

In practice, after receiving the interactive audio instruction, the processor in the smart television feeds back the interactive audio information matched with the interactive audio instruction to the user, and responds to the interactive audio instruction, for example, after receiving the interactive audio instruction (for example, lowering the brightness of the display screen), the processor transmits the interactive audio information (for example, lowering the brightness of the display screen) to the playing end through the first audio channel for playing, and responds to the interactive audio instruction (for example, lowering the brightness of the display screen) for lowering the brightness of the display screen.

According to the voice interaction method, the volume of the interactive audio information and the volume of the spot broadcast audio information can be controlled based on different audio channels respectively, the recognition efficiency of the interactive audio information is improved, and then the efficiency of human-computer interaction is improved.

Further, the voice interaction method further comprises the following steps:

In this embodiment of the present application, after receiving a voice shutdown instruction, the first audio channel is switched from the on state to the off state, and the second audio channel is switched from the target state to the working state.

The switching of the second audio channel from the target state to the working state includes: re-enabling the second audio channel in the off state; or switching the second audio channel from the bass state to a preset volume state.

Specifically, when the target state is the closed state, the second audio channel is switched from the closed state to the open state, and the second volume corresponding to the second audio channel is recovered; and when the target state is the bass state, recovering the second volume corresponding to the second audio channel, or setting the second volume corresponding to the second audio channel as a third preset volume, wherein the third preset volume is a locally pre-stored volume.

In the embodiment of the application, the voice closing instruction is received by one of the following modes:

1. specific voice information sent by the user is received, for example, "turn off voice interaction function", "let us end chat bar".

4. After the interactive audio instruction is received, the next interactive audio instruction is not received within a preset time range.

Further, after receiving the voice wake-up instruction, the method further includes:

In the embodiment of the application, after receiving the voice wake-up instruction, the interactive audio information matched with the voice wake-up instruction is transmitted to the playing end through the first audio channel for playing.

As a possible implementation manner, the interactive audio information corresponding to the voice wake-up instruction is locally pre-stored, and after the voice wake-up instruction is received, the interactive audio information is transmitted to the playing end through the first audio channel for playing.

For example, the interactive audio information corresponding to the voice wake-up instruction is pre-stored locally ("very happy with you chat"), and after the voice wake-up instruction is received, the interactive audio information is played ("very happy with you chat").

Further, as shown in fig. 2, the first audio channel is further used for transmitting prompt audio information, and after the first audio channel is enabled, the method further includes:

s201, if the prompt audio information to be played is detected, determining the transmission sequence of the prompt audio information and the interactive audio information based on the audio information transmission priority corresponding to the first audio channel.

S202, based on the transmission sequence, the prompt audio information and the interactive audio information are sequentially transmitted to a playing end through the first audio channel to be played.

In combination with step 201 and step 202, the first audio channel is used for transmitting the prompt audio information and the interactive audio information, if the prompt audio information to be played is detected in the process of performing voice interaction between the smart television and the user, a first transmission time range corresponding to the prompt audio information to be played and a second transmission time range corresponding to the interactive audio information to be played are obtained, if the first transmission time range is intersected with the second transmission time range, the transmission order of the prompt audio information to be played and the interactive audio information to be played is determined based on the audio information transmission priority corresponding to the first audio channel, and the prompt audio information to be played and the interactive audio information to be played are sequentially transmitted to a playing end for playing through the first audio channel according to the transmission order; if the first transmission time range is not intersected with the second transmission time range, the prompt audio information to be played is transmitted in the first transmission time range, and the interactive audio information to be played is transmitted in the second transmission time range.

For example, a first transmission time range corresponding to the prompt audio information to be played is from 30 minutes 00 seconds at 11 th 3 month and 31 th in 2020 to 30 minutes 05 seconds at 11 th 31 th month and 31 th in 2020, and a second transmission time range corresponding to the interactive audio information to be played is from 30 minutes 03 seconds at 11 th 31 th month and 31 th in 2020 to 30 minutes 10 seconds at 11 th 31 th month and 31 th in 2020, so that the prompt audio information to be played and the interactive audio information to be played are sequentially transmitted through the first audio channel according to the audio information transmission priority corresponding to the first audio channel.

Further, as shown in fig. 3, the first audio channel is further configured to transmit a prompt audio message, and after the first audio channel is closed, the method further includes:

s301, if prompt audio information to be played is detected, enabling a first audio channel, and setting a second audio channel which is currently enabled to be in a target state.

In the embodiment of the application, the first audio channel is used for transmitting prompt audio information and interaction audio information, when the voice interaction function between the user and the smart television is not started, the first audio channel is in a closed state, the second audio channel is in an open state, after the prompt audio information to be played is detected, the first audio channel is switched from the closed state to the open state, and the second audio channel is switched from the open state to the target state.

S302, transmitting the prompt audio information to a playing end for playing through the first audio channel, closing the first audio channel after the prompt audio information is played, and switching the second audio channel from a target state to a working state.

In this embodiment, the prompt audio information is transmitted to the playing end through the first audio channel for playing, each prompt audio information corresponds to a playing time, after the playing time, the first audio channel is switched from an on state to an off state, and the second audio channel is switched from a target state to a working state.

The prompting audio information comprises prompting voice information and prompting video information, the prompting voice information is transmitted to a sound box of the intelligent television through a first audio channel to be played at the first volume (first preset volume), and the prompting video information is transmitted to a display screen of the intelligent television through the first audio channel to be played.

Based on the same inventive concept, the embodiment of the present application further provides a voice interaction device corresponding to the voice interaction method, and since the principle of solving the problem by the device in the embodiment of the present application is similar to that of the voice interaction method described in the embodiment of the present application, the implementation of the device may refer to the implementation of the method, and the repetition is omitted.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a voice interaction device according to an embodiment of the present application, where the voice interaction device includes:

a first setting module 401, configured to enable a first audio channel for transmitting interactive audio information after receiving a voice wake-up instruction, and set a second audio channel currently enabled for transmitting on-demand audio information to a target state; wherein the target state is a closed state or a bass state;

the searching module 402 is configured to search, after receiving an interactive audio instruction, interactive audio information that matches the interactive audio instruction;

the first transmission module 403 is configured to transmit the interactive audio information to a playing end for playing through the first audio channel.

the second transmission module is used for searching the interactive audio information matched with the voice awakening instruction and transmitting the interactive audio information to a playing end for playing through the first audio channel.

In a possible implementation manner, the first audio channel is further used for transmitting prompt audio information, and the voice interaction device further comprises:

the determining module is used for determining the transmission sequence of the prompt audio information and the interactive audio information based on the audio information transmission priority corresponding to the first audio channel if the prompt audio information to be played is detected;

and the third transmission module is used for sequentially transmitting the prompt audio information and the interactive audio information to a playing end for playing through the first audio channel based on the transmission sequence.

the third setting module is used for starting the first audio channel if the prompt audio information to be played is detected, and setting the second audio channel which is started currently as a target state;

the fourth transmission module is used for transmitting the prompt audio information to a playing end for playing through the first audio channel;

and the fourth setting module is used for closing the first audio channel after the prompt audio information is played, and switching the second audio channel from the target state to the working state.

In one possible implementation manner, the second setting module switches the second audio channel from the target state to the working state, or the fourth setting module switches the second audio channel from the target state to the working state, including:

re-enabling the second audio channel in the off state;

or,

According to the voice interaction device provided by the embodiment of the application, the volumes of the interaction audio information and the spot broadcasting audio information can be controlled based on different audio channels respectively, the recognition efficiency of the interaction audio information is improved, and then the efficiency of human-computer interaction is improved.

Referring to fig. 5, fig. 5 is an electronic device 500 provided in an embodiment of the present application, where the electronic device 500 includes: the system comprises a processor 501, a memory 502 and a bus, wherein the memory 502 stores machine-readable instructions executable by the processor 501, and when the electronic device is running, the processor 501 communicates with the memory 502 through the bus, and the processor 501 executes the machine-readable instructions to perform the steps of the voice interaction method as described above.

In particular, the memory 502 and the processor 501 can be general-purpose memories and processors, which are not particularly limited herein, and the above-described voice interaction method can be performed when the processor 501 runs a computer program stored in the memory 502.

Corresponding to the above-mentioned voice interaction method, the embodiment of the present application further provides a computer readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the above-mentioned voice interaction method are executed.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described system and apparatus may refer to corresponding procedures in the foregoing method embodiments, which are not described herein again. In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. The above-described apparatus embodiments are merely illustrative, and the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, and for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some communication interface, indirect coupling or communication connection of devices or modules, electrical, mechanical, or other form.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored on a non-volatile computer readable storage medium executable by a processor. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Finally, it should be noted that: the foregoing examples are merely specific embodiments of the present application, and are not intended to limit the scope of the present application, but the present application is not limited thereto, and those skilled in the art will appreciate that while the foregoing examples are described in detail, the present application is not limited thereto. Any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or make equivalent substitutions for some of the technical features within the technical scope of the disclosure of the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of voice interaction, the method comprising:

after receiving an interactive audio instruction, searching interactive audio information matched with the interactive audio instruction, wherein the interactive audio information comprises interactive voice information and interactive video information; transmitting the interactive voice information in the interactive audio information to the sound equipment of the intelligent television through the first audio channel, and playing the interactive voice information to a user at a first volume; transmitting the interactive video information to a display screen of the intelligent television through the first audio channel to play the interactive video information to a user; the interactive audio information corresponding to the interactive audio instruction is fixed interactive audio information or dynamic interactive audio information;

if the prompt audio information to be played is detected, a first transmission time range corresponding to the prompt audio information to be played and a second transmission time range corresponding to the interactive audio information to be played are obtained, and if the first transmission time range is intersected with the second transmission time range, the transmission sequence of the prompt audio information to be played and the interactive audio information to be played is determined based on the audio information transmission priority corresponding to the first audio channel.

2. The voice interaction method of claim 1, wherein the method further comprises:

3. The voice interaction method according to claim 1, wherein after receiving a voice wake-up instruction, the method further comprises:

4. The voice interaction method of claim 1, wherein the first audio channel is further used to transmit a hint audio message, and wherein after the first audio channel is enabled, the method further comprises:

5. The voice interaction method of claim 1, wherein the first audio channel is further used for transmitting a hint audio message, and wherein after closing the first audio channel, the method further comprises:

6. The voice interaction method according to claim 2 or 5, wherein the switching the second audio channel from the target state to the working state comprises:

re-enabling the second audio channel in the off state;

or,

7. A voice interaction device, the device comprising:

the searching module is used for searching the interactive audio information matched with the interactive audio instruction after receiving the interactive audio instruction, wherein the interactive audio information comprises interactive voice information and interactive video information;

the first transmission module is used for transmitting the interactive voice information in the interactive audio information to the sound equipment of the intelligent television through the first audio channel to play at a first volume; transmitting the interactive video information to a display screen of the intelligent television through the first audio channel for playing; the interactive audio information corresponding to the interactive audio instruction is fixed interactive audio information or dynamic interactive audio information;

8. The voice interaction device of claim 7, wherein the device further comprises:

9. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine readable instructions executable by the processor, the processor and the memory in communication over the bus when the electronic device is running, the processor executing the machine readable instructions to perform the steps of the voice interaction method of any of claims 1 to 6.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of the voice interaction method according to any of claims 1 to 6.