WO2020062862A1

WO2020062862A1 - Voice interactive control method and device for speaker

Info

Publication number: WO2020062862A1
Application number: PCT/CN2019/084834
Authority: WO
Inventors: 祁学文; 吴海全; 迟欣; 张恩勤; 曹磊; 师瑞文
Original assignee: 深圳市冠旭电子股份有限公司
Priority date: 2018-09-28
Filing date: 2019-04-28
Publication date: 2020-04-02
Also published as: CN110970032A

Abstract

The present invention relates to the technical field of speaker control and provides a voice interactive control method and device for a speaker, the method comprising: presetting an application scenario and uploading the same to a server (S201); receiving and caching voice data returned by the server that corresponds to the application scenario (S202); receiving a control command and matching the control command against the voice data (S203); and if a match is found, playing the voice data matching the control command (S204). The present invention reduces the waiting time for smart voice interaction and thus realizes a quick response of voice interaction.

Description

Method and device for interactive control of speaker voice

Technical field

The invention belongs to the technical field of speaker control, and particularly relates to a method and device for interactive control of speaker voice.

Background technique

At present, smart terminal devices (such as smart speakers, mobile phones, Bluetooth speakers used with mobile phones, etc.) are more and more connected to the cloud server, and users can interact with the cloud server through voice through the network. However, when the network is not good, it takes a long time to upload the voice to the cloud server and return the recognition result from the cloud server. Due to the delay of the network transmission, users often need to wait a long time after speaking. Get the voice return from the cloud, the user's voice interaction wait time is too long, the experience is not very good; currently some offline voice recognition is implemented locally, but it is basically limited to offline command parsing, the application scenarios are limited, and the user wants to achieve the voice playback effect still Poor.

technical problem

In view of this, the embodiments of the present invention provide a method and a device for voice interaction control of a speaker to solve the problems of application scenarios based on network transmission delay during voice interaction and offline command analysis in the prior art.

Technical solutions

A first aspect of the embodiments of the present invention provides a method for interactively controlling a voice of a speaker, including:

Preset application scenarios and upload the application scenarios to the server;

Receiving and buffering voice data corresponding to the application scenario returned by the server;

Receiving a control instruction and matching the control instruction with the voice data;

If the matching is successful, the voice data matching the control instruction is played.

A second aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:

Receiving a control instruction sent by a speaker, and matching the control instruction with the voice data;

If the matching is successful, the voice data matching the control instruction is sent to a speaker for voice data playback.

A third aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:

Wi-Fi speakers upload application scenarios to the server in advance;

The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;

Wi-Fi speakers receive and buffer the voice data;

Wi-Fi speakers receive control instructions and match the control instructions with buffered voice data;

If the matching is successful, the Wi-Fi speaker plays voice data matching the control instruction.

A fourth aspect of the embodiments of the present invention provides a method for interactive voice control of a speaker, including:

The application scenario is uploaded to the server in advance by the mobile terminal;

The server generates voice data corresponding to the application scenario according to the application scenario and sends it to the mobile terminal;

The mobile terminal receives and buffers the voice data;

The Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal;

The mobile terminal matches the buffered voice data according to the control instruction;

If the matching is successful, the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;

Bluetooth speakers play voice data.

A fifth aspect of the embodiments of the present invention provides a device for interactively controlling a voice of a speaker, including:

A first identification module, configured to preset an application scenario and upload the application scenario to a server;

A first database for receiving and buffering voice data corresponding to the application scenario returned by the server;

A second identification module, configured to receive a control instruction and match the control instruction with the voice data;

The playing module is configured to play voice data matching the control instruction if the matching is successful.

A sixth aspect of the embodiments of the present invention provides a mobile terminal, including:

A third identification module, configured to preset an application scenario and upload the application scenario to a server;

A second database for receiving and buffering voice data corresponding to the application scenario returned by the server;

A fourth identification module, configured to receive a control instruction sent by the speaker, and match the control instruction with the voice data;

The sending module is configured to: if the matching is successful, send the voice data matching the control instruction to a speaker for voice data playback.

A seventh aspect of the embodiments of the present invention provides a speaker voice interactive control system. The system includes: a Wi-Fi speaker and a server;

The Wi-Fi speaker is used to upload an application scenario to a server in advance, receive and buffer voice data corresponding to the application scenario, receive control instructions, and match the control instruction with the buffered voice data. Play the voice data matching the control instructions;

The server is configured to analyze the application scenario and generate corresponding voice data.

According to an eighth aspect of the embodiments of the present invention, a computer-readable storage medium is provided. The computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements steps of a speaker voice interactive control method.

Beneficial effect

Compared with the prior art, the embodiment of the present invention has a beneficial effect: when performing voice interaction through a speaker, the embodiment of the present invention uploads an application scenario in advance to a server and buffers voice data corresponding to the application scenario, and upon receiving a control instruction , The control instruction is matched with the buffered voice data, and the voice data is played directly after the matching is successful, which reduces the network delay during the voice interaction and improves the response rate of the voice interaction.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only the present invention. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained according to these drawings without paying creative labor.

FIG. 1 is a schematic diagram of an applicable system scenario of a method for interactively controlling a speaker voice provided by an embodiment of the present invention; FIG.

2 is a schematic diagram of a speaker implementation process of a voice interactive control method according to an embodiment of the present invention;

FIG. 3 is a schematic flowchart of a mobile terminal implementing a voice interaction control method according to an embodiment of the present invention; FIG.

4 is a diagram illustrating an example of an interaction flow of a Wi-Fi speaker system of a voice interaction control method according to an embodiment of the present invention;

5 is a diagram illustrating an example of an interaction flow of a Bluetooth speaker system of a voice interaction control method according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a device for interactively controlling a voice of a speaker provided by an embodiment of the present invention.

Embodiments of the invention

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are provided in order to thoroughly understand the embodiments of the present invention. However, it should be clear to a person skilled in the art that the present invention can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary details.

It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and / or components, but does not exclude one or more other features , The whole, steps, operations, elements, components, and / or their presence or addition.

It should also be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in the description of the invention and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly indicates otherwise.

It should be further understood that the term "and / or" used in the present description and the appended claims refers to any combination of one or more of the listed items and all possible combinations, and includes these combinations .

In order to explain the technical solution of the present invention, the following description is made through specific embodiments.

FIG. 1 is a schematic diagram of a system scenario to which a method for interactively controlling a speaker voice provided by an embodiment of the present invention is applied. For convenience of explanation, only a part related to this embodiment is shown.

Referring to FIG. 1, the system may include: a Wi-Fi speaker 11 and a server 12; wherein, the Wi-Fi speaker 11 may upload an application scenario to a server, and buffer voice data corresponding to the application scenario returned by the server 12, The Fi speaker 11 receives the control instruction, matches the control instruction with the buffered voice data, and successfully plays the voice data that matches the control instruction.

The system may further include a Bluetooth speaker 21, a mobile terminal 22, and a server 12. Among them, the mobile terminal 22 uploads an application scenario to the server 12, receives and caches voice data corresponding to the application scenario returned by the server 12, and the Bluetooth speaker 21 receives control And sends the control instruction to the mobile terminal 22 to match the voice data buffered by the mobile terminal 22. If the matching is successful, the mobile terminal 22 sends the voice data matching the control instruction to the Bluetooth speaker 21 for voice data playback.

The speaker voice interaction method in the system scenario shown in FIG. 1 is described in detail below:

FIG. 2 shows a schematic diagram of a speaker implementation process of the voice interactive control method according to an embodiment of the present invention. In this embodiment, the execution subject of the process is the Wi-Fi speaker 11 shown in FIG. 1, which is detailed as follows:

In step S201, an application scenario is set in advance and the application scenario is uploaded to the server.

In the embodiment of the present invention, the application scenario may be a user-used scenario, such as weather, time, and the like; or an application scenario required by a user according to an actual application, such as song search, schedule, and the like. If it is a Wi-Fi speaker, the common application scenarios or individual application scenarios can be directly counted, and the application scenarios can be uploaded to the server through the network in advance.

In addition, the application scenario may be some default commonly used scenarios; it may also be a statistical application scenario of the user's usual usage habits, and the frequently used scenario is used as the common scenario; it may also be an application scenario set by the user himself; Constant input learning, new application scenarios obtained.

Further, the step of setting an application scenario in advance and uploading the application scenario to a server includes:

Receive application scenarios for touch input and upload to the server; and / or application scenarios for voice input and upload to the server; and / or application scenarios for key input and upload to the server.

In the embodiment of the present invention, the Wi-Fi speaker may be provided with a touch display screen, an entire column of microphones, and buttons. Thus, the setting of the application scene may be performed by touching the scene input or by voice. You can also input scenes by pressing keys.

In addition, if it is a Bluetooth speaker, the scene can be input through the Bluetooth speaker and sent to the mobile terminal through the Bluetooth protocol; the mobile terminal performs statistics of the scene and uploads the application scene to the server through the network.

Step S202: Receive and cache the voice data corresponding to the application scenario returned by the server.

In the embodiment of the present invention, the voice data corresponding to the application scenario is the server parsing one or more application scenarios to generate data information corresponding to different application scenarios, which may include single character or multiple character voice data information. If it is a Wi-Fi speaker, it receives the returned voice data and caches the value of the Wi-Fi speaker locally; if it is a Bluetooth speaker, it receives the voice data through the mobile terminal and caches it locally on the mobile terminal.

In addition, the buffered voice data corresponding to the application scenario may be periodically acquired and buffered according to a set time interval to ensure the real-time nature of the voice data, such as weather, which changes over time. Application scenario, set a fixed time to update the cache; or cache similar style songs in advance according to the user's listening habits.

Step S203: Receive a control instruction, and match the control instruction with the voice data.

In the embodiment of the present invention, the control instruction may be a voice control instruction or a signal control instruction input through a remote control or other equipment; the voice data corresponds to a variety of application scenarios, and the control instruction and the buffered voice data The matching can be performed by extracting keyword matching or string matching to obtain voice data matching the input control instruction.

In addition, the control instructions do not distinguish the complexity or simplicity of the instructions. No matter what kind of control instructions are used, the control instructions are matched with the locally buffered voice data. If it is a Wi-Fi speaker, after receiving the voice control command, it will directly match the local buffered voice data; if it is a Bluetooth speaker, it will receive the control command and send the control command to the mobile terminal. Match the voice data buffered by the terminal.

In step S204, if the matching is successful, the voice data matching the control instruction is played.

In the embodiment of the present invention, in the locally cached database, there is voice data matching the control instruction, and the voice data matching the control instruction is played through the speaker of the speaker; since the speaker can set different playback sound effect modes, according to The environment can set the playback sound effect mode to achieve better playback effects.

In addition, for a Bluetooth speaker, voice data matching the control instruction can be sent to the Bluetooth speaker through a mobile terminal to play the voice data.

Further, the step of receiving the control instruction and matching the control instruction with the voice data includes:

Uploading the control instruction to the server while matching the control instruction with the voice data;

If the match is unsuccessful, the voice data received from the server is received, and the feedback voice data is played.

In the embodiment of the present invention, after receiving a control instruction and identifying it, it will also be uploaded to the server when it is matched with the local voice data. If the control instruction matches the local voice data successfully, it will no longer receive the server generated For voice data, the matched voice data is directly played through the speaker of the speaker; if there is no voice data matching the control instruction in the local cache, the voice data generated by the server is parsed and the returned voice data is played.

In addition, the Wi-Fi speaker terminal or mobile terminal saves the application scenario information corresponding to the current control instruction according to the newly input current control instruction, and continuously adds statistics and learns new application scenarios to add more comprehensive application scenarios.

FIG. 3 is a schematic diagram of a mobile terminal implementation process of the voice interaction control method according to an embodiment of the present invention. In this embodiment, the execution body of the process is the mobile terminal 22 shown in FIG. 1, which is detailed as follows:

In step S301, an application scenario is set in advance and the application scenario is uploaded to the server.

In the embodiment of the present invention, the voice interaction performed by the Bluetooth speaker is realized through the connection with the mobile terminal; the Bluetooth speaker can perform recording and playback, and the voice interaction and voice feedback are completed through the application of the mobile terminal.

The mobile terminal can set application scenarios. The application scenarios do not distinguish between complex or simple application scenarios, and only count the application scenarios that are commonly used or newly input by the user. Common application scenarios such as: weather, time, etc .; The setting of application scenarios can be some default common scenarios; it can also count the application scenarios of the user's usual usage habits, and use the frequently used scenarios as the common scenarios; it can also be the application scenario set by the user; it can also be through continuous Enter learning and get new application scenarios.

In addition, the setting of the application scenario may be one or more of touch input, voice input, or key input; and the set application scenario is uploaded to the server through the network; wherein the server may be an independent server or a mobile terminal Cloud corresponding to your application.

Step 302: Receive and cache the voice data corresponding to the application scenario returned by the server.

In the embodiment of the present invention, the voice data corresponding to the application scenario is the server parsing one or more application scenarios to generate voice data corresponding to different application scenarios, which may include single character or multiple character voice data Information; receive voice data and cache to mobile terminal.

Step S303: Receive a control instruction sent by the speaker, and match the control instruction with the voice data.

In addition, the control instructions do not distinguish the complexity or simplicity of the instructions. No matter what kind of control instructions are used, the control instructions are matched with the locally buffered voice data.

After receiving the control instruction sent by the Bluetooth speaker and performing voice recognition, it is matched with the voice data stored locally in the mobile terminal to obtain voice data that matches the control instruction.

In step 304, if the matching is successful, the voice data matching the control instruction is sent to a speaker for voice data playback.

In the embodiment of the present invention, if there is voice data matching the control instruction in the database cached locally on the mobile terminal, the voice data matching the control instruction is sent to the Bluetooth speaker and played through the speaker of the Bluetooth speaker. Set different playback sound effect modes, and you can set the playback sound effect mode according to the environment to achieve better playback effects.

In addition, while matching the control instruction with the voice data, upload the control instruction to the server; if the matching is unsuccessful, receiving the voice data fed back by the server and sending the feedback voice data to the Bluetooth speaker, Play the voice data and save the application scenario corresponding to the current control instruction.

Optionally, after receiving and buffering the voice data corresponding to the application scenario returned by the server, the method further includes:

Receiving a control instruction sent by a speaker, and sending the control instruction to a server;

If the voice data corresponding to the application scenario does not match the control instruction, receiving the voice data corresponding to the control instruction fed back by the server;

Sending voice data corresponding to the control instruction to a speaker for voice data playback.

In the embodiment of the present invention, after receiving the control instruction sent by the Bluetooth speaker and identifying it, it will also be uploaded to the server when it matches the local voice data. If the control instruction matches the local voice data of the mobile terminal successfully, it will no longer be Receive the voice data generated by the server, and directly send the matched voice data to the Bluetooth speaker for playback; if there is no voice data matching the control instructions in the local cache of the mobile terminal, the voice data generated by the server is received and sent to Bluetooth speaker for voice data playback.

In addition, the mobile terminal saves the application scenario information corresponding to the current control instruction according to the newly inputted current control instruction, and continuously counts and learns new application scenarios to add more comprehensive application scenarios.

FIG. 4 is a diagram illustrating an example of an interactive flow of a Wi-Fi speaker system of a voice interaction control method according to an embodiment of the present invention. The execution subject participating in the interactive flow includes the Wi-Fi speaker 11 and the server 12 in FIG. 1. The implementation principle is consistent with the implementation principle of each execution subject side described in FIG. 2 and FIG. 3, so this interaction process is only briefly described, and is not described in detail:

1. Wi-Fi speakers upload application scenarios to the server in advance;

2. The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;

3. The Wi-Fi speaker receives and buffers the voice data;

4. The Wi-Fi speaker receives the control instructions and matches the control instructions with the buffered voice data;

5. If the matching is successful, the Wi-Fi speaker plays voice data matching the control instruction.

Optionally, the Wi-Fi speaker can receive an application scenario of voice input, key input, or touch input.

FIG. 5 shows an example flowchart of a Bluetooth speaker system interaction process of a voice interaction control method provided by an embodiment of the present invention; the execution subject participating in the interaction process includes the Bluetooth speaker 21, the mobile terminal 22, and the server 12 in FIG. The implementation principle is consistent with the implementation principle of each execution subject side described in FIG. 2 and FIG. 3, so this interaction process is only briefly described, and is not described in detail:

1. The application scenario is uploaded to the server in advance by the mobile terminal;

2. The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the mobile terminal;

3. The mobile terminal receives and buffers the voice data;

4. The Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal;

5. The mobile terminal matches the buffered voice data according to the control instruction;

6. If the matching is successful, the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;

7. Bluetooth speaker plays voice data.

Optionally, the method for controlling voice interaction further includes:

The mobile terminal sends the control instruction to the server;

The server generates corresponding voice data according to the control instruction, and sends the voice data to the mobile terminal;

The mobile terminal receives the voice data fed back by the server and sends the voice data to the Bluetooth speaker.

It should be noted that other sorting schemes that can be easily conceived by those skilled in the art within the technical scope disclosed in the present invention should also fall within the protection scope of the present invention, and are not described in detail here.

According to the embodiment of the present invention, when voice interaction is performed through a speaker, the application scenario is uploaded to the server in advance and the voice data corresponding to the application scenario is buffered. When a control instruction is received, the control instruction is matched with the buffered voice data, and the matching is successful Then the voice data is directly played, which reduces the network delay during the voice interaction and improves the response rate of the voice interaction.

It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.

Referring to FIG. 6, which is a schematic diagram of a device for voice interactive control of a speaker provided in an embodiment of the present invention. In a Wi-Fi speaker, the device includes:

A first identification module 61, configured to preset an application scenario and upload the application scenario to a server;

A first database 62, configured to receive and cache voice data corresponding to the application scenario returned by the server;

A second identification module 63, configured to receive a control instruction and match the control instruction with the voice data;

The playing module 64 is configured to play voice data matching the control instruction if the matching is successful.

Further, an embodiment of the present invention provides a mobile terminal, where the mobile terminal includes:

Further, an embodiment of the present invention provides a speaker voice interactive control system, including: a Wi-Fi speaker and a server; wherein the Wi-Fi speaker is used to upload an application scenario to the server in advance, receive and cache and apply The voice data corresponding to the scene receives the control instructions and matches the control instructions with the buffered voice data. If the matching is successful, the voice data matching the control instructions is played;

Optionally, an embodiment of the present invention further provides a speaker voice interactive control system, including: a Bluetooth speaker, a mobile terminal, and a server; wherein the Bluetooth speaker is used to receive a control instruction and send the control instruction to the mobile terminal, And receiving voice data sent by the mobile terminal, and playing the voice data;

The mobile terminal is used for uploading an application scenario to a server, receiving and buffering voice data corresponding to the application scenario, receiving a control instruction sent by a Bluetooth speaker, and matching the control instruction with the buffered voice data. The voice data matching the control instruction is sent to the Bluetooth speaker;

The server is configured to analyze an application scenario, generate corresponding voice data, and send the voice data to a mobile terminal.

An embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program implements steps of a speaker voice interactive control method when the computer program is executed by a processor.

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional modules is used as an example. In practical applications, the above functions can be allocated by different functional units and modules as required. That is, the internal structure of the mobile terminal is divided into different functional units or modules to complete all or part of the functions described above. Each functional module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit, and the integrated unit may be implemented in the form of hardware. , Can also be implemented in the form of software functional units. In addition, the specific names of the functional modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For the specific working process of the module in the foregoing mobile terminal, reference may be made to the corresponding process in the foregoing method embodiment, and details are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed apparatus / terminal device and method may be implemented in other ways. For example, the device / terminal device embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, such as multiple units. Or components can be combined or integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.

When the integrated module / unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on such an understanding, the present invention implements all or part of the processes in the methods of the above embodiments, and may also be completed by a computer program instructing related hardware. The computer program may be stored in a computer-readable storage medium. The computer When the program is executed by a processor, the steps of the foregoing method embodiments can be implemented. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdictions. For example, in some jurisdictions, the computer-readable medium Excludes electric carrier signals and telecommunication signals.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, but not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in Within the scope of the present invention.

Claims

A method for interactive voice control of a speaker, comprising:

Preset application scenarios and upload the application scenarios to the server;

Receiving and buffering voice data corresponding to the application scenario returned by the server;

Receiving a control instruction and matching the control instruction with the voice data;

If the matching is successful, the voice data matching the control instruction is played.
The method for interactive voice control of a speaker according to claim 1, wherein the preset application scenario and uploading the application scenario to a server comprise:

Receive touch input application scenarios and upload them to the server;

and / or

Receive application scenarios for voice input and upload to the server;

and / or

Receive application scenarios for key input and upload to the server.
The method for interactive voice control of a speaker according to claim 1, wherein receiving the control instruction and matching the control instruction with the voice data comprises:

Uploading the control instruction to the server while matching the control instruction with the voice data;

If the match is unsuccessful, the voice data received from the server is received, and the feedback voice data is played.
A method for interactive voice control of a speaker, comprising:

Preset application scenarios and upload the application scenarios to the server;

Receiving and buffering voice data corresponding to the application scenario returned by the server;

Receiving a control instruction sent by a speaker, and matching the control instruction with the voice data;

If the matching is successful, the voice data matching the control instruction is sent to a speaker for voice data playback.
The method for interactive voice control of a speaker according to claim 4, after receiving and buffering voice data corresponding to the application scenario returned by the server, further comprising:

Receiving a control instruction sent by a speaker, and sending the control instruction to a server;

If the voice data corresponding to the application scenario does not match the control instruction, receiving the voice data corresponding to the control instruction fed back by the server;

Sending voice data corresponding to the control instruction to a speaker for voice data playback.
A method for interactive voice control of a speaker, comprising:

Wi-Fi speakers upload application scenarios to the server in advance;

The server generates voice data corresponding to the application scenario according to the application scenario, and sends the voice data to the Wi-Fi speaker;

Wi-Fi speakers receive and buffer the voice data;

Wi-Fi speakers receive control instructions and match the control instructions with buffered voice data;

If the matching is successful, the Wi-Fi speaker plays voice data matching the control instruction.
The method for interactive voice control of a speaker according to claim 6, further comprising:

Application scenarios that receive voice input, key input, or touch input.
A method for interactive voice control of a speaker, comprising:

The application scenario is uploaded to the server in advance by the mobile terminal;

The server generates voice data corresponding to the application scenario according to the application scenario and sends it to the mobile terminal;

The mobile terminal receives and buffers the voice data;

The Bluetooth speaker receives the control instruction and sends the control instruction to the mobile terminal;

The mobile terminal matches the buffered voice data according to the control instruction;

If the matching is successful, the mobile terminal sends the voice data matching the control instruction to the Bluetooth speaker;

Bluetooth speakers play voice data.
The method of voice interaction control according to claim 8, further comprising:

The mobile terminal sends the control instruction to the server;

The server generates corresponding voice data according to the control instruction, and sends the voice data to the mobile terminal;

The mobile terminal receives the voice data fed back by the server and sends the voice data to the Bluetooth speaker.
A device for voice interactive control of a speaker, which is characterized by comprising:

A first identification module, configured to preset an application scenario and upload the application scenario to a server;

A first database for receiving and buffering voice data corresponding to the application scenario returned by the server;

A second identification module, configured to receive a control instruction and match the control instruction with the voice data;

The playing module is configured to play voice data matching the control instruction if the matching is successful.
A mobile terminal, comprising:

A third identification module, configured to preset an application scenario and upload the application scenario to a server;

A second database for receiving and buffering voice data corresponding to the application scenario returned by the server;

A fourth identification module, configured to receive a control instruction sent by the speaker, and match the control instruction with the voice data;

The sending module is configured to: if the matching is successful, send the voice data matching the control instruction to a speaker for voice data playback.
A speaker voice interactive control system, characterized in that the system includes: a Wi-Fi speaker and a server;

The Wi-Fi speaker is used to upload an application scenario to a server in advance, receive and buffer voice data corresponding to the application scenario, receive control instructions, and match the control instruction with the buffered voice data. Play the voice data matching the control instructions;

The server is configured to analyze the application scenario and generate corresponding voice data.
A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 9 are implemented.