CN113112984A

CN113112984A - Control method, device and equipment of intelligent sound box and storage medium

Info

Publication number: CN113112984A
Application number: CN202010031260.0A
Authority: CN
Inventors: 程高飞
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2021-07-13
Anticipated expiration: 2040-01-13
Also published as: CN113112984B

Abstract

The application discloses a control method, a control device, control equipment and a storage medium of an intelligent sound box, and relates to the technical field of voice. The specific implementation scheme is as follows: segmenting a text to be played according to a preset text length through the terminal equipment; and sequentially sending the segmented texts to a server, converting the received segmented texts into audio by the server, and sending the audio to the intelligent sound box for playing. In the embodiment, by segmenting the text to be played, the audio playing can be performed after each segment of segmented text is converted into the audio, and the text to be played can be played without waiting for the whole text to be played to be converted into the audio through an intelligent sound box, so that the response speed is improved, and the text conversion audio can be conveniently stopped at any time and the playing can be stopped at any time; the process of converting the text into the audio is completed by the server, so that the system resource consumption and the electric quantity consumption of the terminal equipment are reduced; and the intelligent sound box plays the audio of the segmented text, so that the audio channel of the terminal equipment is not occupied, and the terminal equipment is not influenced to play other audio.

Description

Control method, device and equipment of intelligent sound box and storage medium

Technical Field

The application relates to the technical field of computers, in particular to the technical field of voice.

Background

With the rapid development of internet technology, terminal devices such as mobile phones, tablet computers, notebook computers, desktop computers, intelligent wearable devices and the like have become an indispensable part of the life of people, and people can check some text contents on the terminal devices or convert the text contents into audio for playing.

In the prior art, the process of converting text content into audio for playing generally requires that the process of converting text into audio be performed on a terminal device, and then the text is played by a speaker of the terminal device or a sound box connected to the terminal device after the conversion process is completed.

In the prior art, the process of converting the text into the audio by the terminal device consumes system resources and electric quantity, and the terminal device cannot play other audio because the audio channel of the terminal device needs to be occupied when the audio is played.

Disclosure of Invention

The application provides a control method, a control device, control equipment and a storage medium of an intelligent sound box, so that in the process of converting text content into audio for playing, the system resource consumption and the electric quantity consumption of terminal equipment are reduced, an audio channel of the terminal equipment is not occupied, and the terminal equipment is not influenced to play other audio.

The application provides a control method of a smart sound box in a first aspect, which is applied to a terminal device, and the method comprises the following steps:

segmenting a text to be played according to a preset text length;

and sequentially sending the segmented texts to a server, so that the server converts the received segmented texts into audio and sends the audio to an intelligent sound box for playing.

In one possible design, the sequentially sending the segmented texts to the server includes:

and sequentially sending the segmented texts to the server according to the playing progress of the intelligent sound box.

In a possible design, the sequentially sending the segmented texts to the server according to the playing progress of the smart sound box includes:

after the current segmented text is sent to the server, the playing progress of the intelligent sound box on the current segmented text is obtained from the server;

and sending the next segment of segmented text to the server according to the playing progress.

In one possible design, the method further includes:

generating a first playing stopping instruction according to the playing stopping operation of the user on the terminal equipment;

and according to the first playing stopping instruction, stopping sending the segmented text to the server, and sending the first playing stopping instruction to the server, so that the server stops the process of converting the segmented text into audio and controls the intelligent sound box to stop playing.

In one possible design, the method further includes:

receiving a second playing stopping instruction sent by the server, wherein the second playing stopping instruction is generated by the intelligent sound box according to the playing stopping operation of the user on the intelligent sound box and is sent to the server;

and stopping sending the segmented text to the server according to the second playing instruction.

A second aspect of the present application provides a method for controlling a smart speaker, which is applied to a server, the method including:

receiving segmented texts sequentially sent by terminal equipment, wherein the segmented texts are obtained by segmenting texts to be played by the terminal equipment according to preset text lengths;

and converting the received segmented text into audio, and sequentially sending the audio to the intelligent sound box for playing.

In one possible design, the receiving the segmented text sequentially transmitted by the terminal device includes:

receiving a current segmented text sent by the terminal equipment;

after converting the current segmented text into audio and sending the audio to the intelligent sound box for playing, acquiring the playing progress of the intelligent sound box on the current segmented text from the intelligent sound box;

and sending the playing progress to the terminal equipment so that the terminal equipment sends the next segment of segmented text to the server according to the playing progress.

In one possible design, the method further includes:

receiving a first playing stopping instruction sent by the terminal equipment, wherein the first playing stopping instruction is generated by the terminal equipment according to the playing stopping operation of a user on the terminal equipment;

and stopping the process of converting the segmented text into audio according to the first playing stopping instruction, and sending the first playing stopping instruction to the intelligent sound box so that the intelligent sound box stops playing according to the first playing stopping instruction.

In one possible design, the method further includes:

receiving a second playing stopping instruction sent by the intelligent sound box, wherein the second playing stopping instruction is generated by the intelligent sound box according to the playing stopping operation of a user on the intelligent sound box;

and stopping the process of converting the segmented text into audio according to the second playing stopping instruction, and sending the second playing stopping instruction to the terminal equipment so that the terminal equipment stops sending the segmented text to the server according to the second playing instruction.

The third aspect of the present application provides a control method for a smart speaker, which is applied to the smart speaker, the method including:

receiving audio of segmented texts sequentially sent by a server, wherein the audio of the segmented texts is obtained by segmenting texts to be played by the terminal equipment according to preset text lengths and sequentially sending the segmented texts to the server, and the server converts the received segmented texts into audio;

and sequentially playing the audio of the segmented text.

In one possible design, the method further includes:

and sending the playing progress of the audio of the current segmented text to the server.

In one possible design, the method further includes:

receiving a first playing stopping instruction sent by the server, wherein the first playing stopping instruction is generated by the terminal equipment according to the playing stopping operation of a user on the terminal equipment and is sent to the server;

and stopping playing according to the first playing stopping instruction.

In one possible design, the method further includes:

generating a second playing stopping instruction according to the playing stopping operation of the user on the intelligent sound box;

and stopping playing the audio according to the second playing stopping instruction, and sending the second playing stopping instruction to the server, so that the server stops the process of converting the segmented text into the audio, and controls the terminal equipment to stop sending the segmented text to the server.

The fourth aspect of the present application provides a control device of smart speaker, is applied to terminal equipment, the device includes:

the processing module is used for segmenting the text to be played according to the preset text length;

and the sending module is used for sequentially sending the segmented texts to the server, so that the server converts the received segmented texts into audio and sends the audio to the intelligent sound box for playing.

In one possible design, the sending module is to:

In one possible design, the apparatus further includes a control module to:

The fifth aspect of the present application provides a control device for a smart speaker, which is applied to a server, the device includes:

the receiving module is used for receiving segmented texts sequentially sent by the terminal equipment, wherein the segmented texts are obtained by segmenting texts to be played by the terminal equipment according to preset text lengths;

the processing module is used for converting the received segmented text into audio;

and the sending module is used for sequentially sending the audio of the segmented text to the intelligent sound box for playing.

In one possible design, the receiving module is configured to receive a current segmented text sent by the terminal device; after converting the current segmented text into audio and sending the audio to the intelligent sound box for playing, acquiring the playing progress of the intelligent sound box on the current segmented text from the intelligent sound box;

the sending module is further configured to send the playing progress to the terminal device, so that the terminal device sends a next segment of segmented text to the server according to the playing progress.

In a possible design, the receiving module is further configured to receive a first play stopping instruction sent by the terminal device, where the first play stopping instruction is generated by the terminal device according to a play stopping operation of a user on the terminal device;

the processing module is further used for stopping the process of converting the segmented text into the audio according to the first playing stopping instruction;

the sending module is further configured to send the first playing stopping instruction to the smart sound box, so that the smart sound box stops playing according to the first playing stopping instruction.

In a possible design, the receiving module is further configured to receive a second play stop instruction sent by the smart sound box, where the second play stop instruction is generated by the smart sound box according to a play stop operation of a user on the smart sound box;

the processing module is further configured to stop a process of converting the segmented text into audio according to the second play stop instruction;

the sending module is further configured to send the second play stopping instruction to the terminal device, so that the terminal device stops sending the segmented text to the server according to the second play stopping instruction.

The sixth aspect of the present application provides a controlling means of smart sound box, is applied to smart sound box, the device includes:

the receiving module is used for receiving audio of segmented texts sequentially sent by the server, the audio of the segmented texts is obtained by segmenting texts to be played by the terminal equipment according to preset text lengths and sequentially sent to the server, and the server converts the received segmented texts into audio;

and the playing module is used for sequentially playing the audio of the segmented text.

In one possible design, the apparatus further includes:

and the sending module is used for sending the playing progress of the audio of the current segmented text to the server.

In a possible design, the receiving module is further configured to receive a first play stopping instruction sent by the server, where the first play stopping instruction is generated by the terminal device according to a play stopping operation of a user on the terminal device and is sent to the server;

the playing module is further configured to stop playing according to the first instruction to stop playing.

In one possible design, the apparatus further includes:

the control module is used for generating a second playing stopping instruction according to the playing stopping operation of the user on the intelligent sound box;

the playing module is further used for stopping playing the audio according to the second playing stopping instruction;

and the sending module is used for sending the second playing stopping instruction to the server so as to enable the server to stop the process of converting the segmented text into the audio and control the terminal equipment to stop sending the segmented text to the server.

A seventh aspect of the present application provides an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

An eighth aspect of the present application provides an electronic apparatus comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

A ninth aspect of the present application provides an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

A tenth aspect of the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.

An eleventh aspect of the present application provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the second aspect.

A twelfth aspect of the present application provides a non-transitory computer-readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the third aspect.

A thirteenth aspect of the present application provides a computer program comprising program code for performing the method according to the first aspect when the computer program is run by a computer.

A fourteenth aspect of the present application provides a computer program comprising program code for performing the method according to the second aspect when the computer program is run by a computer.

A fifteenth aspect of the present application provides a computer program comprising program code for performing the method according to the third aspect when the computer program is run by a computer.

The sixteenth aspect of the present application provides a control system for a smart speaker, comprising:

the terminal equipment is used for segmenting the text to be played according to the preset text length and sequentially sending the segmented text to the server;

the server is used for converting the received segmented text into audio and sending the audio to the intelligent sound box;

and the intelligent sound box is used for playing the audio of the received segmented text.

One embodiment in the above application has the following advantages or benefits: segmenting a text to be played according to a preset text length through the terminal equipment; and sequentially sending the segmented texts to a server, so that the server converts the received segmented texts into audio and sends the audio to an intelligent sound box for playing. In the embodiment, by segmenting the text to be played, the audio playing can be performed after each segment of segmented text is converted into the audio, and the text to be played can be played without waiting for the whole text to be played to be converted into the audio through an intelligent sound box, so that the response speed is improved, and the text conversion audio can be conveniently stopped at any time and the playing can be stopped at any time; the process of converting the text into the audio is completed by the server, so that the system resource consumption and the electric quantity consumption of the terminal equipment are reduced; and the intelligent sound box plays the audio of the segmented text, so that the audio channel of the terminal equipment is not occupied, and the terminal equipment is not influenced to play other audio.

Other effects of the above-described alternative will be described below with reference to specific embodiments.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

fig. 1 is a schematic diagram of a control system of a smart sound box according to an embodiment of the present application;

fig. 2 is a flowchart of a control method of an intelligent sound box according to an embodiment of the present application;

fig. 3 is a flowchart of a control method for a smart sound box according to another embodiment of the present application;

fig. 4 is a flowchart of a control method for a smart sound box according to another embodiment of the present application;

fig. 5 is a structural diagram of a control device of a smart sound box according to an embodiment of the present application;

fig. 6 is a structural diagram of a control device of a smart sound box according to another embodiment of the present application;

fig. 7 is a structural diagram of a control device of a smart sound box according to another embodiment of the present application;

fig. 8 is a block diagram of an electronic device for implementing a method for controlling a smart speaker on a terminal device side according to an embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a method for controlling a smart speaker on a server side according to an embodiment of the present application;

fig. 10 is a block diagram of an electronic device for implementing a method for controlling a smart speaker on a smart speaker side according to an embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The method provided by the embodiment of the application is applied to a control system of an intelligent sound box shown in fig. 1, the control system of the intelligent sound box comprises a terminal device 10, a server 11 and an intelligent sound box 12, the terminal device 10 can be an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer and an intelligent wearable device, and is used for segmenting a text to be played according to a preset text length and sequentially sending the segmented text to the server 11; the server 11 may be a server, a server cluster composed of a plurality of servers, or a cloud computing service center, and is configured to convert the received segmented text into audio and send the audio to the smart sound box 12; smart sound box 12 may be configured to play audio of the received segmented text.

In the embodiment of the application, by segmenting the text to be played, the audio playing can be performed after each segment of segmented text is converted into the audio, and the text to be played can be played without waiting for the whole text to be played to be converted into the audio through an intelligent sound box, so that the response speed is improved, and the text conversion audio can be conveniently stopped at any time and the playing can be stopped at any time; the process of converting the text into the audio is completed by the server, so that the system resource consumption and the electric quantity consumption of the terminal equipment are reduced; and the intelligent sound box plays the audio of the segmented text, so that the audio channel of the terminal equipment is not occupied, and the terminal equipment is not influenced to play other audio.

The following describes the control process of the smart speaker in detail with reference to specific embodiments.

An embodiment of the present application provides a method for controlling an intelligent sound box, and fig. 2 is a flowchart of the method for controlling the intelligent sound box according to the embodiment of the present invention. The execution main body may be a terminal device, as shown in fig. 2, the method for controlling the smart sound box specifically includes the following steps:

s101, segmenting a text to be played according to a preset text length.

In this embodiment, because the text to be played may include more content, such as an electronic book (which may include more than tens of thousands of words), if the text to be played is not segmented, a long time is required to be spent when the text is subsequently converted into audio, the smart speaker needs to wait for the text to be played to be completely converted into audio for playing, and if the process of converting the text into the audio is interrupted, the conversion needs to be performed again, and in addition, when the audio is completely converted and played, the playing needs to be stopped, and system resources are wasted when the text is converted into the audio; by segmenting the text to be played, the audio can be played after each segment of segmented text is converted into audio, the intelligent sound box is not required to wait for the text to be played to be completely converted into audio for playing, the text is conveniently stopped from being converted into audio at any time, and system resources are not wasted. In this embodiment, the preset text length may be set according to actual requirements, and the terminal device may receive an operation instruction for setting the preset text length by a user, and set the preset text length according to the operation instruction. Alternatively, the segmentation can be performed in a segment of 200 words.

Further, before segmenting the text to be played, the method further includes obtaining the text to be played, where the text to be played may include, but is not limited to, a text input by a user, a text received by the user (for example, a short message), or a text obtained from a text file specified by the user (for example, an electronic book file). Optionally, the user may specify a text file on the terminal device, where the text file may include, but is not limited to, a text file in epub format, pdf format, and the like, and further obtain the text to be played from the text file, and the user may further specify a part of content (for example, specify page number, line number, and the like) in the text file as the text to be played.

And S102, sequentially sending the segmented texts to a server, converting the received segmented texts into audio by the server, and sending the audio to an intelligent sound box for playing.

In this embodiment, a terminal device segments a text to be played to obtain a plurality of segmented texts, the terminal device may sequentially send the segmented texts to a server, the server converts the received segmented texts into audio (audio stream) after receiving the segmented texts, optionally, an interface of a speech synthesis service may be invoked to convert the segmented texts into audio through the speech synthesis service, and then the server sequentially sends the audio of the segmented texts to an intelligent sound box, and the intelligent sound box sequentially plays the audio of the received segmented texts.

In this embodiment, the terminal device sequentially sends the segmented texts to the server, and specifically, the segmented texts can be sequentially sent to the server at certain time intervals, so that the server has sufficient time to convert the received segmented texts into audio and send the audio to the smart speaker for playing, and a user can conveniently stop audio playing at any time, thereby avoiding wasting system resources. Optionally, the sequentially sending the segmented texts to the server may specifically include: and sequentially sending the segmented texts to the server according to the playing progress of the intelligent sound box, namely, after each segment of segmented text is sent to the server by the terminal equipment, obtaining the playing progress of the intelligent sound box on the segmented text, and when the playing is finished (for example, the playing progress reaches a preset threshold), sending the next segment of segmented text to the server by the terminal equipment.

Specifically, the sending the segmented text to the server in sequence according to the playing progress of the smart sound box includes:

s1021, after the current segmented text is sent to the server, the playing progress of the intelligent sound box on the current segmented text is obtained from the server;

and S1022, sending the next segment of segmented text to the server according to the playing progress.

In this embodiment, the terminal device may communicate with the server in a long connection manner, so that the terminal device can obtain the playing progress in real time.

On the basis of any one of the above embodiments, the user can also control the smart sound box to stop playing on the terminal device or the smart sound box.

In an alternative embodiment, the process of controlling the smart sound box to stop playing may include:

s211, generating a first playing stopping instruction according to the playing stopping operation of the user on the terminal equipment;

s212, according to the first playing stopping instruction, stopping sending the segmented text to the server, and sending the first playing stopping instruction to the server, so that the server stops the process of converting the segmented text into audio and controls the intelligent sound box to stop playing.

In this embodiment, a user may control the smart speaker to stop playing on the terminal device, and a specific user performs a play stop operation on the terminal device, for example, clicks a play stop button, and then may generate a first play stop instruction, the terminal device may stop sending the segmented text to the server according to the first play stop instruction, and simultaneously the terminal device may send the first play stop instruction to the server, so that the server stops a process of converting the currently executed segmented text into audio, and certainly, the server may also determine whether a process of converting the currently executed segmented text into audio exists, if so, stop the process of converting the segmented text into audio, and if there is a process of sending the audio of the segmented text to the smart speaker, also stop a process of sending the audio of the segmented text, and in addition, the server may also send the first play stop instruction to the smart speaker, so that the smart sound box stops playing the audio according to the first stop playing instruction.

In another alternative embodiment, the process of controlling the smart sound box to stop playing may include:

s221, receiving a second playing stopping instruction sent by the server, wherein the second playing stopping instruction is generated by the intelligent sound box according to the playing stopping operation of the user on the intelligent sound box and is sent to the server;

s222, stopping sending the segmented text to the server according to the second playing instruction.

In this embodiment, a user may control the smart speaker to stop playing on the smart speaker, and a specific user performs a play stop operation on the smart speaker, for example, clicks a play stop button, and then may generate a second play stop instruction, the smart speaker may stop playing audio according to the first play stop instruction, and at the same time, the smart speaker may further send the second play stop instruction to the server, so that the server stops the process of converting the currently executed segmented text into audio, and certainly, the server may also determine whether the currently executed segmented text into audio exists, and if so, stops the process of converting the segmented text into audio, and if so, the server also stops the process of sending the audio of the segmented text to the smart speaker, and in addition, the server may also send the second play stop instruction to the terminal device, and the terminal equipment stops sending the segmented text to the server according to the second playing stopping instruction. In another embodiment, the operation of stopping playing may be turning off the power supply of the smart speaker, the server may detect whether the smart speaker is working normally in real time or at a certain interval, and when it is detected that the smart speaker is not working or working normally, stop the process of converting the current segmented text into audio and the process of sending the audio of the segmented text, similarly, it may also be determined whether the process of converting the currently executed segmented text into audio and/or the process of sending the audio of the segmented text exists at the server at first, and in addition, the server may also control the terminal device to stop sending the segmented text to the server.

According to the control method of the intelligent sound box, the text to be played is segmented according to the preset text length through the terminal equipment; and sequentially sending the segmented texts to a server, so that the server converts the received segmented texts into audio and sends the audio to an intelligent sound box for playing. In the embodiment, by segmenting the text to be played, the audio playing can be performed after each segment of segmented text is converted into the audio, and the text to be played can be played without waiting for the whole text to be played to be converted into the audio through an intelligent sound box, so that the response speed is improved, and the text conversion audio can be conveniently stopped at any time and the playing can be stopped at any time; the process of converting the text into the audio is completed by the server, so that the system resource consumption and the electric quantity consumption of the terminal equipment are reduced; and the intelligent sound box plays the audio of the segmented text, so that the audio channel of the terminal equipment is not occupied, and the terminal equipment is not influenced to play other audio.

An embodiment of the present application provides a method for controlling an intelligent sound box, and fig. 3 is a flowchart of the method for controlling the intelligent sound box according to the embodiment of the present invention. The execution main body may be a server, as shown in fig. 3, the method for controlling the smart sound box specifically includes the following steps:

s301, receiving segmented texts sequentially sent by terminal equipment, wherein the segmented texts are obtained by segmenting texts to be played by the terminal equipment according to preset text lengths;

and S302, converting the received segmented text into audio, and sequentially sending the audio to the intelligent sound box for playing.

This embodiment is a method embodiment on the server side corresponding to the control method embodiment of the smart speaker on the terminal device side, and the principle and technical effects thereof can be referred to the above embodiments, which are not described herein again.

Further, the receiving of the segmented texts sequentially sent by the terminal device in S301 may specifically include:

s3011, receiving a current segmented text sent by the terminal equipment;

s3012, after converting the current segmented text into audio and sending the audio to the intelligent sound box for playing, obtaining the playing progress of the intelligent sound box on the current segmented text from the intelligent sound box;

s3013, the playing progress is sent to the terminal device, so that the terminal device sends the next segment of segmented text to the server according to the playing progress.

In this embodiment, the step of obtaining the playing progress of the smart sound box on the current segmented text from the smart sound box may specifically be that the server sends a request for querying the playing progress to the smart sound box, the smart sound box returns the playing progress of the current segmented text according to the query request, or the smart sound box actively sends the playing progress of the current segmented text to the server.

On the basis of the above embodiment, the user can also control the smart sound box to stop playing on the terminal device or the smart sound box.

s411, receiving a first play stopping instruction sent by the terminal equipment, wherein the first play stopping instruction is generated by the terminal equipment according to a play stopping operation of a user on the terminal equipment;

s412, stopping the process of converting the segmented text into audio according to the first playing stopping instruction, and sending the first playing stopping instruction to the intelligent sound box, so that the intelligent sound box stops playing according to the first playing stopping instruction.

The principles and technical effects of S411 to S412 in this embodiment can be seen in S211 to S212 described above, and are not described herein again.

s421, receiving a second playing stopping instruction sent by the intelligent sound box, wherein the second playing stopping instruction is generated by the intelligent sound box according to the playing stopping operation of the user on the intelligent sound box;

s422, stopping the process of converting the segmented text into the audio according to the second playing stopping instruction, and sending the second playing stopping instruction to the terminal equipment, so that the terminal equipment stops sending the segmented text to the server according to the second playing instruction.

The principle and technical effects of S421 to S422 in this embodiment can be referred to as S221 to S222, which are not described herein again.

An embodiment of the present application provides a method for controlling an intelligent sound box, and fig. 4 is a flowchart of the method for controlling the intelligent sound box according to the embodiment of the present invention. The execution main body can be an intelligent sound box, as shown in fig. 4, the control method of the intelligent sound box specifically comprises the following steps:

s501, receiving audio of a segmented text sequentially sent by a server, wherein the audio of the segmented text is obtained by segmenting a text to be played by the terminal equipment according to a preset text length and sequentially sending the segmented text to the server, and the server converts the received segmented text into audio;

and S502, sequentially playing the audio of the segmented text.

This embodiment is a method embodiment of the intelligent sound box side corresponding to the control method embodiment of the intelligent sound box of the terminal device side, and the principle and technical effects thereof can be referred to the above embodiment, which is not described herein again.

On the basis of the above embodiment, the method further includes sending the playing progress of the audio of the currently segmented text to the server.

In this embodiment, the smart sound box may actively send the playing progress of the current segmented text to the server, or return the playing progress of the current segmented text after receiving a playing progress query request sent by the server.

s611, receiving a first play stopping instruction sent by the server, wherein the first play stopping instruction is generated by the terminal device according to a play stopping operation of a user on the terminal device and is sent to the server;

and S612, stopping playing according to the first playing stopping instruction.

The principles and technical effects of S611 to S612 in this embodiment can be seen in S211 to S212, which are not described herein again.

s621, generating a second playing stopping instruction according to the playing stopping operation of the user on the intelligent sound box;

s621, according to the second playing stopping instruction, stopping playing the audio, and sending the second playing stopping instruction to the server, so that the server stops the process of converting the segmented text into the audio, and controls the terminal device to stop sending the segmented text to the server.

The principle and technical effects of S621-S622 in this embodiment can be seen in the above S221-S222, which are not described herein again.

An embodiment of the present application provides a control device of an intelligent sound box, and fig. 5 is a structural diagram of the control device of the intelligent sound box provided in the embodiment of the present invention, and is applied to a terminal device. As shown in fig. 5, the control device 710 of the smart speaker specifically includes: a processing module 711 and a sending module 712.

The processing module 711 is configured to segment a text to be played according to a preset text length;

and the sending module 712 is configured to send the segmented texts to the server in sequence, so that the server converts the received segmented texts into audio, and sends the audio to the smart sound box for playing.

On the basis of the foregoing embodiment, the sending module 712 is configured to:

On the basis of the foregoing embodiment, optionally, the apparatus 710 further includes a control module 713, configured to:

The control device of the smart sound box provided in this embodiment may be specifically configured to execute the embodiment of the control method of the smart sound box on the terminal device side provided in fig. 2, and specific functions are not described herein again.

According to the control device of the intelligent sound box provided by the embodiment, the text to be played is segmented according to the preset text length through the terminal equipment; and sequentially sending the segmented texts to a server, so that the server converts the received segmented texts into audio and sends the audio to an intelligent sound box for playing. In the embodiment, by segmenting the text to be played, the audio playing can be performed after each segment of segmented text is converted into the audio, and the text to be played can be played without waiting for the whole text to be played to be converted into the audio through an intelligent sound box, so that the response speed is improved, and the text conversion audio can be conveniently stopped at any time and the playing can be stopped at any time; the process of converting the text into the audio is completed by the server, so that the system resource consumption and the electric quantity consumption of the terminal equipment are reduced; and the intelligent sound box plays the audio of the segmented text, so that the audio channel of the terminal equipment is not occupied, and the terminal equipment is not influenced to play other audio.

An embodiment of the present application provides a control device of an intelligent sound box, and fig. 6 is a structural diagram of the control device of the intelligent sound box provided in the embodiment of the present invention, and is applied to a server. As shown in fig. 6, the control device 720 of the smart speaker specifically includes: a receiving module 721, a processing module 722, and a sending module 723.

The receiving module 721 is configured to receive segmented texts sequentially sent by a terminal device, where the segmented texts are obtained by segmenting a text to be played by the terminal device according to a preset text length;

a processing module 722 for converting the received segmented text into audio;

and the sending module 723 is configured to send the audio of the segmented text to the smart sound boxes in sequence for playing.

On the basis of the foregoing embodiment, the receiving module 721 is configured to receive the current segmented text sent by the terminal device; after converting the current segmented text into audio and sending the audio to the intelligent sound box for playing, acquiring the playing progress of the intelligent sound box on the current segmented text from the intelligent sound box;

the sending module 723 is further configured to send the playing progress to the terminal device, so that the terminal device sends a next segment of segmented text to the server according to the playing progress.

On the basis of the foregoing embodiment, optionally, the receiving module 721 is further configured to receive a first play stopping instruction sent by the terminal device, where the first play stopping instruction is generated by the terminal device according to a play stopping operation of a user on the terminal device;

the processing module 722 is further configured to stop the process of converting the segmented text into audio according to the first stop playing instruction;

the sending module 723 is further configured to send the first playing stopping instruction to the smart sound box, so that the smart sound box stops playing according to the first playing stopping instruction.

On the basis of the foregoing embodiment, optionally, the receiving module 721 is further configured to receive a second playing stopping instruction sent by the smart sound box, where the second playing stopping instruction is generated by the smart sound box according to a playing stopping operation of a user on the smart sound box;

the processing module 722 is further configured to stop the process of converting the segmented text into audio according to the second stop playing instruction;

the sending module 723 is further configured to send the second play stop instruction to the terminal device, so that the terminal device stops sending the segmented text to the server according to the second play instruction.

The control device of the smart speaker provided in this embodiment may be specifically configured to execute the embodiment of the control method of the smart speaker on the service end side provided in fig. 3, and specific functions are not described herein again.

An embodiment of the present application provides a control device of an intelligent sound box, and fig. 7 is a structural diagram of the control device of the intelligent sound box provided in the embodiment of the present invention, and is applied to the intelligent sound box. As shown in fig. 7, the control device 730 of the smart speaker specifically includes: a receiving module 731, and a playing module 732.

The receiving module 731 is configured to receive audio of segmented texts sequentially sent by a server, where the audio of the segmented texts is obtained by segmenting a text to be played by the terminal device according to a preset text length and is sequentially sent to the server, and the server converts the received segmented text into audio;

a playing module 732, configured to play the audio of the segmented text in sequence.

On the basis of the above embodiment, the apparatus 730 further includes:

the sending module 734 is configured to send the playing progress of the audio of the current segmented text to the server.

On the basis of the foregoing embodiment, optionally, the receiving module 731 is further configured to receive a first play stopping instruction sent by the server, where the first play stopping instruction is generated by the terminal device according to a play stopping operation of a user on the terminal device and is sent to the server;

the playing module 732 is further configured to stop playing according to the first instruction to stop playing.

On the basis of the foregoing embodiment, optionally, the apparatus 730 further includes:

the control module 733, configured to generate a second play stop instruction according to a play stop operation of a user on the smart sound box;

the playing module 732 is further configured to stop playing the audio according to the second instruction to stop playing;

a sending module 734, configured to send the second play stop instruction to the server, so that the server stops a process of converting the segmented text into audio, and controls the terminal device to stop sending the segmented text to the server.

The control device of the smart sound box provided in this embodiment may be specifically configured to execute the embodiment of the control method of the smart sound box on the smart sound box side provided in fig. 4, and specific functions are not described herein again.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided. As shown in fig. 8, the electronic device is a block diagram of an electronic device according to the method for controlling a smart speaker on a terminal device side in the embodiment of the present application. As shown in fig. 8, the electronic apparatus includes: one or more processors 811, a memory 812, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The electronic device may specifically include, but is not limited to, an electronic device such as a mobile phone, a tablet computer, a notebook computer, a desktop computer, and a smart wearable device.

The electronic device of the control method of the smart speaker may further include: an input device 813 and an output device 814. The processor 811, the memory 812, the input device 813, and the output device 814 may be connected by a bus or other means, and fig. 8 illustrates the connection by a bus as an example.

Memory 812 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the control method of the smart sound box provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the control method of the smart speaker provided by the present application.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided. Fig. 9 is a block diagram of an electronic device according to an embodiment of the present application, illustrating a method for controlling a smart speaker on a server side. As shown in fig. 9, the electronic apparatus includes: one or more processors 821, memory 822, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The electronic device may specifically be a server.

The electronic device of the control method of the smart speaker may further include: an input device 823 and an output device 824. The processor 821, the memory 822, the input device 823 and the output device 824 may be connected by a bus or other means, and the bus connection is exemplified in fig. 9.

Memory 822 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the control method of the smart sound box provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the control method of the smart speaker provided by the present application.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided. Fig. 10 is a block diagram of an electronic device according to the method for controlling a smart speaker on a smart speaker side according to the embodiment of the present application. As shown in fig. 10, the electronic apparatus includes: one or more processors 831, memory 832, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The electronic device may specifically include a smart speaker.

The electronic device of the control method of the smart speaker may further include: an input device 833 and an output device 834. The processor 831, the memory 832, the input device 833 and the output device 834 may be connected by a bus or other means, and in fig. 10, the bus connection is taken as an example.

Memory 832 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the control method of the smart sound box provided by the application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the control method of the smart speaker provided by the present application.

The above-described electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

The electronic device includes: one or more processors, memory, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). One processor is exemplified in fig. 6 and 7.

The memory is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the above-described methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the above-described method provided by the present application.

The memory, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (for example, the obtaining module 401, the extracting module 402, and the identifying module 403 shown in fig. 4) corresponding to the control method of the smart speaker in the embodiment of the present application, and further such as program instructions/modules (for example, the obtaining module 501 and the training module 502 shown in fig. 5) corresponding to the control method of the smart speaker in the embodiment of the present application. The processor executes various functional applications of the server and data processing by executing non-transitory software programs, instructions, and modules stored in the memory, that is, implements the method in the above-described method embodiments.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by use of the electronic device, and the like. Further, the memory may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and these remote memories may be connected to the electronic device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

The application also provides a computer program, which comprises a program code, and when the computer runs the computer program, the program code executes the control method of the intelligent sound box on the terminal equipment side according to the embodiment.

The application further provides a computer program, which includes a program code, and when the computer runs the computer program, the program code executes the control method of the smart speaker on the service end side according to the embodiment.

The present application further provides a computer program, which includes a program code, and when the computer runs the computer program, the program code executes the method for controlling an intelligent sound box on an intelligent sound box side according to the above embodiment.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A control method of an intelligent sound box is applied to terminal equipment, and the method comprises the following steps:

segmenting a text to be played according to a preset text length;

2. The method of claim 1, wherein the sending the segmented texts to the server side sequentially comprises:

3. The method according to claim 2, wherein the sequentially sending the segmented texts to the server according to the playing progress of the smart sound box comprises:

4. The method according to any one of claims 1-3, further comprising:

5. The method according to any one of claims 1-3, further comprising:

6. A control method of an intelligent sound box is applied to a server side, and comprises the following steps:

7. The method of claim 6, wherein the receiving of the segmented texts sequentially transmitted by the terminal device comprises:

receiving a current segmented text sent by the terminal equipment;

8. The method of claim 6 or 7, further comprising:

9. The method of claim 6 or 7, further comprising:

10. A control method of a smart sound box is applied to the smart sound box, and the method comprises the following steps:

and sequentially playing the audio of the segmented text.

11. The method of claim 10, further comprising:

12. The method of claim 10 or 11, further comprising:

and stopping playing according to the first playing stopping instruction.

13. The method of claim 10 or 11, further comprising:

14. The utility model provides a controlling means of intelligence audio amplifier which characterized in that is applied to terminal equipment, the device includes:

15. The apparatus of claim 14, wherein the sending module is configured to:

16. The apparatus of claim 15, wherein the sending module is configured to:

17. The apparatus of any one of claims 14-16, further comprising a control module to:

18. The apparatus of any one of claims 14-16, further comprising a control module to:

19. The utility model provides a controlling means of intelligence audio amplifier which characterized in that is applied to the server side, the device includes:

20. The apparatus of claim 19,

the receiving module is used for receiving the current segmented text sent by the terminal equipment; after converting the current segmented text into audio and sending the audio to the intelligent sound box for playing, acquiring the playing progress of the intelligent sound box on the current segmented text from the intelligent sound box;

21. The apparatus of claim 19 or 20,

the receiving module is further configured to receive a first play stopping instruction sent by the terminal device, where the first play stopping instruction is generated by the terminal device according to a play stopping operation of a user on the terminal device;

22. The apparatus of claim 19 or 20,

the receiving module is further configured to receive a second play stopping instruction sent by the smart sound box, where the second play stopping instruction is generated by the smart sound box according to a play stopping operation of a user on the smart sound box;

23. The utility model provides a controlling means of intelligence audio amplifier which characterized in that is applied to intelligence audio amplifier, the device includes:

24. The apparatus of claim 23, further comprising:

25. The apparatus of claim 23 or 24,

the receiving module is further configured to receive a first play stopping instruction sent by the server, where the first play stopping instruction is generated by the terminal device according to a play stopping operation of a user on the terminal device and is sent to the server;

26. The apparatus of claim 23 or 24, further comprising:

27. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

28. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 6-9.

29. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 10-13.

30. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

31. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 6-9.

32. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 10-13.

33. The utility model provides a control system of intelligence audio amplifier which characterized in that includes: