CN113556649B

CN113556649B - Broadcasting control method and device of intelligent sound box

Info

Publication number: CN113556649B
Application number: CN202010329324.5A
Authority: CN
Inventors: 范冰冰
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd; Shanghai Xiaodu Technology Co Ltd
Priority date: 2020-04-23
Filing date: 2020-04-23
Publication date: 2023-08-04
Anticipated expiration: 2040-04-23
Also published as: CN113556649A

Abstract

The application discloses a broadcasting control method and device of an intelligent sound box, and relates to the field of artificial intelligence. The specific implementation scheme is as follows: acquiring an image of a preset area; judging whether the image contains the head portrait of the user or not; and if the head portrait of the user is contained, controlling the intelligent sound box to play the voice. Therefore, when the user is determined to intend to use the intelligent sound box, voice broadcasting is performed, and when the voice broadcasting is prevented from being triggered by mistake, the user is disturbed, the voice broadcasting is performed in proper practice, and accordingly the service quality of the intelligent sound box is improved.

Description

Broadcasting control method and device of intelligent sound box

Technical Field

The application relates to the technical field of artificial intelligence in the technical field of image processing, in particular to a broadcasting control method and device of an intelligent sound box.

Background

Text-to-speech (TextToSpeech, TTS) technology, which is part of a human-machine conversation, is widely used in smart speakers.

In the related art, when the intelligent sound box is started, related voice broadcasting is performed, however, the start of the intelligent sound box may deviate from false triggering of the real intention of the user, for example, for a small intelligent sound box, as long as the user relates to a "small" keyword in chat, the small intelligent sound box is started, and the false triggering start triggers corresponding voice broadcasting, so that the voice broadcasting disturbs the user and affects the service quality of the intelligent sound box.

Disclosure of Invention

The utility model provides a broadcast control method and device for determining that a user intends to use an intelligent sound box, and broadcasting the voice, and avoiding disturbing the user when the voice broadcast is triggered by mistake.

According to a first aspect, there is provided a broadcast control method of an intelligent sound box, including: acquiring an image of a preset area; judging whether the image contains a head portrait of a user or not; and if the head portrait of the user is included, controlling the intelligent sound box to carry out voice broadcasting.

According to a second aspect, there is provided a broadcast control device of an intelligent sound box, including: the acquisition module is used for acquiring an image of a preset area; the judging module is used for judging whether the image contains the head portrait of the user or not; and the control module is used for controlling the intelligent sound box to carry out voice broadcasting when the head portrait of the user is included.

According to a third aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, where the instructions are executed by the at least one processor, so that the at least one processor can execute the method for controlling the broadcasting of the intelligent sound box according to the above embodiment.

According to a fourth aspect, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the broadcast control method of the smart speaker described in the above embodiments.

According to a fifth aspect, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method for controlling the playback of a smart speaker described in the above embodiments.

The technical scheme provided by the embodiment of the application has at least the following beneficial technical effects:

when the image of the preset area is detected to contain the portrait, the intelligent sound box is controlled to carry out voice broadcasting, therefore, when the user is determined to use the intelligent sound box, the voice broadcasting is carried out, the user is prevented from being disturbed when the voice broadcasting is triggered by mistake, the voice broadcasting is carried out in proper practice, and the service quality of the intelligent sound box is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

fig. 1 is a flow chart of a method for controlling broadcasting of an intelligent sound box according to a first embodiment of the present application;

fig. 2 is a flow chart of a method for controlling broadcasting of a smart speaker according to a second embodiment of the present application;

fig. 3 is a flowchart of a method for controlling a broadcast of a smart speaker according to a third embodiment of the present application;

fig. 4 is a schematic diagram of a display interface of a smart speaker according to a fourth embodiment of the present application;

fig. 5 is a schematic diagram of a broadcast control scenario of a smart speaker according to a fifth embodiment of the present application;

fig. 6 is a schematic structural diagram of a broadcast control device of a smart speaker according to a sixth embodiment of the present application;

fig. 7 is a schematic structural diagram of a broadcast control device of a smart speaker according to a seventh embodiment of the present application;

fig. 8 is a schematic structural diagram of a broadcast control device of a smart speaker according to an eighth embodiment of the present application;

fig. 9 is a block diagram of an electronic device for implementing a method for controlling a broadcast of a smart speaker according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

When the intelligent sound box is started, the voice broadcasting is conducted, so that the problem that a user can be disturbed by the voice broadcasting is caused.

Specifically, fig. 1 is a flowchart of a method for controlling broadcasting of an intelligent sound box according to an embodiment of the present application, as shown in fig. 1, the method includes:

step 101, acquiring an image of a preset area.

The preset area may be an area with an included angle with the screen being a preset range for the screen of the intelligent sound box, or may be an area where a user is usually located when using the intelligent sound box, etc., where the area where the user is usually located may count a position where the user uses the history of the intelligent sound box in a preset time period, and when the number of times of the position where the history is located is greater than a preset threshold, the position where the history is considered to be the preset area.

Specifically, an image of the preset area is acquired, so that whether the user intends to use the intelligent sound box is determined according to image content contained in the image. In practical application, the image of the preset area can be obtained according to the camera equipment in the intelligent sound box, and the intelligent equipment can be controlled to be acquired by controlling other equipment after being connected with other equipment with cameras in the home.

Step 102, it is determined whether the image includes a head portrait of the user.

Specifically, whether the image contains the head portrait of the user can be judged by extracting the image characteristics of the image and identifying whether the image characteristics contain the face characteristics; contour information in the image can be identified, and whether the image contains the head portrait of the user can be judged according to whether the extracted contour information contains the contour of the human face.

In actual implementation, the user's head portrait may be any user's head portrait, or may be a preset head portrait of a specific user.

And step 103, if the head portrait of the user is included, controlling the intelligent sound box to play the voice.

It can be appreciated that, since the preset area generally corresponds to an area where the user uses the smart speaker, when the preset area includes the head portrait of the user, the user is considered to intend to use the smart speaker, thereby controlling the smart speaker to play the voice. Of course, when the image does not include the head portrait of the user, in order to further improve the service quality, it may also be determined whether the timing of actively performing the voice broadcast is satisfied.

In one embodiment of the present application, considering that in practical application, if the intelligent sound box is in a screen saver state, it is considered that the user has used the intelligent sound box in the near future, so, in order to avoid the user from directly entering the sleep state without knowing, in this embodiment, whether the intelligent sound box is in the screen saver state may also be detected, for example, whether the current display image of the intelligent sound box is a preset screen saver image is detected, for example, a current running program interface of the intelligent sound box is detected, and whether to run the screen saver application at present is determined according to the interface.

Further, if the intelligent sound box is in the screen protection state, voice broadcasting is performed to inform the user when the screen protection state exits. Such as informing the user of "i sleep a owner".

If the intelligent sound box is not in the screen protection state, the intelligent sound box is determined to not meet the condition of active voice broadcasting, and accordingly voice broadcasting is not conducted.

In another embodiment of the present application, even if no portrait of the user is acquired in the preset area, the user may actively turn on the smart speaker, and in this case, the condition of voice broadcasting is obviously satisfied.

Specifically, the starting state of the intelligent sound box is detected, for example, the starting state can be determined according to the current upper layer interface calling condition of the intelligent sound box, or the starting state can be determined according to the triggered object, if the starting state is an active starting state, for example, the object called by the upper layer interface is a starting key, or the starting state is a power supply access state, for example, the starting state is that a restarting key is triggered, and the like, the intelligent sound box is controlled to perform voice broadcasting, and if the starting state is a passive starting state, for example, an automatic starting-up state, or a silent restarting state, the intelligent sound box is controlled not to perform voice broadcasting.

Therefore, when the intelligent sound box in the embodiment plays, the real intention of the user is considered from multiple dimensions, whether the voice broadcast is performed or not is determined according to the real intention of the user, and in order to enable those skilled in the art to understand more clearly, the description is made below in connection with a specific application scenario.

In this scenario, the image of the preset area is collected by the camera in the intelligent sound box, as shown in fig. 2, it can be detected whether the camera of the intelligent sound box supports face detection, if not, voice broadcasting is not performed, where a message for reminding the intelligent sound box to perform version upgrading can also be sent to the user, so that the implementation of face detection can be performed subsequently.

If yes, judging whether the camera is opened, if yes, acquiring and detecting whether the image of the preset area contains the human face, if yes, performing voice broadcasting, if not, detecting whether the intelligent sound box is in a screen protection state, and if yes, performing voice broadcasting after the screen protection state exits, wherein the screen protection state exits can be automatically exited after the preset time is reached, or the user triggers the corresponding control to exit. If the intelligent sound box is not in the screen protection state, waiting to identify the portrait or not, and not broadcasting.

If the camera of the intelligent sound box is not opened, detecting the starting state of the current intelligent sound box, if the starting state is an active starting state, controlling the intelligent sound box to carry out voice broadcasting, and if the starting state is a passive starting state, controlling the intelligent sound box not to carry out voice broadcasting.

In summary, according to the broadcasting control method of the intelligent sound box, when the situation that the image of the preset area contains the portrait is detected, the intelligent sound box is controlled to conduct voice broadcasting, therefore, when the fact that the user intends to use the intelligent sound box is determined, the voice broadcasting is conducted, when the voice broadcasting is prevented from being triggered by mistake, the user is disturbed, the fact that voice broadcasting is conducted in proper practice is guaranteed, and accordingly the service quality of the intelligent sound box is improved.

Based on the above embodiment, in order to further improve the service quality of the intelligent sound box, the voice broadcasting mode can be flexibly determined according to the current scene requirement.

In one embodiment of the present application, as shown in fig. 3, the step 103 includes:

step 201, age information and gender information of a user are obtained according to the head portrait of the user.

It is easy to understand that the user's avatar reflects the age information and sex information of the user, and thus the age information and sex information of the user can be acquired according to the user's avatar.

It should be noted that, in different application scenarios, different ways may be adopted to obtain age information and gender information of the user according to the head portraits of the user:

in an embodiment of the present application, portrait features corresponding to different age information and gender information may be pre-constructed, for example, portrait features corresponding to females include long hair features, makeup features, and the like, portrait features corresponding to males include beard features, and the like, and further, head portrait features of a head portrait of a user are extracted, compared with the pre-constructed head portrait features, and age information and gender information of the user are determined according to the comparison result.

In another embodiment of the present application, a deep learning model may be obtained in advance according to a large number of sample images, and the input of the deep learning model is an image including a portrait region and the output is age information and gender information, so that the image may be input into the deep learning model to obtain corresponding age information and gender information.

And 202, acquiring broadcasting recommended content according to age information and gender information, and performing voice broadcasting according to the broadcasting recommended content.

Specifically, since the content of interest of different ages and sexes is different, for example, middle-aged men are generally interested in financial information and the like, the broadcasting recommended content is obtained according to the age information and the sex information, for example, a preset corresponding relation can be queried, the broadcasting recommended content corresponding to the age information and the sex and the like can be determined, and further, voice broadcasting is performed according to the broadcasting recommended content.

Of course, in order to further avoid disturbing the user, the broadcasting time may be selected when the user performs voice broadcasting, in some possible examples, the user may perform voice broadcasting at a suitable time, for example, the historical use time of the sound box used by the user is counted according to the portrait information of the user, the current time is obtained, and the voice broadcasting is performed when the current time matches with the historical use time;

in other possible examples, broadcast time history data of a large number of users corresponding to broadcast recommended content may be determined, time most frequently broadcast by the users is determined according to the broadcast time history data, current time is obtained, and voice broadcast is performed when the current time matches with the time most frequently broadcast by the users.

In another embodiment of the present application, as shown in fig. 4, the smart speaker has a display screen, so before the smart speaker performs voice broadcasting, the recommended function card can be displayed on the display screen, where the function card corresponds to the content of the voice broadcasting, and further, when the voice broadcasting is performed, the user can also intuitively see the corresponding content of the voice broadcasting.

In still another embodiment of the present application, in order to further improve the use experience of the user, the age information of the user may be determined according to the portrait information of the user, and the preset database may be queried according to the age information of the user, so as to obtain the matched broadcast sound information and volume information, for example, as shown in fig. 5, when the age of the user is 10 years old, the voice broadcast of the volume in the cartoon sound is adopted, and when the age of the user is 60 years old, the voice broadcast of the broadcast cavity sound is adopted.

In summary, the broadcasting control method of the intelligent sound box can flexibly select a voice broadcasting mode according to an actual scene, and further improves the service quality of the intelligent sound box.

In order to realize the above-mentioned embodiment, the application still provides a report controlling means of intelligent audio amplifier, and fig. 6 is the schematic structural diagram of report controlling means of intelligent audio amplifier according to an embodiment of the application, as shown in fig. 6, and this report controlling means of intelligent audio amplifier includes: an acquisition module 61, a judgment module 62, a control module 63, wherein,

an acquisition module 61, configured to acquire an image of a preset area;

specifically, the acquiring module 61 acquires an image of a preset area, so as to determine whether the user intends to use the smart speaker according to the content of the image included in the image. In practical application, the image of the preset area can be obtained according to the camera equipment in the intelligent sound box, and the intelligent equipment can be controlled to be acquired by controlling other equipment after being connected with other equipment with cameras in the home.

A judging module 62, configured to judge whether the image includes a head portrait of the user;

specifically, the judging module 62 may judge whether the image includes the head portrait of the user by extracting the image features of the image and identifying whether the image features include the face features; contour information in the image can be identified, and whether the image contains the head portrait of the user can be judged according to whether the extracted contour information contains the contour of the human face.

And the control module 63 is used for controlling the intelligent sound box to play voice when the head portrait of the user is included.

Specifically, as shown in fig. 7, on the basis of the one shown in fig. 6, the apparatus further includes: a first detection module 64, wherein,

a first detection module 64, configured to detect whether the smart speaker is in a screen saver state when the image does not include the head portrait of the user;

in the present embodiment, the control module 63 is further configured to:

and when the intelligent sound box is in the screen protection state, voice broadcasting is performed when the screen protection state exits.

In one embodiment of the present application, even if no portrait of the user is acquired in the preset area, the user may actively turn on the smart speaker, and in this case, the condition of voice broadcasting is obviously satisfied.

As shown in fig. 8, the apparatus further includes, on the basis of that shown in fig. 6: a second detection module 65, wherein,

the second detection module 65 is configured to detect a start state of the smart sound box;

the control module 63 is further configured to:

when the starting state is the active starting state, controlling the intelligent sound box to play voice;

and when the starting state is a passive starting state, controlling the intelligent sound box not to play voice.

Specifically, the second detection module 65 detects the start state of the intelligent sound box, for example, the second detection module 65 may determine the start state according to the current upper layer interface call condition of the intelligent sound box, or may determine the start state according to the triggered object, if the start state is an active start state, for example, the upper layer interface call object is a start button, or the start state is a power supply access state, for example, the start state is a restart button, which is triggered, etc., then the control module 63 controls the intelligent sound box to perform voice broadcast, and if the start state is a passive start state, for example, an automatic on-off state, or a silence restart state, then the intelligent sound box is controlled not to perform voice broadcast.

It should be noted that, the foregoing explanation of the method for controlling the broadcasting of the intelligent sound box also makes the implementation principle of the broadcasting control device applicable to the intelligent sound box of the embodiment similar, and is not repeated herein.

To sum up, the report controlling means of intelligent audio amplifier of this application embodiment, when including the portrait in the image that detects the default area, just control intelligent audio amplifier and carry out voice broadcast, from this, when confirming that the user intends to use intelligent audio amplifier, just carry out voice broadcast, when avoiding voice broadcast to be triggered by mistake, cause the interference to the user, guaranteed carrying out voice broadcast in suitable reality to intelligent audio amplifier's quality of service has been promoted.

In one embodiment of the present application, the control module 63 is specifically configured to:

acquiring age information and sex information of a user according to the head portrait of the user; and

and acquiring broadcasting recommended content according to the age information and the sex information, and performing voice broadcasting according to the broadcasting recommended content.

In one embodiment of the present application, the smart speaker has a display screen, and the control module 63 is further configured to:

and displaying the recommended function card on the display screen, wherein the function card corresponds to the voice broadcasting content.

In sum, the intelligent sound box broadcasting control device can flexibly select a voice broadcasting mode according to actual scenes, and further improves the service quality of the intelligent sound box.

According to embodiments of the present application, an electronic device and a readable storage medium are also provided.

Fig. 9 is a block diagram of an electronic device according to a method for controlling a broadcast of a smart speaker according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 9, the electronic device includes: one or more processors 901, memory 902, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). In fig. 9, a processor 901 is taken as an example.

Memory 902 is a non-transitory computer-readable storage medium provided herein. The memory stores instructions executable by the at least one processor, so that the at least one processor executes the method for controlling the broadcasting of the intelligent sound box. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of controlling the broadcasting of the smart speaker provided by the present application.

The memory 902 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 61, the judgment module 62, and the control module 63 shown in fig. 6) corresponding to the method for controlling the playback of the smart speaker in the embodiments of the present application. The processor 901 executes various functional applications and data processing of the server by running non-transitory software programs, instructions and modules stored in the memory 902, that is, a method for implementing the broadcast control of the smart speaker in the above method embodiment.

The memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the electronic device controlled by the broadcast of the smart speakers, etc. In addition, the memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 902 optionally includes memory remotely located relative to processor 901, which may be connected to the smart speaker's newspaper controlled electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for broadcasting control of the intelligent sound box can further comprise: an input device 903 and an output device 904. The processor 901, memory 902, input devices 903, and output devices 904 may be connected by a bus or other means, for example in fig. 9.

The input device 903 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device for the ticker control of the smart speaker, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a joystick, one or more mouse buttons, a track ball, a joystick, etc. The output means 904 may include a display device, auxiliary lighting means (e.g., LEDs), tactile feedback means (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

The application also provides a computer program product, which comprises a computer program, wherein the computer program realizes the steps of the broadcasting control method of the intelligent sound box when being executed by a processor.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. The broadcasting control method of the intelligent sound box is characterized by comprising the following steps:

when the intelligent sound box is started, acquiring an image of a preset area;

judging whether the image contains a head portrait of a user or not; and

if the head portrait of the user is contained, controlling the intelligent sound box to carry out voice broadcasting;

if the image does not contain the head portrait of the user, detecting whether the intelligent sound box is in a screen protection state;

if the intelligent sound box is in the screen protection state, voice broadcasting is carried out when the screen protection state exits to enter a sleep state;

the controlling the intelligent sound box to play the voice includes:

acquiring age information and sex information of the user according to the head portrait of the user; and

and acquiring broadcasting recommended content according to the age information and the gender information, and performing voice broadcasting according to the broadcasting recommended content.

2. The method for controlling the broadcasting of an intelligent sound box according to claim 1, further comprising:

and acquiring the current time, wherein the broadcasting recommended content is acquired according to the age information, the gender information and the current time.

3. The method for controlling the broadcasting of an intelligent sound box according to claim 1, further comprising:

and if the intelligent sound box is not in the screen protection state, not performing voice broadcasting.

4. The method for controlling the playback of an intelligent sound box according to claim 1, wherein during the start-up of the intelligent sound box, the method further comprises:

detecting the starting state of the intelligent sound box;

if the starting state is an active starting state, controlling the intelligent sound box to carry out voice broadcasting;

and if the starting state is a passive starting state, controlling the intelligent sound box not to play voice.

5. The method for controlling the broadcasting of an intelligent sound box according to claim 1, wherein the intelligent sound box is provided with a display screen, and before the controlling the intelligent sound box to conduct voice broadcasting, the method further comprises:

and displaying a recommended function card on the display screen, wherein the function card corresponds to the voice broadcasting content.

6. The utility model provides a report controlling means of intelligent audio amplifier which characterized in that includes:

the acquisition module is used for acquiring an image of a preset area when the intelligent sound box is started;

the judging module is used for judging whether the image contains the head portrait of the user or not; and

the control module is used for controlling the intelligent sound box to carry out voice broadcasting when the head portrait of the user is included;

the first detection module is used for detecting whether the intelligent sound box is in a screen protection state or not when the head portrait of the user is not contained in the image;

the control module is further configured to:

when the intelligent sound box is in the screen protection state, voice broadcasting is performed when the screen protection state exits to enter a sleep state;

the control module is specifically configured to:

7. The apparatus of claim 6, wherein the apparatus further comprises:

the second detection module is used for detecting the starting state of the intelligent sound box;

the control module is further configured to:

when the starting state is an active starting state, controlling the intelligent sound box to carry out voice broadcasting;

8. The apparatus of claim 6, wherein the smart speaker has a display screen, the control module further to:

9. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of controlling the playback of the smart speaker of any one of claims 1-5.

10. A non-transitory computer-readable storage medium storing computer instructions for causing the computer to execute the method of controlling the broadcasting of the smart speaker according to any one of claims 1 to 5.