CN110858883A

CN110858883A - Intelligent sound box and use method thereof

Info

Publication number: CN110858883A
Application number: CN201810973579.8A
Authority: CN
Inventors: 邱振青; 吴海全; 张恩勤; 曹磊; 师瑞文
Original assignee: Shenzhen Grandsun Electronics Co Ltd
Current assignee: Shenzhen Grandsun Electronics Co Ltd
Priority date: 2018-08-24
Filing date: 2018-08-24
Publication date: 2020-03-03
Also published as: WO2020038494A1

Abstract

The invention is suitable for the technical field of intelligent home furnishing, and provides an intelligent sound box and a method for using the intelligent sound box, wherein the method for using the intelligent sound box comprises the following steps: the method comprises the steps that a microphone array collects voice information, a sound source direction is determined according to the voice information, a control module controls a screen corresponding to the sound source direction to display image information collected by a camera and/or image information received by a wireless communication module according to the sound source direction, and controls an intelligent sound box to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module. The invention can support the use of various application scenes, improves the utilization rate of the intelligent sound box and has stronger usability and practicability.

Description

Intelligent sound box and use method thereof

Technical Field

The invention relates to the technical field of smart homes, in particular to a smart sound box, a method for using the smart sound box and a computer readable storage medium.

Background

With the rise of internet technology, the communication modes among people are greatly enriched, and people in different regions can communicate more conveniently. Among them, the video conference system is an important remote communication technology, and is well received by people due to its advantages of convenience, high efficiency, etc.

However, when the smart speaker devices in the market support video calls, the current video pictures can be generally displayed in only one direction, the requirements of users in a group video conference scene cannot be met, and the utilization rate is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide an intelligent sound box and a method for using the intelligent sound box, which can display image information of a video conference in multiple directions simultaneously, and can improve the utilization rate of the intelligent sound box while facilitating the use of a user.

A first aspect of an embodiment of the present invention provides an intelligent sound box, including:

the system comprises a control module, a microphone array, a wireless communication module, a camera and at least two screens;

the microphone array, the wireless communication module, the camera and the screen are all connected with the control module;

the microphone array is used for acquiring voice information and determining the direction of a sound source according to the voice information;

the control module is used for controlling a screen corresponding to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module according to the sound source direction, and controlling the intelligent sound box to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module.

A second aspect of the embodiments of the present invention provides a method for using a smart speaker, where the smart speaker includes:

the system comprises a control module, a microphone array, a wireless communication module, a camera and at least two screens, wherein the microphone array, the wireless communication module, the camera and the screens are all connected with the control module, and the method comprises the following steps:

the microphone array collects voice information and determines the direction of a sound source according to the voice information;

the control module controls a screen corresponding to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module according to the sound source direction, and controls the intelligent sound box to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module.

A third aspect of an embodiment of the present invention provides a computer-readable storage medium, including: the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of the second aspect mentioned above.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: in this embodiment, the smart speaker includes: the system comprises a control module, a microphone array, a wireless communication module, a camera and at least two screens, wherein the microphone array, the wireless communication module, the camera and the screens are all connected with the control module, and the method comprises the following steps: the microphone array collects voice information, a sound source direction is determined according to the voice information, the control module controls a screen corresponding to the sound source direction to display image information collected by the camera and/or image information received by the wireless communication module according to the sound source direction, and the intelligent sound box is controlled to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module. By the embodiment of the invention, participants in all directions can clearly see the picture of the video conference while hearing the sound, so that the utilization rate of the intelligent sound box is greatly improved, and the intelligent sound box has strong usability and practicability.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic structural diagram of an intelligent sound box according to a first embodiment of the present invention;

fig. 2 is a schematic structural diagram of an intelligent sound box according to a second embodiment of the present invention;

fig. 3 is a schematic flow chart of a method for using the smart sound box according to the third embodiment of the present invention;

fig. 4 is a schematic diagram of a specific implementation process of a method for using an intelligent sound box according to a fourth embodiment of the present invention;

fig. 5 is a schematic diagram of a specific implementation process of a method for using an intelligent sound box according to a fifth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a video conference system according to a sixth embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the specification of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to a determination" or "in response to a detection". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

It should be understood that, the sequence numbers of the steps in this embodiment do not mean the execution sequence, and the execution sequence of each process should be determined by the function and the inherent logic of the process, and should not constitute any limitation on the implementation process of the embodiment of the present invention.

It should be noted that any number of smart speakers may be included in the present invention to enable two or more users to perform a video conference, where the smart speakers include wireless speakers.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Example one

Fig. 1 is a schematic structural diagram of an intelligent sound box according to an embodiment of the present invention, where the intelligent sound box may include:

a control module 11, a microphone array 12, a wireless communication module 13, a camera 14 and a screen 15.

In the embodiment of the present invention, the microphone array 12, the wireless communication module 13, the camera 14, and the screen 15 are all connected to the control module 11.

The microphone array 12 is configured to collect voice information and determine a sound source direction according to the voice information, wherein the sound source direction may be determined based on a positioning algorithm of a time difference of arrival. It should be understood that the microphone array 12 is a system consisting of a number of microphones for sampling and processing the spatial characteristics of the sound field. Optionally, the number of the microphones is 7, and the microphones are arranged in a ring shape.

The wireless communication module 13 is configured to interact with a server, so as to send locally acquired voice information and/or image information to the server, and receive voice information and/or image information acquired by an opposite terminal in the whole video conference process. Optionally, the wireless communication module 13 may include: a WiFi communication sub-module and a Bluetooth communication sub-module. Further, voice information other than the voice information collected by the microphone array 12 and/or image information other than the image information collected by the camera 14 is received by a server. It should be noted that, in consideration of the fact that the smart sound box in the present application is mainly applied to a video conference scene, when a microphone array on the smart sound box is used to collect voice information, the voice information needs to be played through the smart sound box, and a local user can hear the voice information.

The camera 14 is configured to collect image information of a user. It should be noted that the type and number of the cameras 14 can be flexibly selected according to actual situations, including but not limited to a common camera, a 360-degree panoramic camera, or a camera array.

The screen 15 is configured to display image information acquired by the camera 14 and/or image information received by the wireless communication module 13. Optionally, the number of screens is at least 2.

The control module 11 is configured to control, according to the sound source direction, a screen closest to the sound source direction to display image information acquired by the camera 14 and/or image information received by the wireless communication module 13; in addition, the control module 11 is further configured to control the smart speaker to play the voice information collected by the microphone array 12 and/or the voice information received by the wireless communication module 13. Optionally, the control module 11 includes a main control chip, and the main control chip is an APQ8009 chip.

In the embodiment of the invention, the microphone array is used for acquiring voice information, the sound source direction is determined according to the voice information, the control module is used for controlling the screen corresponding to the sound source direction to display the image information acquired by the camera and/or the image information received by the wireless communication module according to the sound source direction, and simultaneously controlling the intelligent sound box to play the voice information acquired by the microphone array and/or the voice information received by the wireless communication module, so that the scene requirement of a multi-person video conference can be met, the practicability of the intelligent sound box is improved, the functions are more complete, and the use by people is more convenient.

Example two

Fig. 2 is a schematic structural diagram of a second embodiment of the present invention, where the smart speaker may include:

a control module 21, a microphone array 22, a wireless communication module 23, a camera 24, a screen 25, a wake-up module 26, an audio processing module 27 and a key module 28.

Wherein, the microphone array 22, the wireless communication module 23, the camera 24 and the screen 25, the wake-up module 26, the audio processing module 27 and the key module 28 are all connected with the control module 21. It should be noted that the control module 21, the microphone array 22, the wireless communication module 23, the camera 24, and the screen 25 are the same as the control module 11, the microphone array 12, the wireless communication module 13, the camera 14, and the screen 15 in the first embodiment, and repeated description is not repeated here.

The awakening module 26 awakens the smart sound box after detecting a preset awakening keyword, so that the smart sound box is in a working state.

The audio processing module 27 includes: digital signal processor, power amplifier and speaker, digital signal processor, power amplifier and speaker all with control module 21 is connected, digital signal processor's output with power amplifier's input is connected, power amplifier's output with the input of speaker is connected. It should be understood that, since the voice information collected by the microphone array 22 and/or the voice information received by the wireless communication module 23 contains much noise, if the voice information is directly played, the final playing effect is affected, so that the user experience is reduced. Optionally, the voice information collected by the microphone array 22 and/or the voice information received by the wireless communication module 23 is processed by a digital signal processing system including the audio processing module 27.

The key module 28 is configured to receive a key instruction of a user, and control, through the control module, adjustment of the volume of the smart sound box.

As can be seen from the above, compared with the first embodiment, the embodiment of the present invention adds the wake-up module, and can wake up the smart speaker to enter the working state after detecting the preset wake-up keyword; moreover, an audio processing module is added, so that the voice played by the intelligent sound box can be better heard; in addition, the button module is additionally arranged, and the volume of the intelligent sound box can be adjusted by combining the control module, so that different requirements of a user under different application scenes are met, the experience of the user is improved, and the intelligent sound box has high usability and practicability.

EXAMPLE III

The flow diagram of the method for using the intelligent sound box provided by the third embodiment of the invention can comprise the following steps:

s301: the microphone array collects voice information and determines the direction of a sound source according to the voice information.

Wherein, the smart sound box includes: control module, microphone array, wireless communication module, camera and two at least screens, the microphone array wireless communication module the camera with the screen all with control module connects.

Optionally, the microphone array is used to collect voice information, process the voice information into voice data, and determine a sound source direction corresponding to the voice information according to the voice data.

S302: the control module controls a screen corresponding to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module according to the sound source direction, and controls the intelligent sound box to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module.

It should be understood that the image displayed by the screen may be only the image information collected by the camera, that is: image information of the own party; it is also possible to use only the image information received by the wireless communication module, that is: image information of the other party; the image information collected by the camera and the image information received by the wireless communication module can be included, that is: and image information of the own party and the other party is displayed simultaneously, and the specifically displayed information can be flexibly set according to actual requirements and the size of a screen. Optionally, the screen displays the image information collected by the camera and the image information received by the wireless communication module at different scales.

It should also be understood that the speech played by the smart sound box may be only the speech information collected by the microphone array, that is: own voice information; it is also possible to only provide the voice information received by the wireless communication module, that is: voice information of the other party; the method can also include the voice information collected by the microphone array and the voice information received by the wireless communication module, namely: and voice information of the own party and the opposite party is displayed simultaneously, and the specifically displayed information can be flexibly set according to actual requirements and the processing effect of the audio processing module. Optionally, the smart speaker plays the voice message received by the wireless communication module.

From the above, in the embodiment of the invention, the microphone array is used for collecting the voice information and determining the direction of the sound source according to the voice information, the control module controls a screen corresponding to the sound source direction to display the image information acquired by the camera and/or the image information received by the wireless communication module according to the sound source direction, and controls the intelligent sound box to play the voice information acquired by the microphone array and/or the voice information received by the wireless communication module, a user in all directions can be enabled to communicate voice in addition to hearing the voice of the person talking to you, and seeing their expression and action, make the people that are in different places just as communicate in same meeting room, can improve the rate of utilization of intelligent audio amplifier when promoting user experience and feel, have stronger ease for use and practicality.

Example four

A schematic diagram of a specific implementation process of the method for using the smart sound box according to the fourth embodiment of the present invention is further detailed and explained in steps S301 and S302 in the third embodiment, and the method may include the following steps:

s401: the microphone array collects voice information.

Step S401 is substantially the same as step S301 in the third embodiment, and is not described again here.

S402: and detecting whether the voice information contains a preset awakening keyword, and if the preset awakening keyword is detected, awakening the intelligent sound box.

The awakening keyword is a word which is defined in advance and used for switching the intelligent sound box from a standby state to a working state. Optionally, the preset wake-up keyword is flexibly set according to the preference of the user.

S403: and after the intelligent sound box is awakened, determining the direction of a sound source according to the voice information.

Step S403 is substantially the same as step S301 in the third embodiment, and is not described again here.

S404: when the determined sound source direction is one, the control module controls a screen closest to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module, and controls the intelligent sound box to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module.

It should be understood that application scenarios contemplated in the present invention include: the one-to-one single video conference mode, the one-to-many group video conference mode and the many-to-many group video conference mode are adopted, so that one or more directions of the sound source are possible, wherein in the embodiment of the invention, only one direction of the sound source is taken as an example for explanation and explanation, and the description about the directions of the plurality of sound sources can be seen in the fifth embodiment.

It should also be understood that, when the sound source direction is one, the screen closest to the sound source direction is controlled to display the image information collected by the camera and/or the image information received by the wireless communication module, so that the user can be ensured to watch a clear video picture to the greatest extent. Wherein, the distance from the sound source to the screen can be converted according to the distance from the sound source to the microphone array.

As can be seen from the above, compared with the third embodiment, the voice awakening step and the step of judging the direction of the sound source are added in the third embodiment of the invention, and the intelligent sound box can be switched from the standby state to the working state in time through the voice awakening step, so that the data processing speed is increased; in addition, only one sound source direction is controlled, the distance between the sound source direction and the nearest screen is controlled, the image information collected by the camera and/or the image information received by the wireless communication module can be acquired, a better watching effect can be obtained, the utilization rate of the intelligent sound box is improved, and the intelligent sound box has stronger usability and practicability.

EXAMPLE five

A detailed implementation process diagram of the method for using the smart sound box provided by the fifth embodiment of the present invention is further detailed and explained with respect to steps S301 and S302 in the third embodiment, and the method may include the following steps:

s501: the microphone array collects voice information.

S502: and detecting whether the voice information contains a preset awakening keyword, and if the preset awakening keyword is detected, awakening the intelligent sound box.

S503: after the intelligent sound box is awakened, the microphone array determines the sound source direction according to the voice information.

The steps S501 to S503 are substantially the same as the steps S401 to S403 in the fourth embodiment, and reference may be made to the related description in the embodiments, which is not repeated herein.

S504: when definite the sound source direction is when a plurality of, control module confirms each sound source direction in the sound source direction and the angle that preset reference direction becomes, when there is the visual angle range that the screen corresponds to contain during the angle, control the screen display the image information that the camera was gathered and/or the image information that wireless communication module received, and control intelligent audio amplifier broadcast the speech information that the microphone array was gathered and/or the speech information that wireless communication module received.

Optionally, the preset reference direction is a reference direction set when the microphone array is installed.

Wherein the viewing angle range refers to the maximum angle range in which a user can clearly observe all contents on the screen from different directions, it should be understood that the viewing angle range is related to the number of screens.

Illustratively, in a specific application scenario, if the smart speaker is equipped with three screens, the viewing angle range corresponding to a first screen is (0, 120 °), the viewing angle range corresponding to a second screen is (120 °, 240 °), and the viewing angle range corresponding to a third screen is (240 °, 360 °), when the control module determines that the angle formed by the sound source direction and a preset reference direction is less than or equal to 120 °, the control module controls the first screen to be in a working state, and displays the image information collected by the camera and/or the image information received by the wireless communication module, when the control module determines that the angle formed by the sound source direction and the preset reference direction falls within a (120 °, 240 ° ] interval, the control module controls the second screen to be in a working state, and displays the image information collected by the camera and/or the image information received by the wireless communication module, and when the control module determines that the sound source direction is located within the (120 °, 240 ° ] interval And when the angle formed by the direction and the preset reference direction is within the (240 degrees, 360 degrees), controlling the third screen to be in a working state, and displaying the image information collected by the camera and/or the image information received by the wireless communication module.

It should be further understood that, in the above application scenario, in order to enable local and remote terminals to perform the same communication and display, while displaying the image information collected by the camera and/or the image information received by the wireless communication module, the control module may further control the smart speaker to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module.

Therefore, compared with the third embodiment, the third embodiment of the invention provides a specific implementation mode when a plurality of sound source directions are provided, and the working state of the screen can be better controlled, so that the utilization rate of the intelligent sound box is improved, and the third embodiment of the invention has stronger usability and practicability.

EXAMPLE six

Fig. 6 is a schematic structural diagram of a video conference system according to a sixth embodiment of the present invention, where the video conference system may include:

the system comprises more than two intelligent sound boxes and servers respectively connected with the at least two intelligent sound boxes, wherein the intelligent sound boxes are described in detail in the first embodiment, and are not described again here.

The following describes a video conference system in an embodiment of the present invention by taking a specific application scenario as an example, where the video conference system shown in fig. 6 includes: first smart speaker 61, second smart speaker 62, and server 63, where first smart speaker 61 is used by a local user and second smart speaker 62 is used by a remote user at an opposite end. It should be noted that, in the present application, the number of local users and the number of remote users are not limited for the moment, and may be one or more, and the specific number may be determined as the case may be. When a local user and an opposite-end user respectively start respective intelligent sound boxes, a first intelligent sound box 61 respectively collects local image information and voice information through a camera and a microphone array which are arranged on the first intelligent sound box 61, and sends the collected image information and voice information to a server through a wireless communication module, when the server receives a request message of a second intelligent sound box 62, the server forwards the image information and voice information sent by the first intelligent sound box 61 to the second intelligent sound box 62, receives the image information and voice information sent by the second intelligent sound box 62, when the server receives the request message of the first intelligent sound box 61, the server forwards the image information and voice information sent by the second intelligent sound box 62 to the first intelligent sound box 61, and when the first intelligent sound box 61 determines the direction of a sound source according to the locally collected voice information, the screen corresponding to the direction of the sound source is controlled to display the image information collected by the local and/or the opposite-end sound box, and controls the smart sound box 61 to play the voice information collected locally and/or at the opposite end, so that the local user can listen to the voice of the opposite side and see the picture containing the image of the opposite side at the same time.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed terminal device and method may be implemented in other ways. For example, the above-described terminal device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical function division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The integrated module, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. An intelligent sound box, comprising:

2. The smart sound box of claim 1, further comprising:

a wake-up module;

the awakening module is connected with the control module;

and the awakening module awakens the intelligent sound box after detecting a preset awakening keyword.

3. The smart sound box according to claim 1 or 2, wherein when the determined sound source direction is one, the control module is specifically configured to control a screen closest to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module, and control the smart sound box to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module.

4. The smart sound box of claim 3, further comprising:

an audio processing module comprising a digital signal processor, a power amplifier and a speaker;

and the digital signal processor, the power amplifier and the loudspeaker are all connected with the control module.

5. The smart sound box according to claim 1 or 2, wherein when the determined sound source directions are multiple, the control module is specifically configured to determine an angle formed by each of the sound source directions and a preset reference direction, and when there is a viewing angle range corresponding to a screen that includes the angle, control the screen to display image information collected by the camera and/or image information received by the wireless communication module, and control the smart sound box to play voice information collected by the microphone array and/or voice information received by the wireless communication module.

6. The smart sound box of claim 1, further comprising:

a key module;

the key module is connected with the control module;

and the control module is used for controlling the adjustment of the volume of the intelligent sound box when the key module receives a key instruction.

7. A method for using a smart sound box, the smart sound box comprising: the system comprises a control module, a microphone array, a wireless communication module, a camera and at least two screens, wherein the microphone array, the wireless communication module, the camera and the screens are all connected with the control module, and the method comprises the following steps:

8. The method according to claim 7, wherein controlling the screen corresponding to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module, and controlling the smart speaker to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module comprises:

when the determined sound source direction is one, the control module controls a screen closest to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module, and controls the intelligent sound box to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module.

9. The method according to claim 7, wherein controlling the screen corresponding to the sound source direction to display the image information collected by the camera and/or the image information received by the wireless communication module, and controlling the smart speaker to play the voice information collected by the microphone array and/or the voice information received by the wireless communication module further comprises:

when definite the sound source direction is when a plurality of, control module confirms each sound source direction in the sound source direction and the angle that preset reference direction becomes, when there is the visual angle range that the screen corresponds to contain during the angle, control the screen display the image information that the camera was gathered and/or the image information that wireless communication module received, and control intelligent audio amplifier broadcast the speech information that the microphone array was gathered and/or the speech information that wireless communication module received.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 7 to 9.

11. A video conferencing system, comprising: at least two smart enclosures as claimed in any one of claims 1 to 6.

12. The video conferencing system of claim 11, wherein the video conferencing system further comprises: and the server is respectively connected with the at least two intelligent sound boxes.