WO2020038494A1

WO2020038494A1 - Intelligent speaker and method for using intelligent speaker

Info

Publication number: WO2020038494A1
Application number: PCT/CN2019/107869
Authority: WO
Inventors: 邱振青; 吴海全; 张恩勤; 曹磊; 师瑞文
Original assignee: 深圳市冠旭电子股份有限公司
Priority date: 2018-08-24
Filing date: 2019-09-25
Publication date: 2020-02-27
Also published as: CN110858883A

Abstract

The present application is applicable to the technical field of intelligent homes, and provides an intelligent speaker and a method for using the intelligent speaker. The method for using the intelligent speaker comprises: a microphone array collecting voice information and determining a sound source direction according to the voice information; and a control module controlling, according to the sound source direction, a screen corresponding to the sound source direction to display image information collected by a camera and/or image information received by a wireless communication module, and controlling an intelligent speaker to play the voice information collected by the microphone array and/or voice information received by the wireless communication module. The present application can be used in multiple application scenarios, thereby increasing a usage rate of the intelligent speaker, and achieving stronger usability and practicability.

Description

Intelligent speaker and method for using intelligent speaker

Technical field

The invention relates to the technical field of smart homes, and in particular, to a smart speaker, a method for using the smart speaker, and a computer-readable storage medium.

Background technique

With the rise of Internet technology, people's communication methods have been greatly enriched, making it easier for people in different regions to communicate. Among them, the video conference system, as an important remote communication technology, has been well received by people because of its convenience and efficiency.

However, when smart speaker devices on the market support video calls, they can generally only display the current video picture in one direction, which cannot meet the needs of users in a group video conference scenario, and the utilization rate is low.

technical problem

In view of this, embodiments of the present invention provide a smart speaker and a method for using the smart speaker, which can simultaneously display image information of a video conference in multiple directions, and can improve the utilization rate of the smart speaker while being convenient for users.

Technical solutions

A first aspect of the embodiments of the present invention provides a smart speaker, including:

A control module, a microphone array, a wireless communication module, a camera, and at least two screens;

The microphone array, the wireless communication module, the camera, and the screen are all connected to the control module;

The microphone array is configured to collect voice information and determine a sound source direction according to the voice information;

The control module is configured to control, according to the direction of the sound source, a screen corresponding to the direction of the sound source to display image information collected by the camera and / or image information received by the wireless communication module, and control the The smart speaker plays voice information collected by the microphone array and / or voice information received by the wireless communication module.

A second aspect of the embodiments of the present invention provides a method for using a smart speaker. The smart speaker includes:

A control module, a microphone array, a wireless communication module, a camera, and at least two screens, and the microphone array, the wireless communication module, the camera, and the screen are all connected to the control module, and the method includes:

Collecting voice information by the microphone array, and determining a sound source direction according to the voice information;

According to the sound source direction, the control module controls a screen corresponding to the sound source direction to display image information collected by the camera and / or image information received by the wireless communication module, and controls the smart speaker to play Voice information collected by the microphone array and / or voice information received by the wireless communication module.

A third aspect of the embodiments of the present invention provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method mentioned in the second aspect is implemented.

Beneficial effect

Compared with the prior art, the embodiment of the present invention has the beneficial effect that, in this embodiment, the smart speaker includes: a control module, a microphone array, a wireless communication module, a camera, and at least two screens, the microphone array, The wireless communication module, the camera, and the screen are all connected to the control module, and the method includes: the microphone array collects voice information, and determines a sound source direction according to the voice information, and the control module is based on The sound source direction, controlling a screen corresponding to the sound source direction to display image information collected by the camera and / or image information received by the wireless communication module, and controlling the smart speaker to play the microphone array acquisition Voice information and / or voice information received by the wireless communication module. Through the embodiments of the present invention, the participants in all directions can clearly see the video conference picture while hearing the sound, which greatly improves the utilization rate of the smart speaker, and has strong ease of use and practicality.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings in the following description are only the present invention. For some embodiments, for those of ordinary skill in the art, other drawings can be obtained according to these drawings without paying creative labor.

1 is a schematic structural diagram of a smart speaker according to a first embodiment of the present invention;

2 is a schematic diagram of a specific structure of a smart speaker provided in Embodiment 2 of the present invention;

3 is a schematic flowchart of a method for using a smart speaker according to a third embodiment of the present invention;

4 is a schematic diagram of a specific implementation process of a method for using a smart speaker according to a fourth embodiment of the present invention;

5 is a schematic diagram of a specific implementation process of a method for using a smart speaker according to a fifth embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a video conference system according to a sixth embodiment of the present invention.

Embodiments of the invention

In the following description, for the purpose of illustration rather than limitation, specific details such as a specific system structure and technology are provided in order to thoroughly understand the embodiments of the present invention. However, it should be clear to a person skilled in the art that the present invention can also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary details.

It should be understood that when used in this specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and / or components, but does not exclude one or more other features , The whole, steps, operations, elements, components, and / or their presence or addition.

It should also be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to limit the invention. As used in the description of the invention and the appended claims, the singular forms "a", "an" and "the" are intended to include the plural forms unless the context clearly indicates otherwise.

It should be further understood that the term "and / or" used in the present description and the appended claims refers to any combination of one or more of the listed items and all possible combinations, and includes these combinations .

As used in this specification and the appended claims, the term "if" can be construed as "when" or "once" or "in response to a determination" or "in response to a detection" depending on the context . Similarly, the phrase "if determined" or "if [the described condition or event] is detected" can be interpreted, depending on the context, to mean "once determined" or "in response to the determination" or "once [the condition or event described ] "Or" In response to [Description of condition or event] detected ".

It should be understood that the size of the sequence numbers of the steps in this embodiment does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiment of the present invention.

It should be noted that the present invention may include any number of smart speakers to enable two or more users to conduct a video conference, wherein the smart speakers include wireless speakers.

In order to explain the technical solution of the present invention, the following description is made through specific embodiments.

Example one

FIG. 1 is a schematic structural diagram of a smart speaker according to a first embodiment of the present invention. The smart speaker may include:

The control module 11, the microphone array 12, the wireless communication module 13, the camera 14 and the screen 15.

In the embodiment of the present invention, the microphone array 12, the wireless communication module 13, the camera 14 and the screen 15 are all connected to the control module 11.

The microphone array 12 is configured to collect voice information and determine a sound source direction according to the voice information, and the sound source direction may be determined based on a positioning algorithm based on a difference in arrival time. It should be understood that the microphone array 12 is a system composed of a certain number of microphones for sampling and processing the spatial characteristics of the sound field. Optionally, the number of the microphones is seven, and the microphones are arranged in a ring shape.

The wireless communication module 13 is configured to interact with the server, so as to send locally collected voice information and / or image information to the server, and receive voice information and / or image information collected by the peer during the entire video conference. Optionally, the wireless communication module 13 may include a WiFi communication sub-module and a Bluetooth communication sub-module. Further, the server receives voice information other than the voice information collected by the microphone array 12 and / or image information other than the image information collected by the camera 14. It should be noted that considering that the smart speakers in this application are mainly used in video conference scenarios, when using the microphone array on the smart speakers to collect voice information, they need to be played out by the smart speakers for local users to hear.

The camera 14 is configured to collect image information of a user. It should be noted that the type and number of the cameras 14 can be flexibly selected according to actual conditions, including, but not limited to, a common camera, a 360-degree panoramic camera, or a camera array.

The screen 15 is configured to display image information collected by the camera 14 and / or image information received by the wireless communication module 13. Optionally, the number of the screens is at least two.

The control module 11 is configured to control, according to the direction of the sound source, a screen closest to the direction of the sound source to display image information collected by the camera 14 and / or image information received by the wireless communication module 13; The control module 11 is further configured to control the smart speaker to play the voice information collected by the microphone array 12 and / or the voice information received by the wireless communication module 13. Optionally, the control module 11 includes a main control chip, and the main control chip is an APQ8009 chip.

In the embodiment of the present invention, voice information is collected through the microphone array, and a sound source direction is determined according to the voice information. The control module controls a screen display corresponding to the sound source direction according to the sound source direction. Image information collected by the camera and / or image information received by the wireless communication module, and simultaneously controlling the smart speaker to play voice information collected by the microphone array and / or voice information received by the wireless communication module, It can meet the needs of multi-person video conference scenarios, making the smart speaker more practical, more complete, and more convenient for people to use.

Example two

FIG. 2 is a detailed structural diagram of a smart speaker provided in Embodiment 2 of the present invention. The smart speaker may include:

The control module 21, the microphone array 22, the wireless communication module 23, the camera 24, the screen 25, the wake-up module 26, the audio processing module 27, and the key module 28.

The microphone array 22, the wireless communication module 23, the camera 24 and the screen 25, the wake-up module 26, the audio processing module 27, and the key module 28 are all connected to the control module 21. It should be noted that the control module 21, the microphone array 22, the wireless communication module 23, the camera 24, and the screen 25 are the same as the control module 11, the microphone array 12, and the wireless communication module in the first embodiment. 13. The camera 14 is the same as the screen 15 and is not repeated here.

The wake-up module 26 wakes up the smart speaker after detecting a preset wake-up keyword, so that the smart speaker is in a working state.

The audio processing module 27 includes a digital signal processor, a power amplifier, and a speaker. The digital signal processor, power amplifier, and speaker are all connected to the control module 21. The output end of the digital signal processor is connected to the digital signal processor. An input terminal of the power amplifier is connected, and an output terminal of the power amplifier is connected to an input terminal of the speaker. It should be understood that, since the voice information collected by the microphone array 22 and / or the voice information received by the wireless communication module 23 contains a lot of noise, if it is directly played, it will affect the final playback effect and reduce the user experience. . Optionally, the voice information collected by the microphone array 22 and / or the voice information received by the wireless communication module 23 is processed by a digital signal processing system including the audio processing module 27.

The key module 28 is configured to receive a key instruction from a user and control the volume adjustment of the smart speaker through the control module.

As can be seen from the above, compared with the first embodiment, the embodiment of the present invention adds a wake-up module, which can wake up the smart speaker to enter the working state after detecting a preset wake-up keyword; and, it adds audio processing The module can make the voice played by the smart speaker more pleasant; in addition, a key module is added, which can be combined with the control module to adjust the volume of the smart speaker, so as to meet the different needs of users in different application scenarios and improve users. Experience, with strong ease of use and practicality.

Example three

A schematic flowchart of a method for using a smart speaker according to Embodiment 3 of the present invention. The method may include the following steps:

S301: The microphone array collects voice information, and determines a sound source direction according to the voice information.

The smart speaker includes a control module, a microphone array, a wireless communication module, a camera, and at least two screens. The microphone array, the wireless communication module, the camera, and the screen are all connected to the control module. connection.

Optionally, voice information is collected through the microphone array, the voice information is processed into voice data, and the direction of the sound source corresponding to the voice information is determined according to the voice data.

S302: According to the direction of the sound source, the control module controls a screen corresponding to the direction of the sound source to display image information collected by the camera and / or image information received by the wireless communication module, and controls the intelligent The speaker plays voice information collected by the microphone array and / or voice information received by the wireless communication module.

It should be understood that the image displayed on the screen may be only image information collected by the camera, that is, own image information; or may be only image information received by the wireless communication module, that is, image information of the other party; It can include the image information collected by the camera and the image information received by the wireless communication module, that is, the image information of the own party and the other party are displayed at the same time, and the specific displayed information can be flexibly set according to the actual needs and the size of the screen . Optionally, the screen simultaneously displays image information collected by the camera and image information received by the wireless communication module at different ratios.

It should also be understood that the voice played by the smart speaker may be only voice information collected by the microphone array, that is, own voice information; or may be only voice information received by the wireless communication module, that is, the voice of the other party The information may also include the voice information collected by the microphone array and the voice information received by the wireless communication module, that is, the voice information of the own party and the other party are displayed at the same time, and the specific displayed information may be based on actual needs and the audio processing module. The processing effect can be flexibly set. Optionally, the smart speaker plays voice information received by the wireless communication module.

As can be seen from the above, in the embodiment of the present invention, voice information is collected through the microphone array, and a sound source direction is determined according to the voice information. The control module controls a screen display corresponding to the sound source direction according to the sound source direction. Image information collected by the camera and / or image information received by the wireless communication module, and controlling the smart speaker to play voice information collected by the microphone array and / or voice information received by the wireless communication module, It can make users in all directions to communicate in addition to hearing the voice of the person you are talking to, and see their expressions and actions, so that people in different places can communicate in the same conference room. Improve the user experience and increase the use of smart speakers, with strong ease of use and practicality.

Embodiment 4

The schematic diagram of the specific implementation process of the method for using the smart speaker provided in the fourth embodiment of the present invention is a further refinement and description of steps S301 and S302 in the third embodiment. The method may include the following steps:

S401: The microphone array collects voice information.

The step S401 is basically the same as step S301 in the third embodiment, and details are not described herein again.

S402: Detect whether the voice information includes a preset wakeup keyword, and if a preset wakeup keyword is detected, wake up the smart speaker.

The wake-up keyword is a predefined word that switches the smart speaker from a standby state to a working state. Optionally, the preset wakeup keywords are flexibly set according to a user's preference.

S403: After waking up the smart speaker, determine a sound source direction according to the voice information.

The step S403 is basically the same as step S301 in the third embodiment, and details are not described herein again.

S404: When the determined direction of the sound source is one, the control module controls a screen closest to the direction of the sound source to display image information collected by the camera and / or image information received by the wireless communication module, and Controlling the smart speaker to play voice information collected by the microphone array and / or voice information received by the wireless communication module.

It should be understood that considering the application scenarios in the present invention include: one-to-one single-person video conference mode, one-to-many group video conference mode, and many-to-many group video conference mode, the direction of the sound source may be One or more, in the embodiment of the present invention, only a case where there is only one sound source direction is used as an example for explanation and description. For a description of multiple sound source directions, refer to Embodiment 5 for details.

It should also be understood that when the direction of the sound source is one, by controlling the screen closest to the direction of the sound source to display the image information collected by the camera and / or the image information received by the wireless communication module, the maximum extent can be achieved. To ensure that users see clear video. The distance from the sound source to the screen can be obtained by converting the distance from the sound source to the microphone array.

As can be seen from the above, compared with the third embodiment, the embodiment of the present invention adds a voice wake-up step and a step of judging the direction of the sound source. The voice wake-up step can promptly switch the smart speaker from the standby state to the working state. , Speeding up the data processing speed; in addition, for the case where there is only one sound source direction, controlling the screen closest to the sound source direction to display the image information collected by the camera and / or the image received by the wireless communication module Information, you can get better viewing results, so that the use of smart speakers can be improved, with strong ease of use and practicality.

Example 5

The schematic diagram of the specific implementation process of the method for using the smart speaker provided in the fifth embodiment of the present invention is a further step of detailing and describing steps S301 and S302 in the third embodiment. The method may include the following steps:

S501: The microphone array collects voice information.

S502: Detect whether the voice information includes a preset wakeup keyword, and if a preset wakeup keyword is detected, wake up the smart speaker.

S503: After the smart speaker is woken up, the microphone array determines a sound source direction according to the voice information.

The steps S501-S503 and steps S401-S403 in the fourth embodiment are basically the same, and reference may be made to related descriptions in the foregoing embodiments, which are not described herein again.

S504: When there are multiple determined directions of the sound source, the control module determines an angle formed by each of the sound source directions and a preset reference direction. At the angle, controlling the screen to display the image information collected by the camera and / or the image information received by the wireless communication module, and controlling the smart speaker to play the voice information and / or the information collected by the microphone array. The speech information received by the wireless communication module is described.

Optionally, the preset reference direction is a reference direction set when the microphone array is installed.

The viewing angle range refers to a maximum angle range in which a user can clearly observe all content on the screen from different directions. It should be understood that the viewing angle range is related to the number of screens.

For example, in a specific application scenario, if the smart speaker is equipped with three screens, the viewing angle range corresponding to the first screen is (0, 120º), and the viewing angle range corresponding to the second screen is (120º , 240º], the viewing angle range corresponding to the third screen is (240º, 360º]. When the control module determines that the angle formed by the sound source direction and the preset reference direction is less than or equal to 120º, the first screen is controlled. Screens are in a working state, displaying image information collected by the camera and / or image information received by the wireless communication module; when the control module determines that the angle formed by the sound source direction and a preset reference direction falls on (120º, 240º) interval, controlling the second screen to be in a working state, displaying image information collected by the camera and / or image information received by the wireless communication module; when the control module determines the sound When the angle formed by the source direction and the preset reference direction falls in the (240º, 360º) interval, the third screen is controlled to be in a working state, and the data collected by the camera is displayed. The image information received image information and / or the wireless communication module.

It should also be understood that, in the above application scenario, in order to enable the same communication and display between the local and remote ends, while displaying image information collected by the camera and / or image information received by the wireless communication module, the The control module may further control the smart speaker to play voice information collected by the microphone array and / or voice information received by the wireless communication module.

As can be seen from the above, compared with the third embodiment, the embodiment of the present invention provides a specific implementation when there are multiple sound source directions, which can better control the working state of the screen, thereby improving the utilization rate of the smart speaker. Strong usability and practicality.

Example Six

FIG. 6 is a schematic structural diagram of a video conference system provided by Embodiment 6 of the present invention. The video conference system may include:

Two or more smart speakers and a server respectively connected to the at least two smart speakers, wherein the smart speakers have been described in detail in the first embodiment, and are not repeated here.

The following uses a specific application scenario as an example to describe the video conference system in the embodiment of the present invention. The video conference system shown in FIG. 6 includes a first smart speaker 61, a second smart speaker 62, and a server 63. One smart speaker 61 is used by a local user, and the second smart speaker 62 is used by a remote user at the opposite end. It should be noted that, in this application, the number of local users and the number of remote users are not limited for the time being, and may be one or more, respectively, and the specific number may depend on circumstances. When both the local user and the opposite user turn on their respective smart speakers, the first smart speaker 61 collects local image information and voice information through its own camera and microphone array, and passes the collected image information and voice information through The wireless communication module sends to the server. When the server receives the request message from the second smart speaker 62, it forwards the image information and voice information sent by the first smart speaker 61 to the second smart speaker 62, and receives the second smart speaker 62. After sending the image information and voice information, when the server receives the request message of the first smart speaker 61, the server forwards the image information and voice information sent by the second smart speaker 62 to the first smart speaker 61. After the voice information collected locally determines the direction of the sound source, the screen corresponding to the direction of the sound source is controlled to display the image information collected locally and / or the opposite end, and the smart speaker 61 is controlled to play the voice information collected locally and / or the opposite end. In this way, for local users, they can hear each other's voice while seeing the other party's image. The picture.

Those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the above-mentioned division of functional units and modules is used as an example. In practical applications, the above functions can be allocated by different functional units according to needs. Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above. Each functional unit and module in the embodiment may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit. The integrated unit may be hardware. It can be implemented in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing from each other, and are not used to limit the protection scope of the present application. For specific working processes of the units and modules in the foregoing system, reference may be made to corresponding processes in the foregoing method embodiments, and details are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not detailed or recorded in an embodiment, reference may be made to related descriptions of other embodiments.

Those of ordinary skill in the art may realize that the units and algorithm steps of each example described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed terminal device and method may be implemented in other manners. For example, the terminal device embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner, such as multiple units or components. It can be combined or integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

When the integrated module is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on such an understanding, the present invention implements all or part of the processes in the methods of the above embodiments, and may also be completed by a computer program instructing related hardware. The computer program may be stored in a computer-readable storage medium. The computer When the program is executed by a processor, the steps of the foregoing method embodiments can be implemented. The computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file, or some intermediate form. The computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media. It should be noted that the content contained in the computer-readable medium can be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdictions. For example, in some jurisdictions, the computer-readable medium Excludes electric carrier signals and telecommunication signals.

The above-mentioned embodiments are only used to illustrate the technical solutions of the present invention, but not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present invention, and should be included in Within the scope of the present invention.

Claims

A smart speaker, comprising:

A control module, a microphone array, a wireless communication module, a camera, and at least two screens;

The microphone array, the wireless communication module, the camera, and the screen are all connected to the control module;

The microphone array is configured to collect voice information and determine a sound source direction according to the voice information;

The control module is configured to control, according to the direction of the sound source, a screen corresponding to the direction of the sound source to display image information collected by the camera and / or image information received by the wireless communication module, and control the The smart speaker plays voice information collected by the microphone array and / or voice information received by the wireless communication module.
The smart speaker according to claim 1, wherein the smart speaker further comprises:

Wake module

The wake-up module is connected to the control module;

The wake-up module wakes up the smart speaker after detecting a preset wake-up keyword.
The smart speaker according to claim 1 or 2, wherein when the determined direction of the sound source is one, the control module is specifically configured to control a screen closest to the direction of the sound source to display the camera The collected image information and / or image information received by the wireless communication module, and controlling the smart speaker to play voice information collected by the microphone array and / or voice information received by the wireless communication module.
The smart speaker according to claim 3, wherein the smart speaker further comprises:

An audio processing module including a digital signal processor, a power amplifier, and a speaker;

The digital signal processor, power amplifier and speaker are all connected to the control module.
The smart speaker according to claim 1 or 2, characterized in that when there are a plurality of determined sound source directions, the control module is specifically configured to determine each sound source direction among the sound source directions An angle formed with a preset reference direction, when the angle of view corresponding to the screen includes the angle, controlling the screen to display image information collected by the camera and / or image information received by the wireless communication module, and Controlling the smart speaker to play voice information collected by the microphone array and / or voice information received by the wireless communication module.
The smart speaker according to claim 1, wherein the smart speaker further comprises:

Key module

The key module is connected to the control module;

The control module is configured to control the volume adjustment of the smart speaker when the key module receives a key instruction.
A method for using a smart speaker, characterized in that the smart speaker includes a control module, a microphone array, a wireless communication module, a camera, and at least two screens, the microphone array, the wireless communication module, the camera and The screens are all connected to the control module, and the method includes:

Collecting voice information by the microphone array, and determining a sound source direction according to the voice information;

According to the sound source direction, the control module controls a screen corresponding to the sound source direction to display image information collected by the camera and / or image information received by the wireless communication module, and controls the smart speaker to play Voice information collected by the microphone array and / or voice information received by the wireless communication module.
The method according to claim 7, characterized in that controlling a screen corresponding to the direction of the sound source to display image information collected by the camera and / or image information received by the wireless communication module, and controlling the smart speaker Playing voice information collected by the microphone array and / or voice information received by the wireless communication module includes:

When the determined sound source direction is one, the control module controls the screen closest to the sound source direction to display image information collected by the camera and / or image information received by the wireless communication module, and controls The smart speaker plays voice information collected by the microphone array and / or voice information received by the wireless communication module.
The method according to claim 7, characterized in that controlling a screen corresponding to the direction of the sound source to display image information collected by the camera and / or image information received by the wireless communication module, and controlling the smart speaker Playing the voice information collected by the microphone array and / or the voice information received by the wireless communication module further includes:

When there are a plurality of determined sound source directions, the control module determines an angle formed by each sound source direction and a preset reference direction, and when a range of viewing angles corresponding to a screen includes the At the angle, controlling the screen to display image information collected by the camera and / or image information received by the wireless communication module, and controlling the smart speaker to play voice information collected by the microphone array and / or the wireless Voice information received by the communication module.
A computer-readable storage medium storing a computer program, wherein when the computer program is executed by a processor, the steps of the method according to any one of claims 7 to 9 are implemented.
A video conference system includes: at least two smart speakers according to any one of claims 1 to 6.
The video conference system according to claim 11, further comprising: a server connected to each of the at least two smart speakers.