WO2020063675A1 - 一种智能音箱及智能音箱使用的方法 - Google Patents

一种智能音箱及智能音箱使用的方法 Download PDF

Info

Publication number
WO2020063675A1
WO2020063675A1 PCT/CN2019/107871 CN2019107871W WO2020063675A1 WO 2020063675 A1 WO2020063675 A1 WO 2020063675A1 CN 2019107871 W CN2019107871 W CN 2019107871W WO 2020063675 A1 WO2020063675 A1 WO 2020063675A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice information
image information
module
smart speaker
information
Prior art date
Application number
PCT/CN2019/107871
Other languages
English (en)
French (fr)
Inventor
黄环
吴海全
张忠海
张恩勤
曹磊
师瑞文
Original Assignee
深圳市冠旭电子股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市冠旭电子股份有限公司 filed Critical 深圳市冠旭电子股份有限公司
Publication of WO2020063675A1 publication Critical patent/WO2020063675A1/zh

Links

Classifications

    • GPHYSICS
    • G03PHOTOGRAPHY; CINEMATOGRAPHY; ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ELECTROGRAPHY; HOLOGRAPHY
    • G03BAPPARATUS OR ARRANGEMENTS FOR TAKING PHOTOGRAPHS OR FOR PROJECTING OR VIEWING THEM; APPARATUS OR ARRANGEMENTS EMPLOYING ANALOGOUS TECHNIQUES USING WAVES OTHER THAN OPTICAL WAVES; ACCESSORIES THEREFOR
    • G03B31/00Associated working of cameras or projectors with sound-recording or sound-reproducing means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/66Remote control of cameras or camera parts, e.g. by remote control devices
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups

Definitions

  • the invention relates to the technical field of smart homes, and in particular, to a smart speaker, a method for using the smart speaker, and a computer-readable storage medium.
  • embodiments of the present invention provide a smart speaker and a method for using the smart speaker, which can perform image feedback at the same time as voice interaction, which greatly enriches the functions of the smart speaker.
  • a first aspect of the embodiments of the present invention provides a smart speaker, including:
  • Control module camera, microphone array, wireless communication module and projection module
  • the camera, the microphone array, the wireless communication module, and the projection module are all connected to the control module;
  • the camera collects image information
  • the control module controls the projection module to project image information onto a preset screen, and controls the smart speaker to play voice information, wherein the projected image information includes image information collected by the camera and / or The image information received by the wireless communication module, and the voice information played includes the voice information collected by the microphone array and / or the voice information received by the wireless communication module.
  • a second aspect of the embodiments of the present invention provides a method for using a smart speaker, including:
  • a third aspect of the embodiments of the present invention provides a computer-readable storage medium, including: the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the method mentioned in the second aspect is implemented.
  • the smart speaker includes: a control module, a camera, a microphone array, a wireless communication module, and a projection module, the camera, the microphone
  • the array, the wireless communication module and the projection module are all connected to the control module, the camera collects image information, the microphone array collects voice information, and the wireless communication module collects the image information and voice information Sending to a remote device and receiving image information and voice information sent by the remote device, the control module controlling the projection module to project the image information onto a preset screen, and controlling the smart speaker to play voice information, wherein
  • the projected image information includes image information collected by the camera and / or image information received by the wireless communication module, and the voice information played includes voice information collected by the microphone array and / or the wireless Voice information received by the communication module.
  • the smart speaker can not only answer questions raised by users, but also can interact with pictures and text, which greatly improves the usage rate of the smart speaker in the display function.
  • FIG. 1 is a schematic structural diagram of a smart speaker according to a first embodiment of the present invention
  • FIG. 2 is a schematic diagram of a specific structure of a smart speaker provided in Embodiment 2 of the present invention.
  • FIG. 3 is a schematic flowchart of a method for using a smart speaker according to a third embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a specific implementation process of a method for using a smart speaker according to a fourth embodiment of the present invention.
  • the term “if” can be construed as “when” or “once” or “in response to a determination” or “in response to a detection” depending on the context .
  • the phrase “if determined” or “if [the described condition or event] is detected” can be interpreted, depending on the context, to mean “once determined” or “in response to the determination” or “once [the condition or event described ] “Or” In response to [Description of condition or event] detected ".
  • the present invention may include any number of smart speakers to enable two or more users to have a video session, wherein the smart speakers include wireless speakers.
  • FIG. 1 is a schematic structural diagram of a smart speaker according to a first embodiment of the present invention.
  • the smart speaker may include:
  • the camera 12, the microphone array 13, the wireless communication module 14 and the projection module 15 are all connected to the control module 11.
  • the camera 12 collects image information.
  • the image information is image information of a user.
  • the microphone array 13 collects voice information.
  • the voice information is voice information of a user.
  • the microphone array 13 is a system composed of a certain number of microphones for sampling and processing the spatial characteristics of the sound field.
  • the number of the microphones is seven, and the microphones are arranged in a ring shape.
  • the wireless communication module 14 sends the collected image information and / or voice information to a remote device, and receives the image information and / or voice information sent by the remote device.
  • the wireless communication module 13 includes a WiFi communication unit and a Bluetooth communication unit.
  • the image information and / or voice information sent by the remote device includes image information and / or voice information obtained by a remote server and image information and / or voice information obtained by a remote Bluetooth speaker; it should also be understood that when receiving When local image information and / or voice information is received, the wireless communication module 14 transmits the information to a remote server or a remote Bluetooth speaker.
  • the projection module 15 projects image information onto a preset screen. It should be understood that the projection includes a projection in a horizontal direction and / or a projection in a vertical direction, and the screen includes a curtain, a desktop, and a wall.
  • the control module 11 controls the projection module 15 to project image information onto a preset screen, and controls the smart speaker to play voice information.
  • the projected image information includes image information collected by the camera 12 and / or image information received by the wireless communication module 14, and the voice information played includes voice information collected by the microphone array 13 and / Or the voice information received by the wireless communication module 14.
  • the control module 11 includes a main control chip, and the main control chip is an APQ8009 chip.
  • the smart speaker includes a control module, a camera, a microphone array, a wireless communication module, and a projection module, and the camera, the microphone array, the wireless communication module, and the projection module are all connected to the
  • the control module is connected, the camera collects image information, the microphone array collects voice information, and the wireless communication module sends the acquired image information and / or voice information to a remote device, and receives the image sent by the remote device Information and / or voice information, the control module controls the projection module to project image information onto a preset screen, and controls the smart speaker to play voice information.
  • the interaction of the smart speaker can be extended from the sound level to a higher level where sound and image interaction coexist, so that the functions of the smart speaker are more diversified, and have stronger ease of use and practicability.
  • FIG. 2 is a detailed structural diagram of a smart speaker provided in Embodiment 2 of the present invention.
  • the smart speaker may include:
  • the control module 21 the camera 22, the microphone array 23, the wireless communication module 24, the projection module 25, the ranging module 26, the LED light source module 27, the key module 28, and the audio processing module 29.
  • the camera 22, the microphone array 23, the wireless communication module 24, the projection module 25, the ranging module 26, the LED light source module 27, the key module 28, and the audio processing module 29 are all connected to the control module 21.
  • the control module 21, camera 22, microphone array 23, wireless communication module 24, and projection module 25 are the same as the control module 11, camera 12, microphone array 13,
  • the wireless communication module 14 and the projection module 15 are basically the same, and are not repeated here.
  • the microphone array 23 can further determine the sound source direction according to the voice information; the control module 21 can also control the camera 22 to rotate to the sound source direction, so as to focus the camera 22 on the sound Source direction to accurately obtain the user's image information.
  • the camera 12 is a 360-degree panoramic camera.
  • the ranging module 26 adjusts the picture size on the screen according to the information fed back by the ranging module.
  • the information includes a distance from the projection module to the screen.
  • the ranging module includes an infrared proximity sensor.
  • the LED light source module 27 controls the LED light source module to display the current state of the smart speaker according to the transmission status of image information and / or voice information.
  • the current state of the smart speaker includes at least one of the following: listening, thinking, and speaking. It should be noted that the listening state indicates that the smart speaker is acquiring user information, and the information is being transmitted from the user side to the smart speaker; the thinking state indicates that the smart speaker is acquiring the user's question Answer, the information is being transmitted from the smart speaker to the server or the server is transmitting response information to the smart speaker; the said state indicates that the smart speaker is giving the answer the user wants, and the information is being transmitted by the user The smart speaker is transmitted to the user terminal, and the smart speaker is in a playback and / or display state. Optionally, the current state of the smart speaker is displayed in different colors.
  • the key module 28 When the key module 28 receives a key instruction, it controls the smart speaker to adjust the playback mode or volume.
  • the number of the keys is at least one.
  • the audio processing module 29 processes and plays voice information collected by the microphone array 23 and / or voice information received by the wireless communication module 24.
  • the audio processing module 29 includes a digital signal processor, a power amplifier, and a speaker.
  • the output of the digital signal processor is connected to the input of the power amplifier.
  • the output of the power amplifier is connected to the speaker. Connected to the input.
  • the smart speaker further includes a GPS positioning module that acquires current position information of the smart speaker.
  • the embodiment of the present invention adds a ranging module, which can flexibly adjust the size of the projection area according to the distance from the speaker to the projection surface, and brings better visual effects to the user; and
  • a key module can be combined with the control module to adjust the playback mode and volume of the smart speaker.
  • an audio processing module has been added to make the voice played by the smart speaker more pleasant and can be performed while the sound is playing.
  • the synchronous display of images improves the user's experience and has strong ease of use and practicality.
  • a schematic flowchart of a method for using a smart speaker according to Embodiment 3 of the present invention may include the following steps:
  • S301 Collect image information.
  • the smart speaker may include a control module, a camera, a microphone array, a wireless communication module, and a projection module, and the camera, the microphone array, the wireless communication module, and the projection module are all connected to all
  • the control module is connected, and the camera includes, but is not limited to, a 360-degree panoramic camera and a 3D sensing lens.
  • the image information of the user is collected through a camera.
  • S302 Collect voice information.
  • the user's voice information can be collected through a microphone array.
  • S303 Send the collected image information and / or voice information to a remote device, and receive the image information and / or voice information sent by the remote device.
  • the local image information and / or voice information and the remote image information and / or voice information may be received through the wireless communication module, and the local image information and / or voice information and the remote image information and / or Or send a voice message.
  • S304 Control the image information to be projected onto a preset screen, and control the smart speaker to play voice information.
  • the projected image information includes collected image information and / or received image information
  • the played voice information includes collected voice information and / or received voice information.
  • the controller can be used to control the projection module to project image information onto a preset screen and control the smart speaker to play voice information.
  • the image information includes: image information collected by the camera and image information of the other party received by the wireless communication module, and at this time, the projection The module plays an interactive auxiliary role and realizes the real-time display function of the video call, so that the smart speaker has more social attributes.
  • the image information includes: the image information returned by the remote server received by the wireless communication module.
  • the voice information includes: voice information collected by the microphone array and voice information of the other party received by the wireless communication module;
  • voice information includes the song returned by the remote server received by the wireless communication module.
  • the relevant steps in the above-mentioned method for using a smart speaker can be implemented by a specific virtual device in addition to specific hardware devices, for example, the camera can be controlled to collect image information by using an application program.
  • the embodiment of the present invention collects image information and voice information first, and then sends the collected image information and / or voice information to a remote device, and receives the image information and / or voice information sent by the remote device. And then controlling the projection module to project image information onto a preset screen and controlling the smart speaker to play voice information can make the smart speaker have an image feedback function, which effectively enriches the function of the smart speaker and makes its function more Diversity, which can meet the needs of users in both hearing and perspective, is closer to the concept of artificial intelligence, more convenient for users' lives, and has strong ease of use and practicality.
  • the schematic diagram of the specific implementation process of the method for using the smart speaker provided in the fourth embodiment of the present invention is a further refinement and description of steps S301 and S302 in the third embodiment.
  • the method may include the following steps:
  • S401 Collect image information.
  • step S401 is the same as the step S301 in the third embodiment, and details are not described herein again.
  • S402 The identity of the user is authenticated according to the collected image information. If the identity authentication is passed, voice information is collected, and the sound source direction is further determined according to the voice information.
  • this embodiment uses face recognition technology to start after the user's identity is passed. Subsequent interactions.
  • the sound source direction is determined based on a positioning algorithm based on the difference in arrival times.
  • the direction of the sound source may be determined in combination with the image information collected in the above step S401.
  • S403 Control the camera to rotate to the direction of the sound source, and continue to collect image information.
  • the adjustment of the camera orientation according to the result of the sound source estimation can achieve the purpose of focusing, while acquiring only the image information, without missing some important image information.
  • S404 Send the collected image information and / or voice information to a remote device, and receive the image information and / or voice information sent by the remote device.
  • S405 Control the image information to be projected onto a preset screen, and control the smart speaker to play voice information.
  • steps S404-S405 are the same as the steps S303-S304 in the third embodiment, and are not repeated here.
  • the embodiment of the present invention adds a step of user identity authentication, which can protect the privacy of the user from being leaked.
  • a step of focusing the camera is added to obtain only the user that includes the user.
  • Image information simplifies subsequent image processing operations and has strong ease of use and practicality.
  • modules, units, and / or method steps of the embodiments described in connection with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the present invention.
  • the disclosed systems, devices, and methods may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the unit is only a logical function division.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on such an understanding, the present invention implements all or part of the processes in the methods of the above embodiments, and may also be completed by a computer program instructing related hardware.
  • the computer program may be stored in a computer-readable storage medium.
  • the computer When the program is executed by a processor, the steps of the foregoing method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program code may be in a source code form, an object code form, an executable file, or some intermediate form.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a mobile hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signals, telecommunication signals, and software distribution media. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in the jurisdictions. Excludes electric carrier signals and telecommunication signals.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请适用于智能家居技术领域,提供了一种智能音箱及智能音箱使用的方法,其中,该智能音箱包括:控制模块、摄像头、麦克风阵列、无线通信模块和投影模块,所述摄像头、所述麦克风阵列、所述无线通信模块和所述投影模块均与所述控制模块连接,所述摄像头采集图像信息,所述麦克风阵列采集语音信息,所述无线通信模块将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息,所述控制模块控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息。通过本申请可以实时进行图像反馈,使智能音箱的功能更加齐全,具有较强的易用性和实用性。

Description

一种智能音箱及智能音箱使用的方法 技术领域
本发明涉及智能家居技术领域,尤其涉及一种智能音箱、智能音箱使用的方法及计算机可读存储介质。
背景技术
作为人工智能的一个技术分支,智能音箱因其具备语音交互功能使得众多普通家庭进入了语音互动的时代,用户通过使用简单的语音便能让音箱自动播放音乐、新闻等,从某种意义上使用户的双手得到了解放。
然而,目前市面上的音箱大多都无法实现视觉上的交互,功能较为单一,难以满足用户的实际需求。
技术问题
鉴于此,本发明实施例提供了一种智能音箱及智能音箱使用的方法,可以在进行语音交互的同时进行图像的反馈,极大地丰富了智能音箱的功能。
技术解决方案
本发明实施例的第一方面提供了一种智能音箱,包括:
控制模块、摄像头、麦克风阵列、无线通信模块和投影模块;
所述摄像头、所述麦克风阵列、所述无线通信模块和所述投影模块均与所述控制模块连接;
所述摄像头采集图像信息;
所述麦克风阵列采集语音信息;
所述无线通信模块将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息;
所述控制模块控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息,其中,投影的所述图像信息包括所述摄像头采集的图像信息和/或所述无线通信模块接收到的图像信息,播放的所述语音信息包括所述麦克风阵列采集的语音信息和/或所述无线通信模块接收到的语音信息。
本发明实施例的第二方面提供了一种智能音箱使用的方法,包括:
采集图像信息;
采集语音信息;
将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息;
控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息,其中,投影的所述图像信息包括采集的图像信息和/或接收到的图像信息,播放的所述语音信息包括采集的语音信息和/或接收到的语音信息。
本发明实施例的第三方面提供了一种计算机可读存储介质,包括:该计算机可读存储介质上存储有计算机程序,上述计算机程序被处理器执行时实现上述第二方面提及的方法。
有益效果
本发明实施例与现有技术相比存在的有益效果是:在本实施例中,所述智能音箱包括:控制模块、摄像头、麦克风阵列、无线通信模块和投影模块,所述摄像头、所述麦克风阵列、所述无线通信模块和所述投影模块均与所述控制模块连接,所述摄像头采集图像信息,所述麦克风阵列采集语音信息,所述无线通信模块将采集的所述图像信息和语音信息发送至远程设备,并接收所述远程设备发送的图像信息和语音信息,所述控制模块控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息,其中,投影的所述图像信息包括所述摄像头采集的图像信息和/或所述无线通信模块接收到的图像信息,播放的所述语音信息包括所述麦克风阵列采集的语音信息和/或所述无线通信模块接收到的语音信息。通过本发明实施例,可以使智能音箱不仅能够回答用户所提的问题,还能够进行图文并茂的交互,大大提高了智能音箱在显示这一功能上的使用率。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为本发明实施例一提供的智能音箱的结构示意图;
图2为本发明实施例二提供的智能音箱的具体结构示意图;
图3为本发明实施例三提供的智能音箱的使用方法的流程示意图;
图4为本发明实施例四提供的智能音箱的使用方法的具体实现过程示意图。
本发明的实施方式
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本发明实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本发明。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本发明的描述。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。
还应当理解,在此本发明说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本发明。如在本发明说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本发明说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
应理解,本实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本发明实施例的实施过程构成任何限定。
需要说明的是,本发明中可以包括任意数量的智能音箱以使得两个或更多用户能够进行视频会话,其中,所述智能音箱包括无线音箱。
为了说明本发明所述的技术方案,下面通过具体实施例来进行说明。
实施例一
图1是本发明实施例一提供的智能音箱的结构示意图,该智能音箱可以包括:
控制模块11、摄像头12、麦克风阵列13、无线通信模块14和投影模块15。
在一个实施例中,所述摄像头12、所述麦克风阵列13、所述无线通信模块14和所述投影模块15均与所述控制模块11连接。
所述摄像头12采集图像信息。在一个实施例中,所述图像信息为用户的图像信息。
所述麦克风阵列13采集语音信息。在一个实施例中,所述语音信息为用户的语音信息。应理解,所述麦克风阵列13是由一定数目的麦克风组成,用来对声场的空间特性进行采样并处理的系统。可选的,所述麦克风的数量为7,呈环状排列。
所述无线通信模块14将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息。可选的,所述无线通信模块13包括:WiFi通信单元和蓝牙通信单元。应理解,所述远程设备发送的的图像信息和/或语音信息包括远程服务器获取的图像信息和/或语音信息以及远程的蓝牙音箱获取的图像信息和/或语音信息;还应理解,当接收到本地的图像信息和/或语音信息时,由所述无线通信模块14传送给远程服务器或远程的蓝牙音箱。
所述投影模块15将图像信息投影到预设的屏幕上。应理解,所述投影包括水平方向上的投影和/或垂直方向上的投影,所述屏幕包括幕布、桌面和墙壁。
所述控制模块11控制所述投影模块15将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息。其中,投影的所述图像信息包括所述摄像头12采集的图像信息和/或所述无线通信模块14接收到的图像信息,播放的所述语音信息包括所述麦克风阵列13采集的语音信息和/或所述无线通信模块14接收到的语音信息。可选的,所述控制模块11包含一主控芯片,所述主控芯片为APQ8009芯片。
本发明实施例中,所述智能音箱包括:控制模块、摄像头、麦克风阵列、无线通信模块和投影模块,所述摄像头、所述麦克风阵列、所述无线通信模块和所述投影模块均与所述控制模块连接,所述摄像头采集图像信息,所述麦克风阵列采集语音信息,所述无线通信模块将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息,所述控制模块控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息。通过本发明实施例,可以将智能音箱的交互从声音层面扩展到声音与图像交互并存的更高层面上来,使智能音箱的功能更加多样化,具有较强的易用性和实用性。
实施例二
图2是本发明实施例二提供的智能音箱的具体结构示意图,该智能音箱可以包括:
控制模块21、摄像头22、麦克风阵列23、无线通信模块24、投影模块25、测距模块26、LED光源模块27、按键模块28和音频处理模块29。
其中,所述摄像头22、麦克风阵列23、无线通信模块24、投影模块25、测距模块26、LED光源模块27、按键模块28和音频处理模块29均与所述控制模块21连接。需要说明的是,所述控制模块21、摄像头22、麦克风阵列23、无线通信模块24和投影模块25与实施例一中的所述控制模块11、所述摄像头12、所述麦克风阵列13、所述无线通信模块14和所述投影模块15基本相同,在此不作重复赘述。另外,所述麦克风阵列23还可根据所述语音信息进一步确定声源方向;所述控制模块21还可控制所述摄像头22转动到所述声源方向,以便于将所述摄像头22聚焦在声源方向,准确获取用户的图像信息。可选的,所述摄像头12为360度的全景摄像头。
所述测距模块26根据所述测距模块反馈的信息调整所述屏幕上的画面尺寸。其中,所述信息包括所述投影模块到所述屏幕的距离。可选的,所述测距模块包括红外接近传感器。
所述LED光源模块27根据图像信息和/语音信息的传输状态控制所述LED光源模块显示所述智能音箱当前所处的状态。其中,所述智能音箱当前所处的状态包括以下至少一种:听、思考和说。需要说明的是,所述听的状态表明所述智能音箱正在获取用户的信息,信息正由用户侧传输至所述智能音箱;所述思考的状态表明所述智能音箱正在获取用户所提问题的答案,信息正由所述智能音箱传输至服务器或正由服务器将响应信息传输至所述智能音箱;所述说的状态表明所述智能音箱正在给出用户想要的答案,信息正由所述智能音箱传输至用户端,所述智能音箱处于播放和/或显示的状态。可选的,通过不同的颜色来显示所述智能音箱当前所处的状态。
在所述按键模块28接收到按键指令时,控制所述智能音箱进行播放模式或音量的调节。可选的,所述按键的数量至少为一个。
所述音频处理模块29对所述麦克风阵列23采集的语音信息和/或所述无线通信模块24接收到的语音信息进行处理和播放。其中,所述音频处理模块29包括:数字信号处理器、功率放大器和扬声器,所述数字信号处理器的输出端与所述功率放大器的输入端连接,所述功率放大器的输出端与所述扬声器的输入端连接。
可选的,所述智能音箱还包括获取所述智能音箱当前的位置信息的GPS定位模块。
由上可见,本发明实施例相比于实施例一,增加了测距模块,可以根据音箱到投影面的距离来灵活调整投影面积的大小,给用户带来了较佳的视觉效果;并且增加了按键模块,可以结合所述控制模块来进行智能音箱的播放模式及音量的调节,另外,还增加了音频处理模块,可以使所述智能音箱播放的语音更加好听,能够在声音播放的同时进行图像的同步显示,提升了用户的体验感,具有较强的易用性和实用性。
实施例三
本发明实施例三提供的智能音箱的使用方法的流程示意图,该方法可以包括以下步骤:
S301:采集图像信息。
在一个实施例中,所述智能音箱可以包括:控制模块、摄像头、麦克风阵列、无线通信模块和投影模块,所述摄像头、所述麦克风阵列、所述无线通信模块和所述投影模块均与所述控制模块连接,所述摄像头包括但不限于360度全景摄像头、3D感知镜头。
在一个实施例中,通过摄像头采集用户的图像信息。
S302:采集语音信息。
在一个实例中,可以通过麦克风阵列采集用户的语音信息。
S303:将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息。
在一个实例中,可以通过无线通信模块接收本地的图像信息和/或语音信息,以及远程的图像信息和/或语音信息,并将本地的图像信息和/或语音信息以及远程的图像信息和/或语音信息发送出去。
S304:控制将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息。
其中,所述投影的图像信息包括采集的图像信息和/或接收到的图像信息,所述播放的语音信息包括采集的语音信息和/或接收到的语音信息。
在一个实施例中,可以通过控制器来控制所述投影模块将图像信息投影到预设的屏幕上及控制所述智能音箱播放语音信息。
应理解,当所述用户正在使用所述智能音箱进行视频通话时,所述图像信息包括:所述摄像头采集的图像信息和所述无线通信模块接收到的对方的图像信息,此时所述投影模块起到交互的辅助作用,实现视频通话的实时显现功能,让智能音箱拥有了更多的社交属性;另外,在只有一个智能音箱的情况下,当所述用户与所述智能音箱进行人机交互请求播放指定的视频时,所述图像信息包括:所述无线通信模块接收到的远程服务器返回的图像信息。
相应地,当所述用户正在使用所述智能音箱进行视频通话时,所述语音信息包括:所述麦克风阵列采集的语音信息和所述无线通信模块接收到的对方的语音信息;此外,在只有一个智能音箱的情况下,当所述用户与所述智能音箱进行人机交互请求播放指定的歌曲时,所述语音信息包括:所述无线通信模块接收到的远程服务器返回的歌曲。
需要说明的是,上述智能音箱使用方法中的相关步骤除了可以通过具体的硬件设备实现外也可以通过相应的虚拟模块来实现,例如:可以通过某种应用程序来控制所述摄像头采集图像信息。
由上可见,本发明实施例通过先采集图像信息和语音信息,然后将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息,再控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息,可以使智能音箱具有图像反馈功能,有效地丰富了智能音箱的功能,使其功能更加多样化,能够同时满足用户在听觉上和视角上的需求,更加贴近人工智能的理念,更加方便用户的生活,具有较强的易用性和实用性。
实施例四
本发明实施例四提供的智能音箱的使用方法的具体实现过程示意图,是对上述实施例三中的步骤S301、S302的进一步细化和说明,该方法可以包括以下步骤:
S401:采集图像信息。
其中,上述步骤S401和上述实施例三中的步骤S301相同,此处不再赘述。
S402:根据采集的图像信息,对用户的身份进行认证,若身份认证通过,则采集语音信息,并根据所述语音信息进一步确定声源方向。
需要说明的是,由于用户在每次使用所述智能音箱后都会有相应的记录,为了最大程度的保护用户的隐私,本实施例通过人脸识别技术,能够在用户身份认证通过后才开始进行后续的交互操作。
可选的,基于到达时间差的定位算法来确定所述声源方向。
进一步的,可以结合上述步骤S401中采集的图像信息来确定声源的方向。
S403:控制摄像头转动到所述声源方向,继续采集图像信息。
本实施例中,根据声源估计的结果进行摄像头指向的调整,能够达到聚焦的目的,可以在仅获取图像信息同时,又不会遗漏一些重要的图像信息。
S404:将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息。
S405:控制将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息。
其中,上述步骤S404-S405和上述实施例三中的步骤S303-S304相同,此处不再赘述。
由上可见,本发明实施例相比于实施例三,增加了用户身份认证的步骤,可以保护用户的隐私不被泄露;另外,还增加了摄像头聚焦的步骤,可以获取仅包含所述用户的图像信息,简化了后续图像处理操作,具有较强的易用性和实用性。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各实施例的模块、单元和/或方法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
以上所述,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (10)

  1. 一种智能音箱,其特征在于,包括:
    控制模块、摄像头、麦克风阵列、无线通信模块和投影模块;
    所述摄像头、所述麦克风阵列、所述无线通信模块和所述投影模块均与所述控制模块连接;
    所述摄像头采集图像信息;
    所述麦克风阵列采集语音信息;
    所述无线通信模块将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息;所述控制模块控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息,其中,投影的所述图像信息包括所述摄像头采集的图像信息和/或所述无线通信模块接收到的图像信息,播放的所述语音信息包括所述麦克风阵列采集的语音信息和/或所述无线通信模块接收到的语音信息。
  2. 根据权利要求1所述的智能音箱,其特征在于,所述麦克风阵列采集语音信息,并根据所述语音信息进一步确定声源方向,所述控制模块控制所述摄像头转动到所述声源方向。
  3. 根据权利要求1所述的智能音箱,其特征在于,所述智能音箱还包括测距模块;
    所述测距模块与所述控制模块连接;
    所述控制模块根据所述测距模块反馈的信息调整所述屏幕上的画面尺寸,其中,所述信息包括所述投影模块到所述屏幕的距离。
  4. 根据权利要求1所述的智能音箱,其特征在于,所述智能音箱还包括:
    LED光源模块;
    所述LED光源模块与所述控制模块连接;
    所述控制模块根据图像信息和/或语音信息的传输状态控制所述LED光源模块显示所述智能音箱当前所处的状态,其中,所述智能音箱当前所处的状态包括以下至少一种:听、思考和说。
  5. 根据权利要求1所述的智能音箱,其特征在于,所述智能音箱还包括:
    按键模块;
    所述按键模块与所述控制模块连接;
    所述控制模块在所述按键模块接收到按键指令时,控制所述智能音箱进行播放模式或音量的调节。
  6. 根据权利要求1至5任一项所述的智能音箱,其特征在于,所述智能音箱还包括:
    音频处理模块,所述音频处理模块包括数字信号处理器、音频功率放大器及扬声器;
    所述数字信号处理器的输出端与所述音频功率放大器的输入端连接,所述音频功率放大器的输出端与所述扬声器的输入端连接。
  7. 一种智能音箱使用的方法,其特征在于,包括:
    采集图像信息;
    采集语音信息;
    将采集的所述图像信息和/或语音信息发送至远程设备,并接收所述远程设备发送的图像信息和/或语音信息;
    控制所述投影模块将图像信息投影到预设的屏幕上,及控制所述智能音箱播放语音信息,其中,投影的所述图像信息包括采集的图像信息和/或接收到的图像信息,播放的所述语音信息包括采集的语音信息和/或接收到的语音信息。
  8. 根据权利要求7所述的方法,其特征在于,所述采集语音信息包括:
    根据采集的图像信息,对用户的身份进行认证,若身份认证通过,则采集语音信息,并根据所述语音信息进一步确定声源方向。
  9. 根据权利要求8所述的方法,其特征在于,在根据所述语音信息进一步确定声源方向之后,还包括:
    控制摄像头转动到所述声源方向,继续采集图像信息。
  10. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求7至9中任一项所述方法的步骤。
PCT/CN2019/107871 2018-09-27 2019-09-25 一种智能音箱及智能音箱使用的方法 WO2020063675A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811129605.5 2018-09-27
CN201811129605.5A CN110958537A (zh) 2018-09-27 2018-09-27 一种智能音箱及智能音箱使用的方法

Publications (1)

Publication Number Publication Date
WO2020063675A1 true WO2020063675A1 (zh) 2020-04-02

Family

ID=69953323

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/107871 WO2020063675A1 (zh) 2018-09-27 2019-09-25 一种智能音箱及智能音箱使用的方法

Country Status (2)

Country Link
CN (1) CN110958537A (zh)
WO (1) WO2020063675A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160724A (zh) * 2021-02-02 2021-07-23 上海锦子昌电子科技有限公司 一种公益宣传智能播报系统
CN113523609A (zh) * 2021-08-13 2021-10-22 彭晓静 一种用于3d建模激光切割的语音智能机器人
CN113873392A (zh) * 2021-09-06 2021-12-31 深圳市海创嘉科技有限公司 一种智能音箱阵列系统
CN114006971A (zh) * 2021-10-28 2022-02-01 努比亚技术有限公司 一种投屏窗口音频控制方法、设备及计算机可读存储介质
CN114071323A (zh) * 2021-11-08 2022-02-18 广州番禺巨大汽车音响设备有限公司 基于全景播放的tws音响的控制方法及控制装置
CN114089655A (zh) * 2021-10-22 2022-02-25 西京学院 一种基于视觉图像处理的坐姿监测纠正装置
CN114489317A (zh) * 2020-11-13 2022-05-13 上海擎感智能科技有限公司 一种交互方法、交互装置、终端及计算机可读存储介质

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113766278B (zh) * 2020-08-11 2024-04-12 北京沃东天骏信息技术有限公司 音频播放方法、音频播放装置和音频播放系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103179475A (zh) * 2011-12-22 2013-06-26 深圳市三诺电子有限公司 无线音箱及其无线音箱系统
US20130329361A1 (en) * 2012-06-08 2013-12-12 Hon Hai Precision Industry Co., Ltd. Projection device
CN204897064U (zh) * 2015-08-25 2015-12-23 黄波 一种电梯投影装置
CN106445455A (zh) * 2016-09-29 2017-02-22 深圳前海弘稼科技有限公司 种植设备和种植设备的控制方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103179475A (zh) * 2011-12-22 2013-06-26 深圳市三诺电子有限公司 无线音箱及其无线音箱系统
US20130329361A1 (en) * 2012-06-08 2013-12-12 Hon Hai Precision Industry Co., Ltd. Projection device
CN204897064U (zh) * 2015-08-25 2015-12-23 黄波 一种电梯投影装置
CN106445455A (zh) * 2016-09-29 2017-02-22 深圳前海弘稼科技有限公司 种植设备和种植设备的控制方法

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114489317A (zh) * 2020-11-13 2022-05-13 上海擎感智能科技有限公司 一种交互方法、交互装置、终端及计算机可读存储介质
CN114489317B (zh) * 2020-11-13 2023-11-03 上海擎感智能科技有限公司 一种交互方法、交互装置、终端及计算机可读存储介质
CN113160724A (zh) * 2021-02-02 2021-07-23 上海锦子昌电子科技有限公司 一种公益宣传智能播报系统
CN113523609A (zh) * 2021-08-13 2021-10-22 彭晓静 一种用于3d建模激光切割的语音智能机器人
CN113873392A (zh) * 2021-09-06 2021-12-31 深圳市海创嘉科技有限公司 一种智能音箱阵列系统
CN114089655A (zh) * 2021-10-22 2022-02-25 西京学院 一种基于视觉图像处理的坐姿监测纠正装置
CN114006971A (zh) * 2021-10-28 2022-02-01 努比亚技术有限公司 一种投屏窗口音频控制方法、设备及计算机可读存储介质
CN114006971B (zh) * 2021-10-28 2024-03-19 努比亚技术有限公司 一种投屏窗口音频控制方法、设备及计算机可读存储介质
CN114071323A (zh) * 2021-11-08 2022-02-18 广州番禺巨大汽车音响设备有限公司 基于全景播放的tws音响的控制方法及控制装置
CN114071323B (zh) * 2021-11-08 2023-09-12 广州番禺巨大汽车音响设备有限公司 基于全景播放的tws音响的控制方法及控制装置

Also Published As

Publication number Publication date
CN110958537A (zh) 2020-04-03

Similar Documents

Publication Publication Date Title
WO2020063675A1 (zh) 一种智能音箱及智能音箱使用的方法
US11061643B2 (en) Devices with enhanced audio
US20140347565A1 (en) Media devices configured to interface with information appliances
US20140342660A1 (en) Media devices for audio and video projection of media presentations
KR102538775B1 (ko) 오디오 재생 방법 및 오디오 재생 장치, 전자 기기 및 저장 매체
CN106454644B (zh) 音频播放方法及装置
US11206372B1 (en) Projection-type video conference system
WO2023051126A1 (zh) 一种视频处理的方法及相关电子设备
CN114245267B (zh) 多设备协同工作的方法、系统及电子设备
WO2018166081A1 (zh) 一种耳机
CN106453032B (zh) 信息推送方法及装置、系统
JP2023544483A (ja) 持続的共存グループビデオ会議システム
WO2020038494A1 (zh) 一种智能音箱及智能音箱使用的方法
WO2023231686A9 (zh) 一种视频处理方法和终端
US20230370801A1 (en) Information processing device, information processing terminal, information processing method, and program
CN113709652B (zh) 音频播放控制方法和电子设备
WO2023212883A1 (zh) 音频输出方法和装置、通信装置和存储介质
CN110213531A (zh) 监控录像处理方法及装置
JP2012248990A (ja) 電子機器及びテレビ電話方法
US11363236B1 (en) Projection-type video conference system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19864644

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19864644

Country of ref document: EP

Kind code of ref document: A1