CN111176438A

CN111176438A - Intelligent sound box control method based on three-dimensional gesture motion recognition and intelligent sound box

Info

Publication number: CN111176438A
Application number: CN201911134679.2A
Authority: CN
Inventors: 尚宇翔
Original assignee: Shenzhen China Star Optoelectronics Technology Co Ltd
Current assignee: TCL China Star Optoelectronics Technology Co Ltd
Priority date: 2019-11-19
Filing date: 2019-11-19
Publication date: 2020-05-19

Abstract

The embodiment of the invention relates to the technical field of intelligent sound boxes, and discloses an intelligent sound box control method based on three-dimensional gesture motion recognition and an intelligent sound box, wherein the method comprises the following steps: detecting the hand of a user by utilizing a first camera module of the intelligent sound box; if the hands of the user are detected, controlling a first camera module to shoot the hands of the user to obtain a first gesture image, and controlling a second camera module of the intelligent sound box to shoot the hands of the user to obtain a second gesture image; recognizing a first gesture image to obtain first gesture information, and recognizing a second gesture image to obtain second gesture information; acquiring target gesture information according to the first gesture information and the second gesture information; controlling the intelligent sound box to execute corresponding operation of the target gesture information; the accuracy of gesture motion recognition is improved.

Description

Intelligent sound box control method based on three-dimensional gesture motion recognition and intelligent sound box

Technical Field

The invention relates to the technical field of intelligent sound boxes, in particular to an intelligent sound box control method based on three-dimensional gesture motion recognition and an intelligent sound box.

Background

At present, the intelligent sound box has a plurality of control modes, such as key control, voice control, gesture control and the like which are common. Among them, gesture control is a recent trend, and gesture recognition is one of key technical points of gesture control. A common method for recognizing gesture actions is to detect through a camera, that is, to shoot an image sequence, and to obtain gesture action information through recognizing the image sequence. However, in practice, the accuracy rate of recognizing the gesture motion only according to the image sequence is not ideal, so that the misoperation problem exists.

Disclosure of Invention

The embodiment of the invention discloses an intelligent sound box control method based on three-dimensional gesture motion recognition and an intelligent sound box, which are used for improving the accuracy of gesture motion recognition.

The first aspect of the embodiment of the invention discloses an intelligent sound box control method based on three-dimensional gesture motion recognition, which comprises the following steps:

detecting a hand of a user by utilizing a first camera module of the intelligent sound box;

if the user hand is detected, controlling the first camera module to shoot the user hand to obtain a first gesture image, and controlling the second camera module of the intelligent sound box to shoot the user hand to obtain a second gesture image;

recognizing the first gesture image to obtain first gesture information, and recognizing the second gesture image to obtain second gesture information;

acquiring target gesture information according to the first gesture information and the second gesture information;

and controlling the intelligent sound box to execute the corresponding operation of the target gesture information.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, before the detecting the hand of the user by using the first camera module of the smart sound box, the method further includes:

judging whether a human body is detected;

if the human body is detected, the step of detecting the hand of the user by utilizing the first camera module of the intelligent sound box is executed.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, before detecting the hand of the user, and controlling the first camera module to shoot the hand of the user to obtain a first gesture image, and controlling the second camera module of the smart speaker to shoot the hand of the user to obtain a second gesture image, the method further includes:

detecting whether the hand of the user generates three-dimensional gesture motion;

if the three-dimensional gesture action of the user hand is detected, the first camera module is controlled to shoot the user hand to obtain a first gesture image, and the second camera module of the intelligent sound box is controlled to shoot the user hand to obtain a second gesture image.

As an optional implementation manner, in the first aspect of the embodiment of the present invention, before detecting the hand of the user and detecting whether the hand of the user performs the three-dimensional gesture motion, the method further includes:

detecting whether a wearable device is worn on the hand of the user;

if wearable equipment is worn on the hand of the user, judging whether the intelligent sound box is connected with the wearable equipment or not;

and if the intelligent sound box is connected with the wearable device, executing the step of detecting whether the hand of the user generates the three-dimensional gesture action.

As an optional implementation manner, in the first aspect of this embodiment of the present invention, the method further includes:

if the hand of the user is detected, a display device connected with the intelligent sound box is turned on;

after obtaining the target gesture information according to the first gesture information and the second gesture information, the method further includes:

generating a target virtual gesture according to the target gesture information;

displaying the target virtual gesture on the display device.

The second aspect of the embodiment of the present invention discloses an intelligent sound box, which includes:

the first detection unit is used for detecting the hand of a user by utilizing the first camera module of the intelligent sound box;

the first control unit is used for controlling the first camera module to shoot the user hand to obtain a first gesture image and controlling the second camera module of the intelligent sound box to shoot the user hand to obtain a second gesture image if the first detection unit detects the user hand;

the first recognition unit is used for recognizing the first gesture image to obtain first gesture information and recognizing the second gesture image to obtain second gesture information;

the information acquisition unit is used for acquiring target gesture information according to the first gesture information and the second gesture information;

and the second control unit is used for controlling the intelligent sound box to execute the corresponding operation of the target gesture information.

As an optional implementation manner, in the second aspect of the embodiment of the present invention, the smart sound box further includes:

the second detection unit is used for judging whether a human body is detected or not before the first detection unit detects the hand of the user by using the first camera module of the intelligent sound box;

first detecting element specifically is used for the second detecting element detects during the human body, utilize the first module of making a video recording of intelligence audio amplifier detects user's hand.

the third detection unit is used for detecting whether the user hand generates a three-dimensional gesture action or not before the first detection unit detects the user hand and the first control unit controls the first camera module to shoot the user hand to obtain a first gesture image and controls the second camera module of the intelligent sound box to shoot the user hand to obtain a second gesture image;

the first control unit is specifically used for controlling the first camera shooting module to shoot the user hand to obtain a first gesture image if the third detection unit detects that the user hand generates the three-dimensional gesture action, and controlling the second camera shooting module of the intelligent sound box to shoot the action of obtaining a second gesture image by the user hand.

a fourth detection unit, configured to detect whether a wearable device is worn on the hand of the user before the first detection unit detects the hand of the user and the third detection unit detects whether the hand of the user performs a three-dimensional gesture;

the connection judging unit is used for judging whether the intelligent sound box is connected with the wearable device or not if the fourth detecting unit detects that the wearable device is worn on the hand of the user;

the third detection unit is specifically configured to detect whether the hand of the user has a three-dimensional gesture action if the connection determination unit determines that the smart sound box is connected to the wearable device.

the third control unit is used for turning on the display equipment connected with the intelligent sound box if the first detection unit detects the hand of the user;

the display unit is used for generating a target virtual gesture according to the target gesture information after the information acquisition unit acquires the target gesture information according to the first gesture information and the second gesture information; and displaying the target virtual gesture on the display device.

The third aspect of the embodiments of the present invention discloses an intelligent speaker, which may include:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to execute the intelligent sound box control method based on the three-dimensional gesture motion recognition disclosed by the first aspect of the embodiment of the invention.

The fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium, which stores a computer program, wherein the computer program enables a computer to execute the method for controlling an intelligent sound box based on three-dimensional gesture motion recognition disclosed in the first aspect of the embodiments of the present invention.

A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.

A sixth aspect of the present embodiment discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where the computer program product is configured to, when running on a computer, cause the computer to perform part or all of the steps of any one of the methods in the first aspect.

Compared with the prior art, the embodiment of the invention has the following beneficial effects:

in the embodiment of the invention, the intelligent sound box detects the hand of the user by using the first camera module, after the hand of the user is detected, the first camera module is controlled to shoot the hand of the user to obtain a first gesture image, the second camera module is controlled to shoot the hand of the user to obtain a second gesture image, the first gesture information is obtained by identifying the first gesture image, the second gesture information is obtained by identifying the second gesture image, the target gesture information is obtained further according to the first gesture information and the second gesture information, and finally the corresponding operation of the target gesture information is executed. Therefore, by implementing the embodiment of the invention, the first camera module and the second camera module of the intelligent sound box can be fully utilized to shoot the hand of the same user from different angles, so as to obtain the first gesture image and the second gesture image at different angles, and the first gesture image and the second gesture image are combined to obtain more accurate target gesture information, thereby being beneficial to improving the accuracy of gesture motion recognition.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a schematic structural diagram of an intelligent sound box disclosed in an embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for controlling an intelligent sound box based on three-dimensional gesture recognition according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of a method for controlling an intelligent sound box based on three-dimensional gesture recognition according to another embodiment of the present invention;

fig. 4 is a schematic structural diagram of an intelligent sound box disclosed in an embodiment of the present invention;

fig. 5 is a schematic structural view of an intelligent sound box disclosed in another embodiment of the present invention;

fig. 6 is a schematic structural diagram of an intelligent sound box disclosed in another embodiment of the present invention;

fig. 7 is a schematic structural diagram of an intelligent sound box disclosed in another embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first", "second", "third", and "fourth" and the like in the description and the claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The intelligent sound box disclosed by the embodiment of the invention is an integrally formed vertical device, can be a small intelligent device with a smaller volume placed on a desktop, and can also be a vertical intelligent device placed on the ground, wherein the intelligent sound box comprises a main machine box body, and a display screen is arranged on the main machine box body. In an optional application scene, intelligent audio amplifier still includes the camera, and display screen and camera can be dismantled and set up on the host computer box, and when using, install the position of reserving on the host computer box with display screen and/or camera, when not using, can pull down display screen and/or camera, are convenient for remove intelligent audio amplifier, protection display screen and camera. In another optional application scenario, the smart sound box is not provided with a camera, but is provided with an additional camera, and when the smart sound box is used by a user, the camera can be installed on glasses worn by the user. Or, the intelligence audio amplifier still includes the camera, and the extra camera of configuration simultaneously, when the user uses, can install the extra camera of configuration on the glasses that the user wore. In another optional application scenario, the smart speaker further includes a camera, the camera may be connected to the main box through a pull cord, and the camera may be pulled out and fixed to any position on the main box. In another optional application scenario, the smart speaker may adopt a dual-camera design, including a top camera and a bottom camera, wherein the top camera is rotatable and liftable for shooting a desktop, and the bottom camera is fixed in the host box and can be used for recognizing gestures. In another optional application scenario, the bottom of the main cabinet is further provided with wheels, so that the main cabinet can be pushed to move. Further optionally, a control circuit is further arranged inside the main case body, the control circuit is electrically connected with the wheels, the main case body provides a walking path, the intelligent sound box can be controlled to walk according to the walking path through the control circuit, and automatic movement of the intelligent sound box is achieved. In another optional application scenario, the display screen of the smart sound box may adopt a foldable screen to solve the problem of switching between a horizontal screen and a vertical screen. In another optional application scenario, the smart sound box is further provided with a light supplement light source, which may include a bulb, a light strip, or a light strip + an external component (e.g., a blind window), and the like.

As shown in fig. 1, only the main body case, the display screen, the top camera, and the bottom camera are shown in fig. 1, and other components are not shown in fig. 1. It can be understood that fig. 1 is only an intelligent sound box corresponding to some embodiments of the present invention, and other intelligent sound boxes that are optimized or transformed based on the intelligent sound box of fig. 1 and can implement the technical solution of the present invention all belong to the protection scope of the technical solution of the present invention, and are not listed here.

The technical solution of the present invention will be described in detail through specific embodiments from the perspective of the smart speaker.

Example one

Referring to fig. 2, fig. 2 is a schematic flow chart of a method for controlling an intelligent sound box based on three-dimensional gesture recognition according to an embodiment of the present invention; as shown in fig. 2, the method for controlling a smart speaker based on three-dimensional gesture motion recognition may include:

201. utilize the first module of making a video recording of intelligent audio amplifier to detect user's hand.

It should be noted that the execution subject of the embodiment of the present invention may be an intelligent sound box. The first camera module of the smart speaker may be a top camera in the smart speaker shown in fig. 1.

As an optional implementation manner, before the detection of the hand of the user by using the first camera module of the smart sound box is performed, the following steps may be further performed:

starting a first camera module, and detecting whether the first camera module starts a multi-angle scanning mode or not;

if the multi-angle scanning mode is started, the multi-angle scanning mode is closed, and the shooting angle of the first camera module is adjusted to be a preset shooting angle.

Implement above-mentioned embodiment, the shooting angle of the first module of making a video recording of control is in order to detect user's hand at specific shooting range, avoids because of the all-round erroneous detection that causes that detects, improves gesture control's accuracy.

202. If detect above-mentioned user's hand, control first module of making a video recording and shoot user's hand and obtain first gesture image to and the second of control intelligent audio amplifier module of making a video recording and shoot user's hand and obtain the second gesture image.

It can be understood that first module of making a video recording can be the top camera of intelligent audio amplifier, and the second module of making a video recording can be the bottom camera of intelligent audio amplifier, therefore the shooting angle of first module of making a video recording is different with the shooting angle of second module of making a video recording to first gesture image and second gesture image demonstrate the gesture action of user's hand from different angles respectively, and then can accurate analysis go out user's true gesture action.

As an alternative implementation, step 202 may include the following steps:

if the hand of the user is detected, the shooting angle of the first camera shooting module is adjusted to be a first shooting angle, the shooting angle of the second camera shooting module is adjusted to be a second shooting angle, the first camera shooting module is controlled to shoot the hand of the user based on the first shooting angle to obtain a first gesture image, and the second camera shooting module is controlled to shoot the hand of the user based on the second shooting angle to obtain a second gesture image.

Through this embodiment, can be through the shooting angle of rationally adjusting first module of making a video recording and the shooting angle of the second module of making a video recording to can shoot the gesture image of the relatively recognition of being convenient for, improve gesture recognition accuracy.

Wherein, the shooting angle of first module of making a video recording when first module of making a video recording is used for detecting the user hand can the identical or partial coincidence with the first shooting angle of first module of making a video recording, and then, after controlling first module of making a video recording and shooting user hand based on first shooting angle and obtain first gesture image, resume this first shooting module's shooting angle to original shooting angle to carry out user hand detection based on same shooting angle (specific shooting scope) again, avoid because of the all-round erroneous detection that causes that detects, improve the accuracy of gesture control.

203. The first gesture image is recognized to obtain first gesture information, and the second gesture image is recognized to obtain second gesture information.

204. And obtaining target gesture information according to the first gesture information and the second gesture information.

Optionally, obtaining the target gesture information according to the first gesture information and the second gesture information includes:

and constructing a target gesture according to the first gesture information and the second gesture information so as to obtain target gesture information, correctly indicate the real gesture action of the user and accurately identify the intention of the user.

For example, the first camera module is positioned at the top to shoot a first gesture image at a top view angle, the second camera module is positioned at the bottom to shoot a second gesture image at a bottom view angle, the gesture motion recognized according to the first gesture image is a dimension at the top view angle, the gesture motion recognized according to the second gesture image is a dimension at the bottom view angle, and according to the gesture motions of the two dimensions, a three-dimensional gesture motion can be accurately constructed, so that the real gesture motion of the user is obtained.

205. And controlling the intelligent sound box to execute corresponding operation of the target gesture information.

In the embodiment of the invention, the intelligent sound box detects the hand of the user by using the first camera module, after the hand of the user is detected, the first camera module is controlled to shoot the hand of the user to obtain a first gesture image, the second camera module is controlled to shoot the hand of the user to obtain a second gesture image, the first gesture information is obtained by identifying the first gesture image, the second gesture information is obtained by identifying the second gesture image, the target gesture information is obtained further according to the first gesture information and the second gesture information, and finally the corresponding operation of the target gesture information is executed. Therefore, by implementing the embodiment of the invention, the first camera module and the second camera module of the intelligent sound box can be fully utilized to shoot the hand of the same user from different angles, so as to obtain the first gesture image and the second gesture image at different angles, and the more accurate target gesture information can be obtained according to the combination of the first gesture image and the second gesture image, thereby being beneficial to improving the accuracy of gesture motion recognition.

Example two

Referring to fig. 3, fig. 3 is a schematic flow chart of a method for controlling an intelligent sound box based on three-dimensional gesture motion recognition according to another embodiment of the present invention; as shown in fig. 3, the method for controlling a smart speaker based on three-dimensional gesture motion recognition may include:

301. and judging whether a human body is detected. If yes, go to step 302; if no human body is detected, the process is ended.

The intelligent sound box has the advantages that the specific human body detection function of the intelligent sound box can be detected when a human body is close to the intelligent sound box, and after the human body is detected, hand detection of a user is carried out to realize gesture control of the intelligent sound box.

As an optional implementation, the determining whether the human body is detected may include:

the first camera module or the second camera module of the intelligent sound box is used for carrying out video monitoring to judge whether a human body appears in a preset monitoring range, if so, the human body is determined to be detected, and if not, the human body is determined not to be detected.

Above-mentioned embodiment carries out video monitoring through the module of making a video recording and whether is close to in order to detect the human body, can directly utilize intelligent audio amplifier's original camera to realize, need not to increase extra part, and the cost is lower.

302. Utilize the first module of making a video recording of intelligent audio amplifier to detect user's hand.

In some alternative embodiments, step 302 may include: the first module of making a video recording of intelligence audio amplifier detection control slides to shooting position from retrieving the position, when first module of making a video recording is located shooting position, controls first module of making a video recording and detects user's hand. Wherein, retrieve the position and refer to when not using first module of making a video recording for retrieve the position of placing first module of making a video recording, when needs were shot, slided and stretch out the intelligent audio amplifier outside from retrieving the position, obtain and shoot the position.

303. And if the hand of the user is detected, detecting whether the hand of the user generates the three-dimensional gesture motion. If the three-dimensional gesture motion of the hand of the user is detected, turning to step 304; if the three-dimensional gesture motion of the hand of the user is not detected, the process is ended.

As an optional implementation manner, if the hand of the user is detected, whether the hand of the user is worn with the wearable device is detected; if the wearable equipment is worn on the hand of the user, judging whether the intelligent sound box is connected with the wearable equipment or not; and if the intelligent sound box is connected with the wearable device, executing a step of detecting whether the hand of the user generates the three-dimensional gesture action.

Through this embodiment, the wearable equipment that can connect triggers gesture control to improve the rate of accuracy of control, avoid the maloperation.

Further, if the wearable device is worn on the hand of the user, whether a connection request initiated by the wearable device is received or not is detected, if the connection request is received, the connection between the intelligent sound box and the wearable device is realized, and the step of detecting whether the hand of the user generates the three-dimensional gesture action or not is executed. In the embodiment, the wearable device actively initiates the connection, so that the accurate judgment on the gesture control intelligent sound box can be further improved.

304. The first module of making a video recording of control shoots user's hand and obtains first gesture image to and the second of control intelligent audio amplifier module of making a video recording shoots user's hand and obtains the second gesture image.

The second of control smart speaker module of making a video recording shoots user's hand and obtains the second gesture image and can include: the intelligent sound box detects whether the second camera module is located at the shooting position, and when the second camera module is located at the shooting position, the second camera module shoots the hand of the user to obtain a second gesture image; when the second camera module is not located at the shooting position, the second camera module is controlled to slide to the shooting position from the recovery position, and the second hand images are obtained by shooting the hands of the user through the second camera module. Wherein, retrieve the position and refer to when not using the second module of making a video recording for retrieve the position of placing the second module of making a video recording, wherein, retrieve the position and can be one and accomodate the chamber that the position that is close the bottom surface on intelligent audio amplifier side set up, when needs were shot, the control second module of making a video recording slides and stretches out the intelligent audio amplifier outside from accomodating the chamber, obtains the shooting position.

305. The first gesture image is recognized to obtain first gesture information, and the second gesture image is recognized to obtain second gesture information.

306. And obtaining target gesture information according to the first gesture information and the second gesture information.

307. And controlling the intelligent sound box to execute corresponding operation of the target gesture information.

In an optional implementation mode, if the hand of the user is detected, the display device connected with the intelligent sound box is turned on;

further, after target gesture information is obtained according to the first gesture information and the second gesture information, a target virtual gesture is generated according to the target gesture information; displaying the target virtual gesture on the display device.

It is understood that, in the above embodiment, the target virtual gesture obtained according to the first gesture information and the second gesture information can be output and displayed on the display device to prompt the user to enable the gesture control of the smart sound box through the gesture action.

Further, in the above embodiment, the user may be further prompted by voice that the gesture control function is turned on.

According to the embodiment, the human body detection is carried out through the intelligent sound box, if the human body is detected, the hand of the user is further detected, after the hand of the user is detected, whether the hand of the user generates the three-dimensional gesture action is detected, if the hand of the user generates the three-dimensional gesture action, the first camera module is controlled to shoot the hand of the user to obtain the first gesture image, the second camera module is controlled to shoot the hand of the user to obtain the second gesture image, the first gesture information is obtained by recognizing the first gesture image, the second gesture information is obtained by recognizing the second gesture image, the target gesture information is further obtained according to the first gesture information and the second gesture information, and finally the corresponding operation of the target gesture information is executed. Therefore, by implementing the embodiment of the invention, the first camera module and the second camera module of the intelligent sound box can be fully utilized to shoot the hand of the same user from different angles, so as to obtain the first gesture image and the second gesture image at different angles, and the more accurate target gesture information can be obtained according to the combination of the first gesture image and the second gesture image, thereby being beneficial to improving the accuracy of gesture motion recognition.

EXAMPLE III

Referring to fig. 4, fig. 4 is a schematic structural diagram of an intelligent sound box disclosed in an embodiment of the present invention; as shown in fig. 4, the smart speaker may include:

the first detection unit 401 is configured to detect a hand of a user by using a first camera module of the smart speaker;

a first control unit 402, configured to control the first camera module to capture the hand of the user to obtain a first gesture image and control the second camera module of the smart speaker to capture the hand of the user to obtain a second gesture image if the first detection unit 401 detects the hand of the user;

a first recognition unit 403 for recognizing the first gesture image to obtain first gesture information and recognizing the second gesture image to obtain second gesture information;

an information obtaining unit 404, configured to obtain target gesture information according to the first gesture information and the second gesture information;

and a second control unit 405, configured to control the smart sound box to perform a corresponding operation of the target gesture information.

As an optional implementation manner, the first detection unit 401 is configured to start the first camera module before detecting the hand of the user by using the first camera module of the smart speaker, and detect whether the first camera module starts the multi-angle scanning mode; if the multi-angle scanning mode is started, the multi-angle scanning mode is closed, and the shooting angle of the first camera module is adjusted to be a preset shooting angle.

The first control unit 402 is specifically configured to, if the hand of the user is detected, adjust the shooting angle of the first camera module to a first shooting angle, adjust the shooting angle of the second camera module to a second shooting angle, control the first camera module to shoot the hand of the user based on the first shooting angle to obtain a first gesture image, and control the second camera module to shoot the hand of the user based on the second shooting angle to obtain a second gesture image.

Implement above-mentioned intelligent audio amplifier, the first module of making a video recording and the second module of making a video recording that can make full use of intelligent audio amplifier shoot same user's hand from different angles, obtain first gesture image and second gesture image under the different angles, according to combining first gesture image and second gesture image, obtain more accurate target gesture information, are favorable to improving the accuracy of gesture action recognition.

Example four

Referring to fig. 5, fig. 5 is a schematic structural diagram of an intelligent sound box according to another embodiment of the present invention; the smart sound box shown in fig. 5 is obtained by performing optimization on the basis of the smart sound box shown in fig. 4, and the smart sound box shown in fig. 5 further includes:

a second detecting unit 501, configured to determine whether a human body is detected before the first detecting unit 401 detects the hand of the user by using the first camera module of the smart speaker;

the first detecting unit 401 is specifically configured to detect a hand of a user by using the first camera module of the smart speaker when the second detecting unit 501 detects a human body.

In some optional embodiments, the specific manner for determining whether the human body is detected by the second detecting unit 501 includes: the first camera module or the second camera module of the intelligent sound box is used for carrying out video monitoring to judge whether a human body appears in a preset monitoring range, if so, the human body is determined to be detected, and if not, the human body is determined not to be detected.

In some implementations, the smart speaker shown in fig. 5 further includes:

a third detecting unit 502, configured to detect whether a three-dimensional gesture action occurs on the user's hand before the first detecting unit 401 detects the user's hand, and the first controlling unit 402 controls the first camera module to capture the user's hand to obtain a first gesture image, and controls the second camera module of the smart speaker to capture the user's hand to obtain a second gesture image;

the first control unit 402 is specifically configured to control the first camera module to shoot the hand of the user to obtain a first gesture image if the third detection unit 502 detects that the hand of the user generates a three-dimensional gesture motion, and control the second camera module of the smart sound box to shoot the hand of the user to obtain a second gesture image.

In other implementations, the smart speaker shown in fig. 5 further includes:

a fourth detecting unit 503, configured to detect whether the wearable device is worn on the hand of the user before the first detecting unit 401 detects the hand of the user and the third detecting unit 502 detects whether the three-dimensional gesture motion occurs on the hand of the user;

a connection determining unit 504, configured to determine whether the smart speaker is connected to the wearable device if the fourth detecting unit 503 detects that the wearable device is worn by the hand of the user;

the third detecting unit 502 is specifically configured to detect whether the hand of the user has a three-dimensional gesture if the connection determining unit 504 determines that the smart speaker is connected to the wearable device.

EXAMPLE five

Referring to fig. 6, fig. 6 is a schematic structural diagram of an intelligent sound box according to another embodiment of the present invention; the smart sound box shown in fig. 6 is obtained by performing optimization on the basis of the smart sound box shown in fig. 4, and the smart sound box shown in fig. 6 further includes:

a third control unit 601, configured to turn on a display device connected to the smart speaker if the first detection unit 401 detects a hand of a user;

a display unit 602, configured to generate a target virtual gesture according to the target gesture information after the information obtaining unit 404 obtains the target gesture information according to the first gesture information and the second gesture information; and displaying the target virtual gesture on a display device.

EXAMPLE six

Referring to fig. 7, fig. 7 is a schematic structural diagram of an intelligent sound box according to another embodiment of the present invention; the smart speaker shown in fig. 7 may include: at least one processor 710, such as a CPU, a communication bus 730 is used to enable communication connections between these components. The memory 720 may be a high-speed RAM memory or a non-volatile memory (e.g., at least one disk memory). The memory 720 may optionally be at least one memory device located remotely from the processor 710. Wherein, the processor 710 may be combined with the smart sound box described in fig. 4 to fig. 6, a set of program codes is stored in the memory 710, and the processor 710 calls the program codes stored in the memory 720 to perform the following operations:

detecting the hand of a user by utilizing a first camera module of the intelligent sound box;

if the hands of the user are detected, controlling a first camera module to shoot the hands of the user to obtain a first gesture image, and controlling a second camera module of the intelligent sound box to shoot the hands of the user to obtain a second gesture image;

recognizing a first gesture image to obtain first gesture information, and recognizing a second gesture image to obtain second gesture information;

and controlling the intelligent sound box to execute corresponding operation of the target gesture information.

In some implementations, the processor 710 is further configured to perform the following steps:

before a first camera module of the intelligent sound box is used for detecting the hands of a user, judging whether a human body is detected or not;

and then if detect the human body, utilize the first module of making a video recording of intelligent audio amplifier detects user's hand.

if the user hand is detected, detecting whether the user hand generates a three-dimensional gesture action;

if the hand of the user is detected, detecting whether the hand of the user is worn with wearable equipment;

The embodiment of the invention also discloses a computer readable storage medium which stores a computer program, wherein the computer program enables a computer to execute the intelligent sound box control method based on three-dimensional gesture motion recognition disclosed in the figures 2 to 3.

An embodiment of the present invention further discloses a computer program product, which, when running on a computer, causes the computer to execute part or all of the steps of any one of the methods disclosed in fig. 2 to 3.

An embodiment of the present invention further discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where when the computer program product runs on a computer, the computer is enabled to execute part or all of the steps of any one of the methods disclosed in fig. 2 to fig. 3.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by instructions associated with a program, which may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), compact disc-Read-Only Memory (CD-ROM), or other Memory, magnetic disk, magnetic tape, or magnetic tape, Or any other medium which can be used to carry or store data and which can be read by a computer.

The intelligent sound box control method based on three-dimensional gesture motion recognition and the intelligent sound box disclosed by the embodiment of the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A control method of an intelligent sound box based on three-dimensional gesture motion recognition is characterized by comprising the following steps:

2. The method of claim 1, wherein prior to detecting a user's hand using the first camera module of the smart speaker, the method further comprises:

judging whether a human body is detected;

3. The method according to claim 1 or 2, wherein before the detecting the user's hand, and controlling the first camera module to capture the user's hand to obtain a first gesture image, and controlling the second camera module of the smart speaker to capture the user's hand to obtain a second gesture image, the method further comprises:

4. The method of claim 3, wherein before detecting the user's hand and detecting whether the user's hand has a three-dimensional gesture, the method further comprises:

detecting whether a wearable device is worn on the hand of the user;

5. The method of claim 1, further comprising:

displaying the target virtual gesture on the display device.

6. A smart sound box, comprising:

7. The smart sound box of claim 6, further comprising:

8. The smart sound box of claim 6 or 7, further comprising:

9. The smart sound box of claim 8, further comprising:

10. The smart sound box of claim 6, further comprising: