CN109886070A

CN109886070A - A kind of apparatus control method, device, storage medium and equipment

Info

Publication number: CN109886070A
Application number: CN201811584166.7A
Authority: CN
Inventors: 汪进; 刘健军; 毛跃辉; 廖湖锋; 张新; 文皓
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2018-12-24
Filing date: 2018-12-24
Publication date: 2019-06-14

Abstract

The present invention provides a kind of apparatus control method, device, storage medium and equipment, which comprises when receiving the voice command of user, the Sounnd source direction of institute's speech commands is identified, with the direction where the determination user；It drives the camera of the equipment to turn to the direction where the user, and acquires the continuous N frame images of gestures of the dynamic gesture of the user；Gesture identification is carried out to the continuous N frame images of gestures of acquisition, to identify the gesture meaning of the dynamic gesture, to control the equipment according to the gesture meaning.Scheme provided by the invention can expand the visual field of camera.

Description

A kind of apparatus control method, device, storage medium and equipment

Technical field

The present invention relates to control field more particularly to a kind of apparatus control method, device, storage medium and equipment.

Background technique

Currently, the visual field of camera has certain range on air-conditioning, user needs to be in and take the photograph when leading to carry out gesture control As head field range in, otherwise camera can not collect the gesture motion of user.On the other hand, existing gesture identification one As be all the meaning for acquiring static gesture and sorting out the expression of each gesture again, user experience is bad.

Summary of the invention

It is a primary object of the present invention to overcome the defect of the above-mentioned prior art, provide a kind of apparatus control method, device, Storage medium and equipment have certain range to solve the visual field of camera in the prior art, and carrying out user when gesture control needs The problem that be in camera field range.

One aspect of the present invention provides a kind of apparatus control method, comprising: when receiving the voice command of user, identification The Sounnd source direction of institute's speech commands, with the direction where the determination user；It drives described in the camera steering of the equipment Direction where user, and acquire the continuous N frame images of gestures of the dynamic gesture of the user；To the continuous N frame of acquisition Images of gestures carries out gesture identification, to identify the gesture meaning of the dynamic gesture, to be controlled according to the gesture meaning The equipment.

Optionally, gesture identification is carried out to the continuous N frame images of gestures of acquisition, to identify the dynamic gesture Gesture meaning, comprising: the gesture area of the dynamic gesture is identified in every frame images of gestures of the continuous N frame images of gestures Domain, and default processing is carried out based on the gesture area and obtains the gesture area image of every frame images of gestures；To every frame hand The gesture area image of gesture image carries out duplex channel process of convolution, to obtain the gesture area of the continuous N frame images of gestures The continuous N frame characteristic pattern of image；Process of convolution is carried out to the continuous N frame characteristic pattern using 3D convolutional neural networks model to obtain The body dynamics information of the gesture area；Utilize the motion feature of preset Recognition with Recurrent Neural Network model and the gesture area Information identifies the continuous N frame characteristic pattern, obtains the gesture meaning of the dynamic gesture.

Optionally, default processing is carried out based on the gesture area and obtains the gesture area image of every frame images of gestures, wrapped It includes: other image-regions in every frame images of gestures in addition to the gesture area is carried out at virtualization processing or removal Reason, to obtain the gesture area image of every frame images of gestures.

Optionally, duplex channel process of convolution is carried out to the gesture area image of every frame images of gestures, to obtain State the continuous N frame characteristic pattern of the gesture area image of continuous N frame images of gestures, comprising: to the gesture of every frame images of gestures The color image and depth image of area image carry out process of convolution respectively, obtain color property figure and depth characteristic figure；By institute It states color property figure and depth characteristic figure merges, obtain the characteristic pattern of the gesture area image of every frame images of gestures.

Optionally, the equipment, comprising: air-conditioning, the method, further includes: direction where determining the user it Afterwards, the air-out direction of the air-conditioning according to the direction controlling where the user.

Another aspect of the present invention provides a kind of plant control unit, comprising: voice recognition unit receives use for working as When the voice command at family, the Sounnd source direction of institute's speech commands is identified, with the direction where the determination user；Driving unit, Camera for driving the equipment turns to the direction where the user；Image acquisition units, for acquiring the user Dynamic gesture continuous N frame images of gestures；Image identification unit, it is described continuous for being acquired to described image acquisition unit N frame images of gestures carries out gesture identification, to identify the gesture meaning of the dynamic gesture；Control unit, for according to The gesture meaning that image identification unit identifies controls the equipment.

Optionally, image identification unit, comprising: gesture area recognition unit, in the continuous N frame images of gestures The gesture area of the dynamic gesture is identified in every frame images of gestures, and default processing is carried out based on the gesture area and is obtained The gesture area image of every frame images of gestures；Duplex channel processing unit, for the gesture area to every frame images of gestures Image carries out duplex channel process of convolution, to obtain the continuous N frame feature of the gesture area image of the continuous N frame images of gestures Figure；3D convolution processing unit is obtained for carrying out process of convolution to the continuous N frame characteristic pattern using 3D convolutional neural networks model To the body dynamics information of the gesture area；Gesture meaning recognition unit, for utilizing preset Recognition with Recurrent Neural Network model The continuous N frame characteristic pattern is identified with the body dynamics information of the gesture area, obtains the hand of the dynamic gesture Gesture meaning.

Optionally, gesture area recognition unit carries out default processing based on the gesture area and obtains every frame images of gestures Gesture area image, comprising: in every frame images of gestures in addition to the gesture area other image-regions carry out Virtualization processing or removal processing, to obtain the gesture area image of every frame images of gestures.

Optionally, the duplex channel processing unit carries out the gesture area image of every frame images of gestures two-way Channel process of convolution, to obtain the continuous N frame characteristic pattern of the gesture area image of the continuous N frame images of gestures, comprising: to institute The color image and depth image for stating the gesture area image of every frame images of gestures carry out process of convolution respectively, obtain color property Figure and depth characteristic figure；The color property figure and depth characteristic figure are merged, the hand of every frame images of gestures is obtained The characteristic pattern of gesture area image.

Optionally, the equipment, comprising: air-conditioning, described control unit are also used to: the side where determining the user To later, according to the air-out direction of air-conditioning described in the direction controlling where the user.

Another aspect of the invention provides a kind of storage medium, is stored thereon with computer program, and described program is processed The step of device realizes aforementioned any the method when executing.

Further aspect of the present invention provides a kind of equipment, including processor, memory and storage on a memory can be The step of computer program run on processor, the processor realizes aforementioned any the method when executing described program.

Further aspect of the present invention provides a kind of equipment, including aforementioned any plant control unit.

According to the technique and scheme of the present invention, by identify the Sounnd source direction that makes a sound of user to determine the direction of user, It drives camera to turn to according to the direction of user, camera visual field can be expanded, will mutually be handed between voice and image Mutually fusion improves user and interacts with air conditioner intelligent；The present invention carries out gesture area identification to images of gestures, and carries out default processing Gesture area image is obtained, by irrelevant information in virtualization or interception image, image data to be treated can be reduced；Using Duplex channel carries out process of convolution to the continuous N frame images of gestures of acquisition and obtains continuous N frame characteristic pattern, and utilizes 3D convolution mind Continuous N frame characteristic pattern is identified through network model, motion feature is obtained, to identify dynamic gesture meaning, Neng Gouti The discrimination of high gesture identification can be realized the identification of dynamic gesture, enhance gesture identification degree.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes a part of the invention, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is the method schematic diagram of an embodiment of apparatus control method provided by the invention；

Fig. 2 is that the continuous N frame images of gestures according to an embodiment of the present invention to acquisition carries out gesture identification, with identification The flow diagram of a specific embodiment of the step of gesture meaning of the dynamic gesture out；

Fig. 3 is that the gesture area image according to an embodiment of the present invention to every frame images of gestures carries out binary channels convolution The schematic diagram of processing；

Fig. 4 is that the images of gestures according to an embodiment of the present invention to continuous acquisition carries out gesture identification to be embodied The schematic diagram of mode；

Fig. 5 is the structural schematic diagram of an embodiment of plant control unit provided by the invention；

Fig. 6 is the structural schematic diagram of one specific embodiment of image identification unit according to an embodiment of the present invention.

Specific embodiment

To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with the specific embodiment of the invention and Technical solution of the present invention is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the present invention one Section Example, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall within the protection scope of the present invention.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

Fig. 1 is the method schematic diagram of an embodiment of apparatus control method provided by the invention.

As shown in Figure 1, according to one embodiment of present invention, the apparatus control method includes at least step S110, step Rapid S120 and step S130.

Step S110 identifies the Sounnd source direction of institute's speech commands, when receiving the voice command of user to determine State the direction where user.

Specifically, using the Sounnd source direction of deep learning voice module identification institute's speech commands, the Sounnd source direction is Issue user's direction of institute's speech commands, i.e., direction of the described user relative to the equipment.

Step S120 drives the camera of the equipment to turn to the direction where the user, and acquires the user's The continuous N frame images of gestures of dynamic gesture.

Specifically, the camera of the equipment is driven to turn to the direction where the user.To identify the dynamic of user Gesture then needs the continuous N frame images of gestures of the dynamic gesture of user described in continuous acquisition, by the continuous N frame hand Gesture image is identified, the gesture meaning of user's dynamic gesture is obtained.The quantity determination of N is the speed that a movement is done by user It determines, most importantly can recognize that the meaning of a gesture.

Step S130 carries out gesture identification to the continuous N frame images of gestures of acquisition, to identify the dynamic gesture Gesture meaning, to control the equipment according to the gesture meaning.

Fig. 2 is that the continuous N frame images of gestures according to an embodiment of the present invention to acquisition carries out gesture identification, with identification The flow diagram of a specific embodiment of the step of gesture meaning of the dynamic gesture out.As shown in Fig. 2, step S130 Including step S131, step S132, step S133 and step S134.

Step S131 identifies the hand of the dynamic gesture in every frame images of gestures of the continuous N frame images of gestures Gesture region, and default processing is carried out based on the gesture area and obtains the gesture area image of every frame images of gestures.

The gesture area of the dynamic gesture is the region of interest ROI in images of gestures, wherein to the continuous N Every frame images of gestures of frame images of gestures identifies identical area-of-interest, i.e., the described gesture area.Mould can be carried out in advance Type training generates the convolution feature identification model of gesture area, to identify the continuous N frame gesture figure according to the identification model The gesture area of every frame images of gestures as in.It, can be according to identifying after identifying the gesture area in every frame images of gestures The gesture area come carries out default processing and obtains the gesture area image of every frame images of gestures.Specifically, to every frame gesture Image-region (namely regions of non-interest) in image in addition to the gesture area carries out virtualization processing or removal processing, To obtain the gesture area image of every frame images of gestures.That is, blurring or removing the picture region of other irrelevant informations Domain can reduce the image data that need to be handled, to improve arithmetic speed.

Optionally, it when direction where driving the camera of the equipment to turn to the user, can obtain in real time described The camera attitude angle of equipment.In order to expand camera visual field, when user is not judged in camera visual field by sound User is in what direction, direction where driving camera to turn to user simultaneously by the direction of user, camera rotation Obtain camera attitude angle simultaneously, the effect of attitude angle be known camera position information and height determine user with it is described The distance of equipment determines that user can preferably determine gesture area size at a distance from equipment.

Step S132 carries out duplex channel process of convolution to the gesture area image of every frame images of gestures, to obtain The continuous N frame characteristic pattern of the gesture area image of the continuous N frame images of gestures.

Specifically, the color image and depth image of the gesture area image of every frame images of gestures are rolled up respectively Product processing, obtains color property figure and depth characteristic figure；The color property figure and depth characteristic figure are merged, institute is obtained The gesture area of the continuous N frame images of gestures finally can be obtained in the characteristic pattern for stating the gesture area image of every frame images of gestures The continuous N frame characteristic pattern of image.

Fig. 3 is that the gesture area image according to an embodiment of the present invention to every frame images of gestures carries out binary channels convolution The schematic diagram of processing.Refering to what is shown in Fig. 3, gesture area more features information is obtained using bidirectional picture channel, wherein colored Image indicates that original image, depth image are generally the gray level image of original image, need to be mapped to same as color image Location information；Color image obtains color property figure by several convolution process, and depth image also passes through several convolution process Obtain depth characteristic figure.It obtains color property figure and depth characteristic figure merges to obtain the spy of the gesture area image of present frame Sign figure.

Step S133 carries out process of convolution to the continuous N frame characteristic pattern using 3D convolutional neural networks model and obtains institute State the body dynamics information of gesture area.

Specifically, using 3D convolutional neural networks model to the company of the gesture area image of the continuous N frame images of gestures Continuous N frame characteristic pattern carries out 3D convolution and obtains the relevant informations such as gesture area motion information, such as the direction of motion.

Step S134, using the body dynamics information of preset Recognition with Recurrent Neural Network model and the gesture area to described Continuous N frame characteristic pattern is identified, the gesture meaning of the dynamic gesture is obtained.

Specifically, the body dynamics information of the continuous N frame characteristic pattern and the gesture area is input to preset RNN Recognition with Recurrent Neural Network model, continuous N frame characteristic pattern are contained by the gesture that dynamic gesture is got in the function region softmax in model Justice, for example, the semanteme of the gesture or the corresponding equipment control command of the gesture.

Above-mentioned steps can also refer to Fig. 4, and Fig. 4 is the images of gestures according to an embodiment of the present invention to continuous acquisition Gesture identification is carried out with the schematic diagram of specific embodiment.As shown in figure 4, consecutive image indicates the user of camera continuous acquisition The images of gestures of dynamic gesture, that is, the continuous multiple frames image acquired in image acquisition process, then by duplex channel model into Row process of convolution obtains the body dynamics information of gesture area using 3D_CNN convolution model, and is added to RNN circulation nerve In network model, the meaning information of user's dynamic gesture is got by the function region softmax in model.

In the gesture meaning for identifying the dynamic gesture, the equipment can be controlled according to the gesture meaning.It is described Equipment is specifically as follows electric appliance, such as air-conditioning.Optionally, the side when the equipment is air-conditioning, where determining the user To later, can the air-conditioning according to the direction controlling where the user air-out direction.

Fig. 5 is the structural schematic diagram of an embodiment of plant control unit provided by the invention.As shown in figure 5, described set Standby control device 100 includes: voice recognition unit 110, driving unit 120, image acquisition units 130, image identification unit 140 With control unit 150.

Voice recognition unit 110 is used for when receiving the voice command of user, identifies the sound source side of institute's speech commands To with the direction where the determination user；Driving unit 120 is used to drive the camera of the equipment to turn to the user institute Direction；Image acquisition units 130 are used to acquire the continuous N frame images of gestures of the dynamic gesture of the user；Image recognition The continuous N frame images of gestures that unit 140 is used to acquire described image acquisition unit carries out gesture identification, to identify State the gesture meaning of dynamic gesture；Control unit 150 is used for the gesture meaning identified according to described image recognition unit Control the equipment.

When receiving the voice command of user, voice recognition unit 110 identifies the Sounnd source direction of institute's speech commands, with Determine the direction where the user.Driving unit 120 drives the camera of the equipment to turn to the direction where the user Specifically, Sounnd source direction of the voice recognition unit 110 using deep learning voice module identification institute's speech commands, the sound source Direction is the user's direction for issuing institute's speech commands, i.e., direction of the described user relative to the equipment.Driving unit The camera of the 120 driving equipment turns to the direction where the user.

Image acquisition units 130 acquire the continuous N frame images of gestures of the dynamic gesture of the user.Specifically, to know The dynamic gesture of other user then needs the continuous N frame images of gestures of the dynamic gesture of user described in continuous acquisition, by institute It states continuous N frame images of gestures to be identified, obtains the gesture meaning of user's dynamic gesture.The quantity determination of N is to do one by user The speed of a movement determines, most importantly can recognize that the meaning of a gesture.

Image identification unit 140 carries out gesture knowledge to the continuous N frame images of gestures that image acquisition units 130 acquire Not, to identify the gesture meaning of the dynamic gesture, to control the equipment according to the gesture meaning.

Fig. 6 is the structural schematic diagram of one specific embodiment of image identification unit according to an embodiment of the present invention.Such as Fig. 6 institute Show, image identification unit 140 includes gesture area recognition unit 141, duplex channel processing unit 142,3D convolution processing unit 143 and gesture meaning recognition unit 144.

Gesture area recognition unit 141, for identifying institute in every frame images of gestures of the continuous N frame images of gestures The gesture area of dynamic gesture is stated, and default processing is carried out based on the gesture area and obtains the gesture area of every frame images of gestures Image.

The gesture area of the dynamic gesture is the region of interest ROI in images of gestures.Gesture area recognition unit Every frame images of gestures of 141 pairs of continuous N frame images of gestures identifies identical area-of-interest, i.e., the described gesture area. The convolution feature identification model that model training generates gesture area can be carried out in advance, thus gesture area recognition unit 141 The gesture area of every frame images of gestures in the continuous N frame images of gestures is identified according to the identification model.Identify every frame gesture figure After gesture area as in, default processing can be carried out according to the gesture area identified and obtains the hand of every frame images of gestures Gesture area image.Specifically, gesture area recognition unit 141 in every frame images of gestures in addition to the gesture area Image-region (namely regions of non-interest) carries out virtualization processing or removal processing, to obtain the hand of every frame images of gestures Gesture area image.That is, blurring or removing the picture region of other irrelevant informations, the picture number that need to be handled can be reduced According to improve arithmetic speed.

Duplex channel processing unit 142 is used to carry out duplex channel volume to the gesture area image of every frame images of gestures Product processing, to obtain the continuous N frame characteristic pattern of the gesture area image of the continuous N frame images of gestures.

Specifically, color image of the duplex channel processing unit 142 to the gesture area image of every frame images of gestures Process of convolution is carried out respectively with depth image, obtains color property figure and depth characteristic figure；By the color property figure and depth Characteristic pattern merges, and obtains the characteristic pattern of the gesture area image of every frame images of gestures, the continuous N finally can be obtained The continuous N frame characteristic pattern of the gesture area image of frame images of gestures.

3D convolution processing unit 143 is for rolling up the continuous N frame characteristic pattern using 3D convolutional neural networks model Product processing obtains the body dynamics information of the gesture area.

Specifically, 3D convolution processing unit 143 is using 3D convolutional neural networks model to the continuous N frame images of gestures The continuous N frame characteristic pattern of gesture area image carries out 3D convolution and obtains the correlations such as gesture area motion information, such as the direction of motion Information.

Gesture meaning recognition unit 144 is used for the movement using preset Recognition with Recurrent Neural Network model and the gesture area Characteristic information identifies the continuous N frame characteristic pattern, obtains the gesture meaning of the dynamic gesture.

Specifically, gesture meaning recognition unit 144 is by the motion feature of continuous the N frame characteristic pattern and the gesture area Information input to preset RNN Recognition with Recurrent Neural Network model, distinguished by the softmax function in model by continuous N frame characteristic pattern The gesture meaning of dynamic gesture is obtained, for example, the semanteme of the gesture or the corresponding equipment control command of the gesture.

Identify that the gesture meaning of the dynamic gesture, control unit 150 can bases in described image recognition unit 140 The gesture meaning controls the equipment.

Optionally, the equipment is specifically as follows electric appliance, such as air-conditioning.When the equipment is air-conditioning, in the voice After direction where the determining user of recognition unit 110, described control unit 150 can be according to the side where the user To the air-out direction for controlling the air-conditioning.

The present invention also provides a kind of storage mediums for corresponding to the apparatus control method, are stored thereon with computer journey Sequence, the step of aforementioned any the method is realized when described program is executed by processor.

The present invention also provides a kind of equipment for corresponding to the apparatus control method, including processor, memory and deposit The computer program that can be run on a processor on a memory is stored up, the processor is realized aforementioned any when executing described program The step of the method.

The present invention also provides a kind of equipment for corresponding to the plant control unit, including aforementioned any equipment control Device processed.

Accordingly, scheme provided by the invention, by identify the Sounnd source direction that makes a sound of user to determine the direction of user, It is driven camera to turn to according to the direction of user and obtains camera attitude angle in real time, and related Intelligent dialogue, Neng Goukuo Big camera visual field will carry out interacting fusion between voice and image, improves user and interacts with air conditioner intelligent；The present invention couple Images of gestures carries out gesture area identification, and carries out default processing and obtain gesture area image, by virtualization or interception image Irrelevant information can reduce image data to be treated；It is carried out using continuous N frame images of gestures of the duplex channel to acquisition Process of convolution obtains continuous N frame characteristic pattern, and is identified using 3D convolutional neural networks model to continuous N frame characteristic pattern, obtains To motion feature, to identify dynamic gesture meaning, it can be improved the discrimination of gesture identification, can be realized dynamic gesture Identification, enhances gesture identification degree.

Function described herein can be implemented in hardware, the software executed by processor, firmware or any combination thereof. If implemented in the software executed by processor, computer can be stored in using function as one or more orders or code It is transmitted on readable media or via computer-readable media.Other examples and embodiment are wanted in the present invention and appended right It asks in the scope and spirit of book.For example, due to the property of software, function described above can be used by processor, Hardware, firmware, hardwired or the software implementation for appointing the combination of whichever to execute in these.In addition, each functional unit can integrate In one processing unit, it is also possible to each unit to physically exist alone, can also be integrated in two or more units In one unit.

In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, and fill as control The component set may or may not be physical unit, it can and it is in one place, or may be distributed over multiple On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including several orders are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.

The above description is only an embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification, Equivalent replacement, improvement etc., should be included within scope of the presently claimed invention.

Claims

1. a kind of apparatus control method characterized by comprising

When receiving the voice command of user, the Sounnd source direction of institute's speech commands is identified, where the determination user Direction；

Drive the camera of the equipment to turn to the direction where the user, and acquire the user dynamic gesture it is continuous N frame images of gestures；

Gesture identification is carried out to the continuous N frame images of gestures of acquisition, to identify the gesture meaning of the dynamic gesture, from And the equipment is controlled according to the gesture meaning.

2. the method according to claim 1, wherein the continuous N frame images of gestures to acquisition carries out gesture Identification, to identify the gesture meaning of the dynamic gesture, comprising:

The gesture area of the dynamic gesture is identified in every frame images of gestures of the continuous N frame images of gestures, and is based on The gesture area carries out default processing and obtains the gesture area image of every frame images of gestures；

Duplex channel process of convolution is carried out to the gesture area image of every frame images of gestures, to obtain the continuous N frame hand The continuous N frame characteristic pattern of the gesture area image of gesture image；

Process of convolution is carried out to the continuous N frame characteristic pattern using 3D convolutional neural networks model and obtains the fortune of the gesture area Dynamic characteristic information；

Using the body dynamics information of preset Recognition with Recurrent Neural Network model and the gesture area to the continuous N frame characteristic pattern It is identified, obtains the gesture meaning of the dynamic gesture.

3. according to the method described in claim 2, it is characterized in that, carrying out default processing based on the gesture area obtains every frame The gesture area image of images of gestures, comprising:

Other image-regions in every frame images of gestures in addition to the gesture area are carried out at virtualization processing or removal Reason, to obtain the gesture area image of every frame images of gestures.

4. according to the method in claim 2 or 3, which is characterized in that the gesture area image of every frame images of gestures Duplex channel process of convolution is carried out, to obtain the continuous N frame characteristic pattern of the gesture area image of the continuous N frame images of gestures, Include:

Process of convolution is carried out to the color image and depth image of the gesture area image of every frame images of gestures respectively, is obtained Color property figure and depth characteristic figure；

The color property figure and depth characteristic figure are merged, the gesture area image of every frame images of gestures is obtained Characteristic pattern.

5. method according to claim 1-4, which is characterized in that the equipment, comprising: air-conditioning, the method, Further include:

After determining the direction where the user, according to the outlet air side of air-conditioning described in the direction controlling where the user To.

6. a kind of plant control unit characterized by comprising

Voice recognition unit, for the Sounnd source direction of institute's speech commands being identified, with true when receiving the voice command of user Direction where the fixed user；

Driving unit, for the direction where driving the camera of the equipment to turn to the user；

Image acquisition units, the continuous N frame images of gestures of the dynamic gesture for acquiring the user；

Image identification unit, for carrying out gesture identification to the continuous N frame images of gestures that described image acquisition unit acquires, To identify the gesture meaning of the dynamic gesture；

Control unit, the gesture meaning for being identified according to described image recognition unit control the equipment.

7. device according to claim 6, which is characterized in that image identification unit, comprising:

Gesture area recognition unit, for identifying the dynamic in every frame images of gestures of the continuous N frame images of gestures The gesture area of gesture, and default processing is carried out based on the gesture area and obtains the gesture area image of every frame images of gestures；

Duplex channel processing unit carries out at duplex channel convolution for the gesture area image to every frame images of gestures Reason, to obtain the continuous N frame characteristic pattern of the gesture area image of the continuous N frame images of gestures；

3D convolution processing unit, for carrying out process of convolution to the continuous N frame characteristic pattern using 3D convolutional neural networks model Obtain the body dynamics information of the gesture area；

Gesture meaning recognition unit, for being believed using the motion feature of preset Recognition with Recurrent Neural Network model and the gesture area Breath identifies the continuous N frame characteristic pattern, obtains the gesture meaning of the dynamic gesture.

8. device according to claim 7, which is characterized in that gesture area recognition unit, based on the gesture area into The default processing of row obtains the gesture area image of every frame images of gestures, comprising:

9. device according to claim 7 or 8, which is characterized in that the duplex channel processing unit, to every frame hand The gesture area image of gesture image carries out duplex channel process of convolution, to obtain the gesture area of the continuous N frame images of gestures The continuous N frame characteristic pattern of image, comprising:

10. according to the described in any item devices of claim 6-9, which is characterized in that the equipment, comprising: air-conditioning, the control Unit is also used to:

11. a kind of storage medium, which is characterized in that it is stored thereon with computer program, it is real when described program is executed by processor The step of existing claim 1-5 any the method.

12. a kind of equipment, which is characterized in that can be transported on a processor on a memory including processor, memory and storage The step of capable computer program, the processor realizes claim 1-5 any the method when executing described program.

13. a kind of equipment, which is characterized in that including the plant control unit as described in claim 6-10 is any.