CN106547356A

CN106547356A - Intelligent interactive method and device

Info

Publication number: CN106547356A
Application number: CN201611025898.3A
Authority: CN
Inventors: 王天; 王天一; 刘聪; 王智国; 胡国平; 胡郁
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2016-11-17
Filing date: 2016-11-17
Publication date: 2017-03-29
Anticipated expiration: 2036-11-17
Also published as: CN106547356B

Abstract

The application proposes a kind of intelligent interactive method and device, and the intelligent interactive method includes：User's hand motion image is obtained, user's hand motion image is obtained after shooting to user's hand motion；According to user's hand motion image, the corresponding class of operation of user's hand motion is determined；According to the class of operation, user's hand motion is responded.The method can be realized man-machine efficiently interacting naturally.

Description

Intelligent interactive method and device

Technical field

The application is related to human-computer interaction technique field, more particularly to a kind of intelligent interactive method and device.

Background technology

Increasingly mature with artificial intelligence correlation technique, the life of people starts to move towards intelligent, such as various intelligent families Residence progressed into usually other, such as various augmented reality equipment are had started to into practical, so that man-machine friendship Mutually also day by day usually it is and required, during man-machine interaction, most it is concerned with whether people naturally can be interacted with machine by user, or even The degree interacted with people can be reached；Therefore, increasing technical staff begins one's study and how efficiently to realize naturally The process of man-machine interaction.

In correlation technique, when user carries out intelligent interaction using hand and machine, it is necessary first in hand usage record equipment, Such as writing pencil, hand-written fingerstall etc.；Then the two dimension or three-dimensional coordinate data of user's hand motion are gathered according to recording equipment；Root again The action of user's hand or the movement locus of hand are identified according to the hand data of collection, to determine the operation of user, by System provides the response results of corresponding operating.

But, above-mentioned intelligent interaction mode does not simultaneously meet natural interaction custom, and the data for being also easily caused collection are inaccurate, Cause interaction effect undesirable.

The content of the invention

The application is intended at least to solve to a certain extent one of technical problem in correlation technique.

For this purpose, a purpose of the application is to propose a kind of intelligent interactive method, the method can realize man-machine nature Efficient interaction.

Further object is to propose a kind of intelligent interaction device.

To reach above-mentioned purpose, the intelligent interactive method that the application first aspect embodiment is proposed, including：Obtain user's hand Portion's motion images, user's hand motion image are obtained after shooting to user's hand motion；According to the user Hand motion image, determines the corresponding class of operation of user's hand motion；According to the class of operation, to user's hand Portion's action is responded.

The intelligent interactive method that the application first aspect embodiment is proposed, is grasped by being determined according to user's hand motion image Make classification, and respective response is carried out according to class of operation, it may not be necessary to Special Equipment is worn in user's hand, so as to meet Natural interaction is accustomed to, and the degree of accuracy of the data of collection can also be improved by the process to image, man-machine naturally high so as to realize The interaction of effect.

To reach above-mentioned purpose, the intelligent interaction device that the application second aspect embodiment is proposed, including：Acquisition module, User's hand motion image is obtained, user's hand motion image is obtained after shooting to user's hand motion；Really Cover half block, for according to user's hand motion image, determining the corresponding class of operation of user's hand motion；Response mould Block, for according to the class of operation, responding to user's hand motion.

The intelligent interaction device that the application second aspect embodiment is proposed, is grasped by being determined according to user's hand motion image Make classification, and respective response is carried out according to class of operation, it may not be necessary to Special Equipment is worn in user's hand, so as to meet Natural interaction is accustomed to, and the degree of accuracy of the data of collection can also be improved by the process to image, man-machine naturally high so as to realize The interaction of effect.

The aspect and advantage that the application is added will be set forth in part in the description, and partly will become from the following description Obtain substantially, or recognized by the practice of the application.

Description of the drawings

The above-mentioned and/or additional aspect of the application and advantage will become from the following description of the accompanying drawings of embodiments It is substantially and easy to understand, wherein：

Fig. 1 is the schematic flow sheet of the intelligent interactive method that the application one embodiment is proposed；

Fig. 2 is the schematic flow sheet of the intelligent interactive method that the application another embodiment is proposed；

Fig. 3 is a kind of network topology structure schematic diagram of user's hand motion recognition model in the embodiment of the present application；

Fig. 4 is the schematic diagram of one group of user's hand motion image in the embodiment of the present application；

Fig. 5 is a kind of schematic diagram of the image for showing skin area in the embodiment of the present application；

Fig. 6 is a kind of schematic diagram of the image for showing hand region in the embodiment of the present application；

The schematic diagram of corresponding user's hand motion when Fig. 7 is single-click operation in the embodiment of the present application；

Fig. 8 is the schematic diagram of corresponding user's hand motion when choosing text maninulation in the embodiment of the present application；

Fig. 9 is various schematic diagrames for operating the corresponding user's hand motion of difference in the embodiment of the present application；

Figure 10 is the structural representation of the intelligent interaction device that the application one embodiment is proposed；

Figure 11 is the structural representation of the intelligent interaction device that the application another embodiment is proposed.

Specific embodiment

Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from start to finish Same or similar label represents same or similar module or the module with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the application, and it is not intended that restriction to the application.Conversely, this The embodiment of application includes all changes fallen in the range of the spirit and intension of attached claims, modification and is equal to Thing.

Fig. 1 is the schematic flow sheet of the intelligent interactive method that the application one embodiment is proposed.

As shown in figure 1, the method for the present embodiment includes：

S11：User's hand motion image is obtained, user's hand motion image is that user's hand motion is shot Obtain afterwards.

User's hand motion refers to the hand motion that the track of user's hand movement, activity of finger etc. are combined into, institute State hand motion to be generally used for operating screen or aerial display content, the display content can be displayed in screen or The contents such as aerial text, image, application program, concrete this case are not construed as limiting.

The hand motion can be the single-handed exercise of user, or the both hands of user or many works manually, work as reception When making manually more, there are multiple users to participate in interaction；The hand motion is such as clenched fist, opens palm, stretches out the actions such as forefinger.

It should be noted that user's hand motion can be realized by hand or hands even many hands, it is same Operation can be completed using one or more hand motions, concrete hand motion can also be determined according to application demand it is various, not It is limited to hand motion described in this case.

For example, camera or video camera are set on smart machine, user's hand motion are entered by camera or video camera Row shooting obtains user's hand motion image, and processing system can get user's hand motion figure from camera or video camera Picture.When shooting to user's hand motion, can shoot and obtain a frame or multiple image, such as utilize camera or video camera After being continuously shot to user's hand, continuous multiple frames user's hand motion image is obtained, when specifically shooting, is generally used and is carried The video camera of RGBD sensors is shot, and directly obtains the RGBD data of user's hand motion, i.e., including color data RGB and Depth data D, such that it is able to directly obtain the RGB image and depth map of user's hand motion.Rgb color pattern is industrial quarters A kind of color standard, is by red (R), green (G), the change of blue (B) three Color Channels and their each other folded Calais obtains color miscellaneous, and RGB is the color for representing three passages of red, green, blue, and this standard is almost included The all colours that human eyesight can perceive are at present with one of most wide color system.In scene, each point is relative to shooting The distance of machine can be represented with depth map (depth map), i.e., each pixel value in depth map represents a certain in scene The distance between point and video camera.

It should be noted that user is when hand motion is carried out, can be carried out with naked hand, i.e., need not be worn specially in hand Recording equipment.And, user can be specifically referred to contactless when operating to screen or the aerial content for showing Operation, i.e., and do not need user's contact screen can just complete to display content operation.

It is understood that the embodiment of the present application is by taking hand motion as an example, but, what other body parts of user were carried out Operation, such as headwork, arm action etc. can also be performed according to the mode of hand motion, therefore belong to the embodiment of the present application Equivalent implementations.

S12：According to user's hand motion image, the corresponding class of operation of user's hand motion is determined.

The corresponding class of operation of user's hand motion refers to user on screen or the class that operated of aerial display content Not, such as mobile cursor, crawl content, drag content, release content, it is hand-written, click.

It is determined that during the class of operation, for example, first identifying user's hand motion classification and hand according to described image Key point position, further according to the user's hand motion classification and key point position that identify, determines the class of operation.Specifically Content may refer to subsequent descriptions.

S13：According to the class of operation, user's hand motion is responded.

System can carry out respective response according to the response mode of every kind of class of operation set in advance.

When such as current operation classification is hand-written operation, after system determines the class of operation of user, hand-written pattern is switched to, with The handwritten content of receive user carries out corresponding handwriting recognition, and recognition result is shown；

During such as current operation classification to click, after system determines the class of operation of user, system is provided according to the gesture Response results, such as user click screen or the aerial application program for showing, perform corresponding operating.

It is understood that before being responded to user's hand motion according to class of operation, can also determine whether full Foot is pre-conditioned, when meeting pre-conditioned, to respond to user's hand motion further according to class of operation；It is being unsatisfactory for Do not responded when pre-conditioned.It is pre-conditioned for example to include：Response user's hand motion function is currently had turned on, and operates class Do not belong to itself and support classification of response etc..

In the present embodiment, by determining class of operation according to user's hand motion image, and carried out according to class of operation Respective response, it may not be necessary to wear Special Equipment in user's hand, so as to meet natural interaction custom, by the place to image Reason can also improve the degree of accuracy of the data of collection, so as to realize man-machine efficiently interacting naturally.

Fig. 2 is the schematic flow sheet of the intelligent interactive method that the application another embodiment is proposed.

As shown in Fig. 2 the method for the present embodiment includes：

S21：Build user's hand motion recognition model.

Can specifically include：

(1) obtain training data.

Wherein, every group of training data includes input data and output data, and in the present embodiment, input data includes：User The depth image of the RGB image of hand region and corresponding user's hand region, output data include：User's hand of mark is moved Make classification and key point position, be typically based on domain expert's mark and obtain.

Specifically, substantial amounts of user's hand motion image can be first collected, collection mode is as using with RGBD sensors Video camera user's hand motion is shot, such that it is able to obtain substantial amounts of mutually corresponding RGB image and depth image. Again the RGB image and depth image in every group of user's hand motion image is split respectively, user's hand region is obtained The depth image of RGB image and user's hand region.Concrete partitioning scheme may refer to described below.And, per group of use of correspondence Family hand motion image labeling user's hand motion classification and key point position.

The activity determination of movement or finger of user's hand motion classification according to user's hand region, such as hand motion classification Including：Clench fist, open palm or stretch out forefinger etc..

It is understood that hand motion classification can be default according to application demand, however it is not limited to above-mentioned example.

The position of key point can be chosen according to application demand etc., such as, by the position of central point of clenching fist, or by forefinger Position as key point position.

(2) determine the structure of user's hand motion recognition model.

Model structure can be arranged according to demand, and the present embodiment is by taking deep neural network structure as an example.

Fig. 3 gives a kind of network topology structure schematic diagram of user's hand motion recognition model.As shown in figure 3, the mould Type includes：Input layer, eigentransformation layer, full articulamentum and output layer.

Input layer is input into the RGB image of user's hand region and corresponding depth image respectively；Eigentransformation layer is right respectively The RGB image and depth image of input carries out eigentransformation, respectively obtains characteristics of image and depth after the conversion of user's hand region Degree feature, the eigentransformation layer are generally convolutional neural networks structure, per layer of change of concrete transform method and convolutional neural networks Change method identical；Change of the characteristics of image and depth characteristic after again convert user's hand region through a full articulamentum Output layer is input to after changing, by output layer export active user's hand motion image belong to every kind of hand motion classification probability and The key point position of active user's hand motion.

(3) it is trained based on the training data and the structure, builds user's hand motion recognition model.

Such as, the input data in training data, is obtained as mode input after with the computing of each layer parameter of model Export the probability and active user's hand for belonging to every kind of hand motion classification as active user's hand motion image to move to model The key point position of work, using probability highest user's hand motion classification as the user's hand motion classification for predicting, will be pre- The user's hand motion classification measured and the key point position of user's hand motion, then will be defeated in training data used as predicted value Go out data as actual value, loss function is worth to according to actual value and prediction, by minimizing loss function, mould can be obtained The each layer parameter of type, obtains model so as to train.Specific model training mode may refer to skill that is various existing or occurring in the future Art, will not be described in detail herein.

S22：Obtain user's hand motion image.

For example, receive the user's hand motion image sent by camera or video camera, camera or video camera can be After user produces hand motion, user's hand motion is shot, user's hand motion image is obtained.

Wherein, when shooting to user's hand motion, the video camera with RGBD sensors can be adopted to be connected Continuous to shoot, so as to obtain continuous multigroup user's hand motion image, every group of user's hand motion image includes a frame RGB image With a frame depth image.

Fig. 4 gives one group of user's hand motion image, including mutual corresponding frame RGB image and a frame depth map Picture.Wherein, it is RGB image on the left of Fig. 4, right side is depth image.It should be noted that accompanying drawing requirement is limited to, in Fig. 4 RGB image is by taking gray level image as an example, but in actual enforcement, RGB image is with coloured coloured image.

S23：Determine the user's hand region in user's hand motion image, and according to user's hand region pair User's hand motion image is split, and obtains user's hand region image.

It is determined that during user's hand region, can include：

According to the RGB image, the skin area in user's hand motion image is determined；

Pixel in the skin area is clustered, different skin region is obtained；

According to depth image, the corresponding depth value in different skin region is obtained, and according to the depth value, determined described User's hand region in user's hand motion image.

When determining skin area, RGB image can be converted to into CrCb images first, conversion regime can be incited somebody to action using existing Rgb space is mapped to the mode in CrCb spaces；Again by the skin mask in default CrCb spaces and the CrCb images carry out with Operation, determines the skin area in user's hand motion image.Skin mask can be by collecting a large amount of skins in advance Picture construction is obtained, the skin image such as hand skin image.As shown in figure 5, the RGB image for giving Fig. 4 is corresponding aobvious The image of skin area is shown.It is understood that for the ease of image procossing, bianry image can be converted the image into, its In, the pixel in skin area is represented with white, and the pixel in non-skin region is represented with black.

After skin area is determined, the pixel in skin area can be clustered, obtain different skin region. Cluster mode is not limited, and is such as clustered using k-means modes.

After different skin region is obtained, the corresponding depth value in different skin region can be obtained according to depth image, and According to the depth value, the user's hand region in user's hand motion image is determined.For example, by different skin region Cluster centre point corresponding to depth image in pixel depth value, as the corresponding depth value of corresponding skin area or By the mean value of the depth value in all pixels point correspondence depth image in cluster, as the corresponding depth of corresponding skin area Value, such that it is able to obtain the corresponding depth value of each skin area.As hand is usually located at the front of body, therefore, it can The minimum skin area of depth value is defined as into user's hand region.As shown in fig. 6, giving from the skin area of Fig. 5 really The schematic diagram of the user's hand region made.

After user's hand region is determined, the original user's hand motion image for obtaining that shoots can be split, Obtain user's hand region image.For example, when user's hand motion image includes RGB image and depth image, then directly clapping The image that user's hand region is partitioned in the RGB image and depth image for obtaining is taken the photograph, the RGB image of user's hand region is obtained With the depth image of user's hand region.

S24：The user's hand motion recognition model built according to user's hand region image and in advance, identifies institute State the corresponding user's hand motion classification of user's hand motion image and key point position.

Assume that user's hand region image includes：The depth map of the RGB image and user's hand region of user's hand region Picture, then corresponding with the model structure shown in Fig. 3, the RGB image and depth image difference of user's hand region that segmentation is obtained As the input of user's hand motion recognition model, model output is obtained, the model output includes：Every kind of user's hand motion The probability of classification and key point position；Move probability highest user's hand motion classification as the user's hand for identifying afterwards Make classification, the key point position that model is exported is used as the key point position identified.

S25：According to the corresponding user's hand motion classification of continuous multigroup user's hand motion image and key point position, really Determine the corresponding class of operation of user's hand motion.

As hand motion easy to use is relatively limited, therefore some user operation classifications are needed with reference to multiple user's hands Action realization, such as single-click operation, need palm first to open and clench fist again.Fig. 7 gives user's hand motion corresponding to single-click operation Schematic diagram, as shown in fig. 7, the corresponding user's hand motion of single-click operation include on the left of Fig. 7 shown in palm open and Fig. 7 on the right side of Shown clenches fist.Therefore, in order to determine the class of operation of user's hand motion, need first to obtain multigroup user's hand motion figure Picture, determines that the corresponding user's hand motion classification of every group of user's hand motion image and key point are postponed, then determines user's Class of operation.How many groups of user's hand motion images are specifically obtained, can be determined according to application demand, such as continuously acquire 15 groups of use Family hand motion image.

During the class of operation of concrete determination user, can be according to the course of action of predefined every kind of class of operation, pass The position of key point and it is current in screen or the aerial content for showing determining.

Such as current screen or the interface for user's Query Information for showing in the air, user have been input into respective queries information, The user action classification for now identifying first is opened for palm and is clenched fist again, that is, in the multigroup user's hand motion image for obtaining, front Image corresponding user's hand motion classification in several groups of face is opened for palm, behind the corresponding user's hand motion classification of several groups of images To clench fist, then can be determined that the class of operation of user to click；

As current screen or it is aerial show for multiline text content, multigroup user's hand motion image of acquisition is corresponding User's hand motion classification is all clenched fist, and the position of user's hand key point is moved in the same direction, then can determine user Class of operation continuously to choose text；If Fig. 8 is to choose text user hand motion schematic diagram.

Further, when user's hand motion combination of multiple image cannot find respective operations classification, then it is assumed that current User's hand motion is invalid action, and system does not provide response, or prompting user hand motion malfunctions；

Certainly, except the corresponding user's hand motion of above user operation classification, which can also be predefined according to demand Its corresponding user's hand motion of user operation classification, as shown in figure 9, figure (a) is crawl display content, such as crawl text, Crawl image etc., figure (b) are mobile cursor, if currently displaying for a large amount of content of text, cursor is in certain text character Before, need to will be moved into another location, then can use the operation, figure (c) release content, the operation typically with crawl content Or other operations are used together, such as text are dragged after crawl text, then discharge the text after movement, figure (d) is hand-written Operation, the operation are generally used for opening hand-written pattern, when user is needed in screen or aerial input content, it is possible to use the behaviour Make；This case does not limit the corresponding user's hand motion of predefined user operation classification, and every kind of class of operation can also make It is with the combination of user's hand motion, every kind of class of operation is directly corresponding with every kind of user's hand motion.

S26：According to the class of operation, user's hand motion is responded.

In the present embodiment, by determining class of operation according to user's hand motion image, and carried out according to class of operation Respective response, it may not be necessary to wear Special Equipment in user's hand, so as to meet natural interaction custom, by the place to image Reason can also improve the degree of accuracy of the data of collection, so as to realize man-machine efficiently interacting naturally.By according to continuous multigroup use The corresponding user's hand motion classification of family hand motion image and key point position, determine class of operation, can improve the degree of accuracy. By user's hand region image is partitioned in user's hand motion image, treatment effeciency can be improved.By adopting depth Neural metwork training builds user's hand motion recognition model, can improve the recognition accuracy of user's hand motion.

Figure 10 is the structural representation of the intelligent interaction device that the application one embodiment is proposed.

As shown in Figure 10, the device 100 of the present embodiment includes：Acquisition module 101, determining module 102 and respond module 103。

Acquisition module 101, obtains user's hand motion image, and user's hand motion image is to user's hand motion Obtain after being shot；

Determining module 102, for according to user's hand motion image, determining the corresponding behaviour of user's hand motion Make classification；

Respond module 103, for according to the class of operation, responding to user's hand motion.

In some embodiments, referring to Figure 11, the determining module 102 includes：

Segmentation submodule 1021, for determining the user's hand region in user's hand motion image, and according to institute State user's hand region to split user's hand motion image, obtain user's hand region image；

Identification submodule 1022, for according to user's hand region image and user's hand motion knowledge of structure in advance Other model, identifies the corresponding user's hand motion classification of user's hand motion image and key point position；

Determination sub-module 1023, for according to the continuously corresponding user's hand motion classification of multigroup user's hand motion image With key point position, the corresponding class of operation of user's hand motion is determined.

In some embodiments, single group user's hand motion image includes：Mutual corresponding frame RGB image and a frame depth Image.

It is in some embodiments, described to split submodule 1021 for determining the user's hand in user's hand motion image Portion region, including：

The skin area is clustered, the different skin region after being clustered；

In some embodiments, the submodule 1021 of splitting is for according to the RGB image, determining user's hand Skin area in motion images, including：

The RGB image is converted to into CrCb images；

The skin mask in default CrCb spaces is carried out and operation with the CrCb images, user's hand is determined Skin area in motion images.

In some embodiments, user's hand region image：The RGB image and depth image of user's hand region, institute State identification submodule 1022 specifically for：

Using the RGB image and depth image of user's hand region as the input of user's hand motion recognition model, obtain Model is exported, and the model output includes：The probability of every kind of user's hand motion classification and key point position；

Using probability highest user's hand motion classification as the user's hand motion classification for identifying, model is exported Key point position is used as the key point position identified.

In some embodiments, referring to Figure 11, the device 100 also includes：For building user's hand motion recognition model Build module 104, it is described structure module 104 specifically for：

Training data is obtained, the training data includes：The RGB image and depth image and mark letter of user's hand region Breath, the RGB image and depth image of user's hand region are obtained after the user's hand motion image to collection is split Arrive, the markup information is corresponding with the user's hand motion image collected, including user's hand motion classification and key point Put；

Determine the structure of user's hand motion recognition model；

It is trained based on the training data and the structure, builds user's hand motion recognition model.

In some embodiments, the structure includes：Deep neural network structure.

It is understood that the device of the present embodiment is corresponding with said method embodiment, particular content may refer to method The associated description of embodiment, here are no longer described in detail.

It is understood that same or similar part mutually can refer in the various embodiments described above, in certain embodiments Unspecified content may refer to same or analogous content in other embodiment.

It should be noted that in the description of the present application, term " first ", " second " etc. are only used for describing purpose, and not It is understood that to indicate or implying relative importance.Additionally, in the description of the present application, unless otherwise stated, the implication of " multiple " Refer at least two.

In flow chart or here any process described otherwise above or method description are construed as, expression includes It is one or more for realizing specific logical function or process the step of the module of code of executable instruction, fragment or portion Point, and the scope of the preferred embodiment of the application includes other realization, wherein the suitable of shown or discussion can not be pressed Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application Embodiment person of ordinary skill in the field understood.

It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage Or firmware is realizing.For example, if realized with hardware, and in another embodiment, can be with well known in the art Any one of row technology or their combination are realizing：With for the logic gates of logic function is realized to data-signal Discrete logic, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Suddenly the hardware that can be by program to instruct correlation is completed, and described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

Additionally, each functional unit in the application each embodiment can be integrated in a processing module, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a module.Above-mentioned integrated mould Block both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as Fruit using in the form of software function module realize and as independent production marketing or use when, it is also possible to be stored in a computer In read/write memory medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show Example ", or the description of " some examples " etc. mean specific features with reference to the embodiment or example description, structure, material or spy Point is contained at least one embodiment or example of the application.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example are referred to necessarily.And, the specific features of description, structure, material or feature can be any One or more embodiments or example in combine in an appropriate manner.

Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to the restriction to the application is interpreted as, one of ordinary skill in the art within the scope of application can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims

1. a kind of intelligent interactive method, it is characterised in that include：

Obtain user's hand motion image, user's hand motion image be user's hand motion is shot after obtain 's；

According to user's hand motion image, the corresponding class of operation of user's hand motion is determined；

According to the class of operation, user's hand motion is responded.

2. method according to claim 1, it is characterised in that described according to user's hand motion image, determines institute The corresponding class of operation of user's hand motion is stated, including：

Determine the user's hand region in user's hand motion image, and according to user's hand region to the user Hand motion image is split, and obtains user's hand region image；

The user's hand motion recognition model built according to user's hand region image and in advance, identifies user's hand The corresponding user's hand motion classification of portion's motion images and key point position；

According to the continuously corresponding user's hand motion classification of multigroup user's hand motion image and key point position, the use is determined The corresponding class of operation of family hand motion.

3. method according to claim 2, it is characterised in that single group user's hand motion image includes：It is mutually corresponding One frame RGB image and a frame depth image.

4. method according to claim 3, it is characterised in that the user in determination user's hand motion image Hand region, including：

The skin area is clustered, the different skin region after being clustered；

According to depth image, the corresponding depth value in different skin region is obtained, and according to the depth value, determines the user User's hand region in hand motion image.

5. method according to claim 4, it is characterised in that described according to the RGB image, determines user's hand Skin area in portion's motion images, including：

The RGB image is converted to into CrCb images；

The skin mask in default CrCb spaces is carried out and operation with the CrCb images, user's hand motion is determined Skin area in image.

6. method according to claim 2, it is characterised in that user's hand region image：User's hand region RGB image and depth image, the user's hand motion recognition mould for building according to user's hand region image and in advance Type, identifies the corresponding user's hand motion classification of user's hand motion image and key point position, including：

Using the RGB image and depth image of user's hand region as the input of user's hand motion recognition model, model is obtained Output, the model output include：The probability of every kind of user's hand motion classification and key point position；

Using probability highest user's hand motion classification as the user's hand motion classification for identifying, the key that model is exported Point position is used as the key point position identified.

7. method according to claim 2, it is characterised in that also include：User's hand motion recognition model is built, it is described User's hand motion recognition model is built, including：

Training data is obtained, the training data includes：The RGB image and depth image of user's hand region and markup information, The RGB image and depth image of user's hand region is obtained after the user's hand motion image to collection is split , the markup information is corresponding with the user's hand motion image collected, including user's hand motion classification and key point position；

Determine the structure of user's hand motion recognition model；

8. method according to claim 7, it is characterised in that the structure includes：Deep neural network structure.

9. a kind of intelligent interaction device, it is characterised in that include：

Acquisition module, obtains user's hand motion image, and user's hand motion image is that user's hand motion is clapped Obtain after taking the photograph；

Determining module, for according to user's hand motion image, determining the corresponding class of operation of user's hand motion；

Respond module, for according to the class of operation, responding to user's hand motion.

10. device according to claim 9, it is characterised in that the determining module includes：

Segmentation submodule, for determining the user's hand region in user's hand motion image, and according to user's hand Portion region is split to user's hand motion image, obtains user's hand region image；

Identification submodule, for the user's hand motion recognition model according to user's hand region image and advance structure, Identify the corresponding user's hand motion classification of user's hand motion image and key point position；

Determination sub-module, for according to the continuously corresponding user's hand motion classification of multigroup user's hand motion image and key point Position, determines the corresponding class of operation of user's hand motion.

11. devices according to claim 10, it is characterised in that single group user's hand motion image includes：It is mutually corresponding A frame RGB image and a frame depth image.

12. devices according to claim 11, it is characterised in that the segmentation submodule is used to determine user's hand User's hand region in motion images, including：

The skin area is clustered, the different skin region after being clustered；

13. devices according to claim 12, it is characterised in that the segmentation submodule for according to the RGB image, The skin area in user's hand motion image is determined, including：

The RGB image is converted to into CrCb images；

14. devices according to claim 10, it is characterised in that user's hand region image：User's hand region RGB image and depth image, the identification submodule specifically for：

15. devices according to claim 10, it is characterised in that also include：For building user's hand motion recognition mould The structure module of type, the structure module specifically for：

Determine the structure of user's hand motion recognition model；

16. devices according to claim 15, it is characterised in that the structure includes：Deep neural network structure.