CN106547356A - Intelligent interactive method and device - Google Patents

Intelligent interactive method and device Download PDF

Info

Publication number
CN106547356A
CN106547356A CN201611025898.3A CN201611025898A CN106547356A CN 106547356 A CN106547356 A CN 106547356A CN 201611025898 A CN201611025898 A CN 201611025898A CN 106547356 A CN106547356 A CN 106547356A
Authority
CN
China
Prior art keywords
user
image
hand motion
hand
region
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611025898.3A
Other languages
Chinese (zh)
Other versions
CN106547356B (en
Inventor
王天
王天一
刘聪
王智国
胡国平
胡郁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
iFlytek Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201611025898.3A priority Critical patent/CN106547356B/en
Publication of CN106547356A publication Critical patent/CN106547356A/en
Application granted granted Critical
Publication of CN106547356B publication Critical patent/CN106547356B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • User Interface Of Digital Computer (AREA)
  • Image Analysis (AREA)

Abstract

The application proposes a kind of intelligent interactive method and device, and the intelligent interactive method includes:User's hand motion image is obtained, user's hand motion image is obtained after shooting to user's hand motion;According to user's hand motion image, the corresponding class of operation of user's hand motion is determined;According to the class of operation, user's hand motion is responded.The method can be realized man-machine efficiently interacting naturally.

Description

Intelligent interactive method and device
Technical field
The application is related to human-computer interaction technique field, more particularly to a kind of intelligent interactive method and device.
Background technology
Increasingly mature with artificial intelligence correlation technique, the life of people starts to move towards intelligent, such as various intelligent families Residence progressed into usually other, such as various augmented reality equipment are had started to into practical, so that man-machine friendship Mutually also day by day usually it is and required, during man-machine interaction, most it is concerned with whether people naturally can be interacted with machine by user, or even The degree interacted with people can be reached;Therefore, increasing technical staff begins one's study and how efficiently to realize naturally The process of man-machine interaction.
In correlation technique, when user carries out intelligent interaction using hand and machine, it is necessary first in hand usage record equipment, Such as writing pencil, hand-written fingerstall etc.;Then the two dimension or three-dimensional coordinate data of user's hand motion are gathered according to recording equipment;Root again The action of user's hand or the movement locus of hand are identified according to the hand data of collection, to determine the operation of user, by System provides the response results of corresponding operating.
But, above-mentioned intelligent interaction mode does not simultaneously meet natural interaction custom, and the data for being also easily caused collection are inaccurate, Cause interaction effect undesirable.
The content of the invention
The application is intended at least to solve to a certain extent one of technical problem in correlation technique.
For this purpose, a purpose of the application is to propose a kind of intelligent interactive method, the method can realize man-machine nature Efficient interaction.
Further object is to propose a kind of intelligent interaction device.
To reach above-mentioned purpose, the intelligent interactive method that the application first aspect embodiment is proposed, including:Obtain user's hand Portion's motion images, user's hand motion image are obtained after shooting to user's hand motion;According to the user Hand motion image, determines the corresponding class of operation of user's hand motion;According to the class of operation, to user's hand Portion's action is responded.
The intelligent interactive method that the application first aspect embodiment is proposed, is grasped by being determined according to user's hand motion image Make classification, and respective response is carried out according to class of operation, it may not be necessary to Special Equipment is worn in user's hand, so as to meet Natural interaction is accustomed to, and the degree of accuracy of the data of collection can also be improved by the process to image, man-machine naturally high so as to realize The interaction of effect.
To reach above-mentioned purpose, the intelligent interaction device that the application second aspect embodiment is proposed, including:Acquisition module, User's hand motion image is obtained, user's hand motion image is obtained after shooting to user's hand motion;Really Cover half block, for according to user's hand motion image, determining the corresponding class of operation of user's hand motion;Response mould Block, for according to the class of operation, responding to user's hand motion.
The intelligent interaction device that the application second aspect embodiment is proposed, is grasped by being determined according to user's hand motion image Make classification, and respective response is carried out according to class of operation, it may not be necessary to Special Equipment is worn in user's hand, so as to meet Natural interaction is accustomed to, and the degree of accuracy of the data of collection can also be improved by the process to image, man-machine naturally high so as to realize The interaction of effect.
The aspect and advantage that the application is added will be set forth in part in the description, and partly will become from the following description Obtain substantially, or recognized by the practice of the application.
Description of the drawings
The above-mentioned and/or additional aspect of the application and advantage will become from the following description of the accompanying drawings of embodiments It is substantially and easy to understand, wherein:
Fig. 1 is the schematic flow sheet of the intelligent interactive method that the application one embodiment is proposed;
Fig. 2 is the schematic flow sheet of the intelligent interactive method that the application another embodiment is proposed;
Fig. 3 is a kind of network topology structure schematic diagram of user's hand motion recognition model in the embodiment of the present application;
Fig. 4 is the schematic diagram of one group of user's hand motion image in the embodiment of the present application;
Fig. 5 is a kind of schematic diagram of the image for showing skin area in the embodiment of the present application;
Fig. 6 is a kind of schematic diagram of the image for showing hand region in the embodiment of the present application;
The schematic diagram of corresponding user's hand motion when Fig. 7 is single-click operation in the embodiment of the present application;
Fig. 8 is the schematic diagram of corresponding user's hand motion when choosing text maninulation in the embodiment of the present application;
Fig. 9 is various schematic diagrames for operating the corresponding user's hand motion of difference in the embodiment of the present application;
Figure 10 is the structural representation of the intelligent interaction device that the application one embodiment is proposed;
Figure 11 is the structural representation of the intelligent interaction device that the application another embodiment is proposed.
Specific embodiment
Embodiments herein is described below in detail, the example of the embodiment is shown in the drawings, wherein from start to finish Same or similar label represents same or similar module or the module with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the application, and it is not intended that restriction to the application.Conversely, this The embodiment of application includes all changes fallen in the range of the spirit and intension of attached claims, modification and is equal to Thing.
Fig. 1 is the schematic flow sheet of the intelligent interactive method that the application one embodiment is proposed.
As shown in figure 1, the method for the present embodiment includes:
S11:User's hand motion image is obtained, user's hand motion image is that user's hand motion is shot Obtain afterwards.
User's hand motion refers to the hand motion that the track of user's hand movement, activity of finger etc. are combined into, institute State hand motion to be generally used for operating screen or aerial display content, the display content can be displayed in screen or The contents such as aerial text, image, application program, concrete this case are not construed as limiting.
The hand motion can be the single-handed exercise of user, or the both hands of user or many works manually, work as reception When making manually more, there are multiple users to participate in interaction;The hand motion is such as clenched fist, opens palm, stretches out the actions such as forefinger.
It should be noted that user's hand motion can be realized by hand or hands even many hands, it is same Operation can be completed using one or more hand motions, concrete hand motion can also be determined according to application demand it is various, not It is limited to hand motion described in this case.
For example, camera or video camera are set on smart machine, user's hand motion are entered by camera or video camera Row shooting obtains user's hand motion image, and processing system can get user's hand motion figure from camera or video camera Picture.When shooting to user's hand motion, can shoot and obtain a frame or multiple image, such as utilize camera or video camera After being continuously shot to user's hand, continuous multiple frames user's hand motion image is obtained, when specifically shooting, is generally used and is carried The video camera of RGBD sensors is shot, and directly obtains the RGBD data of user's hand motion, i.e., including color data RGB and Depth data D, such that it is able to directly obtain the RGB image and depth map of user's hand motion.Rgb color pattern is industrial quarters A kind of color standard, is by red (R), green (G), the change of blue (B) three Color Channels and their each other folded Calais obtains color miscellaneous, and RGB is the color for representing three passages of red, green, blue, and this standard is almost included The all colours that human eyesight can perceive are at present with one of most wide color system.In scene, each point is relative to shooting The distance of machine can be represented with depth map (depth map), i.e., each pixel value in depth map represents a certain in scene The distance between point and video camera.
It should be noted that user is when hand motion is carried out, can be carried out with naked hand, i.e., need not be worn specially in hand Recording equipment.And, user can be specifically referred to contactless when operating to screen or the aerial content for showing Operation, i.e., and do not need user's contact screen can just complete to display content operation.
It is understood that the embodiment of the present application is by taking hand motion as an example, but, what other body parts of user were carried out Operation, such as headwork, arm action etc. can also be performed according to the mode of hand motion, therefore belong to the embodiment of the present application Equivalent implementations.
S12:According to user's hand motion image, the corresponding class of operation of user's hand motion is determined.
The corresponding class of operation of user's hand motion refers to user on screen or the class that operated of aerial display content Not, such as mobile cursor, crawl content, drag content, release content, it is hand-written, click.
It is determined that during the class of operation, for example, first identifying user's hand motion classification and hand according to described image Key point position, further according to the user's hand motion classification and key point position that identify, determines the class of operation.Specifically Content may refer to subsequent descriptions.
S13:According to the class of operation, user's hand motion is responded.
System can carry out respective response according to the response mode of every kind of class of operation set in advance.
When such as current operation classification is hand-written operation, after system determines the class of operation of user, hand-written pattern is switched to, with The handwritten content of receive user carries out corresponding handwriting recognition, and recognition result is shown;
During such as current operation classification to click, after system determines the class of operation of user, system is provided according to the gesture Response results, such as user click screen or the aerial application program for showing, perform corresponding operating.
It is understood that before being responded to user's hand motion according to class of operation, can also determine whether full Foot is pre-conditioned, when meeting pre-conditioned, to respond to user's hand motion further according to class of operation;It is being unsatisfactory for Do not responded when pre-conditioned.It is pre-conditioned for example to include:Response user's hand motion function is currently had turned on, and operates class Do not belong to itself and support classification of response etc..
In the present embodiment, by determining class of operation according to user's hand motion image, and carried out according to class of operation Respective response, it may not be necessary to wear Special Equipment in user's hand, so as to meet natural interaction custom, by the place to image Reason can also improve the degree of accuracy of the data of collection, so as to realize man-machine efficiently interacting naturally.
Fig. 2 is the schematic flow sheet of the intelligent interactive method that the application another embodiment is proposed.
As shown in Fig. 2 the method for the present embodiment includes:
S21:Build user's hand motion recognition model.
Can specifically include:
(1) obtain training data.
Wherein, every group of training data includes input data and output data, and in the present embodiment, input data includes:User The depth image of the RGB image of hand region and corresponding user's hand region, output data include:User's hand of mark is moved Make classification and key point position, be typically based on domain expert's mark and obtain.
Specifically, substantial amounts of user's hand motion image can be first collected, collection mode is as using with RGBD sensors Video camera user's hand motion is shot, such that it is able to obtain substantial amounts of mutually corresponding RGB image and depth image. Again the RGB image and depth image in every group of user's hand motion image is split respectively, user's hand region is obtained The depth image of RGB image and user's hand region.Concrete partitioning scheme may refer to described below.And, per group of use of correspondence Family hand motion image labeling user's hand motion classification and key point position.
The activity determination of movement or finger of user's hand motion classification according to user's hand region, such as hand motion classification Including:Clench fist, open palm or stretch out forefinger etc..
It is understood that hand motion classification can be default according to application demand, however it is not limited to above-mentioned example.
The position of key point can be chosen according to application demand etc., such as, by the position of central point of clenching fist, or by forefinger Position as key point position.
(2) determine the structure of user's hand motion recognition model.
Model structure can be arranged according to demand, and the present embodiment is by taking deep neural network structure as an example.
Fig. 3 gives a kind of network topology structure schematic diagram of user's hand motion recognition model.As shown in figure 3, the mould Type includes:Input layer, eigentransformation layer, full articulamentum and output layer.
Input layer is input into the RGB image of user's hand region and corresponding depth image respectively;Eigentransformation layer is right respectively The RGB image and depth image of input carries out eigentransformation, respectively obtains characteristics of image and depth after the conversion of user's hand region Degree feature, the eigentransformation layer are generally convolutional neural networks structure, per layer of change of concrete transform method and convolutional neural networks Change method identical;Change of the characteristics of image and depth characteristic after again convert user's hand region through a full articulamentum Output layer is input to after changing, by output layer export active user's hand motion image belong to every kind of hand motion classification probability and The key point position of active user's hand motion.
(3) it is trained based on the training data and the structure, builds user's hand motion recognition model.
Such as, the input data in training data, is obtained as mode input after with the computing of each layer parameter of model Export the probability and active user's hand for belonging to every kind of hand motion classification as active user's hand motion image to move to model The key point position of work, using probability highest user's hand motion classification as the user's hand motion classification for predicting, will be pre- The user's hand motion classification measured and the key point position of user's hand motion, then will be defeated in training data used as predicted value Go out data as actual value, loss function is worth to according to actual value and prediction, by minimizing loss function, mould can be obtained The each layer parameter of type, obtains model so as to train.Specific model training mode may refer to skill that is various existing or occurring in the future Art, will not be described in detail herein.
S22:Obtain user's hand motion image.
For example, receive the user's hand motion image sent by camera or video camera, camera or video camera can be After user produces hand motion, user's hand motion is shot, user's hand motion image is obtained.
Wherein, when shooting to user's hand motion, the video camera with RGBD sensors can be adopted to be connected Continuous to shoot, so as to obtain continuous multigroup user's hand motion image, every group of user's hand motion image includes a frame RGB image With a frame depth image.
Fig. 4 gives one group of user's hand motion image, including mutual corresponding frame RGB image and a frame depth map Picture.Wherein, it is RGB image on the left of Fig. 4, right side is depth image.It should be noted that accompanying drawing requirement is limited to, in Fig. 4 RGB image is by taking gray level image as an example, but in actual enforcement, RGB image is with coloured coloured image.
S23:Determine the user's hand region in user's hand motion image, and according to user's hand region pair User's hand motion image is split, and obtains user's hand region image.
It is determined that during user's hand region, can include:
According to the RGB image, the skin area in user's hand motion image is determined;
Pixel in the skin area is clustered, different skin region is obtained;
According to depth image, the corresponding depth value in different skin region is obtained, and according to the depth value, determined described User's hand region in user's hand motion image.
When determining skin area, RGB image can be converted to into CrCb images first, conversion regime can be incited somebody to action using existing Rgb space is mapped to the mode in CrCb spaces;Again by the skin mask in default CrCb spaces and the CrCb images carry out with Operation, determines the skin area in user's hand motion image.Skin mask can be by collecting a large amount of skins in advance Picture construction is obtained, the skin image such as hand skin image.As shown in figure 5, the RGB image for giving Fig. 4 is corresponding aobvious The image of skin area is shown.It is understood that for the ease of image procossing, bianry image can be converted the image into, its In, the pixel in skin area is represented with white, and the pixel in non-skin region is represented with black.
After skin area is determined, the pixel in skin area can be clustered, obtain different skin region. Cluster mode is not limited, and is such as clustered using k-means modes.
After different skin region is obtained, the corresponding depth value in different skin region can be obtained according to depth image, and According to the depth value, the user's hand region in user's hand motion image is determined.For example, by different skin region Cluster centre point corresponding to depth image in pixel depth value, as the corresponding depth value of corresponding skin area or By the mean value of the depth value in all pixels point correspondence depth image in cluster, as the corresponding depth of corresponding skin area Value, such that it is able to obtain the corresponding depth value of each skin area.As hand is usually located at the front of body, therefore, it can The minimum skin area of depth value is defined as into user's hand region.As shown in fig. 6, giving from the skin area of Fig. 5 really The schematic diagram of the user's hand region made.
After user's hand region is determined, the original user's hand motion image for obtaining that shoots can be split, Obtain user's hand region image.For example, when user's hand motion image includes RGB image and depth image, then directly clapping The image that user's hand region is partitioned in the RGB image and depth image for obtaining is taken the photograph, the RGB image of user's hand region is obtained With the depth image of user's hand region.
S24:The user's hand motion recognition model built according to user's hand region image and in advance, identifies institute State the corresponding user's hand motion classification of user's hand motion image and key point position.
Assume that user's hand region image includes:The depth map of the RGB image and user's hand region of user's hand region Picture, then corresponding with the model structure shown in Fig. 3, the RGB image and depth image difference of user's hand region that segmentation is obtained As the input of user's hand motion recognition model, model output is obtained, the model output includes:Every kind of user's hand motion The probability of classification and key point position;Move probability highest user's hand motion classification as the user's hand for identifying afterwards Make classification, the key point position that model is exported is used as the key point position identified.
S25:According to the corresponding user's hand motion classification of continuous multigroup user's hand motion image and key point position, really Determine the corresponding class of operation of user's hand motion.
As hand motion easy to use is relatively limited, therefore some user operation classifications are needed with reference to multiple user's hands Action realization, such as single-click operation, need palm first to open and clench fist again.Fig. 7 gives user's hand motion corresponding to single-click operation Schematic diagram, as shown in fig. 7, the corresponding user's hand motion of single-click operation include on the left of Fig. 7 shown in palm open and Fig. 7 on the right side of Shown clenches fist.Therefore, in order to determine the class of operation of user's hand motion, need first to obtain multigroup user's hand motion figure Picture, determines that the corresponding user's hand motion classification of every group of user's hand motion image and key point are postponed, then determines user's Class of operation.How many groups of user's hand motion images are specifically obtained, can be determined according to application demand, such as continuously acquire 15 groups of use Family hand motion image.
During the class of operation of concrete determination user, can be according to the course of action of predefined every kind of class of operation, pass The position of key point and it is current in screen or the aerial content for showing determining.
Such as current screen or the interface for user's Query Information for showing in the air, user have been input into respective queries information, The user action classification for now identifying first is opened for palm and is clenched fist again, that is, in the multigroup user's hand motion image for obtaining, front Image corresponding user's hand motion classification in several groups of face is opened for palm, behind the corresponding user's hand motion classification of several groups of images To clench fist, then can be determined that the class of operation of user to click;
As current screen or it is aerial show for multiline text content, multigroup user's hand motion image of acquisition is corresponding User's hand motion classification is all clenched fist, and the position of user's hand key point is moved in the same direction, then can determine user Class of operation continuously to choose text;If Fig. 8 is to choose text user hand motion schematic diagram.
Further, when user's hand motion combination of multiple image cannot find respective operations classification, then it is assumed that current User's hand motion is invalid action, and system does not provide response, or prompting user hand motion malfunctions;
Certainly, except the corresponding user's hand motion of above user operation classification, which can also be predefined according to demand Its corresponding user's hand motion of user operation classification, as shown in figure 9, figure (a) is crawl display content, such as crawl text, Crawl image etc., figure (b) are mobile cursor, if currently displaying for a large amount of content of text, cursor is in certain text character Before, need to will be moved into another location, then can use the operation, figure (c) release content, the operation typically with crawl content Or other operations are used together, such as text are dragged after crawl text, then discharge the text after movement, figure (d) is hand-written Operation, the operation are generally used for opening hand-written pattern, when user is needed in screen or aerial input content, it is possible to use the behaviour Make;This case does not limit the corresponding user's hand motion of predefined user operation classification, and every kind of class of operation can also make It is with the combination of user's hand motion, every kind of class of operation is directly corresponding with every kind of user's hand motion.
S26:According to the class of operation, user's hand motion is responded.
System can carry out respective response according to the response mode of every kind of class of operation set in advance.
When such as current operation classification is hand-written operation, after system determines the class of operation of user, hand-written pattern is switched to, with The handwritten content of receive user carries out corresponding handwriting recognition, and recognition result is shown;
During such as current operation classification to click, after system determines the class of operation of user, system is provided according to the gesture Response results, such as user click screen or the aerial application program for showing, perform corresponding operating.
In the present embodiment, by determining class of operation according to user's hand motion image, and carried out according to class of operation Respective response, it may not be necessary to wear Special Equipment in user's hand, so as to meet natural interaction custom, by the place to image Reason can also improve the degree of accuracy of the data of collection, so as to realize man-machine efficiently interacting naturally.By according to continuous multigroup use The corresponding user's hand motion classification of family hand motion image and key point position, determine class of operation, can improve the degree of accuracy. By user's hand region image is partitioned in user's hand motion image, treatment effeciency can be improved.By adopting depth Neural metwork training builds user's hand motion recognition model, can improve the recognition accuracy of user's hand motion.
Figure 10 is the structural representation of the intelligent interaction device that the application one embodiment is proposed.
As shown in Figure 10, the device 100 of the present embodiment includes:Acquisition module 101, determining module 102 and respond module 103。
Acquisition module 101, obtains user's hand motion image, and user's hand motion image is to user's hand motion Obtain after being shot;
Determining module 102, for according to user's hand motion image, determining the corresponding behaviour of user's hand motion Make classification;
Respond module 103, for according to the class of operation, responding to user's hand motion.
In some embodiments, referring to Figure 11, the determining module 102 includes:
Segmentation submodule 1021, for determining the user's hand region in user's hand motion image, and according to institute State user's hand region to split user's hand motion image, obtain user's hand region image;
Identification submodule 1022, for according to user's hand region image and user's hand motion knowledge of structure in advance Other model, identifies the corresponding user's hand motion classification of user's hand motion image and key point position;
Determination sub-module 1023, for according to the continuously corresponding user's hand motion classification of multigroup user's hand motion image With key point position, the corresponding class of operation of user's hand motion is determined.
In some embodiments, single group user's hand motion image includes:Mutual corresponding frame RGB image and a frame depth Image.
It is in some embodiments, described to split submodule 1021 for determining the user's hand in user's hand motion image Portion region, including:
According to the RGB image, the skin area in user's hand motion image is determined;
The skin area is clustered, the different skin region after being clustered;
According to depth image, the corresponding depth value in different skin region is obtained, and according to the depth value, determined described User's hand region in user's hand motion image.
In some embodiments, the submodule 1021 of splitting is for according to the RGB image, determining user's hand Skin area in motion images, including:
The RGB image is converted to into CrCb images;
The skin mask in default CrCb spaces is carried out and operation with the CrCb images, user's hand is determined Skin area in motion images.
In some embodiments, user's hand region image:The RGB image and depth image of user's hand region, institute State identification submodule 1022 specifically for:
Using the RGB image and depth image of user's hand region as the input of user's hand motion recognition model, obtain Model is exported, and the model output includes:The probability of every kind of user's hand motion classification and key point position;
Using probability highest user's hand motion classification as the user's hand motion classification for identifying, model is exported Key point position is used as the key point position identified.
In some embodiments, referring to Figure 11, the device 100 also includes:For building user's hand motion recognition model Build module 104, it is described structure module 104 specifically for:
Training data is obtained, the training data includes:The RGB image and depth image and mark letter of user's hand region Breath, the RGB image and depth image of user's hand region are obtained after the user's hand motion image to collection is split Arrive, the markup information is corresponding with the user's hand motion image collected, including user's hand motion classification and key point Put;
Determine the structure of user's hand motion recognition model;
It is trained based on the training data and the structure, builds user's hand motion recognition model.
In some embodiments, the structure includes:Deep neural network structure.
It is understood that the device of the present embodiment is corresponding with said method embodiment, particular content may refer to method The associated description of embodiment, here are no longer described in detail.
It is understood that same or similar part mutually can refer in the various embodiments described above, in certain embodiments Unspecified content may refer to same or analogous content in other embodiment.
It should be noted that in the description of the present application, term " first ", " second " etc. are only used for describing purpose, and not It is understood that to indicate or implying relative importance.Additionally, in the description of the present application, unless otherwise stated, the implication of " multiple " Refer at least two.
In flow chart or here any process described otherwise above or method description are construed as, expression includes It is one or more for realizing specific logical function or process the step of the module of code of executable instruction, fragment or portion Point, and the scope of the preferred embodiment of the application includes other realization, wherein the suitable of shown or discussion can not be pressed Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be by the application Embodiment person of ordinary skill in the field understood.
It should be appreciated that each several part of the application can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can be performed in memory and by suitable instruction execution system with storage Or firmware is realizing.For example, if realized with hardware, and in another embodiment, can be with well known in the art Any one of row technology or their combination are realizing:With for the logic gates of logic function is realized to data-signal Discrete logic, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Suddenly the hardware that can be by program to instruct correlation is completed, and described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.
Additionally, each functional unit in the application each embodiment can be integrated in a processing module, it is also possible to It is that unit is individually physically present, it is also possible to which two or more units are integrated in a module.Above-mentioned integrated mould Block both can be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as Fruit using in the form of software function module realize and as independent production marketing or use when, it is also possible to be stored in a computer In read/write memory medium.
Storage medium mentioned above can be read-only storage, disk or CD etc..
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show Example ", or the description of " some examples " etc. mean specific features with reference to the embodiment or example description, structure, material or spy Point is contained at least one embodiment or example of the application.In this manual, to the schematic representation of above-mentioned term not Identical embodiment or example are referred to necessarily.And, the specific features of description, structure, material or feature can be any One or more embodiments or example in combine in an appropriate manner.
Although embodiments herein has been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to the restriction to the application is interpreted as, one of ordinary skill in the art within the scope of application can be to above-mentioned Embodiment is changed, changes, replacing and modification.

Claims (16)

1. a kind of intelligent interactive method, it is characterised in that include:
Obtain user's hand motion image, user's hand motion image be user's hand motion is shot after obtain 's;
According to user's hand motion image, the corresponding class of operation of user's hand motion is determined;
According to the class of operation, user's hand motion is responded.
2. method according to claim 1, it is characterised in that described according to user's hand motion image, determines institute The corresponding class of operation of user's hand motion is stated, including:
Determine the user's hand region in user's hand motion image, and according to user's hand region to the user Hand motion image is split, and obtains user's hand region image;
The user's hand motion recognition model built according to user's hand region image and in advance, identifies user's hand The corresponding user's hand motion classification of portion's motion images and key point position;
According to the continuously corresponding user's hand motion classification of multigroup user's hand motion image and key point position, the use is determined The corresponding class of operation of family hand motion.
3. method according to claim 2, it is characterised in that single group user's hand motion image includes:It is mutually corresponding One frame RGB image and a frame depth image.
4. method according to claim 3, it is characterised in that the user in determination user's hand motion image Hand region, including:
According to the RGB image, the skin area in user's hand motion image is determined;
The skin area is clustered, the different skin region after being clustered;
According to depth image, the corresponding depth value in different skin region is obtained, and according to the depth value, determines the user User's hand region in hand motion image.
5. method according to claim 4, it is characterised in that described according to the RGB image, determines user's hand Skin area in portion's motion images, including:
The RGB image is converted to into CrCb images;
The skin mask in default CrCb spaces is carried out and operation with the CrCb images, user's hand motion is determined Skin area in image.
6. method according to claim 2, it is characterised in that user's hand region image:User's hand region RGB image and depth image, the user's hand motion recognition mould for building according to user's hand region image and in advance Type, identifies the corresponding user's hand motion classification of user's hand motion image and key point position, including:
Using the RGB image and depth image of user's hand region as the input of user's hand motion recognition model, model is obtained Output, the model output include:The probability of every kind of user's hand motion classification and key point position;
Using probability highest user's hand motion classification as the user's hand motion classification for identifying, the key that model is exported Point position is used as the key point position identified.
7. method according to claim 2, it is characterised in that also include:User's hand motion recognition model is built, it is described User's hand motion recognition model is built, including:
Training data is obtained, the training data includes:The RGB image and depth image of user's hand region and markup information, The RGB image and depth image of user's hand region is obtained after the user's hand motion image to collection is split , the markup information is corresponding with the user's hand motion image collected, including user's hand motion classification and key point position;
Determine the structure of user's hand motion recognition model;
It is trained based on the training data and the structure, builds user's hand motion recognition model.
8. method according to claim 7, it is characterised in that the structure includes:Deep neural network structure.
9. a kind of intelligent interaction device, it is characterised in that include:
Acquisition module, obtains user's hand motion image, and user's hand motion image is that user's hand motion is clapped Obtain after taking the photograph;
Determining module, for according to user's hand motion image, determining the corresponding class of operation of user's hand motion;
Respond module, for according to the class of operation, responding to user's hand motion.
10. device according to claim 9, it is characterised in that the determining module includes:
Segmentation submodule, for determining the user's hand region in user's hand motion image, and according to user's hand Portion region is split to user's hand motion image, obtains user's hand region image;
Identification submodule, for the user's hand motion recognition model according to user's hand region image and advance structure, Identify the corresponding user's hand motion classification of user's hand motion image and key point position;
Determination sub-module, for according to the continuously corresponding user's hand motion classification of multigroup user's hand motion image and key point Position, determines the corresponding class of operation of user's hand motion.
11. devices according to claim 10, it is characterised in that single group user's hand motion image includes:It is mutually corresponding A frame RGB image and a frame depth image.
12. devices according to claim 11, it is characterised in that the segmentation submodule is used to determine user's hand User's hand region in motion images, including:
According to the RGB image, the skin area in user's hand motion image is determined;
The skin area is clustered, the different skin region after being clustered;
According to depth image, the corresponding depth value in different skin region is obtained, and according to the depth value, determines the user User's hand region in hand motion image.
13. devices according to claim 12, it is characterised in that the segmentation submodule for according to the RGB image, The skin area in user's hand motion image is determined, including:
The RGB image is converted to into CrCb images;
The skin mask in default CrCb spaces is carried out and operation with the CrCb images, user's hand motion is determined Skin area in image.
14. devices according to claim 10, it is characterised in that user's hand region image:User's hand region RGB image and depth image, the identification submodule specifically for:
Using the RGB image and depth image of user's hand region as the input of user's hand motion recognition model, model is obtained Output, the model output include:The probability of every kind of user's hand motion classification and key point position;
Using probability highest user's hand motion classification as the user's hand motion classification for identifying, the key that model is exported Point position is used as the key point position identified.
15. devices according to claim 10, it is characterised in that also include:For building user's hand motion recognition mould The structure module of type, the structure module specifically for:
Training data is obtained, the training data includes:The RGB image and depth image of user's hand region and markup information, The RGB image and depth image of user's hand region is obtained after the user's hand motion image to collection is split , the markup information is corresponding with the user's hand motion image collected, including user's hand motion classification and key point position;
Determine the structure of user's hand motion recognition model;
It is trained based on the training data and the structure, builds user's hand motion recognition model.
16. devices according to claim 15, it is characterised in that the structure includes:Deep neural network structure.
CN201611025898.3A 2016-11-17 2016-11-17 Intelligent interaction method and device Active CN106547356B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611025898.3A CN106547356B (en) 2016-11-17 2016-11-17 Intelligent interaction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611025898.3A CN106547356B (en) 2016-11-17 2016-11-17 Intelligent interaction method and device

Publications (2)

Publication Number Publication Date
CN106547356A true CN106547356A (en) 2017-03-29
CN106547356B CN106547356B (en) 2020-09-11

Family

ID=58394834

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611025898.3A Active CN106547356B (en) 2016-11-17 2016-11-17 Intelligent interaction method and device

Country Status (1)

Country Link
CN (1) CN106547356B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291221A (en) * 2017-05-04 2017-10-24 浙江大学 Across screen self-adaption accuracy method of adjustment and device based on natural gesture
CN108257139A (en) * 2018-02-26 2018-07-06 中国科学院大学 RGB-D three-dimension object detection methods based on deep learning
CN108733287A (en) * 2018-05-15 2018-11-02 东软集团股份有限公司 Detection method, device, equipment and the storage medium of physical examination operation
CN109117746A (en) * 2018-07-23 2019-01-01 北京华捷艾米科技有限公司 Hand detection method and machine readable storage medium
CN110414393A (en) * 2019-07-15 2019-11-05 福州瑞芯微电子股份有限公司 A kind of natural interactive method and terminal based on deep learning
WO2020020146A1 (en) * 2018-07-25 2020-01-30 深圳市商汤科技有限公司 Method and apparatus for processing laser radar sparse depth map, device, and medium
WO2020252918A1 (en) * 2019-06-20 2020-12-24 平安科技(深圳)有限公司 Human body-based gesture recognition method and apparatus, device, and storage medium
CN112383805A (en) * 2020-11-16 2021-02-19 四川长虹电器股份有限公司 Method for realizing man-machine interaction at television end based on human hand key points
CN112686231A (en) * 2021-03-15 2021-04-20 南昌虚拟现实研究院股份有限公司 Dynamic gesture recognition method and device, readable storage medium and computer equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737235A (en) * 2012-06-28 2012-10-17 中国科学院自动化研究所 Head posture estimation method based on depth information and color image
CN102854983A (en) * 2012-09-10 2013-01-02 中国电子科技集团公司第二十八研究所 Man-machine interaction method based on gesture recognition
US20140253429A1 (en) * 2013-03-08 2014-09-11 Fastvdo Llc Visual language for human computer interfaces
CN104598915A (en) * 2014-01-24 2015-05-06 深圳奥比中光科技有限公司 Gesture recognition method and gesture recognition device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737235A (en) * 2012-06-28 2012-10-17 中国科学院自动化研究所 Head posture estimation method based on depth information and color image
CN102854983A (en) * 2012-09-10 2013-01-02 中国电子科技集团公司第二十八研究所 Man-machine interaction method based on gesture recognition
US20140253429A1 (en) * 2013-03-08 2014-09-11 Fastvdo Llc Visual language for human computer interfaces
CN104598915A (en) * 2014-01-24 2015-05-06 深圳奥比中光科技有限公司 Gesture recognition method and gesture recognition device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄晓林、董洪伟: "基于深度信息的实时手势识别和虚拟书写系统", 《计算机工程与应用》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107291221A (en) * 2017-05-04 2017-10-24 浙江大学 Across screen self-adaption accuracy method of adjustment and device based on natural gesture
CN107291221B (en) * 2017-05-04 2019-07-16 浙江大学 Across screen self-adaption accuracy method of adjustment and device based on natural gesture
CN108257139A (en) * 2018-02-26 2018-07-06 中国科学院大学 RGB-D three-dimension object detection methods based on deep learning
CN108257139B (en) * 2018-02-26 2020-09-08 中国科学院大学 RGB-D three-dimensional object detection method based on deep learning
CN108733287A (en) * 2018-05-15 2018-11-02 东软集团股份有限公司 Detection method, device, equipment and the storage medium of physical examination operation
CN109117746A (en) * 2018-07-23 2019-01-01 北京华捷艾米科技有限公司 Hand detection method and machine readable storage medium
WO2020020146A1 (en) * 2018-07-25 2020-01-30 深圳市商汤科技有限公司 Method and apparatus for processing laser radar sparse depth map, device, and medium
WO2020252918A1 (en) * 2019-06-20 2020-12-24 平安科技(深圳)有限公司 Human body-based gesture recognition method and apparatus, device, and storage medium
CN110414393A (en) * 2019-07-15 2019-11-05 福州瑞芯微电子股份有限公司 A kind of natural interactive method and terminal based on deep learning
CN112383805A (en) * 2020-11-16 2021-02-19 四川长虹电器股份有限公司 Method for realizing man-machine interaction at television end based on human hand key points
CN112686231A (en) * 2021-03-15 2021-04-20 南昌虚拟现实研究院股份有限公司 Dynamic gesture recognition method and device, readable storage medium and computer equipment
CN112686231B (en) * 2021-03-15 2021-06-01 南昌虚拟现实研究院股份有限公司 Dynamic gesture recognition method and device, readable storage medium and computer equipment
WO2022193453A1 (en) * 2021-03-15 2022-09-22 南昌虚拟现实研究院股份有限公司 Dynamic gesture recognition method and apparatus, and readable storage medium and computer device

Also Published As

Publication number Publication date
CN106547356B (en) 2020-09-11

Similar Documents

Publication Publication Date Title
CN106547356A (en) Intelligent interactive method and device
CN102081918B (en) Video image display control method and video image display device
CN103941866B (en) Three-dimensional gesture recognizing method based on Kinect depth image
CN103530613B (en) Target person hand gesture interaction method based on monocular video sequence
US10671841B2 (en) Attribute state classification
CN104573706B (en) A kind of subject image recognition methods and its system
CN104395856B (en) For recognizing the computer implemented method and system of dumb show
Qi et al. Computer vision-based hand gesture recognition for human-robot interaction: a review
CN107578023A (en) Man-machine interaction gesture identification method, apparatus and system
CN104838337A (en) Touchless input for a user interface
CN106598227A (en) Hand gesture identification method based on Leap Motion and Kinect
CN106200971A (en) Man-machine interactive system device based on gesture identification and operational approach
CN110135497B (en) Model training method, and method and device for estimating strength of facial action unit
Jin et al. Real-time action detection in video surveillance using sub-action descriptor with multi-cnn
CN107654406A (en) Fan air supply control equipment, fan air supply control method and device
Desai et al. Human Computer Interaction through hand gestures for home automation using Microsoft Kinect
CN106503619B (en) Gesture recognition method based on BP neural network
CN111857334A (en) Human body gesture letter recognition method and device, computer equipment and storage medium
Dardas Real-time hand gesture detection and recognition for human computer interaction
CN103201706A (en) Method for driving virtual mouse
Zhang et al. Emotion recognition from body movements with as-lstm
Lee et al. Recognition of hand gesture to human-computer interaction
US10095308B2 (en) Gesture based human machine interface using marker
CN110134241A (en) Dynamic gesture exchange method based on monocular cam
Liu et al. Recognizing object manipulation activities using depth and visual cues

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant