CN105045399A

CN105045399A - Electronic device with 3D camera assembly

Info

Publication number: CN105045399A
Application number: CN201510563581.4A
Authority: CN
Inventors: 杨晓光; 李建英; 朱磊; 韩琦
Original assignee: Harbin Yishe Technology Co Ltd
Current assignee: Harbin Yishe Technology Co Ltd
Priority date: 2015-09-07
Filing date: 2015-09-07
Publication date: 2015-11-11
Anticipated expiration: 2035-09-07
Also published as: CN105045399B

Abstract

The invention provides an electronic device with a 3D camera assembly. The electronic device comprises a 3D camera unit for obtaining a to-be-tested image sequence, with depth information, of hands of a user, a contour detecting unit for detecting the contours of the hands of the user, a feature point sequence determining unit for determining the to-be-tested feature point sequence of each hand, an action recognizing unit for determining the matching sequence of the to-be-tested feature point sequence of each hand in a plurality of preset feature point sequences so as to determine the action name and position of the hand, a gesture determining unit for selecting a gesture matched with the action name and position of each hand of the user from a preset gesture list to serve as the gesture recognizing result of the to-be-tested image sequence, an instruction determining unit for determining an operation instruction corresponding to the recognized gesture, and an executing unit for conducting corresponding operation on a related device. According to the technology, the gestures of the user can be accurately recognized in the man-machine interaction process, recognition accuracy is high, and recognition speed is high.

Description

A kind of electronic equipment with 3D camera assembly

Technical field

The present invention relates to computer technology, particularly relate to a kind of electronic equipment with 3D camera assembly.

Background technology

Along with mobile computing device from notebook computer to mobile phone, the evolution of panel computer, the control mode of mobile computing device also experienced by from keyboard, mouse to mobile phone key, handwriting pad, arrive the evolution of touch-screen, dummy keyboard again, can see, the control mode of mobile device is towards more and more directly perceived, easy, and meet people naturally custom direction evolve.

The current widely used control mode based on touch-screen on a mobile computing device, technically that one piece of transparent touch sensitive display and display screen are fit together, touch sensitive display is in fact a locating device, the touch action on screen can be captured and obtain its position, simultaneously binding time axis information, by action recognition its for tapping, longly to touch, one of the action such as slip.And then position and action message are passed to mobile computing device as instruction, mobile computing device makes corresponding operant response based on this instruction.Because touch sensitive display and display screen are superimposed, therefore the use sense bringing user's " put namely thought " is subject to, compare the input mode that the positioning equipment such as mouse, Trackpad needs by cursor feedback position, screen touch control manner brings better experience.

Screen touch control manner compares the mode that keyboard adds mouse, more meet the reaction directly perceived of people, more easily learn, but screen touch control manner only captures the action of human finger after all, the occasion of more multi-user's ontology information input is needed at some, such as motor play, simulated training, complicated manipulation, remote control etc., screen touch control manner just demonstrates it and catches the too single limitation of human body information.From the angle of man-machine interaction, the acquisition of more user's information, can pass on abundanter, more accurately user operate intention, therefore must bring more convenient control method and better experience.But the algorithm that the gesture recognition process in current human-computer interaction technology adopts is comparatively complicated, expend time in and the low problem of accuracy of identification.

Summary of the invention

Give hereinafter about brief overview of the present invention, to provide about the basic comprehension in some of the present invention.Should be appreciated that this general introduction is not summarize about exhaustive of the present invention.It is not that intention determines key of the present invention or pith, and nor is it intended to limit the scope of the present invention.Its object is only provide some concept in simplified form, in this, as the preorder in greater detail discussed after a while.

Given this, the invention provides a kind of electronic equipment with 3D camera assembly, the algorithm adopted with the gesture recognition process at least solved in existing human-computer interaction technology is comparatively complicated, expend time in and the low problem of accuracy of identification.

According to an aspect of the present invention, provide a kind of electronic equipment with 3D camera assembly, this electronic equipment comprises: 3D image unit, for catching the testing image sequence of the user's hand containing depth information; Contour detecting unit, for according to image depth information and image color information, detects the hand profile of user in every two field picture of testing image sequence; Characteristic point sequence determining unit, for every hand for user, utilizes the hand structure template preset, determines the characteristic point sequence to be measured of this hand in every two field picture of testing image sequence; Action recognition unit, for every hand for user, determines the matching sequence of the characteristic point sequence to be measured of this hand in multiple default characteristic point sequence, to determine denomination of dive and the position of this hand according to matching sequence; Gesture identification unit, for selecting the gesture matched with the denomination of dive of user's both hands and position in default gesture table, as identifying gesture; Instruction-determining unit, for according to predetermined registration operation instruction list, determines the operational order corresponding with identifying gesture; Performance element, for carrying out the operation corresponding with this operational order to the equipment relevant to the operational order determined.

Further, characteristic point sequence determining unit comprises: template storing sub-units, for storing default hand structure template; Template matches subelement, for every hand for user, utilizes the hand structure template preset, determines a predetermined number unique point of this hand in the hand profile of every two field picture of testing image sequence; Sequence generates subelement, for every hand for user, utilizes the predetermined number unique point that this hand is corresponding in each two field picture of testing image sequence, obtains the characteristic point sequence to be measured of this hand.

Further, 3D image unit is used for: by catching the image of the user's hand in predetermined imaging region, obtains Detection Method in Optical Image Sequences and infrared image sequence for the pixel value at Detection Method in Optical Image Sequences i-th two field picture coordinate (x, y) place, and for the pixel value at infrared image sequence i-th two field picture coordinate (x, y) place, obtain the image sequence extracting user's both hands information according to following formula:

I_{T}^{i} (x, y) = \{\begin{matrix} \frac{{αI}_{I}^{i} (x, y) + {βI}_{C}^{i} (x, y)}{2} & I_{I}^{i} (x, y) &GreaterEqual; λ \\ 0 & I_{I}^{i} (x, y) < λ \end{matrix}

Wherein, α, β, λ are parameter preset threshold value, for the image sequence containing user's both hands of depth information obtained, as testing image sequence, i=1,2 ..., the number of image frames that M, M comprise for testing image sequence.

Further, contour detecting unit is used for: for testing image sequence in every two field picture this two field picture of color combining information deletion in noise spot and non-area of skin color, utilize edge detection operator E () to the image obtained after erased noise point and non-area of skin color carry out rim detection, obtain edge image edge image be the image only comprising user's hand profile.

Further, template matches subelement comprises: setting base determination module, it is for the every two field picture for testing image sequence, finds finger tip point in this outline line and refer to root articulation point according to the profile curvature of a curve in this image, using by finger tip point as setting base; Convergent-divergent benchmark determination module, it is for for the every two field picture after the process of setting base determination module, based on the setting base found in this two field picture, mates each finger root articulation point singly referred to, obtains the benchmark that each length singly referred to is used as scaling; Convergent-divergent and deformation module, it is for for the every two field picture after the process of convergent-divergent benchmark determination module, based on the finger tip point found and the position and each length singly referred to referring to root articulation point, convergent-divergent and deformation are carried out to corresponding hand structure template, obtained each articulations digitorum manus unique point and the wrist mid point unique point of every hand by coupling; Wherein, the hand structure template that template storing sub-units stores comprises left-handed configuration template and right hand configurations template, and left-handed configuration template and right hand configurations template comprise separately: the fingertip characteristic point of each finger, each articulations digitorum manus unique point, topological relation respectively between finger root joint characteristic point, wrist mid point unique point and each unique point.

Further, action recognition unit comprises: segmentation subelement, for the characteristic point sequence to be measured for every hand, is divided into multiple subsequence according to schedule time window by this characteristic point sequence to be measured, and obtains mean place corresponding to each subsequence; Matching sequence determination subelement, for for each subsequence corresponding to every hand, this subsequence is mated respectively with each in multiple default characteristic point sequence, to select in multiple default characteristic point sequence with the matching degree of this subsequence higher than the matching threshold preset and maximum default characteristic point sequence, as the matching sequence of this subsequence; Association subelement, is associated for the denomination of dive that mean place corresponding for each subsequence is corresponding with the matching sequence of this subsequence; Denomination of dive determination subelement, for for every hand, using the matching sequence of each subsequence corresponding for this hand as multiple matching sequences corresponding to this hand, and using the multiple denominations of dive of each for the plurality of matching sequence self-corresponding denomination of dive as this hand.

Further, gesture identification unit comprises: gesture table storing sub-units, is used as default gesture table for storing following map listing: the left end of each mapping in this map listing be set title to and the right position of each denomination of dive; The right-hand member of each mapping in this map listing is a gesture; Gesture table coupling subelement, for the left end of each mapping in default gesture table is mated with the denomination of dive of user's both hands and position, wherein, the coupling of denomination of dive performs strict coupling, position is then calculate relative position information by user's both hands mean place separately, and then the similarity calculated between this relative position information and the position mapping left end realizes.

Further, the electronic equipment with 3D camera assembly also comprises: display unit in real time, shows the mimic diagram of user's hand on the screen of the device for the position based on user every hand.

Further, real-time display unit is used for: the to be measured characteristic point sequence corresponding according to user every hand, obtains the outline figure of this hand, as the mimic diagram of this hand by extension after connection bone; By carrying out translation calibration and proportional zoom to the relative position of user's both hands, determine the display position of every hand in screen of user; In screen, the mimic diagram of user's hand is shown based on the mimic diagram of user every hand and display position.

Further, electronic equipment is one of following: mobile phone, multimedia play equipment, desktop computer, notebook computer and panel computer.

The above-mentioned electronic equipment with 3D camera assembly according to the embodiment of the present invention, first identify single-handed exercise, again by double-handed exercise identification gesture, and then according to identifying that gesture performs respective operations, the gesture of user accurately can be identified in interactive process, accuracy of identification is higher, and recognition speed is very fast.

In addition, because the embodiment of the present invention adopts depth camera to obtain the input manipulation instruction of human action as operation mobile computing device of user, therefore, it is possible to make user use more intuitively, contactless manipulation that natural action easily realizes mobile computing device, input accurately for the application of mobile computing device in the such as field such as motor play, simulated training, complicated manipulation, remote control provides more convenient, control.

The above-mentioned electronic equipment with 3D camera assembly of the present invention adopts Hierarchical Design algorithm, and algorithm complex is low, is convenient to realize.

In addition, apply the above-mentioned electronic equipment with 3D camera assembly of the present invention, (such as revise when needs change, increase or minimizing etc.) definition to action and/or gesture time, can by means of only adjustment template (namely, the definition of action is changed by revising denomination of dive corresponding to default characteristic point sequence, preset characteristic point sequence and respective action title thereof increase by increasing or reducing, subtract action) and default gesture table is (namely, the definition of gesture is changed by revising multiple actions that in default gesture table, gesture is corresponding, gesture in gesture table is preset and respective action increases by increasing or reducing, subtract gesture), and do not need to change algorithm or re-training sorter, substantially increase the adaptability of algorithm.

In addition, of the present invention above-mentionedly have the real-time of the electronic equipment of 3D camera assembly, can be applicable to the occasion of real-time interaction demand.

By below in conjunction with the detailed description of accompanying drawing to most preferred embodiment of the present invention, these and other advantage of the present invention will be more obvious.

Accompanying drawing explanation

The present invention can be better understood by reference to hereinafter given by reference to the accompanying drawings description, wherein employs same or analogous Reference numeral in all of the figs to represent identical or similar parts.Described accompanying drawing comprises in this manual together with detailed description below and forms the part of this instructions, and is used for illustrating the preferred embodiments of the present invention further and explaining principle and advantage of the present invention.In the accompanying drawings:

Fig. 1 illustrates the structural representation with an example of the electronic equipment of 3D camera assembly of the present invention;

Fig. 2 is the structural representation of an example of the characteristic point sequence determining unit 130 illustrated in Fig. 1;

Fig. 3 is the structural representation of an example of the template matches subelement 220 illustrated in Fig. 2;

Fig. 4 is the structural representation of an example of the action recognition unit 140 illustrated in Fig. 1;

Fig. 5 is the structural representation of an example of the gesture determining unit 150 illustrated in Fig. 1;

Fig. 6 illustrates the structural representation with another example of the electronic equipment of 3D camera assembly of the present invention.

The element that it will be appreciated by those skilled in the art that in accompanying drawing be only used to simple and clear for the purpose of illustrate, and not necessarily to draw in proportion.Such as, in accompanying drawing, the size of some element may be exaggerated relative to other elements, to contribute to improving the understanding to the embodiment of the present invention.

Embodiment

To be described one exemplary embodiment of the present invention by reference to the accompanying drawings hereinafter.For clarity and conciseness, all features of actual embodiment are not described in the description.But, should understand, must make a lot specific to the decision of embodiment in the process of any this practical embodiments of exploitation, to realize the objectives of developer, such as, meet those restrictive conditions relevant to system and business, and these restrictive conditions may change to some extent along with the difference of embodiment.In addition, although will also be appreciated that development is likely very complicated and time-consuming, concerning the those skilled in the art having benefited from present disclosure, this development is only routine task.

At this, also it should be noted is that, in order to avoid the present invention fuzzy because of unnecessary details, illustrate only in the accompanying drawings with according to the closely-related apparatus structure of the solution of the present invention and/or treatment step, and eliminate other details little with relation of the present invention.

The embodiment provides a kind of electronic equipment with 3D camera assembly, this electronic equipment comprises: 3D image unit, for catching the testing image sequence of the user's hand containing depth information; Contour detecting unit, for according to image depth information and image color information, detects the hand profile of user in every two field picture of testing image sequence; Characteristic point sequence determining unit, for every hand for user, utilizes the hand structure template preset, determines the characteristic point sequence to be measured of this hand in every two field picture of testing image sequence; Action recognition unit, for every hand for user, determines the matching sequence of the characteristic point sequence to be measured of this hand in multiple default characteristic point sequence, to determine denomination of dive and the position of this hand according to matching sequence; Gesture identification unit, for selecting the gesture matched with the denomination of dive of user's both hands and position in default gesture table, as identifying gesture; Instruction-determining unit, for according to predetermined registration operation instruction list, determines the operational order corresponding with identifying gesture; Performance element, for carrying out the operation corresponding with this operational order to the equipment relevant to the operational order determined.The electronic equipment of the above-mentioned 3D of having camera assembly can be such as any one in following equipment: mobile phone, multimedia play equipment, desktop computer, notebook computer and panel computer.

Fig. 1 shows the structural representation with an example of the electronic equipment of 3D camera assembly of the present invention.As shown in Figure 1, the electronic equipment 100 with 3D camera assembly comprises 3D image unit 110, contour detecting unit 120, characteristic point sequence determining unit 130, action recognition unit 140, gesture identification unit 150, instruction-determining unit 160 and performance element 170.

3D image unit 110 is for obtaining the testing image sequence of the user's hand containing depth information.Wherein, 3D image unit 110 such as can comprise two 3D cameras.3D camera is the depth camera comprising visible light image sensor and infrared image sensor, and visible light image sensor is for obtaining Detection Method in Optical Image Sequences the depth camera of infrared image sensor is then for obtaining infrared image sequence

Contour detecting unit 120, for according to image depth information and image color information, detects the hand profile of user in every two field picture of testing image sequence.Wherein, the hand profile detected may be both hands profile, also may be singlehanded profile.

Characteristic point sequence determining unit 130, for every hand for user, utilizes the hand structure template preset, determines the characteristic point sequence to be measured of this hand in every two field picture of testing image sequence

Action recognition unit 140, for every hand for user, determines the matching sequence of the characteristic point sequence to be measured of this hand in multiple default characteristic point sequence, to determine denomination of dive and the position of this hand according to matching sequence.

The gesture that gesture determining unit 150 matches with denomination of dive and the position of user's both hands for selection in default gesture table, as identifying gesture.

Instruction-determining unit 160, for according to predetermined registration operation instruction list, determines the operational order corresponding with identifying gesture.

Performance element 170 is for carrying out the operation corresponding with this operational order to the equipment relevant to the operational order determined.Thus, the operational order determined is sent to relevant device, can realizes the personalizing of the relevant device of such as mobile computing device, naturalization, non-contacting operation and controlling.

According to a kind of implementation, 3D image unit 110 may be used for: by catching the image of the user's hand in predetermined imaging region, (such as can utilize the visible light image sensor in depth camera and infrared image sensor) obtains Detection Method in Optical Image Sequences and infrared image sequence for the pixel value at Detection Method in Optical Image Sequences i-th two field picture coordinate (x, y) place, and for the pixel value at infrared image sequence i-th two field picture coordinate (x, y) place, can obtain according to following formula the image sequence extracting user's both hands information:

I_{T}^{i} (x, y) = \{\begin{matrix} \frac{{αI}_{I}^{i} (x, y) + {βI}_{C}^{i} (x, y)}{2} & I_{I}^{i} (x, y) &GreaterEqual; λ \\ 0 & I_{I}^{i} (x, y) < λ \end{matrix}

Wherein, α, β, λ are parameter preset threshold value, these parameter preset threshold values can set based on experience value, also can be determined by the method for test (such as being obtained by the actual sample image training using the depth camera of specific model to collect), repeat no more here. for the image sequence containing user's both hands of depth information obtained, as above-mentioned testing image sequence.In addition, i=1,2 ..., M, M are number of image frames included in testing image sequence.

It should be noted that, according to the difference (single or two) of the hand quantity that user's gesture uses, the image of catching in predetermined imaging region may be the image comprising user's both hands, also may be the image only comprising user's single hand.In addition, the testing image sequence of acquisition can obtain in a period of time, and this time period can be arranged in advance based on experience value, such as, can be 10 seconds.

According to a kind of implementation, contour detecting unit 120 may be used for: for testing image sequence in every two field picture this two field picture of color combining information deletion in noise spot and non-area of skin color, utilize edge detection operator E () to the image obtained after erased noise point and non-area of skin color carry out rim detection, thus obtain edge image

I_{T f}^{i} (x, y) = E (I_{T e}^{i} (x, y))

Edge image be the image only comprising user's hand profile.

Wherein, in the processing procedure of " noise spot in this two field picture of color combining information deletion and non-area of skin color ", the noise spot that existing denoising method is come in deleted image can be utilized, and can computed image be passed through average obtain area of skin color, then the region outside area of skin color is non-area of skin color, can realize the deletion to non-area of skin color.Such as, image is obtained average after, to fluctuate a scope in this average, obtain the color gamut comprising this average, when the color value of certain point drops within this color gamut in image, then this point is determined it is colour of skin point, otherwise not think it is colour of skin point; All colour of skin points form area of skin color, and all the other are non-area of skin color.

Thus, by the process of contour detecting unit 120, the hand profile of user can be detected fast, improve speed and the efficiency of whole process.

According to a kind of implementation, template storing sub-units 210, template matches subelement 220 and sequence that characteristic point sequence determining unit 130 can comprise as shown in Figure 2 generate subelement 230.

Wherein, template storing sub-units 210 may be used for storing the hand structure template preset.

According to a kind of implementation, hand structure template can comprise left-handed configuration template and right hand configurations template, and left-handed configuration template and right hand configurations template comprise the topological relation between a predetermined number unique point and each unique point separately.

In one example in which, left-handed configuration template and right hand configurations template can comprise following 20 separately (as the example of predetermined number, but predetermined number is not limited to 20, also can be the numerical value such as 19,21) individual unique point: the fingertip characteristic point (5) of each finger, each articulations digitorum manus unique point (9), respectively finger root joint characteristic point (5), wrist mid point unique point (1).

As shown in Figure 2, template matches subelement 220 can for every hand of user, utilize above-mentioned default hand structure template, respectively the hand profile in every two field picture of testing image sequence is carried out mating, aliging with hand structure template (tiled configuration template and right hand configurations template), obtain predetermined number (the such as 20) unique point in this two field picture hand profile.

Then, sequence generates subelement 230 and for every hand of user, can utilize the predetermined number unique point (i.e. feature point set) that this hand is corresponding in each two field picture of testing image sequence, obtain the characteristic point sequence to be measured of this hand.

Like this, carry out by hand structure template and each hand profile obtained before (i.e. hand profile in every two field picture of testing image sequence) process such as mating, the predetermined number unique point in each hand profile can be obtained quickly and accurately.Thereby, it is possible to make subsequent treatment utilize the described predetermined number unique point in these profiles to realize gesture identification further, compared to prior art, improve speed and the accuracy of whole man-machine dialogue system.

In the prior art, when needing to change (such as revise, increase or minimizing etc.) definition to action according to different application scene, amendment algorithm and re-training sorter is needed; In the present invention, the change that can realize action definition by means of only adjustment action template (namely preset characteristic point sequence), substantially increases the adaptability of Gesture Recognition.

In one example in which, template matches subelement 220 can comprise setting base determination module 310, convergent-divergent benchmark determination module 320 and convergent-divergent as shown in Figure 3 and deformation module 330.

According to the physiological structure feature of mankind's both hands, 20 (example as predetermined number) individual unique point can be got by setting base determination module 310, convergent-divergent benchmark determination module 320 and convergent-divergent and deformation module 330 to often only portable.

For every two field picture of testing image sequence perform following process: first, by setting base determination module 310 according to this image in profile curvature of a curve find finger tip point in this outline line and refer to root articulation point; Then, this two field picture of having found based on setting base determination module 310 of convergent-divergent benchmark determination module 320 outline line in setting base, mate each finger root articulation point singly referred to, obtain the benchmark of each length singly referred to as scaling; Finally, convergent-divergent and deformation module 330 carry out convergent-divergent and deformation based on the finger tip point found and the position referring to root articulation point and the parameter of each length two aspect singly referred to that obtains to corresponding hand structure template, remaining 10 unique point of every hand are obtained, i.e. each articulations digitorum manus unique point of every hand and wrist mid point unique point by coupling.

Such as, outline line is being looked for in finger tip point and refer in the process of root articulation point, can using salient point maximum for its mean curvature as finger tip point, using concave point maximum for curvature as webs minimum point, and be the unit length that this finger tip point is corresponding by the distance definition between each finger tip point to the adjacent webs minimum point of this finger tip point.To every two adjacent webs minimum points, this mid point of 2 is extended again the point of 1/3rd unit lengths (unit length that the finger tip point of unit length now for this reason between 2 is corresponding) toward volar direction, be defined as the finger root articulation point that this finger tip point is corresponding, 3, the centre that can obtain every hand thus refers to root articulation point.In addition, for every hand, root articulation point can be referred to by the head and the tail two obtaining this hand in the process of follow-up convergent-divergent and deformation; Or, also can using the distance between two of this hand webs minimum point that (such as selecting arbitrarily two) is adjacent as finger reference width, then by each for head and the tail two webs minimum points of this hand tangentially, stretch out half finger reference width, the point obtained refers to root articulation point respectively as the head and the tail two of this hand.

It should be noted that, if the salient point found for single hand is more than 5, unnecessary salient point can be removed itself and hand structure template being carried out mate in the process of aliging.

Thus, by setting base determination module 310, convergent-divergent benchmark determination module 320 and convergent-divergent and deformation module 330,20 the unique point Pl={pl obtaining left hand corresponding to each two field picture can be mated ₁, pl ₂..., pl ₂₀and 20 unique point Pr={pr of the right hand ₁, pr ₂..., pr ₂₀.It should be noted that, if user's gesture only comprises single hand, then what obtained by above coupling is 20 unique points (be called feature point set) of this single hand in every two field picture, i.e. Pl={pl ₁, pl ₂..., pl ₂₀or Pr={pr ₁, pr ₂..., pr ₂₀.Wherein, pl ₁, pl ₂..., pl ₂₀be respectively the position of left hand 20 unique points, and pr ₁, pr ₂..., pr ₂₀be respectively the position of the right hand 20 unique points.

If user's gesture comprises both hands, then can be obtained the characteristic point sequence { Pl to be measured of left hand by above process _i, i=1,2 ..., the characteristic point sequence { Pr to be measured of M} and the right hand _i, i=1,2 ..., M}.Wherein, Pl _ifor 20 (example as predetermined number) individual unique point that user's left hand is corresponding in the i-th two field picture of testing image sequence, and Pr _ifor 20 (example as predetermined number) individual unique point that user's right hand is corresponding in the i-th two field picture of testing image sequence.

If user's gesture only comprises single hand, then the every two field picture in the testing image sequence of catching is all the images only comprising this single hand, thus by the characteristic point sequence to be measured of this single hand can be obtained after above process, i.e. { Pl _i, i=1,2 ..., M} or { Pr _i, i=1,2 ..., M}.

According to a kind of implementation, action recognition unit 140 can comprise segmentation subelement 410, matching sequence determination subelement 420, association subelement 430 and denomination of dive determination subelement 440 as shown in Figure 4.

As shown in Figure 4, this characteristic point sequence to be measured for the characteristic point sequence to be measured of every hand, can be divided into multiple subsequence according to schedule time window by segmentation subelement 410, and obtains mean place corresponding to each subsequence.Wherein, the mean place that each subsequence is corresponding can choose specific characteristic point (as wrist mid point, or also can be other unique points) mean place in this subsequence.Wherein, schedule time window is about a singlehanded elemental motion (namely singlehanded hold, the grab) time from start to end, and can set based on experience value, maybe can be determined by the method for test, such as, can be 2.5 seconds.

In one example in which, suppose that characteristic point sequence to be measured gathered in 10 seconds, segmentation subelement 410 utilizes the time window of 2.5 seconds the characteristic point sequence to be measured of the characteristic point sequence to be measured of left hand and the right hand can be divided into 4 subsequences respectively.With the characteristic point sequence { Pl to be measured of left hand _i, i=1,2 ..., M} is the example (characteristic point sequence { Pr to be measured of the right hand _i, i=1,2 ..., M} is similar with it, no longer describes in detail here), suppose collection 10 two field picture per second, then that characteristic point sequence to be measured is corresponding is 100 two field pictures, i.e. M=100, that is, { Pl _i, i=1,2 ..., M} comprises 100 stack features point set Pl ₁, Pl ₂..., Pl ₁₀₀.Like this, by the time window of above-mentioned 2.5 seconds, can by { Pl _i, i=1,2 ..., M} is divided into { Pl _i, i=1,2 ..., 25}, { Pl _i, i=25,26 ..., 50}, { Pl _i, i=51,52 ..., 75} and { Pl _i, i=76,77 ..., 100}4 subsequence, and each corresponding 25 two field pictures of each subsequence, also, each subsequence respectively comprises 25 stack features point sets.Specific characteristic point chooses wrist mid point, with subsequence { Pl _i, i=1,2 ..., 25} is example (its excess-three sub-sequence is similar to its process, no longer describes in detail here), and wrist mid point is at { Pl _i, i=1,2 ..., the position that the 25 stack features points that 25} is corresponding are concentrated is respectively position p ₁, p ₂..., p ₂₅so wrist mid point is at subsequence { Pl _i, i=1,2 ..., the mean place in 25} is (p ₁+ p ₂+ ... + p ₂₅)/25, as subsequence { Pl _i, i=1,2 ..., the mean place that 25} is corresponding.

Then, matching sequence determination subelement 420 can for each subsequence corresponding to every hand, this subsequence is mated respectively with each in multiple default characteristic point sequence, in multiple default characteristic point sequence, select with the matching degree of this subsequence that (this matching threshold can set based on experience value higher than the matching threshold preset, or also can be determined by the method for test) and maximum that default characteristic point sequence, as the matching sequence of this subsequence.Wherein, matching sequence determination subelement 420 can calculate the similarity between subsequence and default characteristic point sequence, is used as matching degree therebetween.

Wherein, multiple default characteristic point sequence can be set in advance in a hand motion list of file names, this hand motion list of file names comprises basic hand motion, such as: wave, push away, draw, opening and closing, to turn, the template that each action has unique name identification and represents with normalized hand-characteristic point sequence (namely default characteristic point sequence).It should be noted that, for the both hands of user, every hand all has an above-mentioned hand motion list of file names.That is, for left hand, each action that the hand motion list of file names of left hand (being called for short left hand action list of file names) comprises, except having respective title respectively, also has a left hand template (i.e. a default characteristic point sequence of left hand); For the right hand, each action that the hand motion list of file names of the right hand (being called for short right hand action list of file names) comprises, except having respective title respectively, also has a right hand template (i.e. a default characteristic point sequence of the right hand).

Such as, the multiple default characteristic point sequence of single hand is designated as sequence A respectively ₁, sequence A ₂..., sequence A _h, wherein, the sequence number that above-mentioned multiple default characteristic point sequence that H is this single hand comprise, then in the hand motion list of file names of this single hand: the name identification of action 1 is " waving " and the template of correspondence (namely presetting characteristic point sequence) is sequence A ₁; The name identification of action 2 is " pushing away " and the template of correspondence is sequence A ₁; The name identification of action H is " turning " and the template of correspondence is sequence A ₁.

It should be noted that, for each subsequence, and not necessarily can find the matching sequence that this subsequence is corresponding in multiple default characteristic point sequence.When certain subsequence for single hand does not find its matching sequence, then the matching sequence of this subsequence is designated as " sky ", but the mean place of this subsequence can not be " sky ".According to a kind of implementation, if the matching sequence of subsequence is " sky ", then the mean place of this subsequence is set to " sky "; According to another kind of implementation, if the matching sequence of subsequence is " sky ", the mean place of this subsequence is the actual average position of specifying unique point in this subsequence; According to other a kind of implementations, if the matching sequence of subsequence is " sky ", the mean place of this subsequence is set to "+∞ ".

In addition, according to a kind of implementation, if there is not specific characteristic point (also namely there is not the actual average position of this specific characteristic point) in subsequence, the mean place of this subsequence can be set to "+∞ ".

Then, as shown in Figure 4, associate subelement 430 denomination of dive that mean place corresponding for each subsequence is corresponding with the matching sequence of this subsequence to be associated.

Like this, denomination of dive determination subelement 440 can for every hand, using the matching sequence of each subsequence corresponding for this hand as multiple matching sequences corresponding to this hand, and using the multiple denominations of dive of each for the plurality of matching sequence self-corresponding denomination of dive (in chronological order after sequence) as this hand.

Such as, suppose that for multiple subsequences of the characteristic point sequence to be measured of left hand be { Pl _i, i=1,2 ..., 25}, { Pl _i, i=25,26 ..., 50}, { Pl _i, i=51,52 ..., 75} and { Pl _i, i=76,77 ..., 100}, finds { Pl in multiple default characteristic point sequence leftward respectively _i, i=1,2 ..., 25}, { Pl _i, i=25,26 ..., 50}, { Pl _i, i=51,52 ..., the matching sequence of 75} is followed successively by Pl ₁', Pl ₂', Pl ₃', and do not find { Pl _i, i=76,77 ..., the matching sequence of 100}.Suppose Pl ₁', Pl ₂', Pl ₃' denomination of dive corresponding in action list of file names respectively leftward is " waving ", " pushing away ", " drawing ", { Pl _i, i=1,2 ..., 25}, { Pl _i, i=25,26 ..., 50}, { Pl _i, i=51,52 ..., 75} and { Pl _i, i=76,77 ..., 100} mean place is separately respectively pm ₁, pm ₂, pm ₃and pm ₄, then denomination of dive and the position of the left hand obtained thus comprise: " waving " (position pm ₁); " push away " (position pm ₂); " draw " (position pm ₃); " sky " (position " pm ₄").Should be noted that and be, in different embodiments, pm ₄may be actual position value, also may be " sky " or "+∞ " etc.

Thus, by the process of segmentation subelement 410, matching sequence determination subelement 420, association subelement 430 and denomination of dive determination subelement 440, multiple denominations of dive corresponding to user every hand can be obtained (as the denomination of dive of this hand, that is, the denomination of dive of this hand), and each denomination of dive is associated with a mean place respectively (as the position of this hand, " position of this hand " comprises one or more mean place, and quantity is identical with the quantity of denomination of dive).Compared to only identifying the recognition technology of individual part as gesture, the respective multiple action of the both hands adopting the process of composition as shown in Figure 4 to identify and position, provide array mode more flexibly, make the accuracy of identification of gesture higher on the one hand, the gesture making it possible on the other hand identify is more various, abundant.

In addition, according to a kind of implementation, the process of gesture determining unit 150 can be realized by structure as shown in Figure 5.As shown in Figure 5, gesture determining unit 150 can comprise gesture table storing sub-units 510 and gesture table coupling subelement 520.

As shown in Figure 5, predefined one can manually to be done and two, position key element be stored as default gesture table to the map listing of gesture from two by gesture determining unit 150: the left end of each mapping be set title to and the right position of each denomination of dive; The right-hand member of each mapping is a gesture HandSignal.

Wherein, " set title to " comprises multiple denomination of dive pair, and each denomination of dive is to comprising left hand denomination of dive ActName _leftwith right hand denomination of dive ActName _right, the right position of each denomination of dive comprises the relative position of two hands.

Such as, in default gesture table, map one for { (" drawing ", " sky "), (" drawing ", " draw "), (" sky ", " conjunction "), (" sky ", " sky ") (as key element one), { (x ₁, y ₁), (x ₂, y ₂), (x ₃, y ₃), (x ₄, y ₄) (relative position, as key element two) to the mapping of gesture " switch "; Map two for { (" drawing ", " drawing "), (" opening ", " opening "), (" sky ", " sky "), (" sky ", " sky ") }, { (x ₅, y ₅), (x ₆, y ₆), (x ₇, y ₇), (x ₈, y ₈) to the mapping of gesture " blast "; Etc..Wherein, each action corresponds to left hand action to the denomination of dive on the left side in (as (" drawing ", " sky ")), and the denomination of dive on the right corresponds to right hand action.

To map one, (x ₁, y ₁) what represent is that left hand first element " draws " relative position between right hand first element " sky " (namely action is to the relative position of left hand action in (" drawing ", " sky ") and two hands corresponding to right hand action); (x ₂, y ₂) represent be left hand second action " draw " and the right hand second action " draw " between relative position; (x ₃, y ₃) what represent is relative position between left hand the 3rd action " sky " and the right hand the 3rd action " conjunction "; And (x ₄, y ₄) what represent is relative position between left hand the 4th action " sky " and the right hand the 4th action " sky ".Elocutionary meaning in other mappings is similar, repeats no more.

Like this, the denomination of dive of the left end of each mapping in default gesture table and user's both hands and position can mate, using the gesture of will mate with user's double-handed exercise Name & Location as identifying gesture by gesture table coupling subelement 520.

Wherein, the coupling of denomination of dive performs strict coupling, also, judges that these two denominations of dive are couplings of verbatim account between two denominations of dive; Position is then calculate relative position information by user's both hands mean place separately, and then (as a similarity threshold can be set, judging that when the similarity calculated is more than or equal to this similarity threshold position is coupling) that the similarity calculated between this relative position information and the position mapping left end realizes.

Such as, suppose to obtain user's both hands denomination of dive separately for { (" drawing ", " drawing "), (" opening " by action recognition unit 140, " open "), (" sky ", " sky "), (" sky ", " sky "), position is { (x ₁₁, y ₁₂), (x ₂₁, y ₂₂), (x ₃₁, y ₃₂), (x ₄₁, y ₄₂) (corresponding left hand); (x ' ₁₁, y ' ₁₂), (x ' ₂₁, y ' ₂₂), (x ' ₃₁, y ' ₃₂), (x ' ₄₁, y ' ₄₂) (corresponding left hand).

Like this, the left end of the denomination of dive of user's both hands with each mapping in default gesture table mates by gesture table coupling subelement 520.

When mating with mapping one, can draw, the denomination of dive of user's both hands does not mate with the denomination of dive of the left end mapping, therefore ignores mapping one, continues coupling mapping two.

When mating with mapping two, can draw, the denomination of dive of user's both hands mates completely with the denomination of dive of the left end mapping two, and then is mated by the relative position of the position of user's both hands with the left end mapping two.

Carrying out in the process of mating by the position of user's both hands with the relative position of the left end mapping two, the relative position first calculating user's both hands is as follows: (x ' ₁₁-x ₁₁, y ' ₁₂-y ₁₂), (x ' ₂₁-x ₂₁, y ' ₂₂-y ₂₂), (x ' ₃₁-x ₃₁, y ' ₃₂-y ₃₂), (x ' ₄₁-x ₄₁, y ' ₄₂-y ₄₂) (corresponding left hand).Then, by the above-mentioned relative position of the user's both hands calculated and the relative position { (x mapping two left ends ₅, y ₅), (x ₆, y ₆), (x ₇, y ₇), (x ₈, y ₈) mate, i.e., calculate (x ' ₁₁-x ₁₁, y ' ₁₂-y ₁₂), (x ' ₂₁-x ₂₁, y ' ₂₂-y ₂₂), (x ' ₃₁-x ₃₁, y ' ₃₂-y ₃₂), (x ' ₄₁-x ₄₁, y ' ₄₂-y ₄₂) (corresponding left hand) and { (x ₅, y ₅), (x ₆, y ₆), (x ₇, y ₇), (x ₈, y ₈) between similarity, suppose that the similarity calculated is 95%.In this example embodiment, if similarity threshold is 80%, so judge that the relative position of the user's both hands calculated mates with the relative position mapping two left ends.Thus, in this example embodiment, the result of man-machine interaction is " blast ".

Thus, utilize gesture table to mate subelement 520, determined the gesture of user by the respective multiple action of both hands and mating between position with prearranged gesture table, make the precision that identifies higher; When needing to change (such as revise, increase or minimizing etc.) definition to gesture according to different application scene, do not need amendment algorithm or re-training sorter, the change that can realize definition of gesture by means of only modes such as the gesture title in adjustment prearranged gesture table or denomination of dive corresponding to gesture, substantially increases the adaptability of algorithm.

According to a kind of implementation, instruction-determining unit 160 can set up a mapping relations table between a gesture title and operational order, as above-mentioned predetermined registration operation instruction list.This predetermined registration operation instruction list comprises multiple mapping, the left side of each mapping is the title of a default gesture, and the right operational order that to be gesture default with this corresponding (such as the basic operation instruction that mobile computing device graphical interfaces operates, such as Focal Point Shift, click, double-click, click drag, amplify, reduce, rotate, longly to touch).Thus, that operational order OptCom corresponding with identifying gesture HandSignal can be obtained by table lookup operation.

In addition, as shown in Figure 6, have in another example of the electronic equipment of 3D camera assembly according to the embodiment of the present invention, the electronic equipment 600 with 3D camera assembly, except can comprising 3D image unit 610, contour detecting unit 620, characteristic point sequence determining unit 630, action recognition unit 640, gesture identification unit 650, instruction-determining unit 660 and performance element 670, can also comprise real-time display unit 680.Wherein, 3D image unit 610 shown in Fig. 6, contour detecting unit 620, characteristic point sequence determining unit 630, action recognition unit 640, gesture identification unit 650, instruction-determining unit 660 and performance element 670 can have the 26S Proteasome Structure and Function identical with corresponding unit in the electronic equipment 100 with 3D camera assembly shown in Fig. 1 respectively, and similar effect can be reached, repeat no more here.

According to a kind of implementation, real-time display unit 680 can show the mimic diagram of user's hand on the screen of the electronic device based on the position of user every hand.

Such as, real-time display unit 680 can be used for: according to the characteristic point sequence to be measured corresponding in every two field picture of testing image sequence of user every hand 20 unique points of every hand (in such as every two field picture), the outline figure of this hand is obtained, as the mimic diagram of this hand by extension after connection bone; By carrying out translation calibration and proportional zoom to the relative position of user's both hands, determine the display position of every hand in described screen of user; In screen, the mimic diagram of user's hand is shown based on the mimic diagram of user every hand and display position.

Thus, visual feedback can be provided by showing translucent hand figure on the screen of mobile computing device to user, and help user to adjust hand position and operation.It should be noted that, when performing the process of " by carrying out translation calibration and proportional zoom to the relative position of user's both hands ", if identified in gesture single the hand only comprising user, then there is not relative position (or relative position is designated as infinity), now, single the hand that the initial position display can specified at is corresponding.In addition, when performing the process of " mimic diagram showing user's hand based on the mimic diagram of user every hand and display position in screen ", if identified, gesture comprises both hands, then show the mimic diagram of both hands; If identified, gesture only comprises single hand, then only show the mimic diagram of this hand.

Such as, in actual applications, above-mentioned technology just of the present invention can be applied to panel computer or notebook computer, realize contactless gesture operation.In this application scenarios, depth camera is installed on above panel computer or notebook computer screen, and just to user, user lifts before both hands are placed in screen, make related gesture operation: 1, substitute movement and the clicking operation that entity mouse realizes cursor of mouse; 2, in game or related software operation, scene navigational is realized by gesture, and the operation such as the convergent-divergent of object, rotation, translation.

Although the embodiment according to limited quantity describes the present invention, benefit from description above, those skilled in the art understand, in the scope of the present invention described thus, it is contemplated that other embodiment.In addition, it should be noted that the language used in this instructions is mainly in order to object that is readable and instruction is selected, instead of select to explain or limiting theme of the present invention.Therefore, when not departing from the scope and spirit of appended claims, many modifications and changes are all apparent for those skilled in the art.For scope of the present invention, be illustrative to disclosing of doing of the present invention, and nonrestrictive, and scope of the present invention is defined by the appended claims.

Claims

1. there is an electronic equipment for 3D camera assembly, it is characterized in that, described in there is 3D camera assembly electronic equipment comprise:

3D image unit, for catching the testing image sequence of the user's hand containing depth information;

Contour detecting unit, for according to image depth information and image color information, detects the hand profile of described user in every two field picture of described testing image sequence;

Characteristic point sequence determining unit, for every hand for described user, utilizes the hand structure template preset, determines the characteristic point sequence to be measured of this hand in every two field picture of described testing image sequence;

Action recognition unit, for every hand for described user, determines the matching sequence of the characteristic point sequence to be measured of this hand in multiple default characteristic point sequence, to determine denomination of dive and the position of this hand according to described matching sequence;

Gesture identification unit, for selecting the gesture matched with the denomination of dive of described user's both hands and position in default gesture table, as identifying gesture;

Instruction-determining unit, for according to predetermined registration operation instruction list, determines to have identified with described the operational order that gesture is corresponding;

Performance element, for carrying out the operation corresponding with this operational order to the equipment relevant to the operational order determined.

2. the electronic equipment with 3D camera assembly according to claim 1, is characterized in that, described characteristic point sequence determining unit comprises:

Template storing sub-units, for storing default hand structure template;

Template matches subelement, for every hand for described user, utilizes the hand structure template preset, determines a predetermined number unique point of this hand in the hand profile of every two field picture of described testing image sequence;

Sequence generates subelement, for every hand for described user, utilizes the predetermined number unique point that this hand is corresponding in each two field picture of described testing image sequence, obtains the characteristic point sequence to be measured of this hand.

3. the electronic equipment with 3D camera assembly according to claim 1 and 2, is characterized in that, described 3D image unit is used for:

By catching the image of the described user's hand in predetermined imaging region, obtain Detection Method in Optical Image Sequences and infrared image sequence for the pixel value at described Detection Method in Optical Image Sequences i-th two field picture coordinate (x, y) place, and for the pixel value at described infrared image sequence i-th two field picture coordinate (x, y) place, obtain the image sequence extracting user's both hands information according to following formula:

I_{T}^{i} (x, y) = \{\begin{matrix} \frac{{αI}_{I}^{i} (x, y) + {βI}_{C}^{i} (x, y)}{2} & I_{I}^{i} (x, y) &GreaterEqual; λ \\ 0 & I_{I}^{i} (x, y) < λ \end{matrix}

Wherein, α, β, λ are parameter preset threshold value, for the image sequence containing user's both hands of depth information obtained, as described testing image sequence, i=1,2 ..., the number of image frames that M, M comprise for described testing image sequence.

4. the electronic equipment with 3D camera assembly according to claim 1 and 2, is characterized in that, described contour detecting unit is used for:

For described testing image sequence in every two field picture this two field picture of color combining information deletion in noise spot and non-area of skin color, utilize edge detection operator E () to the image obtained after deleting described noise spot and described non-area of skin color carry out rim detection, obtain edge image

I_{T f}^{i} (x, y) = E (I_{T e}^{i} (x, y))

Described edge image be the image only comprising described user's hand profile.

5. the electronic equipment with 3D camera assembly according to claim 2, is characterized in that, described template matches subelement comprises:

Setting base determination module, it is for the every two field picture for described testing image sequence, finds finger tip point in this outline line and refer to root articulation point according to the profile curvature of a curve in this image, using by described finger tip point as setting base;

Convergent-divergent benchmark determination module, it is for for the every two field picture after the process of described setting base determination module, based on the described setting base found in this two field picture, mate each finger root articulation point singly referred to, obtain the benchmark that each length singly referred to is used as scaling;

Convergent-divergent and deformation module, it is for for the every two field picture after the process of described convergent-divergent benchmark determination module, based on the position of the described finger tip point found and described finger root articulation point and each length singly referred to, convergent-divergent and deformation are carried out to corresponding described hand structure template, obtained each articulations digitorum manus unique point and the wrist mid point unique point of every hand by coupling;

Wherein, the described hand structure template that described template storing sub-units stores comprises left-handed configuration template and right hand configurations template, and described left-handed configuration template and right hand configurations template comprise separately: the fingertip characteristic point of each finger, each articulations digitorum manus unique point, topological relation respectively between finger root joint characteristic point, wrist mid point unique point and each unique point.

6. the electronic equipment with 3D camera assembly according to claim 1 and 2, is characterized in that, described action recognition unit comprises:

Segmentation subelement, for the characteristic point sequence to be measured for every hand, is divided into multiple subsequence according to schedule time window by this characteristic point sequence to be measured, and obtains mean place corresponding to each subsequence;

Matching sequence determination subelement, for for each subsequence corresponding to every hand, this subsequence is mated respectively with each in described multiple default characteristic point sequence, to select in described multiple default characteristic point sequence with the matching degree of this subsequence higher than the matching threshold preset and maximum default characteristic point sequence, as the matching sequence of this subsequence;

Association subelement, is associated for the denomination of dive that mean place corresponding for each subsequence is corresponding with the matching sequence of this subsequence;

Denomination of dive determination subelement, for for every hand, using the matching sequence of each subsequence corresponding for this hand as multiple matching sequences corresponding to this hand, and using the multiple denominations of dive of each for the plurality of matching sequence self-corresponding denomination of dive as this hand.

7. the electronic equipment with 3D camera assembly according to claim 1 and 2, is characterized in that, described gesture identification unit comprises:

Gesture table storing sub-units, is used as described default gesture table for storing following map listing: the left end of each mapping in this map listing be set title to and the right position of each denomination of dive; The right-hand member of each mapping in this map listing is a gesture;

Gesture table coupling subelement, for the left end of each mapping in described default gesture table is mated with the denomination of dive of described user's both hands and position, wherein, the coupling of denomination of dive performs strict coupling, position is then calculate relative position information by user's both hands mean place separately, and then the similarity calculated between this relative position information and the position mapping left end realizes.

8. the electronic equipment with 3D camera assembly according to claim 1 and 2, is characterized in that, described in there is 3D camera assembly electronic equipment also comprise:

Real-time display unit, shows the mimic diagram of described user's hand on the screen of described equipment for the position based on described user every hand.

9. the electronic equipment with 3D camera assembly according to claim 8, it is characterized in that, described real-time display unit is used for: the to be measured characteristic point sequence corresponding according to described user every hand, the outline figure of this hand is obtained, as the mimic diagram of this hand by extension after connection bone; By carrying out translation calibration and proportional zoom to the relative position of described user's both hands, determine the display position of every hand in described screen of described user; In described screen, the mimic diagram of described user's hand is shown based on the mimic diagram of described user every hand and display position.

10. the electronic equipment with 3D camera assembly according to claim 1, is characterized in that, described electronic equipment is one of following: mobile phone, multimedia play equipment, desktop computer, notebook computer and panel computer.