CN108682021A

CN108682021A - Rapid hand tracking, device, terminal and storage medium

Info

Publication number: CN108682021A
Application number: CN201810349972.XA
Authority: CN
Inventors: 阮晓雯; 王健宗; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-04-18
Filing date: 2018-04-18
Publication date: 2018-10-19
Anticipated expiration: 2038-04-18
Also published as: WO2019200785A1; CN108682021B

Abstract

A kind of rapid hand tracking, including：The video for including human hands region of imaging device acquisition is shown on display interface；Receive the calibration frame that user demarcates on the video comprising human hands region；The gradient orientation histogram feature in the fixed region of the calibration collimation mark is extracted, and is split to obtain hand images according to the gradient orientation histogram feature region fixed to the calibration collimation mark；And using continuous adaptive Mean Shift operator to the hand images into line trace.The present invention also provides a kind of rapid hand tracks of device, terminal and storage mediums.The present invention can quickly extract the HOG features in the calibration frame of calibration, accurately divide hand region according to HOG features, obtain preferable tracking effect.

Description

Rapid hand tracking, device, terminal and storage medium

Technical field

The present invention relates to hand tracking technique fields, and in particular to a kind of rapid hand tracking, device, terminal and deposits Storage media.

Background technology

A kind of important means of the gesture as natural interaction has important researching value and is widely applied foreground.Hand The first step and a most important step that gesture identifies and hand tracks, are to split hand region from figure.Hand The quality of region segmentation directly influences subsequent gesture identification and the effect of gesture tracking.

During people and robot interactive, the video capture device installed in robot has a certain distance with human body When, include Whole Body in the photo of acquisition.Since such photo is there are a large amount of backgrounds, hand region is very little in picture How a part detects hand from a large amount of background areas, and it is fast and accurately split, and is to be worth research Problem.

Invention content

In view of the foregoing, it is necessary to propose a kind of rapid hand tracking, device, terminal and storage medium, it can Shorten the time of extraction hand region, accuracy and the efficiency of hard recognition and hand tracking is improved, especially in complex background Under hand region tracking, tracking efficiency it is preferable.

The first aspect of the present invention provides a kind of rapid hand tracking, the method includes：

The video for including human hands region of imaging device acquisition is shown on display interface；

Receive the calibration frame that user demarcates on the video comprising human hands region；

The gradient orientation histogram feature in the fixed region of the calibration collimation mark is extracted, and according to the gradient orientation histogram The feature region fixed to the calibration collimation mark is split to obtain hand images；And

Using continuous adaptive Mean Shift operator to the hand images into line trace, wherein described using continuous Adaptive Mean Shift operator specifically includes the hand images into line trace：

The color space of the hand images is transformed into HSV color spaces, isolates the hand images of chrominance component, base In the centroid position and size of the search box of the hand images I (i, j) and initialization of the chrominance component, current search window is calculated Centroid position (M₁₀/M₀₀,M₀₁/M₀₀) and current search window sizeWherein,AndFor the first moment of current search window,To work as The zeroth order square of preceding search window, i are the pixel value in the horizontal direction of I (i, j), the pixel value in the vertical direction of j I (i, j).

In a kind of preferred realization method, described imaging device acquisition is shown on display interface includes human hands area The video in domain further includes：

Pre-set standard calibration frame, the pre-set display mode packet are shown with pre-set display mode Include the combination of one or more of：

When receiving idsplay order, the pre-set standard calibration frame is shown；

When receiving hiding instruction, the pre-set standard calibration frame is hidden；

The pre-set standard calibration frame is shown receiving the idsplay order, and is not received by later any When the time of instruction is more than preset time period, the pre-set standard calibration frame is hidden automatically.

It is described to receive what user demarcated on the video comprising human hands region in a kind of preferred realization method Demarcating frame includes:

The standard calibration frame that user demarcates on the video comprising human hands region is received, including：

Receive user it is described comprising the video in human hands region in the rough calibration frame that draws；

Pre-set standard calibration frame corresponding with the rough calibration frame is matched by the method for fuzzy matching；

According to the standard calibration frame matched to being demarcated in the video comprising human hands region and showing mark Fixed standard calibration frame, wherein the geometric center of the geometric center and the standard calibration frame matched of the rough calibration frame It is identical.

It is described to receive what user demarcated on the video comprising human hands region in a kind of preferred realization method Demarcating frame further includes：

The standard calibration frame that user demarcates on the video comprising human hands region is received, including:

The standard calibration frame that user chooses directly is received, according to the standard calibration frame described comprising human hands region Video on demarcated and shown the standard calibration frame of calibration.

In a kind of preferred realization method, the reception user is in the video subscript for including human hands region Fixed standard calibration frame further includes：When receiving the instruction of amplification, diminution, movement, deletion, the standard calibration frame of display is carried out Amplification is reduced, is mobile, deleting.

In a kind of preferred realization method, the method further includes：

The region fixed to the standard calibration collimation mark pre-processes, and the pretreatment may include one or more of Combination：Gray processing processing, correction process.

In a kind of preferred realization method, the method further includes：

The depth information in the fixed corresponding video comprising human hands region in region of the calibration collimation mark is obtained, according to The depth information standardizes to the hand images, and the process of the standardization is：S2* (H2/H1), wherein S1 be from The size for the hand images that the fixed region segmentation of the standard calibration collimation mark of first time obtains, H1 are that the calibration collimation mark of first time is fixed The corresponding depth of view information in region；The size for the hand images that the region segmentation that it is fixed that S2 is current standard calibration collimation mark obtains, H2 For the corresponding depth of view information in the fixed region of current calibration collimation mark.

The second aspect of the present invention provides a kind of rapid hand tracks of device, and described device includes：

Display module, the video for including human hands region for showing imaging device acquisition on display interface；

Demarcating module, the calibration frame demarcated on the video comprising human hands region for receiving user；

Divide module, the gradient orientation histogram feature for extracting the fixed region of the calibration collimation mark, and according to described The gradient orientation histogram feature region fixed to the calibration collimation mark is split to obtain hand images；And

Tracking module, for using continuous adaptive Mean Shift operator to the hand images into line trace.

The third aspect of the present invention provides a kind of terminal, and the terminal includes processor and memory, and the processor is used The rapid hand tracking is realized when executing the computer program stored in the memory.

The fourth aspect of the present invention provides a kind of computer readable storage medium, is stored thereon with computer program, described The rapid hand tracking is realized when computer program is executed by processor.

Rapid hand tracking, device, terminal and storage medium of the present invention first carry out hand region rough Calibration obtains calibration frame, then extracts the HOG features in the fixed region of the calibration collimation mark, will be by hand according to the HOG features Region is accurately split from the fixed region of the calibration collimation mark, to reduce the area in the region for extracting HOG features, The effective time for shortening extraction HOG features, it is thus possible to be rapidly performed by hand region segmentation and tracking；Secondly, it obtains Include the depth information of the video of hand, the clarity of hand profile can be further ensured that, especially under complex background Hand region tracks, and tracking efficiency is especially notable.

Description of the drawings

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.

Fig. 1 is the flow chart for the rapid hand tracking that the embodiment of the present invention one provides.

Fig. 2 is the flow chart of rapid hand tracking provided by Embodiment 2 of the present invention.

Fig. 3 is the structure chart for the rapid hand tracks of device that the embodiment of the present invention three provides.

Fig. 4 is the structure chart for the rapid hand tracks of device that the embodiment of the present invention four provides.

Fig. 5 is the schematic diagram for the terminal that the embodiment of the present invention five provides.

Following specific implementation mode will be further illustrated the present invention in conjunction with above-mentioned attached drawing.

Specific implementation mode

To better understand the objects, features and advantages of the present invention, below in conjunction with the accompanying drawings and specific real Applying example, the present invention will be described in detail.It should be noted that in the absence of conflict, the embodiment of the present invention and embodiment In feature can be combined with each other.

Elaborate many details in the following description to facilitate a thorough understanding of the present invention, described embodiment only It is only a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill The every other embodiment that personnel are obtained without making creative work, shall fall within the protection scope of the present invention.

Unless otherwise defined, all of technologies and scientific terms used here by the article and belong to the technical field of the present invention The normally understood meaning of technical staff is identical.Used term is intended merely to description tool in the description of the invention herein The purpose of the embodiment of body, it is not intended that in the limitation present invention.

The rapid hand tracking of the embodiment of the present invention is applied in one or more terminal.The rapid hand with Track method can also be applied to the hardware environment being made of terminal and the server being attached by network and the terminal In.Network includes but not limited to：Wide area network, Metropolitan Area Network (MAN) or LAN.The rapid hand tracking of the embodiment of the present invention can be with It is executed, can also be executed by terminal by server；It can also be and executed jointly by server and terminal.

The terminal for needing progress rapid hand tracking can integrate the side of the present invention directly in terminal The rapid hand following function that method is provided, or installation is for realizing the client of the method for the present invention.For another example, institute of the present invention The method of offer can also operate in clothes in the form of Software Development Kit (Software Development Kit, SDK) It is engaged in the equipment such as device, the interface of rapid hand following function, terminal or other equipment connecing by offer is provided in the form of SDK The tracking of hand can be realized in mouth.

Embodiment one

Fig. 1 is the flow chart for the rapid hand tracking that the embodiment of the present invention one provides.The stream according to different requirements, Execution sequence in journey figure can change, and certain steps can be omitted.

101：The video for including human hands region of imaging device acquisition is shown on display interface.

In the present embodiment, the terminal provides a display interface, and the display interface is adopted to simultaneous display imaging device The video for including human hands region of collection.The imaging device is 2D cameras.

102：Receive the calibration frame that user demarcates on the video comprising human hands region.

In the present embodiment, when user is found that sense in the video comprising human hands region that the display interface is shown When the hand information of interest, indicate that the interested hand calibrated is believed by adding a calibration frame on the display interface Breath.

User can use finger, stylus or other any suitable objects to touch the display interface, preferably hand Refer to and touches the display interface and add a calibration frame on the display interface.

103：The gradient orientation histogram feature in the fixed region of the calibration collimation mark is extracted, and straight according to the gradient direction The square figure feature region fixed to the calibration collimation mark is split to obtain hand images.

The extraction calibration collimation mark fixed region gradient orientation histogram (Histogram Of Gradient, HOG) feature detailed process includes：

11) gradient information of each pixel in the fixed region of the calibration collimation mark is calculated, the gradient information includes ladder Spend amplitude and gradient direction；

One-Dimensional Center [1,0, -1], one-dimensional non-central [- 1,1], one-dimensional cube of amendment [1, -8, -8, -1], rope may be used The first differentials templates such as Bell (Soble) operator calculate separately each pixel in the fixed region of the calibration collimation mark in level side Gradient in upward and vertical direction；It is fixed that the calibration collimation mark is calculated according to the gradient in the gradient and vertical direction in horizontal direction Region gradient magnitude and gradient direction.

In this preferred embodiment, to calculate each of the fixed region of the calibration collimation mark for One-Dimensional Center [1,0, -1] template The gradient information of a pixel.The fixed region of the calibration collimation mark is denoted as I (x, y), pixel is calculated in the horizontal direction and hangs down On histogram to gradient respectively as shown in following formula (1-1)：

Wherein, G_h(x, y) and G_v(x, y) indicates the gradient of pixel (x, y) in the horizontal direction and the vertical direction respectively Value.

Calculate gradient magnitude (or referred to as gradient intensity) and gradient direction such as following formula (1-2) institute of pixel (x, y) Show：

θ (x, y)=arctan (G_v(x,y)/G_h(x,y)) (1-2)

Wherein, M (x, y) and θ (x, y) indicates the gradient magnitude and gradient direction of pixel (x, y) respectively.

Further, the range of gradient direction is limited, signless range generally may be used, that is, ignore gradient side To angle degree positive and negative grade, signless gradient direction can use following formula (1-3) shown in indicate：

After the calculating of formula (1-3), the gradient direction of each pixel in the fixed region of the calibration collimation mark is limited to 0 It spends to 180 degree.

12) it is multiple pieces by the fixed region division of the calibration collimation mark, each block is divided into multiple cell factories, Mei Gexi Born of the same parents' unit includes multiple pixels；

In the present embodiment, the size of the cell factory is 8*8 pixels, is not overlapped between adjacent cell factory.

For example, it is assumed that fixed region I (x, the y) size of the calibration collimation mark is 64*128, sets the size of each block Size for 16*16, each cell factory is 8*8, then the fixed region of the calibration collimation mark can be divided into 105 blocks, each Block includes 4 cell factories, and each cell factory includes 64 pixels.

Nonoverlapping model split cell factory is used in the present embodiment, can to calculate each gradient direction in the block Histogram speed is faster.

13) quantification treatment is carried out to the gradient information of each pixel in each cell factory, obtains the calibration collimation mark The histogram of gradients in fixed region；

In the present embodiment, the gradient direction of each pixel of each cell factory is divided into (9 sides 9 bin first To channel), the horizontal axis of 9 bin as histogram of gradients, be respectively [0 °, 20 °], [20 °, 40 °], [40 °, 40 °], [40 °, 80 °], [80 °, 100 °], [100 °, 120 °], [120 °, 140 °], [140 °, 140 °], [140 °, 180 °]；Then by each bin The gradient magnitude of corresponding pixel carries out the cumulative longitudinal axis as histogram of gradients.

14) each piece of histogram of gradients is normalized, obtains each piece of histogram of gradients normalization knot Fruit；

In this preferred embodiment, the histogram of gradients that normalized function may be used to each piece is normalized, described Normalized function can be L2 norms, L1 norms.

The variation of the variation and foreground/background contrast shone due to local light so that the change of the gradient magnitude of pixel Change range is very big, and normalization can compress illumination, shade and edge so that gradient orientation histogram feature vector is empty Between to illumination, shade and edge variation have robustness.

15) all pieces of histogram of gradients normalization result is attached, obtains the fixed region of the calibration collimation mark most Whole HOG features；

16) according to the final HOG features, hand region is split from the fixed region of the calibration collimation mark.

104：Using continuous adaptive Mean Shift operator to the hand images into line trace.

In the present embodiment, continuous adaptive Mean Shift (Continuously Adaptive Mean Shift, CamShift) algorithm is a kind of method based on colouring information, and the particular color of target can be utilized into line trace, automatic to adjust The size and location for saving search window, positions size and the center of tracked target, and the result of former frame (i.e. search window size And barycenter) as next frame target size in the picture and barycenter.

It is described that the hand images are specifically included into line trace using continuous adaptive Mean Shift operator：

21) color space of the hand images is transformed into HSV (Hue colorations, Saturation saturation degrees, Value is pure Degree) color space, isolate the hand images of tone H components；

22) hand images based on tone H components, the centroid position and size S of initialization search window W；

23) the rank square of current search window is calculated；

The zeroth order square that current search window is calculated according to formula (1-4) calculates the first moment of current search window according to formula (1-5).

24) centroid position (M of current search window is calculated according to the rank square of current search window₁₀/M₀₀,M₀₁/M₀₀)

25) current search window is calculated according to the rank square of current search window

Relationship between the search window more currently calculated and preset search window threshold value, when the search window currently calculated is more than Or when equal to the preset search window threshold value, repeat above-mentioned steps 21) -25)；Described in being less than when the search window currently calculated When preset search window threshold value, then terminate to track, the position where the barycenter of search window is exactly the current location for tracking target at this time.

In conclusion rapid hand tracking of the present invention, by user to described comprising human hands region After interested hand information is determined with calibration collimation mark in video, then the HOG features in the fixed region of the calibration collimation mark are extracted, according to The HOG features split hand region from the fixed region of the calibration collimation mark.Thus, it is only necessary to calculate the calibration frame HOG features in the region of calibration, compared to the video image for entirely including human hands region is calculated, the present invention passes through reception The calibration frame of user's calibration, can reduce the region area of extraction HOG features, to effectively shorten the time of extraction HOG features, Thus quickly hand region can be split from the video comprising human hands region.

In addition, being processing since the gradient information of each pixel in the fixed region of the calibration collimation mark is with cell factory Unit, thus the HOG features being calculated can keep the geometry and optical characteristics of hand region；Secondly, piecemeal divides cell factory Calculation processing mode, may make that the relationship between each pixel of hand region can be characterized well；Finally take Normalized, the influence that can be brought with partial offset illumination variation, and then ensure that the clarity of the hand region extracted, Hand region is accurately split.

Embodiment two

Fig. 2 is the flow chart of rapid hand tracking provided by Embodiment 2 of the present invention.The stream according to different requirements, Execution sequence in journey figure can change, and certain steps can be omitted.

201：The video for including human hands region of imaging device acquisition is shown on display interface, while to set in advance The display mode set shows pre-set standard calibration frame.

In the present embodiment, the terminal provides a display interface, and the display interface is adopted to simultaneous display imaging device The video for including human hands region of collection, the display interface also show standard calibration frame simultaneously.

The imaging device is 3D depth cameras, and the 3D depth cameras are with 2D cameras the difference is that 3D depth cameras Can simultaneously photographed grey-tone image information and 3 dimensional information including depth information.It is acquired when using the 3D depth cameras To after the video comprising human hands region, comprising human hands region described in simultaneous display on the display interface of terminal Video.

In the present embodiment, the pre-set standard calibration frame is for user shown comprising human hands region It is demarcated on video to obtain interested hand information.

The pre-set display mode includes the combination of one or more of：

1) when receiving idsplay order, the pre-set standard calibration frame is shown；

The idsplay order corresponds to display operation input by user, and the display operation input by user includes, but unlimited In：Display interface any position is clicked, or the time of touch display interface any position is more than the first preset time period (example Such as, 1 second), or send out the first default voice (for example, " calibration frame ") etc..

Clicking operation is performed on the display interface when detecting user, or is worked as and detected user in the display The touch operation time executed on interface is more than preset time, or has issued the described first default voice when detecting user When, the terminal determination has received idsplay order, shows the pre-set standard calibration frame.

2) when receiving hiding instruction, the pre-set standard calibration frame is hidden；

The corresponding hiding operation input by user of the hiding instruction, the hiding operation input by user include, but unlimited In：Display interface any position is clicked, or the time of touch display interface any position is more than the second preset time period (example Such as, 2 seconds), or send out the second default voice (for example, " exiting ") etc..

Clicking operation is performed on the display interface when detecting user, or is worked as and detected user in the display The touch operation time executed on interface is more than the second preset time, or has issued the second default voice when detecting user When, the terminal determination has received hiding instruction, hides the pre-set standard calibration frame.

The hiding instruction can be identical as the idsplay order, can also be different.First preset time period can be with It is identical as second preset time period, it can not also be identical.Preferably, first preset time period is small and described second is pre- If the period, the first shorter preset time period of setting can quickly show the pre-set standard calibration frame, be arranged Longer second preset time period hides the pre-set mark caused by user capable of being avoided unconscious or operation error Fiducial mark is determined the case where frame and is occurred.

The pre-set standard calibration frame is shown when receiving idsplay order, enables to display interface in display institute When stating the video comprising human hands region, user can demarcate interested hand region；It is not receiving simultaneously When to the idsplay order, the pre-set standard calibration frame is not shown, or is received the hiding instruction and hidden institute Pre-set standard calibration frame is stated, the video comprising human hands region of display can be avoided for a long time by described pre- The standard calibration frame being first arranged blocks, to cause the omission of important information or check that described includes human body hand to user Visual sense of discomfort is brought when the video in portion region.

3) the pre-set standard calibration frame is shown receiving the idsplay order, and be not received by appoint later When the time of what instruction is more than third preset time period, the pre-set standard calibration frame is hidden automatically.

After showing the pre-set standard calibration frame, when user no longer inputs any operation and is more than the third When preset time period, pre-set standard calibration frame is hidden automatically, can be triggered at unconscious to avoid user Idsplay order and for a long time the case where showing pre-set standard calibration frame occur, secondly, automatically by pre-set standard Calibration frame is hidden, it helps promotes the interactive experience of user.

In the present embodiment, the pre-set standard calibration frame can be circle, ellipse, rectangle, square etc..

202：Receive the standard calibration frame that user demarcates on the video comprising human hands region.

In the present embodiment, when user is found that sense in the video comprising human hands region that the display interface is shown When the hand information of interest, the interested hand calibrated is indicated by adding a standard calibration frame on the display interface Portion's information.

In the present embodiment, the standard calibration frame for receiving user and being demarcated on the video comprising human hands region Including two following situations：

The first situation：Receive user it is described comprising the video in human hands region in the rough calibration frame that draws；It is logical The method for crossing fuzzy matching matches pre-set standard calibration frame corresponding with the rough calibration frame；According to matching Standard calibration frame to being demarcated and being shown the standard calibration frame of calibration in the video comprising human hands region, In, the geometric center of the rough calibration frame is identical as the geometric center of the standard calibration frame matched.

In the present embodiment, due to the shape and non-standard of the calibration frame that user is drawn by finger on the display interface Or standard, for example, the circular calibration frame that user draws not is very precisely, thus when terminal receives what user drew After the shape of rough rough calibration frame, matched according to the general shape of the rough calibration frame corresponding pre-set The shape of standard calibration frame.Corresponding standard calibration frame is matched by the method for fuzzy matching, convenient for subsequently to the calibration frame The region of calibration is cut.

Second case：The standard calibration frame that user chooses directly is received, includes described according to the standard calibration frame The standard calibration frame of calibration is demarcated and shown on the video in human hands region.

In the present embodiment, user inputs display operation and triggers idsplay order, to show pre-set multiple standard marks Determine frame, user touches standard calibration frame, after terminal detects the touch signal on standard calibration frame, determines the standard calibration frame quilt It chooses.User moves the standard calibration frame being selected and is pulled on the video comprising human hands region, terminal Dragged standard calibration frame is shown on the video comprising human hands region.

Preferably, the step 202 can also include：When receiving the instruction of amplification, diminution, movement, deletion, to display Standard calibration frame be amplified, reduce, move, delete.

203：The region fixed to the standard calibration collimation mark pre-processes.

In the present embodiment, the pretreatment may include the combination of one or more of：Gray processing is handled, at correction Reason.

The gray processing processing refers to converting the fixed area image of the standard calibration collimation mark to gray level image, because of color Multimedia message influences extraction gradient orientation histogram feature little, thus the fixed area image of the standard calibration collimation mark is converted into Gray level image had not both interfered with the gradient information for each pixel for subsequently calculating the fixed region of standard calibration collimation mark, also The calculation amount of the gradient information of each pixel can be reduced.

Gamma (Gamma) correction may be used in the correction process, because in the texture strength of image, local surface layer exposes The proportion of light contribution is larger, and the image after Gamma correction process can be effectively reduced local shade and illumination variation.

204：The gradient orientation histogram feature in the pretreated fixed region of standard calibration collimation mark is extracted, and According to the gradient orientation histogram feature, the region fixed to the standard calibration collimation mark is split to obtain hand images.

Step 204 described in the present embodiment is with the step 103 described in embodiment one, and in this not go into detail.

205：Using continuous adaptive Mean Shift operator to the hand images into line trace.

Step 205 described in the present embodiment is with the step 104 described in embodiment one, and in this not go into detail.

Further, in order to make full use of depth information, after the step 204, before the step 205, institute The method of stating further includes：Obtain the depth letter in the fixed corresponding video comprising human hands region in region of the calibration collimation mark Breath, standardizes to the hand images according to the depth information.

The depth information is to be obtained from the 3D depth cameras.It is described according to the depth information to the hand figure As the detailed process standardized is：The hand images that the fixed region segmentation of standard calibration collimation mark from first time is obtained Size is denoted as standard size S1, and the corresponding depth of view information in region that the calibration collimation mark of first time is fixed is to be denoted as standard depth of field H1；When The size for the hand images that the fixed region segmentation of preceding standard calibration collimation mark obtains is denoted as S2, the fixed region of current calibration collimation mark Corresponding depth of view information is denoted as H2, and the hand images that the region segmentation fixed to current calibration collimation mark obtains carry out specification and turn to S2*(H2/H1)。

Standardize to the size of the hand images, is in order to enable the HOG character representations finally extracted have Unified criticism standard, i.e., dimension having the same improve the accuracy of hand tracking.

In conclusion rapid hand tracking of the present invention, provides two kinds of standard calibration frames to described comprising people The video of body hand region is demarcated, and it is standard calibration frame to enable to the calibration frame that user demarcates, and then divides and obtain The shape of hand region is standard, and the calibration frame of the standard based on the segmentation carries out hand tracking effect more preferably.

It should be noted that quick dynamic hand tracking of the present invention can be adapted for single hand with Track is readily applicable to the tracking of multiple hands.Tracking for multiple hands, using the method for Parallel Tracking into line trace, It is substantially the process of multiple single hand tracking, and herein without detailed description, any thought using the present invention carries out The method of hand tracking all should be within the scope of the present invention.

The above is only the specific implementation mode of the present invention, but scope of protection of the present invention is not limited thereto, for For those skilled in the art, without departing from the concept of the premise of the invention, improvement, but these can also be made It all belongs to the scope of protection of the present invention.

With reference to the 3rd to 5 figure, respectively to the function module and hardware of the terminal of the above-mentioned rapid hand tracking of realization Structure is introduced.

It should be appreciated that the embodiment is only purposes of discussion, do not limited by this structure in patent claim.

Embodiment three

Fig. 3 is the functional block diagram in rapid hand tracks of device preferred embodiment of the present invention.

In some embodiments, the rapid hand tracks of device 30 is run in terminal.The rapid hand tracking dress It may include multiple function modules being made of program code segments to set 30.Each journey in the rapid hand tracks of device 30 The program code of sequence section can be stored in memory, and performed by least one processor, with execution (refer to Fig. 1 and its Associated description) tracking to hand region.

In the present embodiment, the function of the rapid hand tracks of device 30 of the terminal performed by it can be divided For multiple function modules.The function module may include：Display module 301, demarcating module 302, segmentation module 303 and tracking Module 304.The so-called module of the present invention, which refers to one kind, performed by least one processor and capable of completing fixed work( The series of computation machine program segment of energy, is stored in the memory.It in some embodiments, will about the function of each module It is described in detail in subsequent embodiment.

Display module 301, the video for including human hands region for showing imaging device acquisition on display interface.

Demarcating module 302, the calibration frame demarcated on the video comprising human hands region for receiving user.

Divide module 303, the gradient orientation histogram feature for extracting the fixed region of the calibration collimation mark, and according to institute The gradient orientation histogram feature region fixed to the calibration collimation mark is stated to be split to obtain hand images.

The segmentation module 303 extracts gradient orientation histogram (the Histogram Of in the fixed region of the calibration collimation mark Gradient, HOG) feature specifically includes：

Tracking module 304, for using continuous adaptive Mean Shift operator to the hand images into line trace.

23) the rank square of current search window is calculated；

25) size of current search window is calculated according to the rank square of current search window

In conclusion rapid hand tracks of device 30 of the present invention, by user to described comprising human hands region Video in interested hand information with calibration collimation mark it is fixed after, then extract the HOG features in the fixed region of the calibration collimation mark, root Hand region is split from described demarcate in the fixed region of collimation mark according to the HOG features.Thus, it is only necessary to calculate the calibration HOG features in the fixed region of collimation mark, compared to the video image for entirely including human hands region is calculated, the present invention is by connecing Receive user calibration calibration frame, can reduce extraction HOG features region area, to effectively shortening extract HOG features when Between, it is thus possible to quickly hand region is split from the video comprising human hands region.

Example IV

Fig. 4 is the functional block diagram in the preferred embodiment of rapid hand tracks of device of the present invention.

In some embodiments, the rapid hand tracks of device 40 is run in terminal.The rapid hand tracking dress It may include multiple function modules being made of program code segments to set 40.Each journey in the rapid hand tracks of device 40 The program code of sequence section can be stored in memory, and performed by least one processor, with execution (refer to Fig. 2 and its Associated description) tracking to hand region.

In the present embodiment, the function of the rapid hand tracks of device of the terminal performed by it can be divided into Multiple function modules.The function module may include：Display module 401, demarcating module 402, preprocessing module 403, segmentation Module 404, tracking module 405 and normalizing block 406.The so-called module of the present invention refers to that one kind can be by least one processing Device is performed and can complete the series of computation machine program segment of fixed function, is stored in the memory.At some In embodiment, the function about each module will be described in detail in subsequent embodiment.

Display module 401 includes：First display sub-module 4010 and the second display sub-module 4012.Wherein, described first Display sub-module 4010 is used to show the video for including human hands region that imaging device acquires on display interface, described the Two display sub-modules 4012 are used to show pre-set standard calibration frame with pre-set display mode.

The pre-set display mode includes the combination of one or more of：

Demarcating module 402, the standard calibration demarcated on the video comprising human hands region for receiving user Frame.

In the present embodiment, the demarcating module 402 further includes the first calibration submodule 4020, second calibration submodule 4022 And third demarcates submodule 4024.

It is described first calibration submodule 4020, for receive user it is described comprising the video in human hands region in draw Rough calibration frame；Pre-set standard mark corresponding with the rough calibration frame is matched by the method for fuzzy matching Determine frame；According to the standard calibration frame matched to being demarcated in the video comprising human hands region and showing calibration Standard calibration frame, wherein the geometric center of the rough calibration frame is identical as the geometric center of the standard calibration frame matched.

The second calibration submodule 4022, the standard calibration frame chosen for directly receiving user, according to the standard Calibration frame is demarcated and is shown the standard calibration frame of calibration on the video comprising human hands region.

The third demarcates submodule 4024, when instruction for receiving amplification, diminution, movement, deletion, to display Standard calibration frame is amplified, reduces, moves, deletes.

Preprocessing module 403 is pre-processed for the region fixed to the standard calibration collimation mark.

Divide module 404, the gradient direction for extracting the pretreated fixed region of standard calibration collimation mark Histogram feature, and be split to obtain according to the gradient orientation histogram feature region fixed to the standard calibration collimation mark Hand images.

Tracking module 405, for using continuous adaptive Mean Shift operator to the hand images into line trace.

Further, the rapid hand tracks of device 40 further includes normalizing block 406, for obtaining the calibration frame Depth information in the corresponding video comprising human hands region in region of calibration, according to the depth information to the hand Image standardizes.

In conclusion rapid hand tracks of device 40 of the present invention, provide two kinds of standard calibration frames includes to described The video in human hands region is demarcated, and it is standard calibration frame to enable to the calibration frame that user demarcates, and then divides and obtain The shape of hand region be standard, the calibration frame of the standard based on the segmentation carries out hand tracking effect more preferably.

It should be noted that quick dynamic hand tracks of device 30,40 of the present invention can be adapted for single hand Tracking, be readily applicable to the tracking of multiple hands.Tracking for multiple hands, using the method for Parallel Tracking carry out with Track is substantially the process of multiple single hands tracking, herein without detailed description, any thought using the present invention into The device of row hand tracking all should be within the scope of the present invention.

The above-mentioned integrated unit realized in the form of software function module, can be stored in one and computer-readable deposit In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer It is each that equipment (can be personal computer, double screen equipment or the network equipment etc.) or processor (processor) execute the present invention The part of a embodiment the method.

Embodiment five

The terminal 5 includes：Memory 51, at least one processor 52 are stored in the memory 51 and can be in institute State the computer program 53 run at least one processor 52, at least one communication bus 54 and imaging device 55.

At least one processor 52 realizes that above-mentioned rapid hand tracking is real when executing the computer program 53 Apply the step in example, such as step 101 shown in FIG. 1 is to 104 or shown in Fig. 2 steps 201 to 205.Alternatively, described at least one A processor 52 realizes the function of each module/unit in above-mentioned apparatus embodiment, such as Fig. 3 when executing the computer program 53 In module 301 to 304 or the module 401 to 406 in Fig. 4.

Illustratively, the computer program 53 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 51, and are executed by least one processor 52, to complete this hair It is bright.One or more of module/units can be the series of computation machine program instruction section that can complete specific function, this refers to Enable section for describing implementation procedure of the computer program 53 in the terminal 5.For example, the computer program 53 can be with Display module 301, demarcating module 302, segmentation module 303 and the tracking module 304 being divided into Fig. 3, or be divided into Display module 401, demarcating module 402, preprocessing module 403, segmentation module 404, tracking module 405 in Fig. 4 and standardization Module 406.The display module 401 includes the first display sub-module 4010 and the second display sub-module 4012, the calibration mold Block 402 includes that the first calibration submodule 4020, second demarcates submodule 4022 and third calibration submodule 4024, the tool of each module Body function is referring to embodiment one, two and its corresponding description.

The imaging device 55 includes 2D video cameras, 3D depth cameras etc., and the imaging device 55 can be installed in described In terminal 5, it can also detach with the terminal 5 and exist as independent element.

The terminal 5 can be the computing devices such as desktop PC, notebook, palm PC and cloud server.This Field technology personnel are appreciated that the schematic diagram 5 is only the example of terminal 5, and the not restriction of structure paired terminal 5 can be with Including components more more or fewer than diagram, certain components or different components are either combined, such as the terminal 5 may be used also To include input-output equipment, network access equipment, bus etc..

At least one processor 52 can be central processing unit (Central Processing Unit, CPU), It can also be other general processors, digital signal processor (Digital Signal Processor, DSP), special integrated Circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..The processor 52 can be microprocessor or the processor 52 can also be any conventional processor Deng the processor 52 is the control centre of the terminal 5, utilizes each portion of various interfaces and the entire terminal of connection 5 Point.

The memory 51 can be used for storing the computer program 53 and/or module/unit, and the processor 52 passes through Operation executes the computer program and/or module/unit being stored in the memory 51, and calls and be stored in memory Data in 51 realize the various functions of the terminal 5.The memory 51 can include mainly storing program area and storage data Area, wherein storing program area can storage program area, needed at least one function application program (such as sound-playing function, Image player function etc.) etc.；Storage data field can be stored uses created data (such as audio data, electricity according to terminal 5 Script for story-telling etc.) etc..In addition, memory 51 may include high-speed random access memory, can also include nonvolatile memory, example Such as hard disk, memory, plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other volatibility are solid State memory device.

If the integrated module/unit of the terminal 5 is realized in the form of SFU software functional unit and as independent product Sale in use, can be stored in a computer read/write memory medium.Based on this understanding, in present invention realization All or part of flow in embodiment method is stated, relevant hardware can also be instructed to complete by computer program, institute The computer program stated can be stored in a computer readable storage medium, which, can when being executed by processor The step of realizing above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, the computer Program code can be source code form, object identification code form, executable file or certain intermediate forms etc..The computer can Reading medium may include：Any entity or device, recording medium, USB flash disk, mobile hard of the computer program code can be carried Disk, magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs to illustrate It is that the content that the computer-readable medium includes can be fitted according to legislation in jurisdiction and the requirement of patent practice When increase and decrease, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium does not include that electric carrier wave is believed Number and telecommunication signal.

In several embodiments provided by the present invention, it should be understood that disclosed terminal and method can pass through it Its mode is realized.For example, terminal embodiment described above is only schematical, for example, the division of the unit, only Only a kind of division of logic function, formula that in actual implementation, there may be another division manner.

In addition, each functional unit in each embodiment of the present invention can be integrated in same treatment unit, it can also That each unit physically exists alone, can also two or more units be integrated in same unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of hardware adds software function module.

It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation includes within the present invention.Any reference signs in the claims should not be construed as limiting the involved claims.This Outside, it is clear that one word of " comprising " is not excluded for other units or, odd number is not excluded for plural number.The multiple units stated in system claims Or device can also be realized by a unit or device by software or hardware.The first, the second equal words are used for indicating name Claim, and does not represent any particular order.

Finally it should be noted that the above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although reference Preferred embodiment describes the invention in detail, it will be understood by those of ordinary skill in the art that, it can be to the present invention's Technical solution is modified or equivalent replacement, without departing from the spirit of the technical scheme of the invention range.

Claims

1. a kind of rapid hand tracking, which is characterized in that the method includes：

The gradient orientation histogram feature in the fixed region of the calibration collimation mark is extracted, and according to the gradient orientation histogram feature The region fixed to the calibration collimation mark is split to obtain hand images；And

Using continuous adaptive Mean Shift operator to the hand images into line trace, wherein described using continuous adaptive Mean Shift operator is answered to specifically include the hand images into line trace：

The color space of the hand images is transformed into HSV color spaces, isolates the hand images of chrominance component, is based on institute The centroid position and size for stating the hand images I (i, j) of chrominance component and the search box of initialization, calculate the matter of current search window Heart position (M₁₀/M₀₀,M₀₁/M₀₀) and current search window sizeWherein,AndFor the first moment of current search window,For the zeroth order square of current search window, i For the pixel value in the horizontal direction of I (i, j), the pixel value in the vertical direction of j I (i, j).

2. the method as described in claim 1, which is characterized in that it is described shown on display interface imaging device acquisition include The video in human hands region further includes：

Show pre-set standard calibration frame with pre-set display mode, the pre-set display mode include with Under one or more combination：

The pre-set standard calibration frame is shown receiving the idsplay order, and is not received by any instruction later Time be more than preset time period when, automatically hide the pre-set standard calibration frame.

3. method as claimed in claim 2, which is characterized in that described to receive user in the regarding comprising human hands region The calibration frame demarcated on frequency includes：

According to the standard calibration frame matched to being demarcated in the video comprising human hands region and showing calibration Standard calibration frame, wherein the geometric center of the rough calibration frame is identical as the geometric center of the standard calibration frame matched.

4. method as claimed in claim 2, which is characterized in that described to receive user in the regarding comprising human hands region The calibration frame demarcated on frequency includes：

The standard calibration frame that user chooses directly is received, according to the standard calibration frame in the regarding comprising human hands region The standard calibration frame of calibration is demarcated and shown on frequency.

5. method as described in claim 3 or 4, which is characterized in that the reception user is described comprising human hands region Video on the standard calibration frame demarcated further include：

When receiving the instruction of amplification, diminution, movement, deletion, the standard calibration frame of display is amplified, reduce, move, is deleted It removes.

6. method as claimed in claim 5, which is characterized in that the method further includes：

The region fixed to the standard calibration collimation mark pre-processes, and the pretreatment may include the group of one or more of It closes：Gray processing processing, correction process.

7. method as claimed in claim 6, which is characterized in that the method further includes：

The depth information in the fixed corresponding video comprising human hands region in region of the calibration collimation mark is obtained, according to described Depth information standardizes to the hand images, and the process of the standardization is：S2* (H2/H1), wherein S1 are from first The size for the hand images that the fixed region segmentation of secondary standard calibration collimation mark obtains, H1 are the fixed region of the calibration collimation mark of first time Corresponding depth of view information；The size for the hand images that the region segmentation that it is fixed that S2 is current standard calibration collimation mark obtains, H2 are to work as The fixed corresponding depth of view information in region of preceding calibration collimation mark.

8. a kind of rapid hand tracks of device, which is characterized in that described device includes：

Divide module, the gradient orientation histogram feature for extracting the fixed region of the calibration collimation mark, and according to the gradient The direction histogram feature region fixed to the calibration collimation mark is split to obtain hand images；And

Tracking module, for using continuous adaptive Mean Shift operator to the hand images into line trace, wherein institute It states and the hand images is specifically included into line trace using continuous adaptive Mean Shift operator：

9. a kind of terminal, which is characterized in that the terminal includes processor and memory, and the processor is for executing described deposit The computer program stored in reservoir is to realize rapid hand tracking as claimed in any of claims 1 to 7 in one of claims.

10. a kind of computer readable storage medium, computer program, feature are stored on the computer readable storage medium Be, the computer program realized when being executed by processor rapid hand as claimed in any of claims 1 to 7 in one of claims with Track method.