CN109799905A

CN109799905A - A kind of hand tracking and advertisement machine

Info

Publication number: CN109799905A
Application number: CN201811626864.9A
Authority: CN
Inventors: 冯展鹏; 黄轩; 王孝宇
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2018-12-28
Filing date: 2018-12-28
Publication date: 2019-05-24
Anticipated expiration: 2038-12-28
Also published as: CN109799905B

Abstract

The embodiment of the present invention provides a kind of hand tracking and advertisement machine, it is predicted this method comprises: the input picture that will acquire is input to the preparatory trained more box network models of single goal, the target hand-characteristic position based on output image is calculated according to the hand-characteristic that prediction obtains, the input picture includes hand, and the more box network models of single goal are predicted during prediction by target of hand；The hand-characteristic position is mapped to advertisement machine interface corresponding position, the cursor tool position in the advertisement machine interface is updated based on mapping position, there are pre-set mapping relations with the advertisement machine interface for the output image.It is predicted due to carrying out the more boxes of single goal to hand-characteristic, it only needs to extract hand-characteristic and be predicted as target, be not required to extract the hand-characteristic prediction of picture background textural characteristics next ground, to reduce the dependence to background, the stability for improving hand tracking, is advantageously implemented the landing of advertisement machine.

Description

A kind of hand tracking and advertisement machine

Technical field

The present invention relates to artificial intelligence field more particularly to a kind of hand tracking and advertisement machines.

Background technique

Currently, with depth learning technology computer vision field flourish, be related electronic products be commercially Using new life is brought, these electronic products can be identified the movement of user, such as can be to the hand of user Movement, foot action and face action etc. are identified.At present during carrying out human-computer interaction, for recognition site Position tracking is to differentiate hand and background by the textural characteristics of picture background, more demanding to background condition, once background or Light source changes, then influences to track computational solution precision, become difficult human-computer interaction, solution be need in background or Algorithm is redesigned after light source variation, and is unfavorable for the realization of advertisement machine human-computer interaction in this way.Therefore, existing hand tracking is calculated Method stability is poor, there are problems that universality is low so as to cause advertisement machine, is unfavorable for landing.

Summary of the invention

The embodiment of the present invention provides a kind of hand tracking and advertisement machine, to improve the stability of hand track algorithm, To solve the problems, such as that hand track algorithm is low in advertisement machine universality.

In a first aspect, the embodiment of the present invention provides a kind of hand tracking, comprising:

The input picture that will acquire is input to the preparatory trained more box network models of single goal and is predicted, according to pre- The hand-characteristic measured calculates the target hand-characteristic position based on output image, wherein the input picture includes hand, The more box network models of single goal are predicted during prediction by target of hand；

The hand-characteristic position is mapped to advertisement machine interface corresponding position, the advertisement machine is updated based on mapping position Cursor tool position in interface, wherein there are pre-set mapping relations with the advertisement machine interface for the output image.

Optionally, the acquisition of the input picture includes:

The initial pictures that camera is taken zoom to the size of pre-set dimension, obtain input picture, wherein described first Beginning image includes hand.

Optionally, the training of the more box network models of the single goal includes:

Hand images data set is obtained, the hand images data set includes the hand figure under each environment and/or light source Picture, the hand images include the stingy as mark of hand；

The more box network models of the single goal are trained using the stingy hand images as mark for including hand, each It include the multiple dimensioned relationship for learning to correspond in the hand images hand in the training process of hand images.

Optionally, the input picture that will acquire is input to the more box network models of trained single goal in advance and carries out Prediction calculates the target hand-characteristic position based on output image according to the hand-characteristic that prediction obtains, comprising:

Analysis On Multi-scale Features are carried out to the input picture by multiple convolutional layers in the more box network models of the single goal Prediction, obtains the different multiple hand-characteristics of scale；

The multiple hand-characteristic is subjected to duplicate removal, obtains target hand-characteristic；

According to the target hand-characteristic, position of the target hand-characteristic in the output image is obtained.

Optionally, the hand-characteristic includes confidence level, described that the multiple hand-characteristic is carried out duplicate removal, comprising:

Calculate the degree of overlapping between the multiple hand-characteristic；

The highest hand of confidence level is chosen in the hand-characteristic that degree of overlapping is greater than preparatory pre-set degree of overlapping threshold value Feature is retained, remaining hand-characteristic that degree of overlapping is greater than preparatory degree of overlapping threshold value is deleted.

Optionally, described according to the target hand-characteristic, the target hand-characteristic is obtained in the output image Position, comprising:

Obtain any one group angle steel joint coordinate of the target hand-characteristic on the output image；

Position of the hand-characteristic in the output image is determined according to any one group of angle steel joint coordinate.

It is optionally, described that the hand-characteristic position is mapped to advertisement machine interface corresponding position, comprising:

The hand-characteristic position of the output image is mapped in the input picture, the hand position of input picture is obtained It sets, wherein there are pre-set mapping relations with the input picture for the output image；

The hand position of the input picture is mapped in the advertisement machine interface, the hand in advertisement machine interface is obtained Position, wherein there are pre-set mapping relations with the advertisement machine interface for the input picture.

Second aspect, the embodiment of the present invention provide a kind of advertisement machine, comprising:

Prediction module, the input picture that will acquire are input to the more box network models of trained single goal in advance and carry out in advance It surveys, the target hand-characteristic position based on output image is calculated according to the hand-characteristic that prediction obtains, wherein the input picture Including hand, the more box network models of single goal are predicted during prediction by target of hand；

Mapping block is based on mapping position for the hand-characteristic position to be mapped to advertisement machine interface corresponding position Update the cursor tool position in the advertisement machine interface, wherein the output image and the advertisement machine interface exist in advance The mapping relations of setting.

The third aspect, the embodiment of the present invention provide a kind of advertisement machine, comprising: memory, processor and are stored in described deposit On reservoir and the computer program that can run on the processor, the processor realize this when executing the computer program The step in hand tracking that inventive embodiments provide.

Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium Be stored with computer program in matter, the computer program realized when being executed by processor hand provided in an embodiment of the present invention with Step in track method.

In the embodiment of the present invention, the input picture that will acquire is input to the preparatory trained more box network models of single goal It is predicted, the target hand-characteristic position based on output image is calculated according to the hand-characteristic that prediction obtains, wherein described defeated Entering image includes hand, and the more box network models of single goal are predicted during prediction by target of hand；It will be described Hand-characteristic position is mapped to advertisement machine interface corresponding position, updates the cursor work in the advertisement machine interface based on mapping position Has position, wherein there are pre-set mapping relations with the advertisement machine interface for the output image.Due to hand-characteristic Carry out the more box predictions of single goal, it is only necessary to extract hand-characteristic and be predicted as target, be not required to extract picture background texture spy Sign carrys out ground hand-characteristic prediction, to reduce the dependence to background, the stability of hand tracking is improved, is advantageously implemented The landing of advertisement machine.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of invention for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.

Fig. 1 is a kind of flow diagram of hand tracking provided in an embodiment of the present invention；

Fig. 2 is a kind of schematic diagram of the more box network models of single goal provided in an embodiment of the present invention；

Fig. 3 is that the process of another hand tracking provided in an embodiment of the present invention is intended to；

Fig. 4 is a kind of structural schematic diagram of advertisement machine provided in an embodiment of the present invention；

Fig. 5 is the structural schematic diagram of another advertisement machine provided in an embodiment of the present invention；

Fig. 6 is the structural schematic diagram of another advertisement machine provided in an embodiment of the present invention；

Fig. 7 is the structural schematic diagram of another advertisement machine provided in an embodiment of the present invention；

Fig. 8 is the structural schematic diagram of another advertisement machine provided in an embodiment of the present invention；

Fig. 9 is the structural schematic diagram of another advertisement machine provided in an embodiment of the present invention；

Figure 10 is the structural schematic diagram of another advertisement machine provided in an embodiment of the present invention；

Figure 11 is a kind of structural schematic diagram of advertisement machine provided in an embodiment of the present invention.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

Referring to Figure 1, Fig. 1 is a kind of flow diagram of hand tracking provided in an embodiment of the present invention, such as Fig. 1 institute Show, comprising the following steps:

101, the input picture that will acquire is input to the preparatory trained more box network models of single goal and is predicted, root It is predicted that obtained hand-characteristic calculates the target hand-characteristic position based on output image, wherein the input picture includes hand Portion, the more box network models of single goal are predicted during prediction by target of hand.

Wherein, above-mentioned input picture can be the image that camera actually photographed, and be also possible to the practical bat of camera The image that the image taken the photograph is handled, above-mentioned processing include adjustment resolution ratio or adjustment size, and above-mentioned input picture can To be one or more, wherein it is above-mentioned multiple can be it is continuous multiple, such as video, multiple above-mentioned input pictures can be with The speed of fps (frame/second) is inputted, such as: fps=10,15 or 20 etc., specifically can according to advertisement machine device configuration into Row is preset.The above-mentioned more box network models of single goal be in advance it is trained, can learn input data to one scheme The network model of different scale target is detected as in, the more box network models of above-mentioned single goal can be full convolutional network model, I.e. the model can only include convolution layer network, full articulamentum is not necessarily to, in this way, the calculating speed of model can be improved.Specifically , the target of 6 different characteristic figure detection different scales can be used in the more box network models of the single goal, as shown in Fig. 2, utilizing Multiple convolutional layers carry out convolution operation to same image difference size, obtain the different feature of multiple sizes, in the present embodiment, Pass through the more box network models of single goal, the different hand-characteristic of available multiple sizes, the hand-characteristic calculated by model It is properly termed as hand-characteristic frame (bounding box), the convolutional layer in the above-mentioned more box network models of single goal is to including hand Input picture carry out convolution operation, the different hand-characteristic frame of multiple sizes can be exported and predicted into fallout predictor, example Such as, one group of hand-characteristic frame is obtained by Conv (convolution) 4_3 in Fig. 2, input predictor obtains one group of hand by Conv7 Portion's feature frame, input predictor respectively obtain corresponding hand-characteristic by Conv8_2, Conv9_2, Conv10_2, Conv11_2 Frame is predicted in input predictor, due to carrying out the prediction of multiple features frame to single image, is improved complete to looking into for hand-characteristic Rate and precision ratio.The above-mentioned more box network models of single goal can be to be trained using pre-prepd hand images data set, Above-mentioned hand images data set includes the stingy as mark of hand under various environment, by that will have hand to scratch as mark image is in list The more box network models of target are trained, and the more box network models of single goal can be enable to predict hand-characteristic position.Due to not Need to extract background texture in image, it is only necessary to extract hand-characteristic and be predicted as target, improve the steady of hand tracking It is qualitative, in addition, due to not needing to extract background texture in image, so being not required to Optic flow information and depth information as background line yet The feature extraction foundation of reason, therefore improve efficiency while, reduces the requirement to equipment.

Above-mentioned output image can be understood as the image for being labelled with hand-characteristic frame, and above-mentioned includes the defeated of hand-characteristic frame Image can be understood as visualizing (visualization of feature frame) result images when model running out, it is to be understood that above-mentioned can The visualization carried out when referring to when model running or train depending on change result images, facilitates observation and training, rather than will be visual Change result images to be shown in advertisement machine.Above-mentioned hand-characteristic position can be understood as the hand predicted on output image The position of portion's feature frame, i.e. hand-characteristic position can be determined by the hand-characteristic frame marked on output image, on The hand-characteristic position stated can be the scale coordinate based on input picture, be also possible to the pixel coordinate based on output image, Above-mentioned hand-characteristic position is also possible to the position (coordinate) of any one point in hand-characteristic frame, is also possible to multiple points Position distribution coordinate.In addition, above-mentioned input picture include hand can be understood as input picture be to be directed to hand image.

102, the hand-characteristic position is mapped to advertisement machine interface corresponding position, is updated based on mapping position described wide Cursor tool position in announcement machine interface, wherein there are pre-set mappings with the advertisement machine interface for the output image Relationship.

Wherein, above-mentioned hand-characteristic position is the hand-characteristic position based on input picture for predicting to obtain in step 101, It is the hand-characteristic frame exported in image that above-mentioned hand-characteristic can visualize in the output image, and above-mentioned mapping can To understand are as follows: will export in image projection to the image template (painting canvas) of a size identical as advertisement machine interface, corresponding, hand Portion's feature frame, which is also projected, is mapped to image template, also can be obtained by position of the hand-characteristic frame in image template, in turn According to position of the hand-characteristic in image template, the position of cursor tool is obtained, i.e., hand-characteristic frame and wide in image template Cursor tool position is identical in announcement machine interface, and above-mentioned update cursor tool position can be according to hand-characteristic in image template Frame position, corresponding position generates new cursor tool on advertisement machine interface, by original cursor tool on advertisement machine interface It is deleted.Above-mentioned pre-set mapping relations can be size mapping, be also possible to pixel-map；Above-mentioned size is reflected Penetrating can be carried out by change resolution ratio, for example, keeping pixel constant, resolution ratio be turned down, then picture size becomes larger, and will differentiate Rate tunes up, then picture size becomes larger；Above-mentioned pixel-map can be the picture that the setting in image template corresponds to output image Plain lattice, for example, the pixel compartments of output image are vertical 300 units, horizontal 300 units, totally 90000 pixel compartments then can be by image Template is also configured as vertical 300, horizontal 300, totally 90000 grid, and each grid is corresponding with the output same coordinate pixel compartments of image, In a kind of possible embodiment.By above-mentioned mapping relations, the hand-characteristic frame exported in image can be mapped to figure As being updated the cursor tool of advertisement machine according to the hand-characteristic frame position in image template in template, people is realized Machine interaction.

It in the above method, is predicted due to only needing to extract hand-characteristic as target, is not needed to extract and be carried on the back in image Scape texture improves the stability of hand tracking, in addition, due to not needing to extract background texture in image, so also not light requirement Stream information and depth information to the of less demanding of image, therefore improve the same of efficiency as the feature extraction foundation of background texture When reduce requirement to equipment, be capable of handling the image that 2D camera takes.

It should be noted that dynamic gesture identification method provided in an embodiment of the present invention can be applied to advertisement machine, mobile phone, The smart machines such as intelligent terminal, computer, server, tablet computer.

In the embodiment of the present invention, the input picture that will acquire is input to the preparatory trained more box network models of single goal It is predicted, the target hand-characteristic position based on output image is calculated according to the hand-characteristic that prediction obtains, wherein described defeated Entering image includes hand；The hand-characteristic position is mapped to advertisement machine interface corresponding position, institute is updated based on mapping position State the cursor tool position in advertisement machine interface, wherein there are pre-set with the advertisement machine interface for the output image Mapping relations.It is predicted due to carrying out the more boxes of single goal to hand-characteristic, is not required to extract picture background textural characteristics, to reduce Dependence to background improves the stability of hand tracking, is advantageously implemented the landing of advertisement machine.

Fig. 3 is referred to, Fig. 3 is the flow diagram of another hand tracking provided in an embodiment of the present invention, such as Fig. 3 It is shown, comprising the following steps:

301, the initial pictures for taking camera zoom to the size of pre-set dimension, obtain input picture, wherein institute Stating initial pictures includes hand.

Wherein, above-mentioned camera can be 2D camera, and being also possible to other such as 3D cameras etc. has image Depth acquires the image capture device of function, since the more box network models of single goal are to the of less demanding of image, is not required to light stream letter Breath and depth information, therefore the requirement to equipment can be reduced, to save camera cost, preferably 2D is taken the photograph in the embodiment of the present invention As head.Above-mentioned camera can be the camera being built in inside advertisement machine, be also possible to be arranged outside advertisement machine as The camera of peripheral hardware, above-mentioned camera can be one or more, and above-mentioned camera can be the camera of adjustable angle, To obtain the hand images of different angle user.Above-mentioned camera can be automatic focusing camera lens, can be by adjusting automatically Coke gets hand and background size meets image.Above-mentioned initial pictures refer to the figure directly shot by camera Picture can be one, be also possible to it is continuous multiple.Above-mentioned zoom in and out initial pictures can be, and be previously provided with input Image template is zoomed in and out initial pictures by input picture template, for example, if the collected initial pictures size of camera is big In input picture template, then the initial pictures are reduced, so that the initial pictures meet the size of input picture, if camera shooting Collected initial pictures size is less than input picture template, then amplifies the initial pictures, so that the initial pictures Meet the size of input picture, even causing the size for the image shot to change in this way, replacing camera, passes through ruler Very little scaling will not influence the prediction of model.In some possible embodiments, when the size of initial pictures has met The size of input picture can not also be adjusted initial pictures, or can be understood as be to the adjusted value of initial pictures 0.Above-mentioned initial pictures include hand can be understood as initial pictures be to be directed to hand image, in a kind of possible embodiment party In formula, there is no hand in initial pictures, it, then will not be in advertisement machine by the image prediction less than hand-characteristic position data Cursor tool carries out location updating.

302, the input picture that will acquire is input to the preparatory trained more box network models of single goal and is predicted, root It is predicted that obtained hand-characteristic calculates the target hand-characteristic position based on output image, wherein the input picture includes hand Portion.

303, the hand-characteristic position is mapped to advertisement machine interface corresponding position, is updated based on mapping position described wide Cursor tool position in announcement machine interface, wherein there are pre-set mappings with the advertisement machine interface for the output image Relationship.

In above-mentioned steps 301, since initial pictures to be zoomed to the size of pre-set dimension, even replacement camera causes The size for the image shot changes, and will not influence the prediction result of model, reinforces the robustness of model, to improve The scope of application of advertisement machine.

It should be noted that step 301 be it is optional, in some possible scenes, due to camera shooting it is initial Image can be directly input in prediction model as input picture, it may not be necessary to be zoomed in and out to initial pictures.

Include: in the training of a kind of optional embodiment, the more box network models of single goal

Wherein, above-mentioned hand images data set can be user oneself and carry out Image Acquisition and processing, for example, user obtains The scene for getting advertisement machine installation carries out Image Acquisition in the scene of installation and does the stingy as mark of hand；It can also be online It is called, for example, the hand images data in EgoHands data set can be downloaded as the hand images data of training Collection.Hand images can be the environment suitable for advertisement machine installation, such as covered court, outdoor sports under above-mentioned each environment Deng, above-mentioned light source can be the light source of electroluminescent lamp, be also possible to lamp, can also be user oneself customization light source. In above-mentioned hand images include one or more hands and is scratched correspondingly with hand as mark, due to being carried out to hand It scratches as mark, above-mentioned hand images can be the hand images of no Optic flow information and depth information, reduce the data of image Amount, to reduce the equipment requirement predicted Image Acquisition and hand.Above-mentioned should be understood that in the multiple dimensioned relationship of hand To be predicted by the feature frame of different scale a hand.

During training, the image in hand images data set is entered in the more box network models of single goal, mark is passed through The hand of note is scratched as being trained to the more box network models of single goal, and the more box network models of single goal is enable to learn to hand figure Hand-characteristic as in extracts.

It should be noted that the training process is optional, and such as: an advertisement machine needs to carry out hand tracking, this is wide Announcement machine can receive the trained above-mentioned more box network models of single goal of other equipment transmission, or receive user's input The trained above-mentioned more box network models of single goal.

In a kind of optional embodiment, it is more that the input picture that will acquire is input to preparatory trained single goal Box network model is predicted, calculates the target hand-characteristic position based on output image according to the hand-characteristic that prediction obtains, Include:

Wherein, above-mentioned multiple convolutional layers are referred to Fig. 2, above-mentioned multi-scale prediction can according to different convolutional layers into Row, such as: Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2, Conv11_2 in Fig. 2 are for handling different rulers Very little characteristic pattern obtains corresponding hand-characteristic frame, carries out Analysis On Multi-scale Features prediction by above-mentioned convolutional layer, available (38*38*4+19*19*6+10*10*6+5*5*6+3*3*4+1*1*4)=8732 hand-characteristic frame, to this 8732 hands Feature frame carries out duplicate removal, available target hand-characteristic.Above-mentioned duplicate removal can be through confidence level duplicate removal, such as can be The highest hand-characteristic frame of confidence level is chosen under degree of overlapping rule to be retained；Or by intersection duplicate removal, for example can choose The intersection of more outer feature frames, i.e., public domain (intersection) is as last target area；Or by union duplicate removal, for example choose The union of multiple rectangle frames, i.e., the outer square that cuts of the minimum of all feature frames is as target area, as long as certainly here nor intersection Union is just directly taken, the frame for needing to intersect, which meets intersection and accounts for the area of minimum frame, to be reached certain proportion and (namely preset threshold Value) just merge.The position of above-mentioned target hand-characteristic can be by calculating target hand-characteristic frame in input picture or output Coordinate on image is determined, such as centre coordinate or angular coordinate.

In this embodiment, Analysis On Multi-scale Features prediction is carried out by multiple convolutional layers, the precision of prediction can be improved, To improve the precision of hand tracking.

A kind of optional embodiment, the hand-characteristic include confidence level, described to carry out the multiple hand-characteristic Duplicate removal, comprising:

Calculate the degree of overlapping between the multiple hand-characteristic；

The highest hand-characteristic of confidence level is chosen in the hand-characteristic that degree of overlapping reaches pre-set degree of overlapping threshold value Retained, deletes remaining hand-characteristic that degree of overlapping reaches preparatory degree of overlapping threshold value.

Wherein, the confidence level of above-mentioned hand-characteristic is used to indicate that this feature to be the confidence level of hand-characteristic, hand-characteristic Confidence level it is higher, then illustrate that this feature is more likely to be hand-characteristic.Above-mentioned degree of overlapping is understood that multiple hands are special The degree of overlapping of frame is levied, above-mentioned degree of overlapping is alternatively referred to as area and hands over and compare, and refers to the intersecting areas of two feature frames with phase simultaneously The ratio of area is then said for example, the intersecting area for the B feature frame that A feature frame and an area that an area is 10 are 20 is 10 Bright A feature frame is in B feature frame, and simultaneously area is 20 to phase, degree of overlapping 0.5；Or A, B intersection kind are 4, can be obtained Mutually and area is 26, then its degree of overlapping is 0.154.Above-mentioned degree of overlapping threshold value can be one or more, such as: assuming that the first knot It include whole predicted characteristics frame A, B, C, D, E, F, G, H, I, J, K in fruit set, confidence level is sequentially increased that (confidence level of K is most Greatly, the confidence level of A is minimum), the degree of overlapping of K Yu A, B, C, D, E, F, G, H, I can be calculated separately, it is assumed that obtain degree of overlapping difference For ak, bk, ck, dk, ek, fk, gk, hk, ik, then by ak, bk, ck, dk, ek, fk, gk, hk, ik and pre-set first weight Folded degree threshold value compares, and obtains two kinds of comparing results, the first result reaches the first weight for the degree of overlapping of all feature frames and K Folded degree threshold value, it can be assumed that ak, bk, ck, dk, ek, fk, gk, hk, ik reach the first degree of overlapping threshold value, then by A, B, C, D, E, F, G, H, I, J are deleted from the first results set, and obtaining K is target hand-characteristic；Second of result is Partial Feature frame and K Degree of overlapping reaches the first degree of overlapping threshold value, it can be assumed that ck, dk, ek reach the first degree of overlapping degree of overlapping threshold value, then by C, D, E It is deleted from the first results set, retains A, B, F, G, H, I, K is put into the second results set；In second of result, knot Residue character frame is A, B, F, G, H, I in fruit set, and the confidence level highest of I calculates separately I and A, B, F, G, H degree of overlapping, obtains Degree of overlapping ai, bi, fi, gi, hi, can similarly obtain two kinds of comparing results at this time, the first result is all feature frames and I Degree of overlapping reaches the second degree of overlapping threshold value, then deletes A, B, F, G, H from the first results set, and I is put into the second result set In conjunction, the degree of overlapping for calculating K and I deletes I if degree of overlapping reaches the second degree of overlapping threshold value, if degree of overlapping does not reach To the second degree of overlapping threshold value, then the target hand-characteristic that K and I to be identified as two different, i.e., occur in input picture there are two Hand；Second of result reaches the second degree of overlapping threshold value for the degree of overlapping of Partial Feature frame and I, then repeats above-mentioned deletion With contrast operation, until do not have in the first results set feature frame presence, then again by above-mentioned the second results set of calculating Feature frame is deleted and is compared, and will be retained result and is put into third results set, until will not have feature in the second results set Frame exists.It should be noted that the first degree of overlapping is greater than the second degree of overlapping threshold value in above-mentioned example.In this way, can be passed through One degree of overlapping lead to single target hand-characteristic duplicate removal (hand only exports a prediction result i.e. in input picture) It crosses the multiple target hand-characteristics of the second degree of overlapping and is classified that (output of multiple hands is multiple one-to-one pre- i.e. in input picture Survey result).In some possible embodiments, when there are multiple target hand-characteristics, it can be identified and be determined most by the lines of the hand The hand-characteristic of coordinate is obtained eventually.

In the embodiment, it is combined and duplicate removal is carried out to the result of prediction due to increasing degree of overlapping and threshold value, make pre- The hand-characteristic of survey is more accurate, improves the precision of hand position coordinate, to improve the precision of hand tracking.

A kind of optional embodiment, it is described according to the target hand-characteristic, the target hand-characteristic is obtained in institute State the position in output image, comprising:

Wherein, above-mentioned that target hand in output image can be according to target hand-characteristic acquisition target hand-characteristic position The angle steel joint coordinate of portion's feature frame, target hand-characteristic frame are rectangle frame, and one group of above-mentioned angle steel joint can be understood as rectangle frame Diagonal line on two endpoints, therefore only need to obtain cornerwise two endpoints of any bar in the target hand-characteristic frame The coordinate of (being angle steel joint) can then determine target hand-characteristic frame position in the output image and size.Above-mentioned seat Mark can be also possible to pixel coordinate with scale coordinate.In some possible embodiments, can by calculate angle steel joint position, The center for acquiring target hand-characteristic frame is cornerwise midpoint, by the center of target hand-characteristic frame as The position of target hand-characteristic.

In the embodiment, due to that can determine hand by obtaining the angle steel joint coordinate of hand-characteristic in the output image The accuracy of hand tracking is improved in the position of portion's feature in the output image.

A kind of optional embodiment, described that the hand-characteristic position is mapped to advertisement machine interface corresponding position, packet It includes:

Wherein it is possible to will be exported by input picture template as the mapping relations of above-mentioned output image and input picture Hand-characteristic position in image is mapped in input picture, and above-mentioned input picture template and input picture are having the same big It is small.It should be noted that above-mentioned input picture template can according to the hyper parameter of the input size of the more box networks of single goal into Row generates.The image template of size identical as advertisement machine interface reflecting as above-mentioned input picture and advertisement machine interface can be passed through Relationship is penetrated, so that the hand-characteristic position exported in image is mapped in input picture.In some possible embodiments, For example output image is identical as input picture size, can directly be mapped to the hand position for exporting image and advertisement machine interface On the image template of identical size.

It in this embodiment, then will be defeated since the hand-characteristic position exported in image is first mapped back input picture Enter hand-characteristic position in image and be mapped to advertisement machine interface, increases input picture as mapping objects, reduce and be mapped to The distortion level at advertisement machine interface, to improve the accuracy of hand tracking.

It should be noted that the embodiment can be regarded as hand-characteristic position to advertisement machine interface in step 303 Optional embodiment, in some optional scenes, can directly by the hand-characteristic position exported in image be mapped to extensively On the image template of the identical size in announcement machine interface just.

A kind of optional embodiment, the method also includes:

Gesture identification is carried out to the input picture, it is semantic to obtain corresponding gesture, wherein the gesture semanteme is preparatory Setting；

It is semantic according to the gesture, the cursor tool is activated so that the cursor tool executes at the advertisement machine interface Corresponding function.

Wherein it is possible to be identified by gesture of the gesture recognition engine to input picture, to obtain corresponding gesture It is semantic.It should be noted that can be before hand position prediction to the gesture identification of input picture, it is also possible in hand After position prediction, or by different threads, while gesture identification and hand tracking are handled, for example, passing through first thread Input picture is inputted in gesture recognition engine and carries out gesture identification, input picture is inputted by the more boxes of single goal by the second thread Network model is predicted.Above-mentioned gesture semanteme can correspond to the activation instruction of cursor tool, and above-mentioned corresponding function can be, For example, the gesture for stretching out index finger can correspond to the move of cursor tool, and then realize the locomotive function of cursor tool, stretches out The gesture of two fingers can correspond to the dragging instruction of cursor tool, and then realize that function is chosen in dragging, stretch out five fingers Gesture can correspond to the slip instruction of cursor tool, and then realize that (selecting frame) function etc. is chosen in sliding, can specific corresponding to rule To be preset according to user, it is not limited here.

By carrying out gesture identification to input picture, it may thereby determine that whether hand tracking result is used to update cursor work Tool, will not all be tracked in any case, improved the stability of hand tracking, can also be enriched the content of human-computer interaction, Increase the attraction of advertisement machine.

It should be noted that the embodiment be it is optional, in a kind of possible scene, it is only necessary to carry out hand tracking It can realize the interaction of the screens game such as human-computer interaction of advertisement machine, such as " cutting watermelon ".

In the present embodiment, the embodiment shown in Fig. 1 on the basis of, increases the embodiment of plurality of optional, Ke Yiti The stability that hand tracks in high human-computer interaction.

Fig. 4 is referred to, Fig. 4 is a kind of structural schematic diagram of advertisement machine provided in an embodiment of the present invention, as shown in figure 4, packet It includes:

Prediction module 401, the input picture that will acquire be input to the preparatory trained more box network models of single goal into Row prediction calculates the target hand-characteristic position based on output image according to the hand-characteristic that prediction obtains, wherein the input Image includes hand, and the more box network models of single goal are predicted during prediction by target of hand；

Mapping block 402 is based on mapped bits for the hand-characteristic position to be mapped to advertisement machine interface corresponding position Set the cursor tool position updated in the advertisement machine interface, wherein the output image and the advertisement machine interface exist in advance The mapping relations being first arranged.

Optionally, as shown in figure 5, the advertisement machine further include:

Module 403 is obtained, the initial pictures for taking camera zoom to the size of pre-set dimension, inputted Image, wherein the initial pictures include hand.

Optionally, as shown in fig. 6, the prediction module 401 includes:

Convolution submodule 4011, for by multiple convolutional layers in the more box network models of the single goal to the input Image carries out Analysis On Multi-scale Features prediction, obtains the different multiple hand-characteristics of scale；

Duplicate removal submodule 4012 obtains target hand-characteristic for the multiple hand-characteristic to be carried out duplicate removal；

Computational submodule 4013, for obtaining the target hand-characteristic described defeated according to the target hand-characteristic Position in image out.

Optionally, as shown in fig. 7, the hand-characteristic includes confidence level, the duplicate removal submodule 4012 includes:

First computing unit 40121, for calculating the degree of overlapping between the multiple hand-characteristic；

Duplicate removal unit 40122 is set for choosing in the hand-characteristic that degree of overlapping reaches pre-set degree of overlapping threshold value The highest hand-characteristic of reliability is retained, remaining hand-characteristic that degree of overlapping reaches preparatory degree of overlapping threshold value is deleted.

Optionally, as shown in figure 8, the computational submodule 4013 includes:

Second computing unit 40131, for obtaining any one group on the output image of the target hand-characteristic Angle steel joint coordinate；

Determination unit 40132, for determining the hand-characteristic described defeated according to any one group of angle steel joint coordinate Position in image out.

Optionally, as shown in figure 9, the mapping block 402 includes:

First mapping submodule 4021, for the hand-characteristic position of the output image to be mapped to the initial pictures In, obtain the hand position of initial pictures, wherein there are pre-set mappings to close with the initial pictures for the output image System；

Second mapping submodule 4022, for the hand position of the initial pictures to be mapped to the advertisement machine interface In, obtain the hand position in advertisement machine interface, wherein there are pre-set with the advertisement machine interface for the initial pictures Mapping relations.

Optionally, as shown in Figure 10, the advertisement machine further include:

Identification module 404 it is semantic to obtain corresponding gesture, wherein institute for carrying out gesture identification to the input picture Stating gesture semanteme is to preset；

Active module 405, for activating the cursor tool so that the cursor tool is in institute according to the gesture semanteme It states advertisement machine interface and executes corresponding function.

Advertisement machine provided in an embodiment of the present invention can be realized each embodiment in the embodiment of the method for Fig. 1 and Fig. 3, And corresponding beneficial effect, to avoid repeating, which is not described herein again.

It is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention referring to Figure 11, Figure 11, as shown in figure 11, Include: memory 1102, processor 1101 and is stored on the memory 1102 and can be run on the processor 1101 Computer program, in which:

The computer program that processor 1101 is used to that memory 1102 to be called to store executes following steps:

Optionally, processor 1101 is also used to execute the acquisition of the input picture, comprising:

Optionally, the input picture that will acquire that processor 1101 executes is input to preparatory trained single goal More box network models are predicted, calculate the target hand-characteristic position based on output image according to the hand-characteristic that prediction obtains It sets, comprising:

According to the target hand-characteristic, position of the target hand-characteristic in the output image is obtained.It is optional , the hand-characteristic that processor 1101 executes includes confidence level, described that the multiple hand-characteristic is carried out duplicate removal, packet It includes:

Calculate the degree of overlapping between the multiple hand-characteristic；

Optionally, what processor 1101 executed is described according to the target hand-characteristic, obtains the target hand-characteristic Position in the output image, comprising:

Optionally, what processor 1101 executed described be mapped to advertisement machine interface for the hand-characteristic position and correspond to position It sets, comprising:

The hand-characteristic position of the output image is mapped in the initial pictures, the hand position of initial pictures is obtained It sets, wherein there are pre-set mapping relations with the initial pictures for the output image；

The hand position of the initial pictures is mapped in the advertisement machine interface, the hand in advertisement machine interface is obtained Position, wherein there are pre-set mapping relations with the advertisement machine interface for the initial pictures.

Optionally, processor 1101 is also used to execute following steps:

The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, the computer program realize that dynamic gesture identification method provided in an embodiment of the present invention is implemented when being executed by processor Each process of example, and identical technical effect can be reached, to avoid repeating, which is not described herein again.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the program can be stored in a computer-readable storage medium In, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, the storage medium can be magnetic Dish, CD, read-only memory (Read-Only Memory, ROM) or random access memory (Random Access Memory, abbreviation RAM) etc..

The above disclosure is only the preferred embodiments of the present invention, cannot limit the right model of the present invention with this certainly It encloses, therefore equivalent changes made in accordance with the claims of the present invention, is still within the scope of the present invention.

Claims

1. a kind of hand tracking is used for advertisement machine interface alternation characterized by comprising

The input picture that will acquire is input to the preparatory trained more box network models of single goal and is predicted, according to measuring in advance The hand-characteristic arrived calculates the target hand-characteristic position based on output image, wherein the input picture includes hand, it is described The more box network models of single goal are predicted during prediction by target of hand；

The hand-characteristic position is mapped to advertisement machine interface corresponding position, the advertisement machine interface is updated based on mapping position In cursor tool position, wherein there are pre-set mapping relations with the advertisement machine interface for the output image.

2. the method as described in claim 1, which is characterized in that the acquisition of the input picture includes:

The initial pictures that camera is taken zoom to the size of pre-set dimension, obtain input picture, wherein the initial graph As including hand.

3. method according to claim 2, which is characterized in that the training of the more box network models of single goal includes:

Hand images data set is obtained, the hand images data set includes the hand images under each environment and/or light source, institute Stating hand images includes the stingy as mark of hand；

The more box network models of the single goal are trained using the stingy hand images as mark for including hand, each hand It include the multiple dimensioned relationship for learning to correspond in the hand images hand in the training process of image.

4. method as claimed in claim 3, which is characterized in that the input picture that will acquire is input to be trained in advance The more box network models of single goal predicted, the target hand based on output image is calculated according to the obtained hand-characteristic of prediction Feature locations, comprising:

Analysis On Multi-scale Features prediction is carried out to the input picture by multiple convolutional layers in the more box network models of the single goal, Obtain the different multiple hand-characteristics of scale；

5. method as claimed in claim 4, which is characterized in that the hand-characteristic includes confidence level, it is described will be the multiple Hand-characteristic carries out duplicate removal, comprising:

Calculate the degree of overlapping between the multiple hand-characteristic；

The highest hand-characteristic of confidence level is chosen in the hand-characteristic that degree of overlapping reaches pre-set degree of overlapping threshold value to carry out Retain, deletes remaining hand-characteristic that degree of overlapping reaches preparatory degree of overlapping threshold value.

6. method as claimed in claim 5, which is characterized in that it is described according to the target hand-characteristic, obtain the target Position of the hand-characteristic in the output image, comprising:

7. the method as described in any in claim 1 to 6, which is characterized in that described to be mapped to the hand-characteristic position Advertisement machine interface corresponding position, comprising:

The hand-characteristic position of the output image is mapped in the input picture, the hand position of input picture is obtained, Wherein, there are pre-set mapping relations with the input picture for the output image；

The hand position of the input picture is mapped in the advertisement machine interface, the hand position in advertisement machine interface is obtained It sets, wherein there are pre-set mapping relations with the advertisement machine interface for the input picture.

8. the method as described in any in claim 1 to 6, which is characterized in that the method also includes:

Gesture identification is carried out to the input picture, it is semantic to obtain corresponding gesture, wherein the gesture semanteme is to set in advance It sets；

It is semantic according to the gesture, the cursor tool is activated so that the cursor tool executes correspondence at the advertisement machine interface Function.

9. a kind of advertisement machine characterized by comprising

Prediction module, the input picture that will acquire are input to the preparatory trained more box network models of single goal and are predicted, The target hand-characteristic position based on output image is calculated according to the hand-characteristic that prediction obtains, wherein the input picture includes Hand, the more box network models of single goal are predicted during prediction by target of hand；

Mapping block is updated for the hand-characteristic position to be mapped to advertisement machine interface corresponding position based on mapping position Cursor tool position in the advertisement machine interface, wherein the output image exists with the advertisement machine interface to be preset Mapping relations.

10. a kind of advertisement machine characterized by comprising memory, processor and be stored on the memory and can be described The computer program run on processor, the processor are realized when executing the computer program as appointed in claim 1 to 8 Step in hand tracking described in one.