CN110232311B

CN110232311B - Method and device for segmenting hand image and computer equipment

Info

Publication number: CN110232311B
Application number: CN201910345761.3A
Authority: CN
Inventors: 侯丽; 王福晴
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-04-26
Filing date: 2019-04-26
Publication date: 2023-11-14
Anticipated expiration: 2039-04-26
Also published as: CN110232311A; WO2020215565A1

Abstract

The application discloses a method and a device for segmenting hand images and computer equipment, relates to the technical field of computers, and can effectively avoid the problem that skin color confusion, illumination and deformation influence hand image detection when the hand images are segmented. The method comprises the following steps: collecting a sample image containing a complete hand image; labeling the coordinate position of the hand region in the sample image; taking the sample image marked with the coordinate position as a training set, and training based on a fast R-CNN algorithm to obtain a hand recognition model with a training result meeting a preset standard; detecting whether the image to be identified contains a hand image or not by using the hand identification model; and outputting a hand image segmentation result of the image to be identified according to the detection result. The method is suitable for detecting and segmenting the hand image in the picture.

Description

Method and device for segmenting hand image and computer equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for segmenting a hand image, and a computer device.

Background

Gestures are a very human-machine interaction mode without an intermediate medium. Gesture recognition has become an important content of man-machine interaction and a research hotspot. Vision-based gesture recognition systems typically include processes of hand segmentation, gesture modeling, gesture shape feature extraction, and the like. The purpose of hand segmentation is to divide hands from captured gesture images, which is the first step in the vision-based gesture recognition process and is also the key step. The accuracy and the real-time performance of the segmentation directly influence the later recognition effect and the performance of the whole interactive system, so that the segmentation effect and the segmentation speed of the hand segmentation are improved by performing deep study on the machine learning of the hand segmentation, and the method has important significance.

The most common method at present is to use skin color to segment hands, and the skin color model to segment hands can not consider the changeable geometric characteristics of the hands, but can introduce the interference of illumination components, meanwhile, the complexity of the background and the difference of detected human species can influence the detection result, and how to overcome the two problems becomes one of the main research directions of the hand segmentation at present.

Disclosure of Invention

In view of this, the application provides a method, a device and a computer device for segmenting hand images, which mainly aims to solve the problem that skin color confusion, illumination and deformation interfere with hand image detection when segmenting hand images, so that the segmentation result is inaccurate.

According to an aspect of the present application, there is provided a method of segmenting a hand image, the method comprising:

collecting a sample image containing a complete hand image;

labeling the coordinate position of the hand region in the sample image;

taking the sample image marked with the coordinate position as a training set, and training based on a fast R-CNN algorithm to obtain a hand recognition model with a training result meeting a preset standard;

detecting whether the image to be identified contains a hand image or not by using the hand identification model;

And outputting a hand image segmentation result of the image to be identified according to the detection result.

According to another aspect of the present application, there is provided a hand image segmentation apparatus, the apparatus comprising:

the acquisition module is used for acquiring a sample image containing a complete hand image;

the labeling module is used for labeling the coordinate positions of the hand areas in the sample image;

the training module is used for training the sample image marked with the coordinate position as a training set based on the Faster R-CNN algorithm to obtain a hand recognition model with a training result meeting a preset standard;

the detection module is used for detecting whether the hand image is contained in the image to be identified or not by utilizing the hand identification model;

and the output module is used for outputting the hand image segmentation result of the image to be identified according to the detection result.

According to still another aspect of the present application, there is provided a non-volatile readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described hand image segmentation method.

According to still another aspect of the present application, there is provided a computer device comprising a non-volatile readable storage medium, a processor and a computer program stored on the non-volatile readable storage medium and executable on the processor, the processor implementing the above method of segmenting hand images when executing the program.

By means of the technical scheme, compared with the existing method for segmenting the hand image by using the skin color model, the hand image segmentation method, device and computer equipment provided by the application have the advantages that the hand recognition model is built, the fast R-CNN algorithm is used for training the hand recognition model, and continuous machine learning and correction are carried out on judgment of the hand image position in the sample image, so that the training result can meet the preset standard, finally, whether the hand image is contained in the image to be recognized is judged by using the hand recognition model which is successfully trained, and the corresponding hand image segmentation result is output.

The foregoing description is only an overview of the present application, and is intended to be implemented in accordance with the teachings of the present application, as it is to be understood that the following detailed description of the application is intended to provide a better understanding of the present application, as it is intended to provide further understanding of the present application, as it is claimed, along with the additional objects, features and advantages of the present application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the present application. In the drawings:

fig. 1 is a schematic flow chart of a method for segmenting a hand image according to an embodiment of the present application;

fig. 2 is a flow chart illustrating another method for segmenting hand images according to an embodiment of the present application;

fig. 3 is a schematic diagram of capturing a rectangular palm area in an optimal detection image according to an embodiment of the present application;

fig. 4 is a schematic diagram of a portion of a cut-away rectangular palm area overlapping with an elliptical to-be-cut area according to an embodiment of the present application;

fig. 5 shows a schematic structural diagram of a device for segmenting a hand image according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of another hand image segmentation apparatus according to an embodiment of the present application.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

Aiming at the problem that skin color confusion, illumination and deformation can interfere with hand image detection when the hand image is segmented at present, so that a segmentation result is not accurate enough, the embodiment provides a hand image segmentation method, as shown in fig. 1, which comprises the following steps:

101. a sample image is acquired that contains a complete hand image.

In a specific application scenario, in order to better achieve the purpose of machine learning, so that the analysis results are more uniform and have pertinence, when a sample image is selected, the precondition that the image contains a complete hand image needs to be satisfied.

102. The coordinate position of the hand region in the sample image is noted.

In a specific application scene, because the sample image may contain complex backgrounds such as hands, faces, arms and surrounding environments, in order to eliminate irrelevant interference, the positions of the hands in the complex image need to be marked, so that the identification area is more prominent and the size numerical deviation is conveniently calculated, and further, the positions of the hands can be accurately extracted.

103. And taking the sample image marked with the coordinate position as a training set, and training based on the Faster R-CNN algorithm to obtain a hand recognition model with a training result meeting a preset standard.

The hand recognition model can be used for detecting and positioning, namely whether hands appear in the image or not is determined, and the area where the hands are located is found; the hand recognition model can be used for hand segmentation, and hand areas are extracted from the picture, so that the interference of the background is removed.

104. And detecting whether the image to be identified contains the hand image or not by using the hand identification model.

In a specific application scenario, after the hand recognition model reaches the training standard, detection of an input image can be performed, when a key which triggers palm print detection by a user, namely, a gesture operation which carries out palm print detection by the user is detected, a camera is started to acquire image data in an image recognition area, and whether the image to be recognized contains a hand image or not is determined by utilizing the hand recognition model.

105. And outputting a hand image segmentation result of the image to be identified according to the detection result.

The hand image segmentation result may include two cases, one is that the hand image is detected to be included in the image to be identified, and the segmented hand image may be output as the hand image segmentation result; and the other method can correspondingly output corresponding prompt contents when the detected image to be identified does not contain the hand image, and takes the prompt contents of the detected image without the hand as a final output segmentation result.

According to the method for segmenting the hand image, a sample image containing a complete hand image can be utilized, in order to simulate and identify the hand image with better training accuracy in the model training process, the coordinate position of a hand region in the sample image is required to be marked, meanwhile, the hand identification model is trained based on the Faster R-CNN algorithm, after the hand identification model is judged to meet the preset standard, the hand image to be identified can be put into use, whether the hand image is contained in the image to be identified can be detected, and the hand image segmentation result of the image to be identified is output according to the detection result.

Further, as a refinement and extension of the specific implementation of the foregoing embodiment, for fully explaining the specific implementation process in this embodiment, another method for segmenting a hand image is provided, as shown in fig. 2, where the method includes:

201. A sample image is acquired that contains a complete hand image.

In a specific application scene, in order to avoid the situation that the training results have contingency or the situation that the data analysis is difficult due to too few selected sample images or too many selected sample images, the actual situation should be considered when the sample images are selected, and the corresponding preset threshold value should be set according to the specific application scene, so that the number of the selected sample images can better meet the analysis requirements of users.

202. The coordinate position of the hand region in the sample image is noted.

In an alternative embodiment, step 202 may specifically include: creating an image coordinate system by taking the upper left corner of the sample image as an origin; in an image coordinate system, four-point coordinate positions of the finger tips of the thumb, the finger tips of the little finger, the finger tips of the middle finger and the root of the palm of the hand are determined; determining a rectangular frame where the hand is located according to the abscissa of the thumb tip and the little finger tip and the ordinate of the middle finger tip and the palm root; and marking the coordinate positions of the upper left corner and the lower right corner of the rectangular frame.

For example, according to development requirements, 1000 images are screened out from a picture database, pictures which are clear and contain complete hand image data are taken as sample images, four-point coordinate positions of the finger tips of the hands, the fingertips of the little fingers and the fingertips of the middle fingers and the root parts of the palms in the 1000 images are respectively determined, the width of a rectangular frame where the hands are located is respectively determined according to the horizontal coordinates of the fingertips of the thumb and the fingertips of the little fingers, the height of the rectangular frame where the hands are located is determined according to the vertical coordinates of the fingertips of the middle fingers and the root parts of the palms, the specific positions of the rectangular frames of the respective hand images in the 1000 images can be determined, and finally, the coordinate positions of the left upper corner and the right lower corner of each rectangular frame are marked for position recording, identification and correction.

203. And inputting the sample image marked with the coordinate position into an initial hand recognition model which is created based on the Faster R-CNN algorithm in advance.

The initial hand recognition model is created in advance according to design requirements, and the difference between the initial hand recognition model and the hand recognition model is as follows: the initial hand recognition model is only initially created, model training is not passed, and preset standard is not met, the hand data recognition model is a recognition model which reaches the preset standard and can be applied to detection of an image to be recognized through model training, the creation principle of the model is based on a Faster-R-CNN algorithm, the hand recognition model is composed of two main modules, namely a PRN candidate frame extraction module and a Faster-R-CNN detection module, the RPN is a full convolution neural network and is used for extracting a rectangular detection frame; the fast R-CNN detects and identifies targets in the proposal based on the proposal extracted by the RPN, the input of the hand identification model in the training process is a sample image marked with coordinate positions, and the output data is a preset number of hand image suggestion windows containing evaluation scores.

204. And extracting hand image features of the sample image by using a depth convolution neural network CNN of the initial hand recognition model.

In a specific application scenario, the whole picture needs to be input into a CNN convolutional neural network to perform repeated machine learning of hand feature images.

205. Based on the hand image features, the regional advice network RPN generates a predetermined number of advice windows proposals.

In a specific application scenario, a regional suggestion network (Region Proposal Network, RPN) is required to generate suggestion windows (proposals), a preset number of suggestion windows are generated for each picture, a convolution network is creatively adopted by the fast-R-CNN to generate suggestion frames by itself, and the convolution network is shared with an object detection network to generate 300 suggestion frames, wherein the number of the suggestion frames is 300, the input of the RPN is an image and is output as a series of rectangular detection frames (proposals), each proposals corresponds to a respective object score, the confidence score is 100% and represents the confidence degree of whether the center of the patch contains an object, the confidence score is 100% and the confidence interval represents the probability degree that a hand image appears in the suggestion frame, and the higher the object score is, the more complete the hand image contained in the suggestion window is represented.

206. Each proposed window is mapped onto the last layer convolution feature map of the CNN to generate a fixed size.

In a specific application scene, each RoI Pooling layer can generate a feature map with a fixed size through the RoI Pooling layer, wherein the RoI Pooling layer is responsible for collecting all candidate frames, calculating a feature map of each candidate frame and then sending the feature map to a subsequent network; the image size input after the network training must be a fixed value, while the network output is also a fixed size. The ROI Pooling is to pool feature maps with different sizes into feature maps with the same size, which is beneficial to output to the next layer network.

207. And training and correcting the detection images in each suggestion window by using the detection classification probability Softmax Loss and the detection frame regression Smooth L1 Loss so as to enable the initial hand recognition model to meet the preset standard.

The detection classification probability is the probability that the system correctly detects the target when the target exists, and the calculation method can be as follows: detection classification probability = number of correctly detected target images/number of total sample images involved in detection, in this scheme, the detection classification probability is the probability that a hand recognition model judges that a hand image appears when sample image data with a complete hand image is input to the model; the detection frame regression is used for fine-tuning the images in the candidate frames when the images in the candidate frames are not positioned accurately (overlapping degree (Intersection overUnion, ioU) < preset value), so that the fine-tuned window is closer to the annotated image data, and the positioning is more accurate, wherein the IOU is used for evaluating the positioning accuracy and represents the overlapping degree of two bounding boxes, and the calculation formula is as follows: IOU= (A.u.B)/(A.u.B), i.e. the overlapping area of rectangular box A, B is the proportion of the area of A, B union.

In an alternative embodiment, step 207 may specifically include: matching the coincidence degree of the detected image and the hand image in the rectangular frame marked with the coordinate position; determining the value of the coincidence degree as a corresponding evaluation score; if the number of the assessment scores which are larger than or equal to the first preset threshold value in the suggestion window meets the first preset number condition, determining that the training result of the initial hand recognition model reaches a preset standard; and if the number of the assessment scores which are larger than or equal to the first preset threshold value in the advice window does not meet the first preset number condition, correcting the advice window with the assessment score lower than the first preset threshold value according to the coordinate position of the hand image which is actually marked.

The first preset threshold is a numerical comparison basis for judging whether the identification result of the single suggestion window can meet the preset standard, and when the evaluation score of the suggestion window is larger than or equal to the first preset threshold, the identification result of the window meets the expected standard, otherwise, the identification result of the suggestion window does not meet the expected standard. The first preset number of conditions is a judging standard for judging whether the hand recognition model training can pass the verification, the first preset number of conditions is a minimum score duty ratio, when the number of the recommended windows and the total number of the windows which are larger than or equal to a first preset threshold value are larger than or equal to the minimum score duty ratio, the first preset number of conditions are met, and otherwise, the first preset number of conditions are not met. The higher the minimum score ratio set in the first preset number of conditions is, the more strictly the representative model is trained, the more accurate the model analysis conclusion is, when the number of the recommended windows meeting the expected standard in all the recommended windows is judged to meet the first preset number of conditions, the hand recognition model is proved to pass the training, otherwise, the hand recognition model does not pass the training, and the hand image still needs to be further recognized and corrected.

For example, if the set second preset threshold is 95%, and 200 detection probabilities are higher than 95% in the 300 recommended windows, and if the minimum score ratio in the second preset number of conditions for judging whether the hand image is included in the image to be identified is 1/2, the ratio of the number of recommended windows meeting the expected standard to the total recommended windows can be calculated first: 200/300=2/3, and if the numerical comparison determines that 2/3 is greater than 1/2, it can be further determined that the hand recognition model is trained.

In a specific application scenario, when fine tuning is performed by using detection frame regression, four-dimensional vectors can be used to represent (x, y, w, h) for a rectangular detection frame of a suggestion window, and the four-dimensional vectors respectively correspond to the coordinates of the center point, the width and the height of a target frame, for example, an original foreground anchor is represented by a, a group trunk of a target is represented by G, and the target is to find a relationship, so that an input original Anchor is mapped to a regression window G' closer to a real frame G, namely:

given: a= (Ax, ay, aw, ah), g= (Gx, gy, gw, gh);

finding a transformation F such that F (Ax, ay, aw, ah) = (Gx ', gy', gw ', gh'), where (Gx, gy, gw, gh) ≡ (Gx ', gy', gw ', gh');

f (a) =g' can be achieved by panning and scaling:

Translation:

Gx′＝Ax+Aw*dx(A)

Gy′＝Ay+Ah*dy(A)

scaling:

Gw′＝Aw*exp(dw(A))

Gh′＝Ah*exp(dh(A))

in the above formula, dx (a), dy (a), dw (a), dh (a), wherein (aw×dx (a), aw×dy (a)) represents the offset of the center distance of two frames. When the input anchor is less different from G, the transformation can be considered a linear transformation, and linear regression can be used to model the target box for fine tuning.

208. And uploading the image to be identified into the hand identification model.

When the hand recognition model reaches a preset standard, the hand recognition model can be put into the application of hand image detection of the images to be recognized, and any images to be recognized, which are unknown whether the images contain the hand images, are uploaded into the hand recognition model, so that the hand images can be detected and recognized.

209. And determining a preset number of suggestion windows corresponding to the image to be identified and detection probabilities corresponding to the suggestion windows by using a fast R-CNN algorithm.

The detection probability is judging data for judging whether the recommended window contains the hand image, and the higher the detection probability is, the larger the probability that the corresponding recommended window contains the hand image is.

210. And if the number of the recommended windows with the detection probability being greater than or equal to the second preset threshold value meets the second preset number condition, determining that the image to be identified contains the hand image.

The second preset threshold is the minimum detection probability that each advice window contains the hand image; the second preset number of conditions is a criterion for judging whether the image to be identified contains the hand image, namely, a minimum score ratio is set, when the ratio of the number of all the recommended windows with the detection probability larger than the second preset threshold to the total recommended windows is larger than or equal to the minimum score ratio in the second preset number of conditions, the image to be identified contains the hand image, otherwise, the image to be identified does not contain the hand image.

For example, if the set second preset threshold is 90%, and 100 detection probability is determined to be greater than or equal to 90% in 300 recommended windows, and the minimum score ratio in the preset second preset number of conditions is 1/3, the ratio of the number of recommended windows with detection probability greater than or equal to the second preset threshold to the total recommended windows may be calculated first: 100/300=1/3, and the calculated ratio of 1/3 to the second predetermined number is determined by comparing the values, so that the hand image included in the image to be identified can be further determined.

211. If the number of the recommended windows with the detection probability being greater than or equal to the second preset threshold value does not meet the second preset number condition, determining that the image to be identified does not contain the hand image.

For example, if the set second preset threshold is 90%, and 100 detection probability is determined to be greater than or equal to 90% in 300 recommended windows, and the minimum score ratio in the preset second preset number of conditions is 1/2, the ratio of the number of recommended windows with detection probability greater than or equal to the second preset threshold to the total recommended windows may be calculated first: 100/300=1/3, and if the value comparison determines that 1/3 is smaller than 1/2, it can be further determined that the image to be identified does not contain a hand image.

212. And if the image to be identified contains the hand image, selecting a suggestion window with the highest detection score as an optimal detection image.

In a specific application scenario, after the image to be identified is determined to include the hand image, the detection probability value corresponding to the suggestion window with the highest detection score is selected as an optimal detection image, namely, the most complete hand detection image is correspondingly identified by further comparing the detection probability value corresponding to the suggestion window with the second preset threshold value in 300 suggestion windows.

213. And outputting the optimal detection image as a hand image segmentation result.

In a specific application scene, the most complete hand optimal detection image is finally determined to be the last segmented hand image, and the segmented hand image is further displayed on a display page.

214. If the image to be identified does not contain the hand image, outputting prompt information of the hand image which is not detected on the display page.

The prompt information can comprise character prompt information, picture prompt information, audio prompt information, video prompt information, lamplight prompt information, vibration prompt information and the like of the display page.

In a specific application scenario, in order to better implement application to the hand segmentation image, as an optional application scenario, the embodiment may further include: extracting the positions of fingertips, finger roots and palms of the hands in the optimal detection image by using a key point detection algorithm; intercepting a rectangular palm area of the optimal detection image according to the positions of the finger root and the palm center; determining an elliptical to-be-cut area of the thumb part by utilizing the fingertip position; cutting off the coincident part of the rectangular palm area and the elliptical area to be cut; comparing the palm print similarity of the rectangular palm area after cutting and the pre-stored user palm print image; and judging the user identity corresponding to the image to be identified according to the user identity corresponding to the user palm print image with the palm print similarity being greater than or equal to the preset threshold value.

In a specific application scenario, the method for capturing the rectangular palm area of the optimal detection image according to the finger root and palm center positions may be (as shown in fig. 3): and determining an included angle between a connecting line of the index finger root A and the little finger root B and an intersecting horizontal line as a deviation angle, and carrying out rotation adjustment on the palm area according to the angle so as to enable the AB to rotate to the horizontal direction. Selecting a midpoint E point and a palm center point C point of an AB line segment, enabling the length of CE to be equal to 1/2 of the length of AB, enabling CE to be perpendicular to AB, establishing an image coordinate system, determining the C point in a rotated palm as the midpoint (x, y) of a rectangular palm area, and intercepting the rectangular palm area by using the rectangular areas with the points p1 (x-length, y-length), p2 (x+length, y-length), p3 (x-length, y+length) and p4 (x+length, y+length) as vertexes.

Accordingly, the method for determining the elliptical to-be-cut area of the thumb part by utilizing the fingertip position can be as follows (as shown in fig. 4): and determining the left-right hand attribute of the hand according to the detection result of the key points. Specifically, the positions of a finger root point B of the little finger and a finger tip point T of the middle finger are determined according to the key point detection, and the left and right attributes of the hand can be determined according to the point B and the point T. In the image coordinate system, if the ordinate Ty of the T point is greater than the ordinate By of the B, the hand points downwards, the is_up mark is False, and if not, the hand points upwards, the is_up mark is True. I.e. the up-down direction of the hand is first determined by the ordinate of the two points. If is_up=true and the abscissa Tx < B of T-point is/up=false and the abscissa Tx > B of T-point is/up=false, then this hand is the left hand, otherwise the right hand. And determining the direction of the thumb according to the detection result of the left and right hand attributes. The origin and major axis of the ellipse are determined on the rectangular border on the thumb side of the rectangular palm area. The elliptic long half shaft takes 2/5 of the height of the rectangular palm area to be segmented, and the elliptic short half shaft takes 1/4 of the height of the rectangular palm area to be segmented. The position of the center point of the ellipse and the long and short axes determine the position of the thumb part, which is the position of the cut image, so the positions of the thumbs of the left hand and the right hand are different according to the row and column number of the palm image, and the positions of the center points of the ellipse are also different. If the hand is right hand, the x coordinate of the ellipse center point takes the number of columns of the image, and the y coordinate takes 4/5 of the number of rows of the image; in the case of the left hand, the x-coordinate of the center point of the ellipse is taken as 0, and the coordinate is taken as 4/5 of the line number of the image.

In a specific application scenario, after determining an elliptical image corresponding to a thumb, in order to cut out a portion with disorder lines in a rectangular palm area, a method for cutting out a portion, which coincides with an elliptical to-be-cut area, in the rectangular palm area may be as follows: the pixel at the intersection of this ellipse and the rectangular palm area is set to 0, i.e., the thumb portion is cut off.

Correspondingly, according to the user identity corresponding to the user palm print image with the palm print similarity larger than or equal to the preset threshold, the method for judging the user identity corresponding to the image to be identified can be as follows: and acquiring the characteristics of the palm print image after the thumb is cut off through machine learning, and comparing the characteristics with the palm print image recorded by a pre-stored user. And matching the characteristics of the two palmprint images, and extracting the palmprint characteristics by using MobileNet. The MobileNet extracts the characteristic vector of the palm print, judges whether to match by calculating the cosine similarity of the characteristic vectors of two palm print images to be compared, can judge the distance between the two palm print images by using the Euclidean distance, and matches if the cosine similarity of the two characteristic vectors reaches a preset threshold value, otherwise does not match, thereby realizing the verification of the user identity.

Wherein, the basic unit of MobileNet is depth level separable convolution (depthwise separable convolution- - -DSC), depthwise separable convolution total calculated amount is: d (D) _K *D _K *M*D _F *D _F +M*N*D _F *D _F Wherein D is _F *D _F * M is the size of the input feature map, D _F *D _F * N is the output signature size, M, N refers to the number of channels; the cosine similarity of the feature vectors of the two palmprint images is calculated, namely the similarity degree of the two palmprint images is judged by calculating the cosine value of the vector, and the closer the cosine value is to 1, the higher the similarity of the two palmprint images is, and the calculation formula of the cosine value is as follows:wherein x and y are respectively the characteristic vector of the palm print image after the thumb is cut off and the characteristic vector of the palm print image input by the userN is the number of all feature vectors correspondingly contained in the two palmprint images; the similarity of two palmprint images can also be achieved by comparing Euclidean distances between a plurality of two points, for example, coordinates of two points on a two-dimensional plane corresponding to the palmprint images are a (x 1, y 1) and b (x 2, y 2), and a formula for calculating Euclidean distance between the two points a and b can be as follows:

according to the method for segmenting the hand image, the hand recognition model can be created based on the Faster R-CNN algorithm, in the process of model training by using the sample image with the marked coordinate position, whether the recognition result of the single suggestion window can meet the preset standard or not is judged by using the first preset threshold, whether the hand recognition model training can pass verification is further judged by using the first preset number of conditions, the model training process is more accurate by using the method of double threshold limitation, when the model training does not reach the expected standard, the suggestion window with the assessment score lower than the first preset threshold is corrected according to the coordinate position of the hand image with the actual mark, the hand recognition model can meet the requirements of a user, in addition, the mode of limiting the second preset threshold and the second preset number of conditions is still adopted for the detection result when the hand recognition model is put into use, the conclusion that whether the finally judged image to be recognized contains the hand image is more persuasive and accurate, and finally the segmentation result of the hand image is intuitively displayed, and the hand image can be accurately detected and accurately in a position under the condition that the hand image is detected and inclined under the non-extreme conditions, and the image is accurately detected.

Further, as an embodiment of the method shown in fig. 1 and fig. 2, an embodiment of the present application provides a device for segmenting a hand image, as shown in fig. 5, where the device includes: the system comprises an acquisition module 31, a labeling module 32, a training module 33, a detection module 34 and an output module 35.

An acquisition module 31 operable to acquire a sample image containing a complete hand image;

the labeling module 32 is configured to label coordinate positions of hand regions in the sample image;

the training module 33 is configured to train the sample image with the marked coordinate position as a training set based on the fast R-CNN algorithm to obtain a hand recognition model with a training result meeting a preset standard;

the detection module 34 may be configured to detect whether the image to be identified includes a hand image by using the hand recognition model;

the output module 35 may be configured to output a hand image segmentation result of the image to be identified according to the detection result.

In a specific application scenario, in order to accurately label the coordinate position of the hand region in the sample image, the labeling module 32 may be further configured to create an image coordinate system with the upper left corner of the sample image as the origin; in an image coordinate system, four-point coordinate positions of the finger tips of the thumb, the finger tips of the little finger, the finger tips of the middle finger and the root of the palm of the hand are determined; determining a rectangular frame where the hand is located according to the abscissa of the thumb tip and the little finger tip and the ordinate of the middle finger tip and the palm root; and marking the coordinate positions of the upper left corner and the lower right corner of the rectangular frame.

Correspondingly, in order to train to obtain a hand recognition model with training results meeting preset standards, the training module 33 may be further configured to input a sample image with a labeled coordinate position into an initial hand recognition model created in advance based on the fast R-CNN algorithm; extracting hand image features of the sample image by using a depth convolutional neural network CNN of the initial hand recognition model; generating a preset number of advice windows proposals by the regional advice network RPN according to the hand image characteristics; mapping each advice window to the last layer of convolution feature map of the CNN to generate a fixed size; and training and correcting the detection images in each suggestion window by using the detection classification probability Softmax Loss and the detection frame regression Smooth L1 Loss so as to enable the initial hand recognition model to meet the preset standard.

In a specific application scenario, in order to implement training correction on the detected images in each suggestion window, the training module 33 may be further configured to perform overlap ratio matching on the detected images and the hand images in the rectangular frame marked with the coordinate positions; determining the value of the coincidence degree as a corresponding evaluation score; if the number of the assessment scores which are larger than or equal to the first preset threshold value in the suggestion window meets a first preset number ratio, determining that the training result of the initial hand recognition model reaches a preset standard; and if the number of the assessment scores in the suggestion window which is larger than or equal to the first preset threshold value does not meet the first preset number ratio, correcting the suggestion window with the assessment score lower than the first preset threshold value according to the coordinate position of the hand image which is actually marked.

Correspondingly, in order to utilize the trained hand recognition model to detect whether the hand image is included in the image to be recognized, the detection module 34 may be further configured to upload the image to be recognized into the hand recognition model; determining a preset number of suggestion windows corresponding to the image to be identified and detection probabilities corresponding to the suggestion windows by using a Faster R-CNN algorithm, wherein the detection probabilities are judging data for judging whether the suggestion windows contain hand images or not; if the number of the recommended windows with the detection probability being greater than or equal to a second preset threshold value meets a second preset number ratio, determining that the image to be identified contains the hand image; if the number of the recommended windows with the detection probability being greater than or equal to the second preset threshold value does not meet the second preset number ratio, determining that the image to be identified does not contain the hand image.

In a specific application scenario, in order to output a hand image segmentation result of an image to be identified, the output module 35 may be further configured to select a suggestion window with the highest detection score as an optimal detection image if it is determined that the image to be identified includes a hand image; outputting the optimal detection image as a hand image segmentation result; if the image to be identified does not contain the hand image, outputting prompt information of the hand image which is not detected on the display page.

In a specific application scenario, in order to implement application of the segmented hand image, as shown in fig. 6, the present apparatus further includes: the device comprises an extraction module 36, an interception module 37, a determination module 38, a cutting module 39, a comparison module 310 and a judgment module 311.

The extracting module 36 is configured to extract the fingertip, the finger root and the palm center positions of the hand in the optimal detection image by using a key point detection algorithm;

the intercepting module 37 is used for intercepting a rectangular palm area of the optimal detection image according to the positions of the finger root and the palm center;

a determination module 38 operable to determine an elliptical to-be-cut region of the thumb portion using the fingertip position;

a cutting module 39, which is used for cutting off the overlapping part of the rectangular palm area and the elliptical area to be cut;

the comparison module 310 may be configured to compare the palm print similarity of the resected rectangular palm area with a pre-stored user palm print image;

the determining module 311 may be configured to determine, according to a user identity corresponding to a user palm print image with a palm print similarity greater than or equal to a preset threshold, a user identity corresponding to an image to be identified.

It should be noted that, for other corresponding descriptions of each functional unit related to the hand image segmentation apparatus provided in the present embodiment, reference may be made to corresponding descriptions in fig. 1 to fig. 2, and detailed descriptions thereof are omitted herein.

Based on the above-mentioned methods shown in fig. 1 and 2, correspondingly, the embodiment of the present application further provides a storage medium, on which a computer program is stored, which when executed by a processor, implements the above-mentioned hand image segmentation method shown in fig. 1 and 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to execute the method of each implementation scenario of the present application.

Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiments shown in fig. 5 and fig. 6, in order to achieve the above objects, the embodiments of the present application further provide a computer device, which may specifically be a personal computer, a server, a network device, etc., where the entity device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the above-described hand image segmentation method as shown in fig. 1 and 2.

Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the computer device structure provided in this embodiment is not limited to this physical device, and may include more or fewer components, or may combine certain components, or may be arranged in different components.

The non-volatile readable storage medium may also include an operating system, a network communication module, etc. The operating system is a program that manages the physical device hardware and software resources of the marketing campaign information creation method, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the nonvolatile readable storage medium and communication with other hardware and software in the entity device.

From the above description of the embodiments, it will be apparent to those skilled in the art that the present application may be implemented by means of software plus necessary general hardware platforms, or may be implemented by hardware. Compared with the prior art, the method and the device have the advantages that the hand recognition model can be created based on the Faster R-CNN algorithm, in the process of model training by using sample images of marked coordinate positions, whether the recognition result of a single suggestion window can meet preset standards or not is judged by using a first preset threshold, whether the hand recognition model training can pass verification is further judged by using a first preset number of conditions, the model training process is more accurate by using a double threshold limiting method, when the model training can not reach the expected standards, the suggestion window with the evaluation score lower than the first preset threshold is corrected according to the coordinate position of an actually marked hand image, the user requirement can be met, in addition, in the process of using the hand recognition model, the mode of limiting the second preset threshold and the second preset number of conditions is still adopted for the detection result, whether the finally judged image to be recognized contains the hand image or not is more uniform and accurate, finally the hand image segmentation result is displayed, the whole technical scheme is that the image is detected and the image is accurately and accurately positioned under the condition that the image is not inclined and accurately detected, and the image is accurately and accurately positioned in the left hand region is not accurately detected.

Those skilled in the art will appreciate that the drawing is merely a schematic illustration of a preferred implementation scenario and that the modules or flows in the drawing are not necessarily required to practice the application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The above-mentioned inventive sequence numbers are merely for description and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely illustrative of some embodiments of the application, and the application is not limited thereto, as modifications may be made by those skilled in the art without departing from the scope of the application.

Claims

1. A method of segmenting a hand image, comprising:

collecting a sample image containing a complete hand image;

labeling the coordinate position of the hand region in the sample image;

outputting a hand image segmentation result of the image to be identified according to the detection result;

extracting the positions of fingertips, finger roots and palms of the hands in the optimal detection image by using a key point detection algorithm;

intercepting a rectangular palm area in the optimal detection image according to the finger root and the palm center position;

determining an elliptical to-be-cut area of the thumb part by utilizing the fingertip position;

cutting off the coincident part of the rectangular palm area and the elliptical to-be-cut area;

performing palm print similarity comparison on the rectangular palm area after cutting off and a pre-stored user palm print image;

judging the user identity corresponding to the image to be identified according to the user identity corresponding to the user palm print image with the palm print similarity being greater than or equal to a preset threshold value;

outputting a hand image segmentation result of the image to be identified according to the detection result, wherein the hand image segmentation result comprises: and if the image to be identified contains the hand image, selecting a suggestion window with highest detection probability as the optimal detection image, and outputting the optimal detection image as a hand image segmentation result.

2. The method according to claim 1, wherein the labeling of the coordinate position of the hand region in the sample image specifically comprises:

creating an image coordinate system with the upper left corner of the sample image as an origin;

in the image coordinate system, four-point coordinate positions of the thumb fingertip, the little finger fingertip, the middle finger fingertip and the palm root of the stretched hand are determined;

determining a rectangular frame where a hand is located according to the abscissa of the thumb tip and the little finger tip and the ordinate of the middle finger tip and the palm root;

and marking the coordinate positions of the upper left corner and the lower right corner of the rectangular frame.

3. The method according to claim 2, wherein the training the sample image marked with the coordinate position as a training set based on the fast R-CNN algorithm to obtain a hand recognition model with training results meeting a preset standard, specifically includes:

inputting the sample image marked with the coordinate position into an initial hand recognition model which is created based on a fast R-CNN algorithm in advance;

extracting hand image features of the sample image by using a deep convolutional neural network CNN of the initial hand recognition model;

Generating a preset number of advice windows proposals by the regional advice network RPN according to the hand image characteristics;

mapping each suggestion window to the last layer convolution feature map of CNN to generate fixed size;

and training and correcting the detection images in each suggestion window by using the detection classification probability Softmax Loss and the detection frame regression Smooth L1 Loss so as to enable the initial hand recognition model to meet the preset standard.

4. The method of claim 3, wherein the training and correcting the detected images in each of the suggestion windows by using a detection classification probability Softmax Loss and a detection frame regression smoth L1 Loss to make the initial hand recognition model meet the preset standard specifically includes:

matching the coincidence degree of the detection image and the hand image in the rectangular frame marked with the coordinate position;

determining the value of the coincidence degree as a corresponding evaluation score;

if the number of the evaluation scores greater than or equal to a first preset threshold in the suggestion window meets a first preset number condition, determining that the training result of the initial hand recognition model reaches a preset standard;

And if the number of the evaluation scores greater than or equal to the first preset threshold value in the suggestion window does not meet the first preset number condition, correcting the suggestion window with the evaluation score lower than the first preset threshold value according to the coordinate position of the hand image which is actually marked.

5. The method according to claim 4, wherein detecting whether the image to be recognized contains the hand image by using the hand recognition model, specifically comprises:

uploading the image to be identified into the hand identification model;

determining a preset number of suggestion windows corresponding to the image to be identified and detection probabilities corresponding to the suggestion windows by using a fast R-CNN algorithm, wherein the detection probabilities are judging data for judging whether the suggestion windows contain hand images or not;

if the number of the suggested windows with the detection probability being greater than or equal to a second preset threshold meets a second preset number condition, determining that the image to be identified contains a hand image;

and if the number of the recommended windows with the detection probability being greater than or equal to the second preset threshold value does not meet the second preset number condition, determining that the image to be identified does not contain a hand image.

6. The method according to claim 5, wherein the outputting the hand image segmentation result of the image to be identified according to the detection result specifically comprises:

and if the hand image is not contained in the image to be identified, outputting prompt information of not detecting the hand image on a display page.

7. A hand image segmentation apparatus, comprising:

the output module is used for outputting a hand image segmentation result of the image to be identified according to the detection result;

the extraction module is used for extracting the positions of the fingertips, the finger roots and the palm centers of the hands in the optimal detection image by utilizing a key point detection algorithm;

the intercepting module is used for intercepting a rectangular palm area in the optimal detection image according to the finger root and the palm center position;

The determining module is used for determining an elliptical to-be-cut area of the thumb part by utilizing the fingertip position;

the cutting module is used for cutting off the overlapping part of the rectangular palm area and the elliptical area to be cut;

the comparison module is used for comparing the palm print similarity of the rectangular palm area after the cutting is completed with a pre-stored user palm print image;

the judging module is used for judging the user identity corresponding to the image to be identified according to the user identity corresponding to the user palm print image with the palm print similarity being greater than or equal to a preset threshold value;

8. A non-transitory readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the hand image segmentation method of any one of claims 1 to 6.

9. A computer device comprising a non-volatile readable storage medium, a processor and a computer program stored on the non-volatile readable storage medium and executable on the processor, characterized in that the processor implements the method of segmentation of hand images according to any one of claims 1 to 6 when executing the program.