CN109583331B

CN109583331B - Deep learning-based accurate positioning method for positions of wrist vein and mouth of person

Info

Publication number: CN109583331B
Application number: CN201811356765.3A
Authority: CN
Inventors: 路红; 汪子健; 甘中学; 罗静静; 印文杰; 商慧亮; 张文强; 杨友钊
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2018-11-15
Filing date: 2018-11-15
Publication date: 2023-05-02
Anticipated expiration: 2038-11-15
Also published as: CN109583331A

Abstract

The invention belongs to the technical field of computer image processing, and particularly relates to a method for accurately positioning the position of a wrist vein opening of a person based on deep learning. The basic steps of the invention are as follows: the method comprises the steps of sampling hand pictures of multiple persons, and processing sample pictures to obtain a hand binary image and pulse coordinates, wherein the hand binary image and pulse coordinates are used as training data of a deep learning model; then, performing supervised learning on the training set to obtain a generalized deep learning model; then, respectively performing white balance, converting to HSV color space, mean Shift color clustering, extracting a binary image according to skin color, and further processing the binary image to obtain a hand outline; and (3) taking the processed binary image as input, and putting the binary image into a pre-trained deep learning model to obtain the coordinates of wrist vein points on the image. The method can find the pulse position of the wrist of the person with higher precision, and provides real-time visual positioning for the robot to perform traditional Chinese medicine pulse diagnosis.

Description

Deep learning-based accurate positioning method for positions of wrist vein and mouth of person

Technical Field

The invention belongs to the technical field of computer image processing, and particularly relates to a method for accurately positioning pulse points by deep learning.

Background

Pulse diagnosis refers to the diagnosis of a disease by a doctor's finger cutting the pulse of a patient and perceiving the image of the pulse to be indicated. The theory of pulse theory of traditional Chinese medicine is profound, the pulse diagnosis operation of traditional Chinese medicine is simple and easy, and the method is a diagnosis method which is unique in the diagnosis of traditional Chinese medicine. Traditional pulse-taking is based on the sense of touch of the physician's finger. With the rapid development of science and technology, the development of interdisciplinary is more and more important, and the combination of traditional Chinese medicine and artificial intelligence is also more and more concerned.

The robot with artificial intelligence can also perform four diagnosis of 'looking and asking about' in traditional Chinese medicine type on a patient, and in the process of performing pulse diagnosis, the robot generally realizes the pulse diagnosis link through a 'hand-eye' system, namely a mechanical arm, a camera and a pressure sensor, so that the accurate positioning of pulse mouth points is very important.

With the development of deep learning becoming more mature, the precision of object detection of deep learning has far exceeded traditional methods, and with the updating of hardware and optimization of algorithms, the detection speed is also faster and faster, and the realization of many algorithms reaches real-time level. Thus, accurate positioning of the pulse point can be accomplished entirely with deep learning.

Disclosure of Invention

The invention aims to provide a method capable of accurately positioning the positions of the wrist vein openings of a human hand.

The accurate positioning method for the wrist vein opening position of the human hand is based on deep learning; the basic steps are as follows: the method comprises the steps of sampling hand pictures of multiple persons, and processing sample pictures to obtain a hand binary image and pulse coordinates, wherein the hand binary image and pulse coordinates are used as training data of a deep learning model; then, performing supervised learning on the training set to obtain a generalized deep learning model; then, respectively performing white balance, converting to HSV color space, mean Shift color clustering, extracting a binary image according to skin color, and further processing the binary image to obtain a hand outline; and (3) taking the processed binary image as input, and putting the binary image into a pre-trained deep learning model to obtain the coordinates of wrist vein points on the image.

The invention provides a deep learning-based accurate positioning method for the positions of wrist vein openings of people, which comprises the following specific steps:

(1) Acquiring training data of deep learning;

(2) Training a deep learning model;

(3) And predicting the pulse position of the acquired hand picture by using the trained model.

The specific training data acquisition process in the step (1) is as follows:

(11) The method comprises the steps of sampling hand pictures of a plurality of people, and cutting and scaling the pictures according to the shape requirement of a deep learning model on input pictures; the sampling quantity of each person is not less than 20, the hand gestures during sampling have various states and different rotation degrees and heights, and the whole hand and wrist during sampling are ensured to be contained in the picture; attaching a mark which is different from skin color at the wrist vein; the background color is further to be distinguished from the skin color and the card color;

(12) Extracting a binary image of a human hand; firstly, performing white balance on a picture, and restoring the picture to be a picture which is close to the true color temperature of an original picture; then converting the picture into an HSV color space, wherein H represents color, S represents saturation, and V represents brightness value;

(13) After converting to HSV color space, firstly carrying out fuzzy denoising treatment on the picture by using a proper Gaussian kernel, then carrying out color clustering on the picture by using a Mean Shift algorithm, obtaining a binary image of skin color by limiting the range of the HSV color space, carrying out contour detection on the binary image to eliminate possible interference blocks, finding out a contour with the largest area, and storing the contour to obtain a final hand contour;

(14) Then extracting the marked outline, namely the position of the pulse, wherein the extraction processing mode is similar to the mode of extracting the outline of the hand, and only H is needed to be changed into the color of the small wafer; after the outline of the mark is extracted, the mark is wrapped by a rectangular frame, and the diagonal coordinates of the rectangular frame are returned to be used as the label during training.

In the training of the deep learning model in step (2), data are trained using a YOLOv3 model (redson J, faradai a. YOLOv3: an Incremental Improvement [ J ]. 2018.):

(21) YOLOv3 was dark-53;

(22) And (3) carrying out data enhancement processing (including increasing data amount through rotation, cutting, scaling and color floating) on the data under the condition of smaller data set, then dividing the acquired data into a training set, a testing set and a verification set in a ratio of 7:2:1, and adjusting the super parameters preset by the model to train.

The main super parameters in the model are:

(221) Batch size (batch size): the larger batch is easy to cause the network to be easy to converge, but the too large batch is easy to cause the memory shortage, and the network can be set to be 4 or 8 under the condition of the memory shortage of the display card;

(222) Momentum parameter (momentum): this value affects the rate at which the gradient drops to the optimal value;

(223) Weight decay term (decay): the larger the value of the weight attenuation regular term is, the larger the suppression capability of the overfitting is;

(224) Learning rate (learning rate): the learning rate determines the updating speed of the weight, and the result exceeds the optimal value when the learning rate is set too large, and the descending speed is too slow when the learning rate is too small;

(225) Iteration number (steps): the total training times are generally set to be more than 5 ten thousand;

(23) After the super parameters are set, training can be started, the training loss and the verification loss value during training are noticed, the prediction capacity of the network is gradually increased by reducing the training loss and the verification loss value, and when the prediction capacity of the network is not reduced, the network is converged, and the training can be ended;

(24) Several super-parameters of multiple adjustments (221) - (225) (e.g., momentum parameters adjusted within 0.9 to 0.99, weight decay term at 10) ^-5 To 10 ^-3 Internal adjustment, the learning rate is initialized to 10 ^-3 Then gradually decreasing with the increase of the iteration times) to obtain the model with the best generalization performance.

And (3) testing the data by using the trained model:

(31) Firstly, preprocessing a picture of a pulse position to be predicted; the specific processing mode is that white balance processing is firstly carried out, then the white balance processing is converted into HSV color space, image color is clustered by means of a mean shift algorithm, skin color is extracted to obtain a binary image, and then the contour with the largest area is extracted to be used as the contour of a hand;

(32) The binary image of the hand is used as input and put into a model, and the obtained prediction result is the position of the final pulse.

In the step (13), the color clustering is performed by using a mean shift algorithm, and the specific flow is as follows:

(131) Randomly selecting unlabeled pixels in a picture as centers;

(132) Finding out pixel points with a certain radius from the center, marking the pixel points as a set M, and considering the points as belonging to a cluster c; meanwhile, setting the probability that the inner points belong to the cluster as 1;

(133) The vectors from the center point to each element in the set M are calculated and added to obtain a drift vector, and the vector calculation formula is as follows:

(134) The center point c is set to be a value after the drift vector is added, namely, the center moves along the direction of the drift vector, and the moving distance is the absolute value of the drift vector;

(135) Repeating steps (132), (133) and (134) until the iteration converges, and recording the center at the moment; the points encountered in this iterative process are all categorized into cluster c;

(136) If the distance between the center of the current cluster c and the centers of other existing clusters is smaller than a threshold value during convergence, merging the cluster with c; otherwise, c is taken as a new cluster;

(137) Repeating (131) - (136) until all points are accessed by the tag.

Compared with the prior art, the invention has the beneficial effects that:

1. the position of the pulse point can be accurately found only by providing a hand picture for a user;

2. a visual positioning method for the robot to perform Chinese medicine pulse diagnosis is provided.

Drawings

Fig. 1 is a general flow diagram of the present invention.

Fig. 2 is a diagram of the white balance and transition to HSV color space of step (12) of fig. 1. Wherein, (a) is a white balanced picture, and (b) is a picture converted into HSV color space.

Fig. 3 is a color clustering method for HSV by mean shift algorithm in step (13) of fig. 1, and then extracting a hand binary image according to skin color. And (a) performing fuzzy noise reduction treatment on the HSV and performing color clustering on the HSV by using a mean shift algorithm. (b) And extracting a binary image of the skin color according to the skin color. (c) And searching the outline with the largest area to obtain a binary image of the hand.

Fig. 4 is a diagram of step (14) of fig. 1, wherein the binary image is extracted according to the color of the card, and the minimum positive rectangular frame of the card outline, that is, the position of the pulse opening is obtained. And (a) extracting a binary image according to the color of the card on the basis of the HSV image after the previous color clustering. (b) screening the profile to obtain the profile of the card. (c) obtaining a minimum positive rectangular box of the card outline.

FIG. 5 illustrates the preprocessing of the picture to be predicted, as described in step (31) of FIG. 1, and then putting the preprocessed picture into a pre-trained model to obtain the predicted pulse coordinates. Wherein, (a) white-balanced pictures are obtained. (b) converting to a post-HSV color space picture. (c) And carrying out fuzzy noise reduction on the HSV picture, and processing the HSV picture by using a mean shift color clustering algorithm. (d) obtaining a binary image based on skin color. (e) And searching the outline with the largest area to obtain a binary image of the hand.

FIG. 6 shows the predicted pulse coordinates of the processed hand binary image in a pre-trained model for regression, as described in step (31) of FIG. 1. The predicted pulse position is shown.

Fig. 7 is a diagram of various gestures of a hand at the time of data sampling.

FIG. 8 shows the verification loss value and the training loss value variation in the training process.

Fig. 9 is a graphical representation of a result test.

Detailed Description

1. Data acquisition and data enhancement

The hand picture sampling is carried out on a plurality of people, the sampling quantity of each person is not less than 20, the hand gestures during sampling are in various states and have different rotation degrees and heights, and the whole hand and wrist are ensured to be contained in the picture during sampling. See fig. 7. Data enhancement can be performed under the condition of insufficient sample number, and operations such as turning, rotation, color floating, scaling and the like are performed on the samples so as to enhance the generalization capability of the network.

2. Parameter setting

And adjusting parameters to perform different training processes so as to obtain a model with the best prediction effect. Wherein the batch size is set to 4, the momentum parameter is set to 0.95, the iteration number is set to 10 ten thousand times, and the weight attenuation term is set to 2×10 ^-4 The learning rate is initialized to 10 ^-3 。

3. Training process

Referring to fig. 8, the upper part (blue) of the graph is the validation loss value, the lower part (orange) is the training loss value, and both decrease at the same time indicating that the network predictive power is increasing, and when both do not decrease any more, indicating that the network has converged, the training can be ended, the ordinate is the loss value, and the abscissa is the number of iterations, and it can be seen that the loss value hardly decreases any more when the number of iterations exceeds 5 ten thousand times.

4. Results testing

Referring to fig. 9, it can be seen from the graph that after 5 ten thousand training iterations, the predicted result has approached the true value.

Claims

1. A method for accurately positioning the positions of wrist vein and mouth of a person based on deep learning is characterized by comprising the following specific steps:

(1) Acquiring training data of deep learning;

(2) Training a deep learning model;

(3) Predicting the pulse position of the acquired hand picture by using the trained model;

the specific training data acquisition process in the step (1) is as follows:

(13) After converting to HSV color space, firstly carrying out fuzzy denoising treatment on the picture by using a proper Gaussian kernel, then carrying out color clustering on the picture by using a Mean Shift algorithm, and then obtaining a binary image of skin color by limiting the range of the HSV color space; performing contour detection on the binary image, finding out the contour with the largest area, and storing the contour to obtain the final hand contour;

(14) Then extracting the marked outline, namely the position of the pulse mouth, wherein the extraction processing mode is the same as that of the hand outline, and only the H color is changed into the marked color; after the marked outline is extracted, wrapping the marked outline by using a rectangular frame, and returning to the diagonal coordinates of the rectangular frame to be used as a label during training;

in the training deep learning model in the step (2), a YOLOv3 model is adopted to train the data, and the process is as follows:

(21) YOLOv3 employs dark-53;

(22) For the case where the data amount is small, data enhancement processing is performed on the data, the enhancement processing including increasing the data amount by rotation, clipping, scaling, color floating; dividing the acquired data into a training set, a testing set and a verification set according to the ratio of 7:2:1, and adjusting the super parameters preset by the model for training;

(23) After the super parameters are set, training is started, the training loss and the verification loss value during training are noticed, the two values are reduced at the same time, the prediction capability of the network is gradually increased, and when the two values are not reduced any more, the network is converged, and the training is ended;

(24) The super parameters are adjusted for multiple times, and a model with the best generalization performance is obtained;

the step (3) predicts the pulse position of the collected hand picture by using a trained model, and the specific flow is as follows:

2. The accurate positioning method for the wrist vein opening position of the person based on deep learning according to claim 1, wherein in the step (31), the color clustering is performed by using a mean shift algorithm, and the specific flow is as follows:

(131) Randomly selecting unlabeled pixels in a picture as centers;

(137) Repeating (131) - (136) until all points are accessed by the tag.

3. The accurate positioning method for the wrist vein ostium position of the person based on deep learning according to claim 2, wherein the model in the step (2) presets super parameters:

(221) Batch size: set to 4 or 8;

(222) Momentum parameter: this value affects the rate at which the gradient drops to an optimal value;

(223) Weight decay term: the larger the weight decay regular term value is, the larger the inhibition capability to the overfitting is;

(224) Learning rate: the learning rate determines the updating speed of the weight;

(225) Iteration number: the total training times is set to be more than 5 ten thousand;

the adjusting the super parameter includes: adjusting the momentum parameter to be in the range of 0.9 to 0.99; adjusting weight attenuation term to 10 ^-5 To 10 ^-3 The method comprises the steps of carrying out a first treatment on the surface of the The learning rate is adjusted and initialized to 10 ^-3 And then gradually decreases as the number of iterations increases.