CN116844189A - Detection method and application of anchor frame and acupoint site of human body part - Google Patents

Detection method and application of anchor frame and acupoint site of human body part Download PDF

Info

Publication number
CN116844189A
CN116844189A CN202310937306.9A CN202310937306A CN116844189A CN 116844189 A CN116844189 A CN 116844189A CN 202310937306 A CN202310937306 A CN 202310937306A CN 116844189 A CN116844189 A CN 116844189A
Authority
CN
China
Prior art keywords
human body
point
body part
robot
anchor frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310937306.9A
Other languages
Chinese (zh)
Inventor
张华�
尹鹏
熊根良
张晓庆
浦泽洋
李泽光
吕咸吉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai University of Engineering Science
Original Assignee
Shanghai University of Engineering Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai University of Engineering Science filed Critical Shanghai University of Engineering Science
Priority to CN202310937306.9A priority Critical patent/CN116844189A/en
Publication of CN116844189A publication Critical patent/CN116844189A/en
Pending legal-status Critical Current

Links

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H39/00Devices for locating or stimulating specific reflex points of the body for physical therapy, e.g. acupuncture
    • A61H39/02Devices for locating such points
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H23/00Percussion or vibration massage, e.g. using supersonic vibration; Suction-vibration massage; Massage with moving diaphragms
    • A61H23/008Percussion or vibration massage, e.g. using supersonic vibration; Suction-vibration massage; Massage with moving diaphragms using shock waves
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H39/00Devices for locating or stimulating specific reflex points of the body for physical therapy, e.g. acupuncture
    • A61H39/04Devices for pressing such points, e.g. Shiatsu or Acupressure
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H39/00Devices for locating or stimulating specific reflex points of the body for physical therapy, e.g. acupuncture
    • A61H39/06Devices for heating or cooling such points within cell-life limits
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61HPHYSICAL THERAPY APPARATUS, e.g. DEVICES FOR LOCATING OR STIMULATING REFLEX POINTS IN THE BODY; ARTIFICIAL RESPIRATION; MASSAGE; BATHING DEVICES FOR SPECIAL THERAPEUTIC OR HYGIENIC PURPOSES OR SPECIFIC PARTS OF THE BODY
    • A61H2201/00Characteristics of apparatus not provided for in the preceding codes
    • A61H2201/50Control means thereof
    • A61H2201/5007Control means thereof computer controlled
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Rehabilitation Therapy (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Animal Behavior & Ethology (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Software Systems (AREA)
  • Pain & Pain Management (AREA)
  • Epidemiology (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a robot path planning method combining detection of anchor frames and acupoint points of human body parts, which is characterized in that human body images and depth point cloud information are collected, the images are preprocessed and then transmitted into a detection network, the anchor frames and the acupoint point image coordinates are output, the point cloud information is combined with external environment information and the anchor frame coordinates, after filtering, downsampling and segmentation, only point cloud data of an interested region are reserved, the acupoint point coordinates of the robot needing to be operated are combined subsequently, intermediate path points are searched equidistantly, and calculation vectors are used for generating robot motion path parameters, wherein the detection network uses mobiletv 2 to combine a feature pyramid to extract image features, finally a feature map for prediction is obtained through region candidate and interested region extraction, the position anchor frames are predicted in a full-connection mode, and the acupoint point coordinates are regressed in a full convolution mode. Compared with the traditional method, the method has the advantages that the precision and the speed are obviously improved, and the path planning process of the robot can be completed fully independently.

Description

Detection method and application of anchor frame and acupoint site of human body part
Technical Field
The invention belongs to the field of medical and computer, and relates to a detection method and application of anchor frames and acupoint sites of human body parts.
Background
With the progress and development of the economic society, the life rhythm of the modern society becomes faster and faster, people face more life and working pressure, and meanwhile, people pay more attention to physical state. On one hand, heavy work and life pressure bring more health problems to people, such as cervical spondylosis, scapulohumeral periarthritis and the like. On the other hand, with the improvement of the living standard of people, people pay more attention to the health state of the people, and more demands are made on medical treatment and rehabilitation physiotherapy.
Meanwhile, with the rapid development of robotics, applications thereof have been shifted from the conventional industrial fields to service robots centering on people. Therefore, shock wave physiotherapy robots, massage robots, moxibustion robots and the like in the field of health care are rapidly developed, and the robots mainly perform physiotherapy, massage, moxibustion and the like on acupuncture points or channels of a human body when working.
In order to realize the fully autonomous intelligent operation of the robots, the human resource investment is further reduced, and the intelligent identification of the anchor frame and the acupuncture points of the human body parts is particularly important. Patent CN115634147a discloses a hand acupoint recognition method based on a lightweight deep learning network, which divides a palm in advance through an image processing method, uses mobiletv 3 in combination with resnet50 to recognize 16 acupoint positions on the palm, needs to acquire a human body part anchor frame first, acquires acupoint positions on the basis, and on one hand, the detection of the human body part anchor frame and the acupoint positions is performed step by step, and the flow is complex; on the other hand, the anchor frame of the human body part is detected based on a skin color segmentation method, a skin color segmentation threshold value is required to be predetermined, and the anti-interference capability is poor.
After the anchor frame and the acupoint points of the human body part are intelligently identified, the movement path of the robot needs to be automatically planned by combining the identification result. Patent CN115741732a discloses an interactive path planning method, which uses a B-spline curve to fit a track in point cloud data based on a machine vision system according to a 2D massage track planned by a doctor; the direct fitting mode is large in calculation amount, the tail end gesture of the robot is not determined, and an additional sensor is needed to determine the gesture.
Disclosure of Invention
The invention aims to solve the problems in the prior art and provides a detection method and application of anchor frames and hole sites of human body parts.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the invention provides a method for detecting anchor frame and acupoint position of human body part, after collecting human body image, inputting the human body image into a trained deep learning key point detection model, outputting the upper left vertex coordinate (x) lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ) Position coordinates of the hole site;
the workflow of the deep learning key point detection model is as follows:
(1) Data cutting and normalization pretreatment;
processing the human body image f to obtain f with the size of Nx3×224×224 1 Wherein, N is the number of images input by one training;
(2) Extracting features;
will f 1 Input into mobiletv 2, output s of downsampled 4 times and size 64×56×56 in mobiletv 2 is extracted 1 Downsampling an output s of 8 times and of size 160×28×28 2 Downsampling an output s of 32 times and of size 1280×7×7 3
For s 1 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature mapFor s 2 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature map +.>For s 3 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature map +.>
For a pair ofPerforming convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
For a pair ofPerforming maximum pooling with step length of 2 to obtain characteristic diagram +.>
For a pair ofAfter upsampling, and->Fusion to obtain intermediate feature map->For->Performing convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
For a pair ofAfter upsampling, and->Fusion to obtain intermediate feature map->For->Performing convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
The number of the channels is 256, the sizes are in a multiple relation, and f is obtained after combining the four 2
The invention adopts an improved KeypointRCNN during feature extraction, and the improvement of the improved KeypointRCNN relative to the KeypointRCNN (the framework is shown in figure 8) is that: the MobilenetV2 is combined with the FPN structure to serve as a network feature extraction network, so that the parameter quantity of the network is reduced, the feature extraction performance of the network is guaranteed, and the deployment of an integrated algorithm to an embedded platform is facilitated;
mobiletv 2 is a lightweight feature extraction network consisting of a series of convolutions plus ReLu6 activation functions; the human body image can be output as a feature map through mobiletv 2, and the feature map is a high-dimensional array; the feature pyramid structure is added to increase the feature extraction capability of mobiletv 2;
(3) Region candidates;
respectively to f 2 In (a) and (b)Generating a series of region of interest anchor frames by using a sliding window mode, and aiming at f on the basis 2 Performing a series of convolution and activation operations, performing coarse classification using a Softmax function, retaining k anchor frames and their parameters using non-maximum suppression to obtain { x } center ,y center ,w,h} k Sum { Δx } center ,Δy center ,s w ,s h } k Wherein x is center Represents the central abscissa, y of the anchor frame of the human body part center Represents the central ordinate of the human body part anchor frame, w represents the width of the human body part anchor frame, h represents the height of the human body part anchor frame, and Deltax center Represents x center Offset of Deltay center Represents y center Offset of S w Representing the correction parameter of w, S h Representing the correction parameter of h, and calculating the loss;
(4) Region of interest extraction (ROI alignment);
in the way of bilinear interpolation, at f 2 In combination with { x ] center ,y center ,w,h} k 、{Δx center ,Δy center ,s w ,s h } k Sample f 2 A feature map f of 14×14 is obtained 14×14 Feature map f of size 7×7 7×7
(5) Detecting acupuncture points and detecting anchor frames of human body parts;
will f 14×14 Outputting an acupoint predicted thermodynamic diagram through full convolution FCN and transpose convolution, acquiring position coordinates of an acupoint by using argmax, and calculating loss; will f 7×7 Outputting image coordinates of the anchor frame of the human body part through two full connection layers, obtaining the upper left vertex coordinates and the lower right vertex coordinates of the anchor frame of the human body part through conversion, and calculating loss;
(6) And calculating the Loss of the deep learning key point detection model.
As a preferable technical scheme:
the specific process of the step (1) is as follows: firstly, the human body image f is cut into 224 multiplied by 224 with uniform size, then is converted into tensor data, and finally the three channels of the normalized image RGB [0,255 ]]Pixel value to [0,1 ]]To obtain f with the size of Nx3×224×224 1
The specific process of the step (3) is as follows:
For the followingGenerating a human body part anchor frame controlled by the aspect ratio and the size by taking each pixel point (a characteristic diagram with 256 channels as one image) as a center in a sliding window mode;
for f 2 Performing convolution-connected ReLU activation function operation with convolution kernel size of 3×3 and output channel number of 256 to obtain intermediate feature diagram f rpn
For f rpn After convolution operation with the convolution kernel size of 1 multiplied by 1 and the output channel number of human body part anchor frames, classifying the human body part anchor frame types by using a Softmax function, outputting confidence score of whether the human body part anchor frame types are background or not, and calculating binary cross entropy Loss by combining labeling obj
For f rpn After convolution operation with convolution kernel size of 1×1 and output channels four times as many as human body part anchor frames is performed, translational shrinkage of human body part anchor frames is obtainedPut factor Deltax center ,Δy center ,s w ,s h And Loss of Loss in binding to Smooth L1 rpn_reg Removing human body part anchor frames with overlap ratio less than 0.7 by using non-maximum suppression (NMS), simplifying the number of the human body part anchor frames to k, and obtaining k human body part anchor frame control parameters { x } center ,y center ,w,h} k And k human body part anchor frame correction parameters { Δx center ,Δy center ,s w ,s h } k
The specific process of the step (5) is as follows:
will f 14×14 Inputting the obtained product into a full convolution network formed by eight convolution kernel sizes of 3 multiplied by 3 and ReLU activation functions, outputting a feature map with the size of 28 multiplied by 28 and the number of output channels of 256, performing transposed convolution up-sampling on the feature map to obtain an acupoint prediction thermodynamic diagram with the size of 56 multiplied by 56 and the number of output channels of acupoint, performing argmax operation regression on the acupoint prediction thermodynamic diagram to obtain position coordinates of acupoint, and calculating cross entropy Loss by combining acupoint labeling kp
Will f 7×7 Obtaining a one-dimensional vector with the length of 1024 through a full connection layer; one-dimensional vectors with length 1024 are passed through a full connection layer to obtain one-dimensional vectors with size 2 (representing confidence score of anchor frame of human body part), and cross entropy classification Loss is calculated cls The method comprises the steps of carrying out a first treatment on the surface of the Simultaneously, one-dimensional vectors with the length of 1024 are subjected to a full connection layer to obtain one-dimensional vectors with the size of 8, and each four one-dimensional vectors are respectively x center 、y center H, w; calculate the upper left vertex coordinates (x) lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ),Simultaneously calculate the regression Loss of the anchor frame of the human body part box_reg
In the above detection method for the anchor frame and the acupoint of the human body part, in the step (6), the calculation formula of Loss is as follows: loss=loss cls +Loss box_reg +Loss kpt +Loss obj +Loss rpn_box_reg
The method for detecting the anchor frame and the acupoint of the human body part comprises the following training steps of a deep learning key point detection model:
(1) Creating a data set;
after the human body image is acquired, firstly, a human body part anchor frame is marked in the human body part anchor frame (the human body is divided into four parts of chest and abdomen, upper limbs, lower limbs and back, the human body part anchor frame is marked by the human body part anchor frame), and then, the position coordinates of human body point positions are marked in each human body part anchor frame by combining the concept of the bone degree classification of the traditional Chinese medicine point theory, so that the data set is manufactured;
(2) Constructing a dynamic calculation map of the deep learning key point detection model based on the torch framework;
(3) Firstly, loading official pre-training weights of a deep learning key point detection model (the pre-training weights are the convolution kernel parameters of the trained deep learning key point detection model, and if the parameters are not loaded, the initial weights are all 0), and deploying the model to a CUDA platform to accelerate operation;
then, the data set is transmitted into the computer;
finally, the deep learning key point detection model calculates each loss through forward transmission, and the parameter weights are updated through backward propagation until the optimal weight of the deep learning key point detection model is obtained;
forward transmission calculation each loss, namely, each layer of convolution calculation of the input image through the deep learning key point detection model, outputting a result, and calculating the loss (error) of the label and the output result;
The parameter weights of the back propagation update, namely the gradient (derivative) of the combination loss are calculated in reverse order to obtain a parameter update gradient, and a specific update method is a random gradient descent method (SGD), and a learning rate update strategy is as follows: the initial learning rate is 0.001, the initial learning rate is 0.000006, the pre-heating deep learning key point detection model is linearly increased to 0.001, and the learning rate is multiplied by 0.3 every 5 epochs are trained;
(4) And saving the optimal weight of the deep learning key point detection model.
The invention also provides an automatic planning method for the visual robot path, which comprises the following steps:
(1) Image acquisition and processing;
starting a robot carrying a depth camera, moving to an image acquisition position, and starting the depth camera to acquire a color human body image and a depth human body image of the same frame;
registering the color human body image and the depth human body image of the same frame by using a hardware registration tool of the depth camera to obtain the depth value of each point in the color human body image;
the upper left vertex coordinates (x) of the human body part anchor frame are obtained from the color human body image by adopting the detection method of the human body part anchor frame and the acupoint position as set forth in any one of the above lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ) Position coordinates of the hole site;
Converting the depth human body image into an initial point cloud c;
(2) Filtering and downsampling by a point cloud;
because of the limitation of the depth camera and the limitation of the point cloud segmentation algorithm, sparse noise points exist in the point cloud, and in the invention, the initial point cloud c is based on a K-neighbor outlier filtering algorithm, and the KD tree index method is used for removing noise to obtain filtered point cloud c 1
Because the directly acquired point clouds are denser, the data volume is larger, the later point cloud data processing is not facilitated, and the downsampling is realized by adopting a voxel grid method, namely the number of points is reduced, so that the data volume of the point clouds is reduced, and the data volume of the point clouds is reduced by the point cloud c 1 A three-dimensional voxel grid is created, the gravity centers of all points in each voxel are calculated to approximately represent all points in the voxel, so that all points in the voxel are finally represented by a gravity center point, and a down-sampling point cloud c is obtained 2
(3) Dividing point cloud;
(3.1) Point cloud c 2 The invention provides a point cloud data based on color characteristics and human body parts for eliminating the influence of other factors and reducing the calculation difficulty, which is relatively simple and can show the corresponding target characteristics, but contains a plurality of characteristic areas which are not interested by peopleThe point cloud segmentation method of the bit anchor frame calculates the percentage of blue components in the point cloud and removes the background with the percentage larger than 0.4 because the area outside the edge of the human body is covered by medical sterile cloth (blue), and the method passes through the formula From point cloud c 2 Obtaining a new point cloud c 3 Wherein x is i ,y i ,z i Is point cloud c 2 The ith point of (1), R i ,G i ,B i Values (range 0-255) of three RGB channels for a point in a color human body image corresponding to the point, PB i The blue duty ratio in the point in the color human body image corresponding to the point is obtained;
(3.2) calculating the coordinates (x) of the upper left vertex of the anchor frame of the human body part under the camera coordinate system by adopting the following formula 1 ,y 1 ,z 1 ) And coordinates (x) of the lower right vertex in the camera coordinate system 2 ,y 2 ,z 2 );
Wherein d 1 ,d 2 The depth value of the point acquired for the depth camera c x 、c y 、f x 、f y Is an internal reference of the depth camera;
by the formulaFrom point cloud c 3 Obtaining a new point cloud c 4 Wherein x is i ,y i ,z i Is point cloud c 3 The i-th point in (a);
(4) Searching point cloud;
determining the point cloud c of the initial point in the color human body image 4 Corresponding start path point p 1 Determining point cloud c of termination point in color human body image 4 Corresponding termination path point p 2 At point cloud c 4 Middle search for a middle path point along p 1 Pointing to p 2 The distance between the direction and the starting point is r, r is 0.5-5 cm (which can be changed according to the requirement, the smaller the accuracy is, the higher the accuracy is), and p is at the beginning of searching 1 As a starting point, each time an intermediate path point is searched, the intermediate path point is used as a new starting point, and the search termination condition is as follows: along p 1 Pointing to p 2 Last intermediate path point in direction p 2 Is not more than r;
all the path points are along p 1 Pointing to p 2 The direction connection is carried out to form a complete robot working path;
(5) Calculating the gesture of the path point;
each path point is converted into a pose array one by one, and the pose array is sent to a robot controller by using a server mode, wherein the conversion process is as follows:
first, the coordinates (p x ,p y ,p z ) At point cloud c 4 Selecting the nearest m points around the path point, wherein m is 5-15 (which can be optionally adjusted), fitting the tangent plane by using a least square method, and calculating the normal vector of the tangent plane pointing to the human bodyDefine the direction vector +.>Calculate->And->Is>Calculate->And->Is>
Second, based onConstructing a rotation matrix T, wherein the expression of T is +.>Wherein n is 1 、n 2 、n 3 Is->Is calculated as +.>),k 1 、k 2 、k 3 Is->M 1 、m 2 、m 3 Is->Is a unitized result of (2);
then, constructing a path point pose homogeneous matrixThe expression of (2) is +.>Combining hand-eye matrix->And end effector to robot baseMark system pose matrix->Converting the pose of the path point under the camera coordinate system to the robot base coordinate system to obtain +.>The calculation process is->
Finally, the API function using the robot homogeneous matrix pivot angle will Position and posture array [ p 'converted into robot shaft angle representation' x ,p′ y ,p′ z123 ]Wherein p' x ,p′ y ,p′ z Alpha is the position of the path point under the robot base coordinates 1 、α 2 、α 3 Is an axial angle representation of the pose of the path point in the robot-based coordinate system.
As a preferable technical scheme:
according to the automatic planning method for the visual robot path, the robot has at least three-axis motion capability, millimeter-level position accuracy for accurately acquiring the pose of the end effector, and a perfect control module for accessing an external algorithm, such as a Siling seven-degree-of-freedom robot.
According to the visual robot path automatic planning method, the principle of the depth camera uses the structured light imaging mode to provide RGB image input of the detection model and obtain depth information of a detection target, and the binocular structured light camera Gemini2 of the light in the obbe is selected in consideration of the working mode of the robot and the cost of the RGBD camera.
According to the automatic planning method for the visual robot path, the installation mode of the depth camera and the robot is that the eyes are on the hands, and the calibration steps are as follows:
(i) The Aruco calibration plate is fixedly placed at a position which does not change relative to the robot, and the moving robot enables the depth camera to shoot the Aruco calibration plate at an inclination angle of 45 degrees in a circle right above the calibration plate, and the Aruco calibration plate shoots clearly and occupies at least one third of the visual field of the picture;
(ii) Recording the current tail end pose of the robot, and simultaneously solving the position of the Aruco calibration plate in a camera coordinate system; repeating the operation, so that the Aruco calibration plate can be shot at different positions of the circle each time, and the smaller the relative movement position of the tail end of the robot is, the better the relative movement position is, and ten times are taken;
(iii) Using the recorded tail end pose of the robot and the pose of the Aruco calibration plate to complete AX=XB hand-eye matrix solving by using a Tsai algorithm, and using a depth camera reference matrix K and a hand-eye matrixThe written configuration file is convenient for subsequent use, and the expression of K is as follows:
the principle of the invention is as follows:
in the acupoint recognition process, the two-stage multi-task processing deep learning network is used, the anchor frame of the human body part and the coordinates of the point image can be detected end to end, and compared with the traditional image processing method, the method has good robustness and anti-interference capability due to the strong characteristic expression capability of the deep learning network.
In the path planning process, when the point cloud data is processed, the point cloud data of the region of interest is extracted by fully utilizing the environmental priori knowledge and the human body part anchor frame, so that the calculated amount is greatly reduced; the robot motion path is generated by searching real point cloud data, so that the error problem caused by direct fitting in the traditional method is avoided, and the motion pose of the tail end of the robot is directly determined.
The beneficial effects are that:
(1) The KeypointRCNN is a two-stage detection algorithm and a multitask model, can realize end-to-end detection of the anchor frame and the acupoint points of the human body by the human body image, and can identify the acupoint on the basis of identifying the body part (the anchor frame of the human body part) of the acupoint, thereby reducing the difficulty of network training;
(2) The invention uses the MobilenetV2 and FPN structure as the characteristic extraction network of the network, which ensures the accuracy and simultaneously has shorter detection time and smaller parameter quantity, and can integrate the algorithm into an embedded computing platform (Jetson nano);
(3) The invention can identify the acupoints (such as back, abdomen, hand, etc.) of different parts of human body, and simultaneously obtain the positions of the parts, such as: when the back acupoints are identified, the positions of the acupoints in the back anchor frame and the back human body part anchor frame of the human body in the image can be identified simultaneously;
(4) According to the path planning method, the environment information and the detection network detection result are fully utilized, the point cloud data are simplified, the calculated amount is greatly reduced, the intermediate path points are searched at equal intervals, and the path point control gesture is calculated so as to define the movement gesture of the robot;
(5) According to the invention, by combining a deep learning algorithm, a machine vision technology and a point cloud processing algorithm, the robot can automatically complete the acupoint positioning by only inputting the acupoints to be acted, and search path points according to point cloud data, calculate a vector generation path, greatly improve the acupoint recognition efficiency, accuracy and related robot automation degree, and facilitate the automation integration and popularization of acupoint positioning in the robot health care fields such as a shock wave physiotherapy robot, a massage robot and a moxibustion robot.
Drawings
FIG. 1 is a basic framework diagram of a deep learning keypoint detection model of the present invention;
FIG. 2 is a schematic diagram of the basic structure of a feature extractor of the present invention;
FIG. 3 is a schematic diagram of a region candidate network structure according to the present invention;
FIG. 4 is a schematic view of a detecting head according to the present invention;
FIG. 5 is a visual effect diagram of the point marked on the back of the human body according to the present invention;
FIG. 6 is a flow chart of an automatic path planning method for a vision robot of the present invention;
FIG. 7 is a diagram of a vision robot hardware system in accordance with the present invention;
FIG. 8 is a basic frame diagram of a KeypointRCNN model;
FIG. 9 is a schematic diagram of an initial point cloud c of a depth human body image according to the present invention;
FIG. 10 is a point cloud c of the depth human body image after color threshold segmentation 3 Schematic of (2);
FIG. 11 is a depth human body image back point cloud c of the present application 4 Schematic of (2);
fig. 12 is a schematic diagram of a working path of the robot after the point cloud search according to the present application.
Detailed Description
The application is further described below in conjunction with the detailed description. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application. Furthermore, it should be understood that various changes and modifications can be made by one skilled in the art after reading the teachings of the present application, and such equivalents are intended to fall within the scope of the application as defined in the appended claims.
Example 1
The detection method of the anchor frame and the acupoint position of the human body comprises the following steps:
(1) Establishing a deep learning key point detection model;
as shown in fig. 1, the workflow of the deep learning keypoint detection model is as follows:
(1.1) data clipping and normalization pretreatment;
firstly, the human body image f is cut into 224 multiplied by 224 with uniform size, then is converted into tensor data, and finally the three channels of the normalized image RGB [0,255 ]]Pixel value to [0,1 ]]To obtain f with the size of Nx3×224×224 1 Wherein, N is the number of images input by one training;
(1.2) feature extraction;
As shown in FIG. 2, f is 1 Is input into mobiletv 2, downsampled by 4 times in the mobiletv 2 and has the size of 64×5Output s of 6×56 1 Downsampling an output s of 8 times and of size 160×28×28 2 Downsampling an output s of 32 times and of size 1280×7×7 3
For s 1 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature mapFor s 2 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature map +.>For s 3 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature map +.>
For a pair ofPerforming convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
For a pair ofPerforming maximum pooling with step length of 2 to obtain characteristic diagram +.>
For a pair ofAfter upsampling, and->Fusion to obtain intermediate feature map->For->Performing convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
For a pair ofAfter upsampling, and->Fusion to obtain intermediate feature map->For->Performing convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
The number of the channels is 256, the sizes are in a multiple relation, and f is obtained after combining the four 2
(1.3) region candidates;
as shown in fig. 3, forGenerating a human body part anchor frame controlled by the height-width ratio and the size by taking each pixel point as a center in a sliding window mode;
for f 2 Performing convolution kernel with 3×3 and outputting channel numberPerforming convolution of 256 and ReLU activation function operation to obtain an intermediate feature map f rpn
For f rpn After convolution operation with the convolution kernel size of 1 multiplied by 1 and the output channel number of human body part anchor frames, classifying the human body part anchor frame types by using a Softmax function, outputting confidence score of whether the human body part anchor frame types are background or not, and calculating binary cross entropy Loss by combining labeling obj
For f rpn After convolution operation with convolution kernel size of 1×1 and output channel number four times of human body part anchor frame number is performed, the translation scaling factor Deltax of human body part anchor frame is obtained center ,Δy center ,s w ,s h And Loss of Loss in binding to Smooth L1 rpn_reg Removing human body part anchor frames with overlap ratio less than 0.7 by using non-maximum suppression (NMS), simplifying the number of the human body part anchor frames to k, and obtaining k human body part anchor frame control parameters { x } center ,y center ,w,h} k And k human body part anchor frame correction parameters { Δx center ,Δy center ,s w ,s h } k Wherein x is center Represents the central abscissa, y of the anchor frame of the human body part center Represents the central ordinate of the human body part anchor frame, w represents the width of the human body part anchor frame, h represents the height of the human body part anchor frame, and Deltax center Represents x center Offset of Deltay center Represents y center Offset of S w Representing the correction parameter of w, S h A correction parameter representing h;
(1.4) region of interest extraction (ROI alignment);
as shown in fig. 4, a bilinear interpolation method is adopted, at f 2 In combination with { x ] center ,y center ,w,h} k 、{Δx center ,Δy center ,s w ,s h } k Sample f 2 A feature map f of 14×14 is obtained 14×14 Feature map f of size 7×7 7×7
(1.5) detecting acupuncture points and detecting anchor frames of human body parts;
as shown in FIG. 4, f is 14×14 Inputting the obtained product into a full convolution network formed by eight convolution kernel sizes of 3 multiplied by 3 and ReLU activation functions, outputting a feature map with the size of 28 multiplied by 28 and the number of output channels of 256, performing transposed convolution up-sampling on the feature map to obtain an acupoint prediction thermodynamic diagram with the size of 56 multiplied by 56 and the number of output channels of acupoint, performing argmax operation regression on the acupoint prediction thermodynamic diagram to obtain position coordinates of acupoint, and calculating cross entropy Loss by combining acupoint labeling kp The method comprises the steps of carrying out a first treatment on the surface of the Will f 7×7 Obtaining a one-dimensional vector with the length of 1024 through a full connection layer; one-dimensional vectors with length 1024 are passed through a full connection layer to obtain one-dimensional vectors with size 2, and cross entropy classification Loss is calculated cls The method comprises the steps of carrying out a first treatment on the surface of the Simultaneously, one-dimensional vectors with the length of 1024 are subjected to a full connection layer to obtain one-dimensional vectors with the size of 8, and each four one-dimensional vectors are respectively x center 、y center H, w; calculate the upper left vertex coordinates (x) lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ),Simultaneously calculate the regression Loss of the anchor frame of the human body part box_reg
(1.6) calculating a deep learning key point detection model Loss, wherein the formula is as follows:
Loss=Loss cls +Loss box_reg +Loss kpt +Loss obj +Loss rpn_box_reg
(2) Training a deep learning key point detection model;
(2.1) data set preparation;
after the human body image is acquired, the chest and abdomen, the upper limb, the lower limb and the back are marked as human body part anchor frames, and then the position coordinates of human body point positions are marked in each human body part anchor frame by combining the concept of the traditional Chinese medicine point theory bone size division, so that the data set is manufactured;
for example, firstly labeling the back anchor frame of the human body, and then labeling the position coordinates of each point, such as Dazhui point, jianliao point, jianzhen blood, da point, air door point and the like, wherein the specific labeling effect visualization result is shown in fig. 5, black is the selected area of the back anchor frame of the human body, and blue is the visualization result of the labeling position of the point;
all the labeling data are stored in a COCO data set labeling format;
(2.2) constructing a deep learning key point detection model dynamic calculation map based on the torch framework;
(2.3) loading official pre-training weights of the deep learning key point detection model, and deploying the model to a CUDA platform to accelerate operation; then, the data set is transmitted into the computer;
Finally, the deep learning key point detection model calculates each loss through forward transmission, and the parameter weights are updated through backward propagation until the optimal weight of the deep learning key point detection model is obtained;
each loss of forward transmission calculation is that the input image is subjected to convolution calculation of each layer of the deep learning key point detection model, a result is output, and the loss of the label and the output result is calculated;
the parameter updating gradient is calculated by back propagation updating each parameter weight, namely combining the gradient reverse direction of the loss, the specific updating method is a random gradient descent method (SGD), and the learning rate updating strategy is as follows: the initial learning rate is 0.001, the initial learning rate is 0.000006, the pre-heating deep learning key point detection model is linearly increased to 0.001, and the learning rate is multiplied by 0.3 every 5 epochs are trained;
(2.4) storing the optimal weight of the deep learning key point detection model;
when the method is used for training the deep learning key point detection model, a self-built data set is used as the data set; wherein, the data set comprises 1800 human back pictures, each picture is standard human back area and 43 hole sites on the back;
using the training method of the model described above, the model was deployed to a jetson nano 4G development board for testing, and the test results are shown in table 1:
TABLE 1
Data set Method Part anchor frame AP% Cave site AP% Fps frame/s
Self-built dataset Mobilenetv2_fpn 72.2 90.1 2.03
AP (average Precision) is a common target detection and key point detection evaluation index, and is calculated according to a Precision-Recall curve, wherein Precision is the prediction accuracy, recall is the accuracy, the area enclosed by the Precision-Recall curve under the Recall threshold of 0-1 on the coordinate axis is calculated and is marked as AP, fps is the processing speed, namely the number of pictures processed per second, and the input picture size is 1280×720; compared with the traditional algorithm, the deep learning network has better robustness;
(3) After the human body image is acquired, the human body image is input into a trained deep learning key point detection model, and the upper left vertex coordinates (x lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ) And the position coordinates of the hole sites.
Example 2
The automatic planning method for the visual robot path is as shown in fig. 6-7, and comprises the following steps:
(1) Selection of the device:
and (3) a robot: a Siling seven-degree-of-freedom robot;
depth camera: binocular structured light camera Gemini2, optical technologies group Co., ltd;
(2) Installing a depth camera and a robot;
the installation mode of the depth camera and the robot is that eyes are on hands, and the calibration steps are as follows:
(2.1) fixedly placing the Aruco calibration plate at a position which does not change relative to the robot, wherein the moving robot enables the depth camera to shoot the Aruco calibration plate at an inclination angle of 45 degrees in a circle right above the calibration plate, and the Aruco calibration plate shoots clearly and occupies at least one third of the visual field of the picture;
(2.2) recording the current tail end pose of the robot, and simultaneously solving the pose of the Aruco calibration plate in a camera coordinate system; repeating the operation, so that the Aruco calibration plate can be shot at different positions of the circle each time, and the smaller the relative movement position of the tail end of the robot is, the better the relative movement position is, and ten times are taken;
(2.3) using the recorded robot end pose and the Aruco calibration plate pose to complete AX=XB hand-eye matrix solution by using a Tsai algorithm, and combining the depth camera internal reference matrix K and the hand-eye matrixThe written configuration file is convenient for subsequent use, and the expression of K is as follows:
(3) Image acquisition and processing;
starting a robot carrying a depth camera, moving to an image acquisition position, and starting the depth camera to acquire a color human body image and a depth human body image of the same frame;
registering the color human body image and the depth human body image of the same frame by using a hardware registration tool of the depth camera to obtain the depth value of each point in the color human body image;
Detection of human body part Anchor frame and Point site Using example 1 aboveMethod for obtaining upper left vertex coordinates (x) of human body part anchor frame from color human body image lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ) Position coordinates of the hole site;
converting the depth human body image into an initial point cloud c as shown in fig. 9;
(4) Filtering and downsampling by a point cloud;
firstly, an initial point cloud c is based on a K-neighbor outlier filtering algorithm, and a KD tree index method is used for removing noise to obtain a filtered point cloud c 1
Then through the point cloud c 1 A three-dimensional voxel grid is created, the gravity centers of all points in each voxel are calculated to approximately represent all points in the voxel, so that all points in the voxel are finally represented by a gravity center point, and a down-sampling point cloud c is obtained 2
(5) Dividing point cloud;
(5.1) by the formulaFrom point cloud c 2 Obtaining a new point cloud c 3 In the formula, x is as shown in FIG. 10 i ,y i ,z i Is point cloud c 2 The ith point of (1), R i 、G i 、B i Values (range 0-255) of three RGB channels for a point in a color human body image corresponding to the point, PB i The blue duty ratio in the point in the color human body image corresponding to the point is obtained;
(5.2) calculating the coordinates (x) of the upper left vertex of the anchor frame of the human body part under the camera coordinate system by adopting the following formula 1 ,y 1 ,z 1 ) And coordinates (x) of the lower right vertex in the camera coordinate system 2 ,y 2 ,z 2 );
Wherein d 1 ,d 2 The depth value of the point acquired for the depth camera c x 、c y 、f x 、f y Is an internal reference of the depth camera;
by the formulaFrom point cloud c 3 Obtaining a new point cloud c 4 In the formula, x is as shown in FIG. 11 i ,y i ,z i Is point cloud c 3 The i-th point in (a);
as can be seen by comparing fig. 9 and fig. 11, when the point cloud is not reduced, the point cloud c has 34000 points, and the reduced point cloud c 4 The data volume of the visible point cloud is reduced by more than ten times for 2300 points, so that the calculation amount of the follow-up path planning and normal vector calculation is greatly reduced;
(6) Searching point cloud;
determining the point cloud c of the initial point in the color human body image 4 Corresponding start path point p 1 Determining point cloud c of termination point in color human body image 4 Corresponding termination path point p 2 At point cloud c 4 Middle search for a middle path point along p 1 Pointing to p 2 A point having a distance r between the direction and the starting point, r being 0.5-5 cm, and p being set at the beginning of the search 1 As a starting point, each time an intermediate path point is searched, the intermediate path point is used as a new starting point, and the search termination condition is as follows: along p 1 Pointing to p 2 Last intermediate path point in direction p 2 Is not more than r;
as shown in fig. 12, all the path points are along p 1 Pointing to p 2 The direction connection is carried out to form a complete robot working path;
(7) Calculating the gesture of the path point;
each path point is converted into a pose array one by one, and the pose array is sent to a robot controller by using a server mode, wherein the conversion process is as follows:
(7.1) obtaining the coordinates (p) of a certain Path Point in the camera coordinate System x ,p y ,p z ) At point cloud c 4 Selecting the nearest m points around the path point, wherein m is 5-15, fitting the tangent plane by using a least square method, and calculating the normal vector of the tangent plane pointing to the human bodyDefine the direction vector +.>Calculate->And->Is>Calculate->And->Is the cross product of (2)
(7.2) based onConstructing a rotation matrix T, wherein the expression of T is +.>Wherein n is 1 、n 2 、n 3 Is->Is calculated by +.>k 1 、k 2 、k 3 Is->M 1 、m 2 、m 3 Is->Is a unitized result of (2);
(7.3) constructing a Path Point pose homogeneous matrixThe expression of (2) is +.>Combining hand-eye matrix->And end effector to robot base coordinate system pose matrix +.>Converting the pose of the path point under the camera coordinate system to the robot base coordinate system to obtain +.>The calculation process is->
(7.4) API function Using robot homogeneous matrix pivot anglePosition and posture array [ p 'converted into robot shaft angle representation' x ,p′ y ,p′ z123 ]Wherein p' x ,p′ y ,p′ z Alpha is the position of the path point under the robot base coordinates 1 、α 2 、α 3 For the path point at the robot base coordinatesAn axis angle representation of the tethered pose. />

Claims (10)

1. The method for detecting the anchor frame and the acupoint of the human body part is characterized in that after the human body image is acquired, the human body image is input into a trained deep learning key point detection model, and the upper left vertex coordinate (x lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ) Position coordinates of the hole site; the workflow of the deep learning key point detection model is as follows:
(1) Data cutting and normalization pretreatment;
processing the human body image f to obtain f with the size of Nx3×224×224 1 Wherein, N is the number of images input by one training;
(2) Extracting features;
will f 1 Input into mobiletv 2, output s of downsampled 4 times and size 64×56×56 in mobiletv 2 is extracted 1 Downsampling an output s of 8 times and of size 160×28×28 2 Downsampling an output s of 32 times and of size 1280×7×7 3
For s 1 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature mapFor s 2 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature map +.>For s 3 Performing convolution operation with convolution kernel size of 1×1 and output channel number of 256 to obtain intermediate feature map +. >
For a pair ofPerforming convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
For a pair ofPerforming maximum pooling with step length of 2 to obtain characteristic diagram +.>
For a pair ofAfter upsampling, and->Fusion to obtain intermediate feature map->For->Performing convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
For a pair ofAfter upsampling, and->Fusion to obtain intermediate feature map->For->Performing convolution operation with convolution kernel size of 3×3 and output channel number of 256 to obtain feature map +.>
The number of the channels is 256, the sizes are in a multiple relation, and f is obtained after combining the four 2
(3) Region candidates;
respectively to f 2 In (a) and (b)Generating a series of region of interest anchor frames by using a sliding window mode, and aiming at f on the basis 2 Performing a series of convolution and activation operations, performing coarse classification using a Softmax function, retaining k anchor frames and their parameters using non-maximum suppression to obtain { x } center ,y center ,w,h} k Sum { Δx } center ,Δy center ,s w ,s h } k Wherein x is center Represents the central abscissa, y of the anchor frame of the human body part center Represents the central ordinate of the human body part anchor frame, w represents the width of the human body part anchor frame, h represents the height of the human body part anchor frame, and Deltax center Represents x center Offset of Deltay center Represents y center Offset of S w Representing the correction parameter of w, S h Representing the correction parameter of h, and calculating the loss;
(4) Extracting a region of interest;
in the way of bilinear interpolation, at f 2 In combination with { x ] center ,y center ,w,h} k 、{Δx center ,Δy center ,s w ,s h } k Sample f 2 A feature map f of 14×14 is obtained 14×14 Feature map f of size 7×7 7×7
(5) Detecting acupuncture points and detecting anchor frames of human body parts;
will f 14×14 Outputting an acupoint predicted thermodynamic diagram through full convolution FCN and transpose convolution, acquiring position coordinates of an acupoint by using argmax, and calculating loss; will f 7×7 Outputting image coordinates of the anchor frame of the human body part through two full connection layers, obtaining the upper left vertex coordinates and the lower right vertex coordinates of the anchor frame of the human body part through conversion, and calculating loss;
(6) And calculating the Loss of the deep learning key point detection model.
2. The method for detecting human body part anchor frame and hole site according to claim 1, wherein the specific process of step (1) is as follows: firstly, the human body image f is cut into 224 multiplied by 224 with uniform size, then is converted into tensor data, and finally the three channels of the normalized image RGB [0,255 ]]Pixel value to [0,1 ]]To obtain f with the size of Nx3×224×224 1
3. The method for detecting human body part anchor frame and hole site according to claim 2, wherein the specific process of step (3) is as follows:
For the followingGenerating a human body part anchor frame controlled by the height-width ratio and the size by taking each pixel point as a center in a sliding window mode;
for f 2 Performing convolution-connected ReLU activation function operation with convolution kernel size of 3×3 and output channel number of 256 to obtain intermediate feature diagram f rpn
For f rpn After convolution operation with the convolution kernel size of 1 multiplied by 1 and the output channel number of human body part anchor frames, classifying the human body part anchor frame types by using a Softmax function, outputting confidence score of whether the human body part anchor frame types are background or not, and calculating binary cross entropy Loss by combining labeling obj
For f rpn After convolution operation with convolution kernel size of 1×1 and output channel number four times of human body part anchor frame number is performed, the translation scaling factor Deltax of human body part anchor frame is obtained center ,Δy center ,s w ,s h And Loss of Loss in binding to Smooth L1 rpn_reg Using non-maximum value to inhibit and remove human body part anchor frames with the overlap ratio smaller than 0.7, simplifying the number of the human body part anchor frames to k, and obtaining k human body part anchor frame control parameters { x } center ,y center ,w,h} k And k human body part anchor frame correction parameters { Δx center ,Δy center ,s w ,s h } k
4. The method for detecting human body part anchor frame and hole site according to claim 3, wherein the specific process of step (5) is as follows:
will f 14×14 Inputting the obtained product into a full convolution network formed by eight convolution kernel sizes of 3 multiplied by 3 and ReLU activation functions, outputting a feature map with the size of 28 multiplied by 28 and the number of output channels of 256, performing transposed convolution up-sampling on the feature map to obtain an acupoint prediction thermodynamic diagram with the size of 56 multiplied by 56 and the number of output channels of acupoint, performing argmax operation regression on the acupoint prediction thermodynamic diagram to obtain position coordinates of acupoint, and calculating cross entropy Loss by combining acupoint labeling kp
Will f 7×7 Obtaining a one-dimensional vector with the length of 1024 through a full connection layer; one-dimensional vectors with length 1024 are passed through a full connection layer to obtain one-dimensional vectors with size 2, and cross entropy classification Loss is calculated cls The method comprises the steps of carrying out a first treatment on the surface of the Simultaneously, one-dimensional vectors with the length of 1024 are subjected to a full connection layer to obtain one-dimensional vectors with the size of 8, and each four one-dimensional vectors are respectively x center 、y center H, w; calculate the upper left vertex coordinates (x) lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ),
Simultaneously calculate the regression Loss of the anchor frame of the human body part box_reg
5. The method for detecting human body part anchor frame and hole site according to claim 4, wherein in the step (6), a calculation formula of Loss is: loss=loss cls +Loss box_reg +Loss kpt +Loss obj +Loss rpn_box_reg
6. The method for detecting human body part anchor frame and hole site according to claim 5, wherein the training step of the deep learning key point detection model is as follows:
(1) Creating a data set;
after the human body image is acquired, firstly marking human body part anchor frames in the human body image, and then marking position coordinates of human body acupoint positions in each human body part anchor frame, so that the data set is manufactured;
(2) Constructing a dynamic calculation map of the deep learning key point detection model based on the torch framework;
(3) Firstly, loading official pre-training weights of a deep learning key point detection model, and deploying the model to a CUDA platform to accelerate operation;
Then, the data set is transmitted into the computer;
finally, the deep learning key point detection model calculates each loss through forward transmission, and the parameter weights are updated through backward propagation until the optimal weight of the deep learning key point detection model is obtained;
each loss of forward transmission calculation is that the input image is subjected to convolution calculation of each layer of the deep learning key point detection model, a result is output, and the loss of the label and the output result is calculated;
the parameter updating gradient is calculated by reversely propagating and updating each parameter weight, namely combining the lost gradient reversely and sequentially, the specific updating method is a random gradient descent method, and the learning rate updating strategy is as follows: the initial learning rate is 0.001, the initial learning rate is 0.000006, the pre-heating deep learning key point detection model is linearly increased to 0.001, and the learning rate is multiplied by 0.3 every 5 epochs are trained;
(4) And saving the optimal weight of the deep learning key point detection model.
7. An automatic planning method for a visual robot path is characterized by comprising the following steps:
(1) Image acquisition and processing;
starting a robot carrying a depth camera, moving to an image acquisition position, and starting the depth camera to acquire a color human body image and a depth human body image of the same frame;
Registering the color human body image and the depth human body image of the same frame by using a hardware registration tool of the depth camera to obtain the depth value of each point in the color human body image;
using the method for detecting a human body part anchor frame and a point site according to any one of claims 1 to 6, the upper left vertex coordinates (x lu ,y lu ) And lower right vertex coordinates (x rd ,y rd ) Position coordinates of the hole site;
converting the depth human body image into an initial point cloud c;
(2) Filtering and downsampling by a point cloud;
the initial point cloud c is based on a K-neighbor outlier filtering algorithm, and a KD tree index method is used for removing noise to obtain a filtered point cloud c 1
At point cloud c 1 A three-dimensional voxel grid is created, the gravity centers of all points in each voxel are calculated to approximately represent all points in the voxel, so that all points in the voxel are finally represented by a gravity center point, and a down-sampling point cloud c is obtained 2
(3) Dividing point cloud;
(3.1) by the formulaFrom point cloud c 2 Obtaining a new point cloud c 3 Wherein x is i ,y i ,z i Is point cloud c 2 The ith point of (1), R i ,G i ,B i For the color human body image corresponding to the pointThe values of the RGB three channels, PB, of the points in (a) i The blue duty ratio in the point in the color human body image corresponding to the point is obtained;
(3.2) calculating the coordinates (x) of the upper left vertex of the anchor frame of the human body part under the camera coordinate system by adopting the following formula 1 ,y 1 ,z 1 ) And coordinates (x) of the lower right vertex in the camera coordinate system 2 ,y 2 ,z 2 );
Wherein d 1 ,d 2 The depth value of the point acquired for the depth camera c x 、c y 、f x 、f y Is an internal reference of the depth camera;
by the formulaFrom point cloud c 3 Obtaining a new point cloud c 4 Wherein x is i ,y i ,z i Is point cloud c 3 The i-th point in (a);
(4) Searching point cloud;
determining the point cloud c of the initial point in the color human body image 4 Corresponding start path point p 1 Determining point cloud c of termination point in color human body image 4 Corresponding termination path point p 2 At point cloud c 4 Middle search for a middle path point along p 1 Pointing to p 2 A point having a distance r between the direction and the starting point, r being 0.5-5 cm, and p being set at the beginning of the search 1 As a starting point, each time an intermediate path point is searched, the intermediate path point is used as a new starting point, and the search termination condition is as follows: along p 1 Pointing to p 2 Last intermediate path point in direction p 2 Is not more than r;
all the path points are along p 1 Pointing to p 2 The direction connection is carried out to form a complete robot working path;
(5) Calculating the gesture of the path point;
each path point is converted into a pose array one by one, and the pose array is sent to a robot controller by using a server mode, wherein the conversion process is as follows:
First, the coordinates (p x ,p y ,p z ) At point cloud c 4 Selecting the nearest m points around the path point, wherein m is 5-15, fitting the tangent plane by using a least square method, and calculating the normal vector of the tangent plane pointing to the human bodyDefine the direction vector +.>Calculate->And->Is>Calculate->And->Is>
Second, based onConstructing a rotation matrix T, wherein the expression of T is +.>Wherein n is 1 、n 2 、n 3 Is->And (k) is a unitized result of 1 、k 2 、k 3 Is->M 1 、m 2 、m 3 Is->Is a unitized result of (2);
then, constructing a path point pose homogeneous matrix The expression of (2) is +.>Combining hand-eye matrix->And end effector to robot base coordinate system pose matrix +.>Converting the pose of the path point under the camera coordinate system to the robot base coordinate system to obtain +.>The calculation process is->
Finally, the API function using the robot homogeneous matrix pivot angle willPosition and posture array [ p 'converted into robot shaft angle representation' x ,p′ y ,p z ′,α 123 ]Wherein p is x ′,p′ y ,p z ' is the position of the waypoint in the robot base coordinates, α 1 、α 2 、α 3 Is an axial angle representation of the pose of the path point in the robot-based coordinate system.
8. The method of claim 7, wherein the robot has at least three-axis motion capability, millimeter-scale positional accuracy, and a sophisticated control module.
9. The automatic path planning method of a vision robot of claim 8, wherein the depth camera is a binocular structured light camera Gemini2 of the obbe mid light.
10. The automatic planning method of vision robot path according to claim 7, wherein the depth camera and the robot are installed in such a way that the eyes are on the hands, and the calibration steps are as follows:
(i) Fixedly placing the Aruco calibration plate at a position which does not change relative to the robot, wherein the moving robot enables the depth camera to shoot the Aruco calibration plate at an inclination angle of 45 degrees in a circle right above the calibration plate, and the Aruco calibration plate occupies at least one third of the visual field of the picture;
(ii) Recording the current tail end pose of the robot, and simultaneously solving the position of the Aruco calibration plate in a camera coordinate system; repeating the operation, so that the Aruco calibration plate can be shot at different positions of the circle each time, and the smaller the relative movement position of the tail end of the robot is, the better the relative movement position is, and ten times are taken;
(iii) Using the recorded tail end pose of the robot and the pose of the Aruco calibration plate to complete AX=XB hand-eye matrix solving by using a Tsai algorithm, and using a depth camera reference matrix K and a hand-eye matrixThe written configuration file is convenient for subsequent use, and the expression of K is as follows:
CN202310937306.9A 2023-07-27 2023-07-27 Detection method and application of anchor frame and acupoint site of human body part Pending CN116844189A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310937306.9A CN116844189A (en) 2023-07-27 2023-07-27 Detection method and application of anchor frame and acupoint site of human body part

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310937306.9A CN116844189A (en) 2023-07-27 2023-07-27 Detection method and application of anchor frame and acupoint site of human body part

Publications (1)

Publication Number Publication Date
CN116844189A true CN116844189A (en) 2023-10-03

Family

ID=88161789

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310937306.9A Pending CN116844189A (en) 2023-07-27 2023-07-27 Detection method and application of anchor frame and acupoint site of human body part

Country Status (1)

Country Link
CN (1) CN116844189A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974740A (en) * 2024-04-01 2024-05-03 南京师范大学 Acupoint positioning method and robot based on aggregation type window self-attention mechanism

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974740A (en) * 2024-04-01 2024-05-03 南京师范大学 Acupoint positioning method and robot based on aggregation type window self-attention mechanism

Similar Documents

Publication Publication Date Title
Hasan et al. RETRACTED ARTICLE: Static hand gesture recognition using neural networks
Elgammal et al. Tracking people on a torus
JP5209751B2 (en) Robot drive system, robot drive method, and robot drive program
JP7016522B2 (en) Machine vision with dimensional data reduction
Elforaici et al. Posture recognition using an RGB-D camera: exploring 3D body modeling and deep learning approaches
CN113496507A (en) Human body three-dimensional model reconstruction method
CN110561399B (en) Auxiliary shooting device for dyskinesia condition analysis, control method and device
CN109086754A (en) A kind of human posture recognition method based on deep learning
CN110135277B (en) Human behavior recognition method based on convolutional neural network
CN105825268A (en) Method and system for data processing for robot action expression learning
CN108171133A (en) A kind of dynamic gesture identification method of feature based covariance matrix
CN107886558A (en) A kind of human face expression cartoon driving method based on RealSense
CN110555383A (en) Gesture recognition method based on convolutional neural network and 3D estimation
CN116844189A (en) Detection method and application of anchor frame and acupoint site of human body part
CN114036969A (en) 3D human body action recognition algorithm under multi-view condition
CN106846462A (en) Insect identifying device and method based on three-dimensional simulation
CN106886758B (en) Insect identification device and method based on 3 d pose estimation
CN113076918B (en) Video-based facial expression cloning method
CN110223368A (en) A kind of unmarked motion capture method of face based on deep learning
Lee et al. Motion recognition and recovery from occluded monocular observations
Massad et al. Combining multiple views and temporal associations for 3-D object recognition
CN113420783B (en) Intelligent man-machine interaction method and device based on image-text matching
CN113674395B (en) 3D hand lightweight real-time capturing and reconstructing system based on monocular RGB camera
CN112099330B (en) Holographic human body reconstruction method based on external camera and wearable display control equipment
Ardizzone et al. Pose classification using support vector machines

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination