CN117372476A - Multi-camera tracking and monitoring method for lower limb rehabilitation training - Google Patents

Multi-camera tracking and monitoring method for lower limb rehabilitation training Download PDF

Info

Publication number
CN117372476A
CN117372476A CN202311322634.4A CN202311322634A CN117372476A CN 117372476 A CN117372476 A CN 117372476A CN 202311322634 A CN202311322634 A CN 202311322634A CN 117372476 A CN117372476 A CN 117372476A
Authority
CN
China
Prior art keywords
patient
camera
under
target
rehabilitation training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311322634.4A
Other languages
Chinese (zh)
Inventor
陈博
罗傅宜
胡明南
周袁
周京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202311322634.4A priority Critical patent/CN117372476A/en
Publication of CN117372476A publication Critical patent/CN117372476A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention provides a multi-camera tracking and monitoring method for lower limb rehabilitation training, which comprises the following steps: and using a YOLO model to detect pedestrians on the images captured by each camera, and obtaining the position and bounding box information of the pedestrians. Features within the bounding box region are extracted for the patient within the detection box. Learning should be done on the changing features of the patient under different cameras, and calculating the distance between feature vectors. The distances are ordered within the multiple camera videos to achieve matching across the same patient under the cameras. And carrying out gesture estimation on the target successfully tracked under a single visual angle, and extracting coordinates of key points to obtain a local track. And fusing the local trajectories obtained under each single view angle to obtain the global trajectories of each patient under the multi-camera view angle. The invention adopts a multi-camera multi-target tracking mode, and can well realize one-to-many monitoring and rehabilitation training evaluation during rehabilitation training.

Description

Multi-camera tracking and monitoring method for lower limb rehabilitation training
Technical Field
The invention relates to the field of tracking and monitoring of lower limb rehabilitation training, in particular to a multi-camera tracking and monitoring method aiming at lower limb rehabilitation training.
Background
Along with the continuous aggravation of population aging, the muscle performance of the aged gradually declines, so that the aged is limited in the mobility, the basic daily life cannot be met, and the rehabilitation requirement is continuously increased. However, there is currently a shortage of specialized rehabilitation talents and an insufficient number of rehabilitation hospitals nationwide. To solve this problem, a lower limb rehabilitation robot becomes a feasible solution. The robot is mainly used for assisting a patient to execute simple and highly repeatable training tasks, and is expected to lighten the burden of a medical system. Rehabilitation robots are of great importance in assisting and supervising the daily walking of patients with lower limb dysfunction, especially in the stage where they revert to walking independently.
However, these patients still need health assessment and safety supervision during daily walking, and long-term manual assistance and supervision is costly due to caregivers' shortages and high care costs. Thus, it becomes critical for rehabilitation robots to provide safety guidance, supervision and assistance functions in indoor activities. However, because there is a mismatch between the perception capability of the rehabilitation robot and the uncontrollable limb of the patient during rehabilitation, the motion assessment and accidental nursing of the patient still require the whole-course monitoring of medical staff, and the complete autonomous rehabilitation training cannot be realized. Aiming at the problem, a multi-camera tracking and monitoring technology aiming at lower limb rehabilitation training is provided. The technology can integrate and display multi-view images of patients in the whole rehabilitation process in real time, so that one rehabilitation engineer can monitor a plurality of patients at the same time; the training route and the real-time position information of each patient can be acquired, so that the patient can be managed better, and the system can respond quickly when accidents happen. Therefore, research on the multi-camera tracking technology is of great significance as a feasibility of an indoor rehabilitation assistance system. The technology can be used for tracking the safety and supervision conditions of independent walking of the patient in rehabilitation training, so that the problem of autonomous rehabilitation of the patient under the condition of medical resource shortage is solved.
Disclosure of Invention
In order to monitor and evaluate a rehabilitation patient in lower limb rehabilitation training, the invention provides a multi-camera tracking monitoring method aiming at lower limb rehabilitation training, which can realize one-to-many monitoring of a rehabilitation engineer and evaluation of the rehabilitation condition of the patient so as to relieve the problem of medical resource tension.
A multi-camera tracking and monitoring method for lower limb rehabilitation training comprises the following steps:
step 1: acquiring an input video of lower limb rehabilitation training of a single camera, detecting a patient in the lower limb rehabilitation training input video by using a YOLO model, and acquiring boundary box information of the patient;
step 2: determining target patients to be tracked in the boundary box information acquired under each camera, setting a label for each target patient, extracting appearance characteristics of the target patient in a boundary box area by using a pedestrian re-identification method based on measurement learning for the target patients to be tracked, and converting the appearance characteristics into characteristic vectors with fixed lengths by using a deep learning model;
step 3: mapping the feature vectors with fixed lengths under a single camera to a new space, setting a distance threshold by calculating the distance between the feature vectors, judging whether the distance between the feature vectors is smaller than the threshold, if so, considering the feature vectors as the same target patient, and if so, not, judging the feature vectors as the same target patient;
step 4: and (3) acquiring input videos of lower limb rehabilitation training of different cameras, repeating the step (1), the step (2) and the step (3) to obtain feature vectors with fixed lengths under different cameras, calculating the distances of the feature vectors with the fixed lengths under different cameras, sorting according to the distances between the feature vectors with the fixed lengths, and returning to target patient labels corresponding to the feature vectors with the fixed lengths to realize matching of the same target patients under the cross cameras.
Step 5: and (3) carrying out gesture estimation by using openpost on the successfully matched target patients under the single view angle in the step (3), extracting key point coordinates in skeleton information of each target patient, and obtaining a local motion track of each target patient under the single camera view angle.
Step 6: and 5, repeating the step to obtain local motion trajectories of the target patients under each camera view angle, and fusing the local trajectories of the target patients obtained under each single view angle to obtain global trajectories of each target patient under the multi-camera view angle.
In step 1, a YOLO target detection algorithm is used for an input video of lower limb rehabilitation training, a locally unique bounding box is obtained through convolution neural network and non-maximum suppression, and then bounding box information is output.
In step 2, the appearance characteristics of the target patient to be tracked are extracted by using the convolution layer in the neural network, and the output of the convolution layer is converted into a characteristic vector with a fixed length through pooling operation.
In step 3, a full connection layer is added on the basis of step 2, and the feature vector with fixed length obtained after pooling is mapped to a new space. And defining a contrast loss function according to a pedestrian re-identification method of metric learning:
wherein I is a And I b Representing two input training pictures, y is the label of the input training picture, when y=1, representing that the two pictures belong to one patient, and when y=0, representing that the two pictures belong to one patientIn different patients, alpha is a threshold parameter designed according to actual requirements,representation-> Is picture I a And I b Distance between two pictures inputted by network, d (I a ,I b ) Will become smaller. Conversely, when the two pictures entered by the network are different patients, d (I a ,I b ) And becomes large. According to the invention, the threshold value is set to be 1.5 according to the actual rehabilitation training scene and the patient number, and the loss function L is reduced by minimizing the contrast c The distance between the pictures of the same patient becomes smaller, and the distance between the pictures of different patients becomes larger.
And 4, repeating the steps 1,2 and 3 to obtain the target patient label corresponding to the characteristic vector with fixed length under different cameras.
Step 5, estimating the gesture by using openpost based on the target patient successfully matched under the single camera in step 3, predicting a vector domain set L of human body key point affinities by using a feedforward neural network for the boundary box of the target patient, wherein the vectors of the human body key point affinities represent the positions and the directions of limbs,and->Is the key point j at both ends of the kth person's limb c 1 And j 2 V is the key point j 1 ,j 2 A pixel at an arbitrary point p, p position when p is on the limbIs v, when p is not on the limb, the pixel valueIs 0:
after the position and the direction of the limb are determined, complete human skeleton information can be obtained. The location information of the point p falling on the limb, i.e. the keypoint information of each target patient within the bounding box, is then predicted by a greedy inference algorithm. And recording the coordinates of key points of the target patients, taking the coordinates as the position information of each frame of target patients, and continuously extracting the position information of the same label patient as the local motion trail of each target patient on the basis of the step 4.
In step 6, in the monitoring system, the positions of the cameras are fixed, the patient moves in the same space, the ground can be selected as a reference plane to calculate the transformation relationship between the camera views, and the key point P of any target patient which can be observed by two cameras at the same time on the reference plane in the three-dimensional space is set w Is (x) w ,y w ,z w ) The pixel coordinates in the images acquired by the two cameras are respectivelyAnd->P can be obtained w And->Is a transformation relation of (a). Due to P w On the reference plane, so P w Can be written as (x) w ,y w ,0),s 1 Sum s 2 Is a scale change factor, then P w To->The transformation relationship of the plane becomes:
M′ 1 and M' 2 All 3 x 3 matrix, two image points are established according to the reference planeAnd->Is a conversion relation of:
wherein,is a 3 x 3 homography matrix containing 8 degrees of freedom, homography matrix H represents the conversion relationship established by two camera views according to the reference plane:
wherein h is ij Is an element of a matrix, i=1, 2,3; j=1, 2,3, h 11 ,h 12 ,h 13 Control mapping and translation on abscissa, h 21 ,h 22 ,h 23 Control mapping and translation on ordinate, h 31 ,h 32 ,h 33 The perspective change and the scaling are controlled, and local tracks under a plurality of cameras can be unified on a global reference plane through the corresponding relation and homography transformation between every two views, so that global position information of each target patient is obtained.
Compared with the existing method, the method has the following main beneficial effects:
the invention provides a multi-camera tracking and monitoring method for lower limb rehabilitation training. And acquiring an input video of each camera, and performing pedestrian detection on the image captured by each camera by using a YOLO model to acquire the position and bounding box information of the pedestrian. A target patient is determined from the target detection results, and features within the bounding box region are extracted for the patient within each detection box. Learning should be done on the changing features of the patient under different cameras, and calculating the distance between feature vectors. The distances are ordered within the multiple camera videos to achieve matching across the same patient under the cameras. And carrying out gesture estimation on the target successfully tracked under a single visual angle, and extracting coordinates of key points to obtain a local track. And fusing the local trajectories obtained under each single view angle to obtain global trajectories of each patient under the multi-camera view angle, thereby realizing the monitoring task of the multi-camera lower limb rehabilitation patient. The invention adopts a multi-camera multi-target tracking mode, so that the method can replace the previous one-to-one rehabilitation training mode. Finally, experiments show that the invention can well realize one-to-many monitoring and rehabilitation training evaluation during lower limb rehabilitation training.
Drawings
FIG. 1 is a frame construction diagram of the present invention;
FIG. 2 is a block diagram of an algorithm embodying the present invention.
Detailed Description
In order to make the purposes, design ideas and technical schemes of the embodiments of the invention clearer, the invention is further described below with reference to the accompanying drawings.
Referring to fig. 1, the invention provides a multi-camera tracking and monitoring method for lower limb rehabilitation training, which comprises the following steps:
step 1: and acquiring an input video of each camera, and performing pedestrian detection on the image captured by each camera by using a YOLO model to acquire the position and bounding box information of the pedestrian.
Prior to step 1, the object detection task requires a large number of pictures and labels. The invention actually shoots a large number of videos of walking of the robot wearing the exoskeleton by pedestrians and in-situ static pictures, and extracts the pictures in the video stream. In order to ensure the quality of the pictures, all the pictures are manually screened, some walking blur is removed, the pictures are highly shielded, and the pictures which are lost by pedestrians are repeatedly shot. Through the steps, the collected pictures are integrated, and the number and the quality of the pictures meet the standards of the target detection task. And dividing the pedestrian picture into a training set, a verification set and a test set according to the proportion. And finally, labeling pedestrians in the image, and storing the coordinates of key points, the coordinates of central points and the length and width of the skeleton of the pedestrians as real data.
As shown in fig. 2, an image is divided into a plurality of small blocks by using a YOLO target detection algorithm, two bounding boxes are placed in each lattice in advance, the coordinates, the category and the confidence of each bounding box are obtained through prediction by a convolutional neural network, and then a locally unique prediction box is obtained through non-maximum suppression. Setting a classification label, designating a class for detection, and outputting a required target.
Step 2: and determining a target patient to be tracked from target detection results, extracting the characteristics in the boundary box area of the patient in each detection box based on a pedestrian re-identification method of metric learning, and mapping the characteristics into a characteristic vector with fixed length by using a deep learning model.
According to the pedestrian re-recognition method based on measurement learning, the similarity of two pictures is learned through a network, the similarity of different pictures of the same pedestrian is larger than that of different pictures of different pedestrians, and finally the distance between the pictures of the same pedestrian is as small as possible due to the loss function of the network, so that the distances between the pictures of the same pedestrian are large.
Step 3: learning to cope with the changing characteristics of patients under different cameras, mapping the learned characteristics to a new space, calculating the distance between the characteristic vectors, setting a distance threshold, judging whether the distance between the characteristic vectors is smaller than the threshold, and if so, judging that the person is the same person.
Contrast loss is used to train a twin network whose inputs are two pictures I a And I b The two pictures can be the same pedestrian or different pedestrians, and each pair of training pictures hasOne label y, where y=1 indicates that two pictures belong to one pedestrian (positive sample pair), whereas y=0 indicates that two pictures belong to a different pedestrian (negative sample pair). After that, the process is performed. The contrast loss function is rewritten as:
wherein the method comprises the steps ofRepresentation->Alpha is a threshold parameter designed according to actual requirements, and the threshold is set to be 1.5 according to the actual rehabilitation training scene and the number of patients in the invention, in order to minimize the loss function,is picture I a And I b Distance between, when the network inputs a pair of positive samples, d (I a ,I b ) The pedestrian pictures of the same ID gradually become smaller, i.e., clusters are gradually formed in the feature space. Conversely, when the network inputs a pair of negative sample pairs, d (I a ,I b ) Will become progressively larger until the set alpha is exceeded. Through minimizing the contrast loss function, the distance between positive sample pairs can be gradually reduced, and the distance between negative sample pairs is gradually increased, so that the requirement of a pedestrian re-recognition task is met.
Step 4: and searching the images in the videos of the cameras, sorting according to the distance between the images, and returning the search result of the corresponding patient to realize the matching of the same patient under the cross cameras.
And (3) repeating the step (3) under each camera, and performing metric learning on the target pictures under the plurality of cameras so as to obtain pedestrian tags with the same ID.
Step 5: and carrying out gesture estimation on the targets successfully tracked under the single view angle by using openpost, extracting key point coordinates in the targets, and obtaining the local motion trail of each target under the single camera view angle.
The invention uses openpost to estimate the gesture, the openpost is a human gesture estimation algorithm from bottom to top, the previous top-down method is sensitive to object detection, key point detection can not be carried out when pedestrians cannot detect the object, the shielding treatment effect is very poor, the detection speed is slow when the number of people increases, the aspects are improved by the bottom-up method, the detection of joint parts is carried out first, and then splicing is carried out.
And (3) determining detection targets on the basis of the step (2), obtaining detection frames, and predicting the vector domain set L of the affinity of the human body key points for the detection frames by using a feedforward neural network. Finally, the confidence and affinity domains are used to derive the key points of each person in the detection frame by greedy inference algorithm.
To guide the network structure to iteratively predict confidence graphs and affinity domains, a loss function is used after each phase. Using L for prediction results and real tags 2 Loss. Spatial weighting is used on the loss function to solve the problem of labels where some data is not all.
The affinity domain represents the position and orientation of the limb with a vector.And->Is the key point j at both ends of the kth person's limb c 1 And j 2 Is a real position of (c). Known j 1 ,j 2 Is a unit direction vector v of (1), arbitrary point p, when it is on the limb,/i>The value of (pixel at p position) is v, and when it is not on the limb, the pixel value is 0:
wherein, the key point j 1 ,j 2 Is defined by the unit direction vector v:
through the estimation of the gesture, the skeleton information of each patient can be obtained, the coordinates of the foot points of the patient are recorded and used as the position information of each frame of patient, and on the basis of the step 4, the motion trail of the patient with the same ID is extracted from continuous frames to obtain the local trail.
Step 6: and fusing the local trajectories obtained under each single view angle to obtain global trajectories of each patient under the multi-camera view angle, thereby realizing the monitoring task of the multi-camera lower limb rehabilitation patient.
The application scene of the invention is an indoor rehabilitation training room, the positions of cameras are fixed in a monitoring system, a patient moves in the same space, the ground can be selected as a reference plane to calculate the transformation relationship between camera views, and a key point P of a target patient observed by two cameras at the same time on any one of the reference planes in a three-dimensional space is set w Is (x) w ,y w ,z w ) The pixel coordinates in the images acquired by the two cameras are respectivelyAnd-> P can be obtained w And->Is a transformation relation of (a).
Due to P w On the reference plane, so P w Can be written as (x) w ,y w ,0),P w In a reference planeThe coordinates above may represent a rewrite as P' = (x) w ,y w ),P w To the point ofThe transformation relationship of the plane becomes.
M′ 1 And M' 2 All 3 x 3 matrix, two image points are established according to the reference planeAnd->Is a conversion relation of:
wherein,is a homography matrix of 3 x 3 containing 8 degrees of freedom, representing the transformation relationship established by the two camera views according to the reference plane.
Local tracks under a plurality of cameras can be unified on a global reference plane through the corresponding relation and homography transformation between every two views, so that global position information of each patient is obtained.

Claims (6)

1. A multi-camera tracking and monitoring method for lower limb rehabilitation training is characterized by comprising the following steps:
step 1: acquiring a lower limb rehabilitation training input video under a single camera, detecting a patient in the lower limb rehabilitation training input video by using a YOLO target detection algorithm, and acquiring boundary box information of the patient;
step 2: determining a target patient to be tracked in the acquired boundary frame information, setting a target patient label, extracting appearance characteristics of the target patient to be tracked in a boundary frame area, and converting the appearance characteristics into characteristic vectors with fixed lengths by using a deep learning model;
step 3: mapping the feature vectors with fixed length obtained in the step 2 to a new space, setting a distance threshold by calculating the distance between the feature vectors, judging whether the distance between the feature vectors is smaller than the threshold, if so, considering the feature vectors as the same target patient, and if so, not, judging the feature vectors as the same target patient;
step 4: obtaining lower limb rehabilitation training input videos of different cameras, repeating the steps 1 to 2 to obtain feature vectors with fixed lengths under the different cameras, performing distance calculation on the feature vectors with the fixed lengths under the different cameras, sequencing according to the distance between the feature vectors with the fixed lengths, and returning to target patient labels corresponding to the feature vectors with the fixed lengths to realize matching of the same target patients under the cross cameras;
step 5: based on the fact that the target patients successfully matched under the single camera in the step 3 use openpost to carry out gesture estimation, key point coordinates in skeleton information of each target patient are extracted, and a local motion track of each target patient under the single camera view angle is obtained;
step 6: and (3) acquiring lower limb rehabilitation training input videos of different cameras, repeating the steps 1,2,3 and 5 to obtain local motion trajectories of target patients under different camera view angles, and fusing the local motion trajectories of the target patients under different camera view angles to obtain global trajectories of each target patient under the multi-camera view angles.
2. The multi-camera tracking and monitoring method for lower limb rehabilitation training according to claim 1, wherein in step 1, a YOLO target detection algorithm is used to detect a patient in a lower limb rehabilitation training input video, and specifically comprises:
and (3) using a YOLO target detection algorithm to the lower limb rehabilitation training input video, obtaining a locally unique boundary box through convolution neural network and non-maximum suppression, and outputting boundary box information.
3. The multi-camera tracking and monitoring method for lower limb rehabilitation training according to claim 1, wherein in step 2, the appearance characteristics of the target patient to be tracked in the boundary box area are extracted, and the method specifically comprises the following steps:
the method comprises the steps of extracting appearance characteristics of a target patient to be tracked under each single camera by using a convolution layer in a neural network, and converting the output of the convolution layer into a characteristic vector with fixed length through pooling operation.
4. The multi-camera tracking and monitoring method for lower limb rehabilitation training according to claim 1, wherein in step 3, the feature vector with a fixed length obtained in step 2 is mapped to a new space, and specifically includes:
mapping the feature vector with fixed length obtained in the step 2 to a new space through a full connection layer, and defining a contrast loss function:
wherein I is a And I b Two input training pictures are represented, y is the label of the input training picture, y=1 represents that the two pictures belong to one patient, y=0 represents that the two pictures belong to different patients, alpha is a threshold parameter designed according to actual requirements,representation-> Is picture I a And I b Distance between two pictures inputted by network, d (I a ,I b ) Will become smaller, whereas when the two pictures entered by the network are different patients, d (I a ,I b ) And becomes large.
5. The multi-camera tracking and monitoring method for lower limb rehabilitation training according to claim 1, wherein in step 5, based on that the target patient successfully matched under the single camera in step 3 uses openpost to perform posture estimation, key point coordinates in skeleton information of each target patient are extracted, and a local motion track of each target patient under a single camera view angle is obtained, which specifically comprises:
based on the object patient successfully matched under the single camera in the step 3, performing gesture estimation by using openpost, predicting a vector domain set L of human body key point affinities by using a feedforward neural network for a boundary box of the object patient, wherein the vectors of the human body key point affinities represent the positions and directions of limbs,and->Is the key point j at both ends of the kth person's limb c 1 And j 2 V is the key point j 1 ,j 2 Is a unit direction vector of (1), an arbitrary point p, a pixel of the p position when p is on the limb +.>Is v, and when p is not on the limb, the pixel value is 0:
after the position and the direction of the limb are determined, complete human skeleton information is obtained, then the position information of a point p falling on the limb, namely, the key point information of each target patient in the boundary box is predicted through a greedy inference algorithm, the key point coordinates of the target patient are recorded and used as the key point coordinates in the skeleton information of each target patient, the position information of the patient with the same label is continuously extracted and used as the local movement track of each target patient under the single camera view angle.
6. The multi-camera tracking and monitoring method for lower limb rehabilitation training according to claim 1, wherein in step 6, the local motion trajectories of the target patients under different camera angles are fused to obtain global trajectories of each target patient under the multi-camera angles, and the method specifically comprises the following steps:
the ground is selected as a reference plane to calculate the transformation relation between different camera visual angles, and a key point P of a certain target patient observed by two cameras at the same time on any one of the reference planes in the three-dimensional space is set w Is (x) w ,y w ,z w ) Key point P w The pixel coordinates in the images acquired by the two cameras are respectivelyAndobtaining P w And->Due to the transformation relation of P w On the reference plane, so P w Is written as (x) w ,y w ,0),P w The coordinate representation on the reference plane is rewritten as P =(x w ,y w ),s 1 Sum s 2 Is a scale change factor, then P w To->The transformation relationship of the plane becomes:
M 1 and M 2 All 3 x 3 matrix, two image points are established according to the reference planeAnd->Is a conversion relation of:
wherein,is a 3 x 3 homography matrix containing 8 degrees of freedom; the homography matrix H represents a conversion relation established by the two camera views according to a reference plane;
and unifying the local tracks under a plurality of cameras to a global reference plane through the conversion relation established by the two camera views according to the reference plane, thereby obtaining the global track of each target patient under the multi-camera view.
CN202311322634.4A 2023-10-13 2023-10-13 Multi-camera tracking and monitoring method for lower limb rehabilitation training Pending CN117372476A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311322634.4A CN117372476A (en) 2023-10-13 2023-10-13 Multi-camera tracking and monitoring method for lower limb rehabilitation training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311322634.4A CN117372476A (en) 2023-10-13 2023-10-13 Multi-camera tracking and monitoring method for lower limb rehabilitation training

Publications (1)

Publication Number Publication Date
CN117372476A true CN117372476A (en) 2024-01-09

Family

ID=89407082

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311322634.4A Pending CN117372476A (en) 2023-10-13 2023-10-13 Multi-camera tracking and monitoring method for lower limb rehabilitation training

Country Status (1)

Country Link
CN (1) CN117372476A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726821A (en) * 2024-02-05 2024-03-19 武汉理工大学 Medical behavior identification method for region shielding in medical video

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117726821A (en) * 2024-02-05 2024-03-19 武汉理工大学 Medical behavior identification method for region shielding in medical video
CN117726821B (en) * 2024-02-05 2024-05-10 武汉理工大学 Medical behavior identification method for region shielding in medical video

Similar Documents

Publication Publication Date Title
US11482048B1 (en) Methods and apparatus for human pose estimation from images using dynamic multi-headed convolutional attention
Jalal et al. Depth maps-based human segmentation and action recognition using full-body plus body color cues via recognizer engine
Rout A survey on object detection and tracking algorithms
Liu et al. Using unsupervised deep learning technique for monocular visual odometry
CN117372476A (en) Multi-camera tracking and monitoring method for lower limb rehabilitation training
CN117671738B (en) Human body posture recognition system based on artificial intelligence
Noe et al. Automatic detection and tracking of mounting behavior in cattle using a deep learning-based instance segmentation model
CN115900710A (en) Dynamic environment navigation method based on visual information
CN114550027A (en) Vision-based motion video fine analysis method and device
CN112966628A (en) Visual angle self-adaptive multi-target tumble detection method based on graph convolution neural network
Shi et al. Fuzzy dynamic obstacle avoidance algorithm for basketball robot based on multi-sensor data fusion technology
CN115661856A (en) User-defined rehabilitation training monitoring and evaluating method based on Lite-HRNet
Rungsarityotin et al. Finding location using omnidirectional video on a wearable computing platform
CN113408435B (en) Security monitoring method, device, equipment and storage medium
CN112785564A (en) Pedestrian detection tracking system and method based on mechanical arm
CN108577849A (en) A kind of physiological function detection method based on mist computation model
Henning et al. Bodyslam++: Fast and tightly-coupled visual-inertial camera and human motion tracking
Moolan-Feroze et al. Predicting out-of-view feature points for model-based camera pose estimation
Singh et al. Autonomous Multiple Gesture Recognition system for disabled people
Sarkar et al. Action-conditioned deep visual prediction with roam, a new indoor human motion dataset for autonomous robots
Dubuisson-Jolly et al. Tracking deformable templates using a shortest path algorithm
Xu et al. Human motion prediction based on imus and metaformer
Wong et al. Markerless motion capture using appearance and inertial data
Thati et al. A High Level CNN-based App to Guide the People of Visually Diminished
Li et al. Improved SLAM and Motor Imagery Based Navigation Control of a Mobile Robot

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination