CN117497129A - Game rehabilitation scene participation degree behavior recognition method based on vision - Google Patents
Game rehabilitation scene participation degree behavior recognition method based on vision Download PDFInfo
- Publication number
- CN117497129A CN117497129A CN202310634168.7A CN202310634168A CN117497129A CN 117497129 A CN117497129 A CN 117497129A CN 202310634168 A CN202310634168 A CN 202310634168A CN 117497129 A CN117497129 A CN 117497129A
- Authority
- CN
- China
- Prior art keywords
- feature
- behavior recognition
- participation degree
- extraction unit
- feature extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000000605 extraction Methods 0.000 claims abstract description 71
- 238000012549 training Methods 0.000 claims abstract description 45
- 239000013598 vector Substances 0.000 claims description 54
- 238000011176 pooling Methods 0.000 claims description 26
- 238000013528 artificial neural network Methods 0.000 claims description 12
- 230000004927 fusion Effects 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 9
- 230000009467 reduction Effects 0.000 claims description 9
- 238000012360 testing method Methods 0.000 claims description 6
- 238000012300 Sequence Analysis Methods 0.000 claims description 3
- 239000012634 fragment Substances 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 238000005070 sampling Methods 0.000 claims description 3
- 230000006399 behavior Effects 0.000 description 49
- 230000006870 function Effects 0.000 description 24
- 210000003128 head Anatomy 0.000 description 23
- 238000010586 diagram Methods 0.000 description 7
- 208000006011 Stroke Diseases 0.000 description 5
- 230000008569 process Effects 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000001815 facial effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 210000003141 lower extremity Anatomy 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 210000000744 eyelid Anatomy 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000001747 pupil Anatomy 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
- G16H20/30—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to physical therapies or activities, e.g. physiotherapy, acupressure or exercising
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/197—Matching; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/70—Multimodal biometrics, e.g. combining information from different biometric modalities
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Human Computer Interaction (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Biodiversity & Conservation Biology (AREA)
- Physical Education & Sports Medicine (AREA)
- Ophthalmology & Optometry (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Databases & Information Systems (AREA)
- Public Health (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a game rehabilitation scene participation degree behavior recognition method based on vision. The method comprises the following steps: constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit; training the participation behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model; and identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model. The method is simple and effective, has high accuracy, and can provide supervision and feedback of rehabilitation training for patients in real time.
Description
Technical Field
The invention relates to the fields of rehabilitation medicine and computer vision, in particular to a game rehabilitation scene participation degree behavior identification method based on vision.
Background
Investigation has shown that stroke is the leading cause of disability in adults. A plurality of clinical researches show that the rehabilitation training is an effective way for improving the movement function of stroke patients and promoting rehabilitation. Rehabilitation participation is defined as a state in which it is stimulated and actively strives to participate in rehabilitation training. It can reflect the patient's attitude to rehabilitation, understanding the task requirements, and the effectiveness of the entire training process. Previous studies have shown that high patient involvement is an important factor in promoting nerve recombination. Even if the patient does not have the ability to perform an action, the willingness to exercise actively is necessary for rehabilitation. However, the repeatability of rehabilitation exercises tends to be boring and tiring for the patient. In fact, patients often exhibit low levels of participation during rehabilitation training, resulting in poor rehabilitation performance and safety hazards. In traditional rehabilitation training, supervision and correction by therapists can reduce the impact of low patient engagement. However, there is a shortage of rehabilitation physicians and patients often perform training tasks without supervision and correction. Therefore, it is very important to evaluate the participation of the patient in rehabilitation training, which is beneficial to the evaluation of rehabilitation results and the adjustment of training tasks.
Researchers have developed a number of methods of assessing patient engagement, including mainly scoring-based scales and physiological signal-based methods. The evaluation method adopting the scale requires a rehabilitation doctor to observe and score according to the participation indexes of the patient in the rehabilitation training process. This approach inevitably introduces subjective judgment from the physician and increases the workload of the rehabilitation physician. Physiological signals generated during rehabilitation training are also often used to assess patient engagement.
In the chinese published patent application CN105054927a, zhang Jinhua et al provides a method for quantitatively evaluating the degree of active participation in a lower limb rehabilitation system by detecting EEG signals and EMG signals of a patient in real time to calculate the degree of active participation in the lower limb rehabilitation system, which is disadvantageous in that the acquisition of physiological signals depends on wearable sensors, is inconvenient to wear, and is liable to cause discomfort to the patient in rehabilitation training.
The low engagement behavior of stroke patients in virtual gaming rehabilitation training is often accompanied by many facial and posture behaviors. Inspired by this finding, the present invention seeks to capture changes in facial behavior, such as eyelid movement, pupil movement, eye opening, head posture and facial fatigue expression, using visual techniques to intuitively detect subjective participation by a patient. In contrast to physiological signal-based methods, vision-based methods can be evaluated in a non-invasive manner and do not affect the patient's rehabilitation training process. However, there has been no related study in which vision techniques were applied to detect engagement in stroke patients.
The matters in the background section are only those known to the public and do not represent prior art in the field.
Disclosure of Invention
Aiming at the technical defects existing in the prior art, the invention provides a game rehabilitation scene participation degree behavior recognition method based on vision, which aims at applying the vision technology to detect the participation degree of a stroke patient, and realizes the extraction of participation degree characteristics in a vision image and the recognition of participation degree behaviors by using a deep learning model.
The object of the invention is achieved by at least one of the following technical solutions.
A game rehabilitation scene participation degree behavior recognition method based on vision comprises the following steps:
s1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit;
the eye feature extraction unit is used for obtaining eye feature vectors according to the intercepted eye images in the original video frames; the gesture feature extraction unit is used for extracting feature points related to the head gesture from the original image and carrying out normalization operation so as to obtain a head gesture feature vector; in the time sequence feature extraction unit, eye feature vectors and head gesture feature vectors extracted from each video frame are spliced into fusion feature vectors, and a fusion feature vector group of one video is input into a time sequence neural network to obtain the classification of participation degree;
s2, training a participation behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model;
s3, identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model.
Further, in order to unify the ocular feature extraction unit input, the ocular images cut out from the original video frames are unified in size.
Further, the eye feature extraction unit comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer and a full connection layer which are connected in sequence;
the method comprises the steps of intercepting an eye image from an original video frame to serve as input, and extracting features through a first convolution layer to obtain a first feature map;
the first feature map output by the first convolution layer is input into the first pooling layer for feature dimension reduction, and a second feature map is obtained;
the second feature map output by the first pooling layer is input into a second convolution layer for further feature extraction to obtain a third feature map;
the third feature map output by the second convolution layer is input into the second pooling layer for feature dimension reduction, and a fourth feature map is obtained;
the fourth feature map output by the second pooling layer is input into a third convolution layer for further feature extraction to obtain a fifth feature map;
and inputting the fifth feature map into the full-connection layer for feature dimension reduction to obtain an output vector.
Further, in the gesture feature extraction unit, feature points related to the head gesture are extracted from the original image, wherein the feature points comprise a left shoulder point, a right shoulder point, a trunk vertex, a nose tip point, a left eye positioning point and a right eye positioning point which are two-dimensional coordinates;
the trunk top point is respectively connected with the left shoulder point, the right shoulder point and the nose tip point, and the nose tip point is respectively connected with the left eye positioning point and the right eye positioning point.
Further, in the gesture feature extraction unit, for the normalization operation of the 6 head gesture feature points, first, the two-dimensional coordinates of the trunk vertex are subtracted from the two-dimensional coordinates of the 6 head gesture feature points to obtain relative coordinates, and the obtained 6 relative coordinates are subjected to normal distribution normalization processing.
Further, in the time sequence feature extraction unit, the eye feature vector and the head gesture feature vector extracted from each video frame are spliced into a fusion feature vector, and a frame-level feature representation of one video is constructed.
Further, in the time sequence feature extraction unit, the fusion feature vector is used as the input of the time step of the TCN time sequence neural network, and the output of the last time step of the TCN time sequence neural network is transmitted to a full connection layer and a softmax function to obtain the category prediction Y 'of the query video V' V And a loss function L.
Further, the step S2 specifically includes the following steps:
s2.1, constructing a training video sample library in a server, and selecting a sample video fragment from the training video sample library as input of a participation degree behavior recognition model;
s2.2, executing the eye feature extraction unit by using a server, extracting eye feature vectors by using a training sample video segment through the eye feature extraction unit according to video frames as units, and storing the output eye feature vectors in a feature library;
s2.3, executing the gesture feature extraction unit by using a server, extracting head gesture feature vectors by using a training sample video segment through the gesture feature extraction unit according to video frames as units, and storing the output head gesture feature vectors in a feature library;
s2.4, executing the time sequence feature extraction unit by utilizing a server, splicing the eye feature vector and the head gesture feature vector of each video frame, and performing time sequence analysis type prediction as input of a TCN time sequence neural network to obtain the type prediction Y 'of the query video V' V And a loss function L;
s2.5, performing end-to-end network training by using a server; the participation degree behavior recognition task loss function L is the distance between a predicted value and a true value, and the distance is minimized through a standard cross entropy loss function, namely, from the query video predicted category to the true category;
s2.6, optimizing an objective function by using a server, and acquiring a local optimal network parameter as a network weight of the participation degree behavior recognition model by using the loss function L in the step S2.5 as the objective function.
Further, the step S3 specifically includes the following steps:
s3.1, executing a test sample video segment generating unit by using a server, uniformly sampling all frames of one test sample video segment, and taking an obtained video frame set as input;
s3.2, performing participation behavior classification on the obtained video frame set by using the participation behavior recognition model.
Compared with the prior art, the invention has the following advantages and technical effects:
the game rehabilitation scene participation degree behavior recognition method based on vision is provided, eye features and gesture features are fused to realize recognition of participation degree behaviors: an eye feature extraction unit is provided to automatically extract advanced spatial features of an eye image; a pose feature extraction unit is proposed to extract head space pose features; a TCN time sequence feature extraction unit is introduced, time sequence modeling is carried out based on the extracted eye and gesture space features, the identification of participation degree behaviors is realized, and excellent identification accuracy is obtained; the vision-based participation behavior detection method is convenient and quick, does not need wearing articles and complex experimental settings, and is favorable for better supervision and feedback for rehabilitation training of patients.
Drawings
FIG. 1 is a diagram of an eye feature extraction unit constructed in an embodiment of the invention;
FIG. 2 is a schematic view of extracted head pose feature points according to an embodiment of the present invention;
fig. 3 is an algorithm frame diagram of a vision-based game rehabilitation scene participation degree behavior recognition method in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Aiming at the problems and the shortcomings in the prior art, the invention provides a game rehabilitation scene participation degree behavior recognition method based on vision, which mainly comprises five stages of steps of constructing an eye feature extraction unit, constructing a gesture feature extraction unit, constructing a time sequence feature extraction unit, model training and model deducing.
Examples:
a game rehabilitation scene participation degree behavior recognition method based on vision comprises the following steps:
s1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model;
s2, training a participation degree behavior recognition model;
s3, identifying the participation degree behavior in a new game rehabilitation scene by using the participation degree behavior identification model;
each step is described in detail below.
S1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit;
the eye feature extraction unit is used for obtaining an eye feature vector according to an eye image intercepted from an original video frame, as shown in fig. 1, which is an eye feature extraction unit diagram constructed by the invention, and an eye image is intercepted from the original video frame by using a dlib face key point detection tool, and an eye feature extraction model based on a convolutional neural network is designed to obtain an 84-dimensional eye feature vector; the gesture feature extraction unit is used for extracting 6 head gesture related feature points from an original image by using an openpost human body key point detection tool box, and performing normalization operation to obtain a 12-dimensional head gesture feature vector; in the time sequence feature extraction unit, eye feature vectors and head gesture feature vectors extracted from each video frame are spliced into fusion feature vectors, and a fusion feature vector group of one video is input into a time sequence neural network to obtain the classification of participation degree. The method comprises the following steps:
in one embodiment, to unify the ocular feature extraction unit inputs, the captured ocular images from the original video frames are unified to a size of 32 x 3.
The eye feature extraction unit comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer and a full connection layer which are sequentially connected;
the method comprises the steps of intercepting an eye image from an original video frame, and inputting the eye image into a first convolution layer for feature extraction; the first convolution layer adopts 32 convolution kernels of 5×5×3, and is not filled with 0, and the step size is 1; the convolution kernel obtains a characteristic value through convolution operation;
in one embodiment, the 28×28×32 first feature map output by the first convolution layer is input into the first pooling layer for feature dimension reduction; a pooling core with the size of 2 multiplied by 2 and the step length of 2 is adopted in the first pooling layer; in the first pooling layer, for inputAll elements in the 2 x 2 region of the first feature map are summed and then multiplied by a trainable first coefficient w 1 Add a first bias term b 1 Finally, obtaining the output of the first pooling layer through a sigmoid function, wherein the sigmoid function is a nonlinear activation function, and specifically comprises the following steps:
in one embodiment, the 14×14×32 second feature map output by the first pooling layer is input into the second convolution layer for further feature extraction; the second convolution layer adopts 64 convolution kernels of 5 multiplied by 3, 0 is not used for filling, the step length is 1, and a third characteristic diagram of 10 multiplied by 64 is obtained through convolution calculation;
in one embodiment, the third feature map is input into the second pooling layer for feature dimension reduction; a pooling core with the size of 2 multiplied by 2 and the step length of 2 is adopted in the second pooling layer; in the second pooling layer, all elements in the 2×2 region of the input third feature map are summed first, followed by multiplication with a trainable second coefficient w 2 Plus a second bias term b 2 Finally, obtaining the output of the second pooling layer through a sigmoid function;
in one embodiment, the 5×5×64 fourth feature map output by the second pooling layer is input into the third convolution layer for further feature extraction; 128 convolution kernels of 5 multiplied by 3 are adopted in the third convolution layer, 0 filling is not used, the step length is 1, and a fifth characteristic diagram of 1 multiplied by 128 is obtained through convolution calculation;
in one embodiment, the fifth feature map is input into the fully connected layer for feature dimension reduction; the full connection layer comprises 84 nodes; in the fully connected layer, the input vector is multiplied by a trainable third coefficient w 3 Add a third coefficient bias term b 3 Finally, an 84-dimensional output is obtained through a sigmoid function.
In one embodiment, in the gesture feature extraction unit, as shown in fig. 2, the schematic diagrams of 6 head gesture feature points extracted from the original image are respectively: the left shoulder point 1, the right shoulder point 2, the trunk vertex 3, the nose tip point 4, the left eye positioning point 5 and the right eye positioning point 6 are two-dimensional coordinates;
the trunk vertex 3 is respectively connected with the left shoulder point 1, the right shoulder point 2 and the nose tip point 4, and the nose tip point 4 is respectively connected with the left eye positioning point 5 and the right eye positioning point 6.
In one embodiment, in the gesture feature extraction unit, for the normalization operation of 6 head gesture feature points, first, two-dimensional coordinates of the torso vertex 3 are subtracted from two-dimensional coordinates of the 6 head gesture feature points to obtain relative coordinates, and the obtained 6 relative coordinates are subjected to normal distribution normalization processing, which specifically includes:
wherein μ and σ are the mean and standard deviation of the data, respectively; n is the number of data samples; p (P) k Is the original input vector of the kth data sample; p'. k Is the normalized input vector for the kth data sample.
In the time sequence feature extraction unit, 84-dimensional eye feature vectors and 12-dimensional head gesture feature vectors extracted from each video frame are spliced into 96-dimensional fusion feature vectors, and a frame-level feature representation of one video is constructed.
In the time sequence feature extraction unit, the fusion feature vector is used as the input of the time step of the TCN time sequence neural network, and the output of the last time step of the TCN time sequence neural network is transmitted to a full connection layer and a softmax function to obtain the category prediction Y 'of the query video V' V And a loss function L.
Step S2, training the participation degree behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model; in one embodiment, as shown in fig. 3, an algorithm framework diagram of a vision-based game rehabilitation scene participation behavior recognition method is specifically implemented as follows:
s2.1, constructing a training video sample library in a server, and selecting a sample video fragment from the training video sample library as input of a participation degree behavior recognition model;
s2.2, executing the eye feature extraction unit by using a server, extracting eye feature vectors by using a training sample video segment through the eye feature extraction unit according to video frames as units, and storing the output eye feature vectors in a feature library;
s2.3, executing the gesture feature extraction unit by using a server, extracting head gesture feature vectors by using a training sample video segment through the gesture feature extraction unit according to video frames as units, and storing the output head gesture feature vectors in a feature library;
s2.4, executing the time sequence feature extraction unit by utilizing a server, splicing the eye feature vector and the head gesture feature vector of each video frame, and performing time sequence analysis type prediction as input of a TCN time sequence neural network to obtain the type prediction Y 'of the query video V' V And a loss function L;
s2.5, performing end-to-end network training by using a server; the participation degree behavior recognition task loss function L is the distance between a predicted value and a true value, and the distance is minimized through a standard cross entropy loss function, namely, from the query video predicted category to the true category;
s2.6, optimizing an objective function by using a server, and acquiring a local optimal network parameter as a network weight of the participation degree behavior recognition model by using the loss function L in the step S2.5 as the objective function.
And S5, identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model. The specific implementation process is as follows:
s3.1, executing a test sample video segment generating unit by using a server, uniformly sampling all frames of one test sample video segment, and taking an obtained video frame set as input;
s3.2, performing participation behavior classification on the obtained video frame set by using the participation behavior recognition model.
The preferred embodiment of the invention provides a game rehabilitation scene participation degree behavior recognition method based on vision, which does not need complex wearing equipment and experimental setting operation, has usability and accuracy, and can be used for helping rehabilitation doctors to obtain feedback in time and providing proper training prescriptions for patients.
While the invention has been described with reference to specific embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It is to be understood that the features described in the different dependent claims and in the invention may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.
Claims (10)
1. The game rehabilitation scene participation degree behavior recognition method based on vision is characterized by comprising the following steps of:
s1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit;
s2, training a participation behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model;
s3, identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model.
2. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the participation degree behavior recognition model, an eye feature extraction unit is used for obtaining eye feature vectors according to an eye image intercepted from an original video frame; the gesture feature extraction unit is used for extracting feature points related to the head gesture from the original image and carrying out normalization operation so as to obtain a head gesture feature vector; in the time sequence feature extraction unit, eye feature vectors and head gesture feature vectors extracted from each video frame are spliced into fusion feature vectors, and a fusion feature vector group of one video is input into a time sequence neural network to obtain the classification of participation degree.
3. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein in order to unify the ocular feature extraction unit input, the ocular images cut out from the original video frames are unified in size.
4. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein the eye feature extraction unit comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer and a full connection layer which are sequentially connected;
the method comprises the steps of intercepting an eye image from an original video frame to serve as input, and extracting features through a first convolution layer to obtain a first feature map;
the first feature map output by the first convolution layer is input into the first pooling layer for feature dimension reduction, and a second feature map is obtained;
the second feature map output by the first pooling layer is input into a second convolution layer for further feature extraction to obtain a third feature map;
the third feature map output by the second convolution layer is input into the second pooling layer for feature dimension reduction, and a fourth feature map is obtained;
the fourth feature map output by the second pooling layer is input into a third convolution layer for further feature extraction to obtain a fifth feature map;
and inputting the fifth feature map into the full-connection layer for feature dimension reduction to obtain an output vector.
5. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the gesture feature extraction unit, feature points related to head gestures extracted from an original image comprise a left shoulder point (1), a right shoulder point (2), a trunk vertex (3), a nose tip point (4), a left eye positioning point (5) and a right eye positioning point (6), which are two-dimensional coordinates;
the trunk vertex (3) is respectively connected with the left shoulder point (1), the right shoulder point (2) and the nose tip point (4), and the nose tip point (4) is respectively connected with the left eye positioning point (5) and the right eye positioning point (6).
6. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the gesture feature extraction unit, normalization operation is performed on 6 head gesture feature points, first two-dimensional coordinates of the 6 head gesture feature points are respectively subtracted from two-dimensional coordinates of a trunk vertex (3) to obtain relative coordinates, and the obtained 6 relative coordinates are subjected to normal distribution normalization processing.
7. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the time sequence feature extraction unit, an eye feature vector and a head posture feature vector extracted from each video frame are spliced into a fusion feature vector, and a frame-level feature representation of one video is constructed.
8. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 7, wherein in the time sequence feature extraction unit, the fusion feature vector is used as an input of a time step of a TCN time sequence neural network, and an output of a last time step of the TCN time sequence neural network is transferred to a full connection layer and a softmax function to obtain a category prediction Y of the query video V V ' and a loss function L.
9. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein the step S2 specifically comprises the following steps:
s2.1, constructing a training video sample library in a server, and selecting a sample video fragment from the training video sample library as input of a participation degree behavior recognition model;
s2.2, executing the eye feature extraction unit by using a server, extracting eye feature vectors by using a training sample video segment through the eye feature extraction unit according to video frames as units, and storing the output eye feature vectors in a feature library;
s2.3, executing the gesture feature extraction unit by using a server, extracting head gesture feature vectors by using a training sample video segment through the gesture feature extraction unit according to video frames as units, and storing the output head gesture feature vectors in a feature library;
s2.4, executing the time sequence feature extraction unit by utilizing a server, splicing the eye feature vector and the head gesture feature vector of each video frame, and performing time sequence analysis type prediction as input of a TCN time sequence neural network to obtain the type prediction Y of the query video V V ' sum loss function L;
s2.5, performing end-to-end network training by using a server; the participation degree behavior recognition task loss function L is the distance between a predicted value and a true value, and the distance is minimized through a standard cross entropy loss function, namely, from the query video predicted category to the true category;
s2.6, optimizing an objective function by using a server, and acquiring a local optimal network parameter as a network weight of the participation degree behavior recognition model by using the loss function L in the step S2.5 as the objective function.
10. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein the step S3 specifically comprises the following steps:
s3.1, executing a test sample video segment generating unit by using a server, uniformly sampling all frames of one test sample video segment, and taking an obtained video frame set as input;
s3.2, performing participation behavior classification on the obtained video frame set by using the participation behavior recognition model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310634168.7A CN117497129A (en) | 2023-05-31 | 2023-05-31 | Game rehabilitation scene participation degree behavior recognition method based on vision |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310634168.7A CN117497129A (en) | 2023-05-31 | 2023-05-31 | Game rehabilitation scene participation degree behavior recognition method based on vision |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117497129A true CN117497129A (en) | 2024-02-02 |
Family
ID=89671339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310634168.7A Pending CN117497129A (en) | 2023-05-31 | 2023-05-31 | Game rehabilitation scene participation degree behavior recognition method based on vision |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117497129A (en) |
-
2023
- 2023-05-31 CN CN202310634168.7A patent/CN117497129A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021143353A1 (en) | Gesture information processing method and apparatus, electronic device, and storage medium | |
Yadav et al. | Real-time Yoga recognition using deep learning | |
EP4101371A1 (en) | Electroencephalogram signal classifying method and apparatus, electroencephalogram signal classifying model training method and apparatus, and medium | |
CN112861624A (en) | Human body posture detection method, system, storage medium, equipment and terminal | |
de San Roman et al. | Saliency Driven Object recognition in egocentric videos with deep CNN: toward application in assistance to Neuroprostheses | |
Loureiro et al. | Using a skeleton gait energy image for pathological gait classification | |
US20050105768A1 (en) | Manipulation of image data | |
CN111176447A (en) | Augmented reality eye movement interaction method fusing depth network and geometric model | |
CN112420141A (en) | Traditional Chinese medicine health assessment system and application thereof | |
CN110503636B (en) | Parameter adjustment method, focus prediction method, parameter adjustment device and electronic equipment | |
Wang et al. | A deep learning approach using attention mechanism and transfer learning for electromyographic hand gesture estimation | |
CN114424941A (en) | Fatigue detection model construction method, fatigue detection method, device and equipment | |
CN114420299A (en) | Eye movement test-based cognitive function screening method, system, equipment and medium | |
CN113974612A (en) | Automatic assessment method and system for upper limb movement function of stroke patient | |
CN110192860A (en) | A kind of the Brian Imaging intelligent test analyzing method and system of network-oriented information cognition | |
Wang | Simulation of sports movement training based on machine learning and brain-computer interface | |
CN115154828B (en) | Brain function remodeling method, system and equipment based on brain-computer interface technology | |
CN116645346A (en) | Processing method of rotator cuff scanning image, electronic equipment and storage medium | |
CN116543455A (en) | Method, equipment and medium for establishing parkinsonism gait damage assessment model and using same | |
CN117497129A (en) | Game rehabilitation scene participation degree behavior recognition method based on vision | |
CN115067934A (en) | Hand motion function analysis system based on machine intelligence | |
Wang et al. | Rehabilitation system for children with cerebral palsy based on body vector analysis and GMFM-66 standard | |
Ni et al. | A remote free-head pupillometry based on deep learning and binocular system | |
Zhao et al. | A Tongue Color Classification Method in TCM Based on Transfer Learning | |
Zheng et al. | Sports Biology Seminar of Three-dimensional Movement Characteristics of Yoga Standing Based on Image Recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |