CN117497129A - Game rehabilitation scene participation degree behavior recognition method based on vision - Google Patents

Game rehabilitation scene participation degree behavior recognition method based on vision Download PDF

Info

Publication number
CN117497129A
CN117497129A CN202310634168.7A CN202310634168A CN117497129A CN 117497129 A CN117497129 A CN 117497129A CN 202310634168 A CN202310634168 A CN 202310634168A CN 117497129 A CN117497129 A CN 117497129A
Authority
CN
China
Prior art keywords
feature
behavior recognition
participation degree
extraction unit
feature extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310634168.7A
Other languages
Chinese (zh)
Inventor
谢龙汉
林旭杰
陈彦
谢沛民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202310634168.7A priority Critical patent/CN117497129A/en
Publication of CN117497129A publication Critical patent/CN117497129A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/30ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to physical therapies or activities, e.g. physiotherapy, acupressure or exercising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/197Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/70Multimodal biometrics, e.g. combining information from different biometric modalities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Physical Education & Sports Medicine (AREA)
  • Ophthalmology & Optometry (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a game rehabilitation scene participation degree behavior recognition method based on vision. The method comprises the following steps: constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit; training the participation behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model; and identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model. The method is simple and effective, has high accuracy, and can provide supervision and feedback of rehabilitation training for patients in real time.

Description

Game rehabilitation scene participation degree behavior recognition method based on vision
Technical Field
The invention relates to the fields of rehabilitation medicine and computer vision, in particular to a game rehabilitation scene participation degree behavior identification method based on vision.
Background
Investigation has shown that stroke is the leading cause of disability in adults. A plurality of clinical researches show that the rehabilitation training is an effective way for improving the movement function of stroke patients and promoting rehabilitation. Rehabilitation participation is defined as a state in which it is stimulated and actively strives to participate in rehabilitation training. It can reflect the patient's attitude to rehabilitation, understanding the task requirements, and the effectiveness of the entire training process. Previous studies have shown that high patient involvement is an important factor in promoting nerve recombination. Even if the patient does not have the ability to perform an action, the willingness to exercise actively is necessary for rehabilitation. However, the repeatability of rehabilitation exercises tends to be boring and tiring for the patient. In fact, patients often exhibit low levels of participation during rehabilitation training, resulting in poor rehabilitation performance and safety hazards. In traditional rehabilitation training, supervision and correction by therapists can reduce the impact of low patient engagement. However, there is a shortage of rehabilitation physicians and patients often perform training tasks without supervision and correction. Therefore, it is very important to evaluate the participation of the patient in rehabilitation training, which is beneficial to the evaluation of rehabilitation results and the adjustment of training tasks.
Researchers have developed a number of methods of assessing patient engagement, including mainly scoring-based scales and physiological signal-based methods. The evaluation method adopting the scale requires a rehabilitation doctor to observe and score according to the participation indexes of the patient in the rehabilitation training process. This approach inevitably introduces subjective judgment from the physician and increases the workload of the rehabilitation physician. Physiological signals generated during rehabilitation training are also often used to assess patient engagement.
In the chinese published patent application CN105054927a, zhang Jinhua et al provides a method for quantitatively evaluating the degree of active participation in a lower limb rehabilitation system by detecting EEG signals and EMG signals of a patient in real time to calculate the degree of active participation in the lower limb rehabilitation system, which is disadvantageous in that the acquisition of physiological signals depends on wearable sensors, is inconvenient to wear, and is liable to cause discomfort to the patient in rehabilitation training.
The low engagement behavior of stroke patients in virtual gaming rehabilitation training is often accompanied by many facial and posture behaviors. Inspired by this finding, the present invention seeks to capture changes in facial behavior, such as eyelid movement, pupil movement, eye opening, head posture and facial fatigue expression, using visual techniques to intuitively detect subjective participation by a patient. In contrast to physiological signal-based methods, vision-based methods can be evaluated in a non-invasive manner and do not affect the patient's rehabilitation training process. However, there has been no related study in which vision techniques were applied to detect engagement in stroke patients.
The matters in the background section are only those known to the public and do not represent prior art in the field.
Disclosure of Invention
Aiming at the technical defects existing in the prior art, the invention provides a game rehabilitation scene participation degree behavior recognition method based on vision, which aims at applying the vision technology to detect the participation degree of a stroke patient, and realizes the extraction of participation degree characteristics in a vision image and the recognition of participation degree behaviors by using a deep learning model.
The object of the invention is achieved by at least one of the following technical solutions.
A game rehabilitation scene participation degree behavior recognition method based on vision comprises the following steps:
s1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit;
the eye feature extraction unit is used for obtaining eye feature vectors according to the intercepted eye images in the original video frames; the gesture feature extraction unit is used for extracting feature points related to the head gesture from the original image and carrying out normalization operation so as to obtain a head gesture feature vector; in the time sequence feature extraction unit, eye feature vectors and head gesture feature vectors extracted from each video frame are spliced into fusion feature vectors, and a fusion feature vector group of one video is input into a time sequence neural network to obtain the classification of participation degree;
s2, training a participation behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model;
s3, identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model.
Further, in order to unify the ocular feature extraction unit input, the ocular images cut out from the original video frames are unified in size.
Further, the eye feature extraction unit comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer and a full connection layer which are connected in sequence;
the method comprises the steps of intercepting an eye image from an original video frame to serve as input, and extracting features through a first convolution layer to obtain a first feature map;
the first feature map output by the first convolution layer is input into the first pooling layer for feature dimension reduction, and a second feature map is obtained;
the second feature map output by the first pooling layer is input into a second convolution layer for further feature extraction to obtain a third feature map;
the third feature map output by the second convolution layer is input into the second pooling layer for feature dimension reduction, and a fourth feature map is obtained;
the fourth feature map output by the second pooling layer is input into a third convolution layer for further feature extraction to obtain a fifth feature map;
and inputting the fifth feature map into the full-connection layer for feature dimension reduction to obtain an output vector.
Further, in the gesture feature extraction unit, feature points related to the head gesture are extracted from the original image, wherein the feature points comprise a left shoulder point, a right shoulder point, a trunk vertex, a nose tip point, a left eye positioning point and a right eye positioning point which are two-dimensional coordinates;
the trunk top point is respectively connected with the left shoulder point, the right shoulder point and the nose tip point, and the nose tip point is respectively connected with the left eye positioning point and the right eye positioning point.
Further, in the gesture feature extraction unit, for the normalization operation of the 6 head gesture feature points, first, the two-dimensional coordinates of the trunk vertex are subtracted from the two-dimensional coordinates of the 6 head gesture feature points to obtain relative coordinates, and the obtained 6 relative coordinates are subjected to normal distribution normalization processing.
Further, in the time sequence feature extraction unit, the eye feature vector and the head gesture feature vector extracted from each video frame are spliced into a fusion feature vector, and a frame-level feature representation of one video is constructed.
Further, in the time sequence feature extraction unit, the fusion feature vector is used as the input of the time step of the TCN time sequence neural network, and the output of the last time step of the TCN time sequence neural network is transmitted to a full connection layer and a softmax function to obtain the category prediction Y 'of the query video V' V And a loss function L.
Further, the step S2 specifically includes the following steps:
s2.1, constructing a training video sample library in a server, and selecting a sample video fragment from the training video sample library as input of a participation degree behavior recognition model;
s2.2, executing the eye feature extraction unit by using a server, extracting eye feature vectors by using a training sample video segment through the eye feature extraction unit according to video frames as units, and storing the output eye feature vectors in a feature library;
s2.3, executing the gesture feature extraction unit by using a server, extracting head gesture feature vectors by using a training sample video segment through the gesture feature extraction unit according to video frames as units, and storing the output head gesture feature vectors in a feature library;
s2.4, executing the time sequence feature extraction unit by utilizing a server, splicing the eye feature vector and the head gesture feature vector of each video frame, and performing time sequence analysis type prediction as input of a TCN time sequence neural network to obtain the type prediction Y 'of the query video V' V And a loss function L;
s2.5, performing end-to-end network training by using a server; the participation degree behavior recognition task loss function L is the distance between a predicted value and a true value, and the distance is minimized through a standard cross entropy loss function, namely, from the query video predicted category to the true category;
s2.6, optimizing an objective function by using a server, and acquiring a local optimal network parameter as a network weight of the participation degree behavior recognition model by using the loss function L in the step S2.5 as the objective function.
Further, the step S3 specifically includes the following steps:
s3.1, executing a test sample video segment generating unit by using a server, uniformly sampling all frames of one test sample video segment, and taking an obtained video frame set as input;
s3.2, performing participation behavior classification on the obtained video frame set by using the participation behavior recognition model.
Compared with the prior art, the invention has the following advantages and technical effects:
the game rehabilitation scene participation degree behavior recognition method based on vision is provided, eye features and gesture features are fused to realize recognition of participation degree behaviors: an eye feature extraction unit is provided to automatically extract advanced spatial features of an eye image; a pose feature extraction unit is proposed to extract head space pose features; a TCN time sequence feature extraction unit is introduced, time sequence modeling is carried out based on the extracted eye and gesture space features, the identification of participation degree behaviors is realized, and excellent identification accuracy is obtained; the vision-based participation behavior detection method is convenient and quick, does not need wearing articles and complex experimental settings, and is favorable for better supervision and feedback for rehabilitation training of patients.
Drawings
FIG. 1 is a diagram of an eye feature extraction unit constructed in an embodiment of the invention;
FIG. 2 is a schematic view of extracted head pose feature points according to an embodiment of the present invention;
fig. 3 is an algorithm frame diagram of a vision-based game rehabilitation scene participation degree behavior recognition method in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Aiming at the problems and the shortcomings in the prior art, the invention provides a game rehabilitation scene participation degree behavior recognition method based on vision, which mainly comprises five stages of steps of constructing an eye feature extraction unit, constructing a gesture feature extraction unit, constructing a time sequence feature extraction unit, model training and model deducing.
Examples:
a game rehabilitation scene participation degree behavior recognition method based on vision comprises the following steps:
s1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model;
s2, training a participation degree behavior recognition model;
s3, identifying the participation degree behavior in a new game rehabilitation scene by using the participation degree behavior identification model;
each step is described in detail below.
S1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit;
the eye feature extraction unit is used for obtaining an eye feature vector according to an eye image intercepted from an original video frame, as shown in fig. 1, which is an eye feature extraction unit diagram constructed by the invention, and an eye image is intercepted from the original video frame by using a dlib face key point detection tool, and an eye feature extraction model based on a convolutional neural network is designed to obtain an 84-dimensional eye feature vector; the gesture feature extraction unit is used for extracting 6 head gesture related feature points from an original image by using an openpost human body key point detection tool box, and performing normalization operation to obtain a 12-dimensional head gesture feature vector; in the time sequence feature extraction unit, eye feature vectors and head gesture feature vectors extracted from each video frame are spliced into fusion feature vectors, and a fusion feature vector group of one video is input into a time sequence neural network to obtain the classification of participation degree. The method comprises the following steps:
in one embodiment, to unify the ocular feature extraction unit inputs, the captured ocular images from the original video frames are unified to a size of 32 x 3.
The eye feature extraction unit comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer and a full connection layer which are sequentially connected;
the method comprises the steps of intercepting an eye image from an original video frame, and inputting the eye image into a first convolution layer for feature extraction; the first convolution layer adopts 32 convolution kernels of 5×5×3, and is not filled with 0, and the step size is 1; the convolution kernel obtains a characteristic value through convolution operation;
in one embodiment, the 28×28×32 first feature map output by the first convolution layer is input into the first pooling layer for feature dimension reduction; a pooling core with the size of 2 multiplied by 2 and the step length of 2 is adopted in the first pooling layer; in the first pooling layer, for inputAll elements in the 2 x 2 region of the first feature map are summed and then multiplied by a trainable first coefficient w 1 Add a first bias term b 1 Finally, obtaining the output of the first pooling layer through a sigmoid function, wherein the sigmoid function is a nonlinear activation function, and specifically comprises the following steps:
in one embodiment, the 14×14×32 second feature map output by the first pooling layer is input into the second convolution layer for further feature extraction; the second convolution layer adopts 64 convolution kernels of 5 multiplied by 3, 0 is not used for filling, the step length is 1, and a third characteristic diagram of 10 multiplied by 64 is obtained through convolution calculation;
in one embodiment, the third feature map is input into the second pooling layer for feature dimension reduction; a pooling core with the size of 2 multiplied by 2 and the step length of 2 is adopted in the second pooling layer; in the second pooling layer, all elements in the 2×2 region of the input third feature map are summed first, followed by multiplication with a trainable second coefficient w 2 Plus a second bias term b 2 Finally, obtaining the output of the second pooling layer through a sigmoid function;
in one embodiment, the 5×5×64 fourth feature map output by the second pooling layer is input into the third convolution layer for further feature extraction; 128 convolution kernels of 5 multiplied by 3 are adopted in the third convolution layer, 0 filling is not used, the step length is 1, and a fifth characteristic diagram of 1 multiplied by 128 is obtained through convolution calculation;
in one embodiment, the fifth feature map is input into the fully connected layer for feature dimension reduction; the full connection layer comprises 84 nodes; in the fully connected layer, the input vector is multiplied by a trainable third coefficient w 3 Add a third coefficient bias term b 3 Finally, an 84-dimensional output is obtained through a sigmoid function.
In one embodiment, in the gesture feature extraction unit, as shown in fig. 2, the schematic diagrams of 6 head gesture feature points extracted from the original image are respectively: the left shoulder point 1, the right shoulder point 2, the trunk vertex 3, the nose tip point 4, the left eye positioning point 5 and the right eye positioning point 6 are two-dimensional coordinates;
the trunk vertex 3 is respectively connected with the left shoulder point 1, the right shoulder point 2 and the nose tip point 4, and the nose tip point 4 is respectively connected with the left eye positioning point 5 and the right eye positioning point 6.
In one embodiment, in the gesture feature extraction unit, for the normalization operation of 6 head gesture feature points, first, two-dimensional coordinates of the torso vertex 3 are subtracted from two-dimensional coordinates of the 6 head gesture feature points to obtain relative coordinates, and the obtained 6 relative coordinates are subjected to normal distribution normalization processing, which specifically includes:
wherein μ and σ are the mean and standard deviation of the data, respectively; n is the number of data samples; p (P) k Is the original input vector of the kth data sample; p'. k Is the normalized input vector for the kth data sample.
In the time sequence feature extraction unit, 84-dimensional eye feature vectors and 12-dimensional head gesture feature vectors extracted from each video frame are spliced into 96-dimensional fusion feature vectors, and a frame-level feature representation of one video is constructed.
In the time sequence feature extraction unit, the fusion feature vector is used as the input of the time step of the TCN time sequence neural network, and the output of the last time step of the TCN time sequence neural network is transmitted to a full connection layer and a softmax function to obtain the category prediction Y 'of the query video V' V And a loss function L.
Step S2, training the participation degree behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model; in one embodiment, as shown in fig. 3, an algorithm framework diagram of a vision-based game rehabilitation scene participation behavior recognition method is specifically implemented as follows:
s2.1, constructing a training video sample library in a server, and selecting a sample video fragment from the training video sample library as input of a participation degree behavior recognition model;
s2.2, executing the eye feature extraction unit by using a server, extracting eye feature vectors by using a training sample video segment through the eye feature extraction unit according to video frames as units, and storing the output eye feature vectors in a feature library;
s2.3, executing the gesture feature extraction unit by using a server, extracting head gesture feature vectors by using a training sample video segment through the gesture feature extraction unit according to video frames as units, and storing the output head gesture feature vectors in a feature library;
s2.4, executing the time sequence feature extraction unit by utilizing a server, splicing the eye feature vector and the head gesture feature vector of each video frame, and performing time sequence analysis type prediction as input of a TCN time sequence neural network to obtain the type prediction Y 'of the query video V' V And a loss function L;
s2.5, performing end-to-end network training by using a server; the participation degree behavior recognition task loss function L is the distance between a predicted value and a true value, and the distance is minimized through a standard cross entropy loss function, namely, from the query video predicted category to the true category;
s2.6, optimizing an objective function by using a server, and acquiring a local optimal network parameter as a network weight of the participation degree behavior recognition model by using the loss function L in the step S2.5 as the objective function.
And S5, identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model. The specific implementation process is as follows:
s3.1, executing a test sample video segment generating unit by using a server, uniformly sampling all frames of one test sample video segment, and taking an obtained video frame set as input;
s3.2, performing participation behavior classification on the obtained video frame set by using the participation behavior recognition model.
The preferred embodiment of the invention provides a game rehabilitation scene participation degree behavior recognition method based on vision, which does not need complex wearing equipment and experimental setting operation, has usability and accuracy, and can be used for helping rehabilitation doctors to obtain feedback in time and providing proper training prescriptions for patients.
While the invention has been described with reference to specific embodiments, it is to be understood that these embodiments are merely illustrative of the principles and applications of the present invention. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and that other arrangements may be devised without departing from the spirit and scope of the present invention as defined by the appended claims. It is to be understood that the features described in the different dependent claims and in the invention may be combined in ways different from those described in the original claims. It is also to be understood that features described in connection with separate embodiments may be used in other described embodiments.

Claims (10)

1. The game rehabilitation scene participation degree behavior recognition method based on vision is characterized by comprising the following steps of:
s1, constructing a vision-based game rehabilitation scene participation degree behavior recognition model, wherein the participation degree behavior recognition model comprises an eye feature extraction unit, a gesture feature extraction unit and a time sequence feature extraction unit;
s2, training a participation behavior recognition model: training the participation degree behavior recognition model by utilizing a server in a training sample video data set collected in a game rehabilitation scene, and optimizing parameters of the participation degree behavior recognition model by reducing a network loss function until the participation degree behavior recognition model converges to obtain a trained participation degree behavior recognition model;
s3, identifying the participation degree behavior in the new game rehabilitation scene by using the participation degree behavior identification model.
2. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the participation degree behavior recognition model, an eye feature extraction unit is used for obtaining eye feature vectors according to an eye image intercepted from an original video frame; the gesture feature extraction unit is used for extracting feature points related to the head gesture from the original image and carrying out normalization operation so as to obtain a head gesture feature vector; in the time sequence feature extraction unit, eye feature vectors and head gesture feature vectors extracted from each video frame are spliced into fusion feature vectors, and a fusion feature vector group of one video is input into a time sequence neural network to obtain the classification of participation degree.
3. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein in order to unify the ocular feature extraction unit input, the ocular images cut out from the original video frames are unified in size.
4. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein the eye feature extraction unit comprises a first convolution layer, a first pooling layer, a second convolution layer, a second pooling layer, a third convolution layer and a full connection layer which are sequentially connected;
the method comprises the steps of intercepting an eye image from an original video frame to serve as input, and extracting features through a first convolution layer to obtain a first feature map;
the first feature map output by the first convolution layer is input into the first pooling layer for feature dimension reduction, and a second feature map is obtained;
the second feature map output by the first pooling layer is input into a second convolution layer for further feature extraction to obtain a third feature map;
the third feature map output by the second convolution layer is input into the second pooling layer for feature dimension reduction, and a fourth feature map is obtained;
the fourth feature map output by the second pooling layer is input into a third convolution layer for further feature extraction to obtain a fifth feature map;
and inputting the fifth feature map into the full-connection layer for feature dimension reduction to obtain an output vector.
5. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the gesture feature extraction unit, feature points related to head gestures extracted from an original image comprise a left shoulder point (1), a right shoulder point (2), a trunk vertex (3), a nose tip point (4), a left eye positioning point (5) and a right eye positioning point (6), which are two-dimensional coordinates;
the trunk vertex (3) is respectively connected with the left shoulder point (1), the right shoulder point (2) and the nose tip point (4), and the nose tip point (4) is respectively connected with the left eye positioning point (5) and the right eye positioning point (6).
6. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the gesture feature extraction unit, normalization operation is performed on 6 head gesture feature points, first two-dimensional coordinates of the 6 head gesture feature points are respectively subtracted from two-dimensional coordinates of a trunk vertex (3) to obtain relative coordinates, and the obtained 6 relative coordinates are subjected to normal distribution normalization processing.
7. The vision-based game rehabilitation scene participation degree behavior recognition method according to claim 1, wherein in the time sequence feature extraction unit, an eye feature vector and a head posture feature vector extracted from each video frame are spliced into a fusion feature vector, and a frame-level feature representation of one video is constructed.
8. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 7, wherein in the time sequence feature extraction unit, the fusion feature vector is used as an input of a time step of a TCN time sequence neural network, and an output of a last time step of the TCN time sequence neural network is transferred to a full connection layer and a softmax function to obtain a category prediction Y of the query video V V ' and a loss function L.
9. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein the step S2 specifically comprises the following steps:
s2.1, constructing a training video sample library in a server, and selecting a sample video fragment from the training video sample library as input of a participation degree behavior recognition model;
s2.2, executing the eye feature extraction unit by using a server, extracting eye feature vectors by using a training sample video segment through the eye feature extraction unit according to video frames as units, and storing the output eye feature vectors in a feature library;
s2.3, executing the gesture feature extraction unit by using a server, extracting head gesture feature vectors by using a training sample video segment through the gesture feature extraction unit according to video frames as units, and storing the output head gesture feature vectors in a feature library;
s2.4, executing the time sequence feature extraction unit by utilizing a server, splicing the eye feature vector and the head gesture feature vector of each video frame, and performing time sequence analysis type prediction as input of a TCN time sequence neural network to obtain the type prediction Y of the query video V V ' sum loss function L;
s2.5, performing end-to-end network training by using a server; the participation degree behavior recognition task loss function L is the distance between a predicted value and a true value, and the distance is minimized through a standard cross entropy loss function, namely, from the query video predicted category to the true category;
s2.6, optimizing an objective function by using a server, and acquiring a local optimal network parameter as a network weight of the participation degree behavior recognition model by using the loss function L in the step S2.5 as the objective function.
10. The vision-based game rehabilitation scenario engagement behavior recognition method according to claim 1, wherein the step S3 specifically comprises the following steps:
s3.1, executing a test sample video segment generating unit by using a server, uniformly sampling all frames of one test sample video segment, and taking an obtained video frame set as input;
s3.2, performing participation behavior classification on the obtained video frame set by using the participation behavior recognition model.
CN202310634168.7A 2023-05-31 2023-05-31 Game rehabilitation scene participation degree behavior recognition method based on vision Pending CN117497129A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310634168.7A CN117497129A (en) 2023-05-31 2023-05-31 Game rehabilitation scene participation degree behavior recognition method based on vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310634168.7A CN117497129A (en) 2023-05-31 2023-05-31 Game rehabilitation scene participation degree behavior recognition method based on vision

Publications (1)

Publication Number Publication Date
CN117497129A true CN117497129A (en) 2024-02-02

Family

ID=89671339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310634168.7A Pending CN117497129A (en) 2023-05-31 2023-05-31 Game rehabilitation scene participation degree behavior recognition method based on vision

Country Status (1)

Country Link
CN (1) CN117497129A (en)

Similar Documents

Publication Publication Date Title
WO2021143353A1 (en) Gesture information processing method and apparatus, electronic device, and storage medium
Yadav et al. Real-time Yoga recognition using deep learning
EP4101371A1 (en) Electroencephalogram signal classifying method and apparatus, electroencephalogram signal classifying model training method and apparatus, and medium
CN112861624A (en) Human body posture detection method, system, storage medium, equipment and terminal
de San Roman et al. Saliency Driven Object recognition in egocentric videos with deep CNN: toward application in assistance to Neuroprostheses
Loureiro et al. Using a skeleton gait energy image for pathological gait classification
US20050105768A1 (en) Manipulation of image data
CN111176447A (en) Augmented reality eye movement interaction method fusing depth network and geometric model
CN112420141A (en) Traditional Chinese medicine health assessment system and application thereof
CN110503636B (en) Parameter adjustment method, focus prediction method, parameter adjustment device and electronic equipment
Wang et al. A deep learning approach using attention mechanism and transfer learning for electromyographic hand gesture estimation
CN114424941A (en) Fatigue detection model construction method, fatigue detection method, device and equipment
CN114420299A (en) Eye movement test-based cognitive function screening method, system, equipment and medium
CN113974612A (en) Automatic assessment method and system for upper limb movement function of stroke patient
CN110192860A (en) A kind of the Brian Imaging intelligent test analyzing method and system of network-oriented information cognition
Wang Simulation of sports movement training based on machine learning and brain-computer interface
CN115154828B (en) Brain function remodeling method, system and equipment based on brain-computer interface technology
CN116645346A (en) Processing method of rotator cuff scanning image, electronic equipment and storage medium
CN116543455A (en) Method, equipment and medium for establishing parkinsonism gait damage assessment model and using same
CN117497129A (en) Game rehabilitation scene participation degree behavior recognition method based on vision
CN115067934A (en) Hand motion function analysis system based on machine intelligence
Wang et al. Rehabilitation system for children with cerebral palsy based on body vector analysis and GMFM-66 standard
Ni et al. A remote free-head pupillometry based on deep learning and binocular system
Zhao et al. A Tongue Color Classification Method in TCM Based on Transfer Learning
Zheng et al. Sports Biology Seminar of Three-dimensional Movement Characteristics of Yoga Standing Based on Image Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination