CN117542121A - Computer vision-based intelligent training and checking system and method - Google Patents

Computer vision-based intelligent training and checking system and method Download PDF

Info

Publication number
CN117542121A
CN117542121A CN202311663184.5A CN202311663184A CN117542121A CN 117542121 A CN117542121 A CN 117542121A CN 202311663184 A CN202311663184 A CN 202311663184A CN 117542121 A CN117542121 A CN 117542121A
Authority
CN
China
Prior art keywords
action
training
semantic
feature
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311663184.5A
Other languages
Chinese (zh)
Inventor
黄冀周
王亚利
李佳音
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hebei Shuangxue Education Technology Co ltd
Original Assignee
Hebei Shuangxue Education Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hebei Shuangxue Education Technology Co ltd filed Critical Hebei Shuangxue Education Technology Co ltd
Priority to CN202311663184.5A priority Critical patent/CN117542121A/en
Publication of CN117542121A publication Critical patent/CN117542121A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/20Education
    • G06Q50/205Education administration or guidance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Tourism & Hospitality (AREA)
  • Educational Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Strategic Management (AREA)
  • General Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • General Business, Economics & Management (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent training and checking system and method based on computer vision, which are used for acquiring training action state monitoring images of a first training action of an object to be checked, which are acquired by a camera; acquiring a reference motion image of the first training motion; extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image; constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector. Thus, the recognition of the action type and the state of the trainer can be realized, and whether the current training action of the object to be checked is standard or not can be intelligently judged.

Description

Computer vision-based intelligent training and checking system and method
Technical Field
The invention relates to the technical field of intelligent training and checking, in particular to an intelligent training and checking system and method based on computer vision.
Background
In the fields of physical training, fitness teaching and the like, it is important to accurately evaluate actions of a trainer. Accurate action assessment can help the trainer to understand own action quality, advantages and deficiencies, thereby improving training results and avoiding injury. Accurate action assessment can also help coaches formulate appropriate training plans and coaching methods to accommodate the needs and levels of different trainers. However, conventional assessment methods often require human involvement, are time consuming and laborious and are susceptible to subjective factors.
Computer vision is a discipline that studies how machines can obtain high-level semantic information from images or videos, and it involves multiple fields of image processing, pattern recognition, artificial intelligence, etc. Computer vision plays an important role in many application scenarios, such as face recognition, autopilot, medical image analysis, etc. The development and application of the computer vision technology provide a new idea for constructing a training assessment and evaluation method.
Therefore, an intelligent training and assessment system and method based on computer vision are desired.
Disclosure of Invention
The embodiment of the invention provides an intelligent training and checking system and method based on computer vision, which are used for acquiring training action state monitoring images of a first training action of an object to be checked, which are acquired by a camera; acquiring a reference motion image of the first training motion; extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image; constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector. Thus, the recognition of the action type and the state of the trainer can be realized, and whether the current training action of the object to be checked is standard or not can be intelligently judged.
The embodiment of the invention also provides an intelligent training and checking method based on computer vision, which comprises the following steps:
acquiring a training action state monitoring image of a first training action of an object to be checked, which is acquired by a camera;
acquiring a reference motion image of the first training motion;
extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image;
constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and
and determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
The embodiment of the invention also provides an intelligent training and checking system based on computer vision, which comprises the following steps:
the training action state monitoring image acquisition module is used for acquiring a training action state monitoring image of a first training action of the object to be checked, which is acquired by the camera;
the reference motion image acquisition module is used for acquiring a reference motion image of the first training motion;
the action semantic feature extraction module is used for extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image;
the semantic comparison feature construction module is used for constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and
and the first training action judging module is used for determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:
fig. 1 is a flowchart of an intelligent training and checking method based on computer vision provided in an embodiment of the present invention.
Fig. 2 is a schematic diagram of a system architecture of an intelligent training and checking method based on computer vision according to an embodiment of the present invention.
FIG. 3 is a block diagram of an intelligent training and assessment system based on computer vision according to an embodiment of the present invention.
Fig. 4 is an application scenario diagram of an intelligent training and checking method based on computer vision provided in an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.
Unless defined otherwise, all technical and scientific terms used in the examples of this application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.
In the description of the embodiments of the present application, unless otherwise indicated and defined, the term "connected" should be construed broadly, and for example, may be an electrical connection, may be a communication between two elements, may be a direct connection, or may be an indirect connection via an intermediary, and it will be understood by those skilled in the art that the specific meaning of the term may be understood according to the specific circumstances.
It should be noted that, the term "first\second\third" in the embodiments of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that the embodiments of the present application described herein may be implemented in sequences other than those illustrated or described herein.
Accurate action assessment refers to objective, comprehensive and accurate assessment of the posture, skill and manner of execution of a trainer while performing various exercises and actions. Through action evaluation, a trainer and a coach can know action quality, advantages and disadvantages of the trainer, and accordingly, a corresponding training plan and a guiding method are formulated. The motion assessment can be applied to various sports and training fields including sports, fitness training, rehabilitation therapy, and the like.
The action evaluation can help the trainer and the coach to know how the quality of the action meets the correct technical requirements, and by evaluating the performances of the action in aspects of posture, stability, strength output and the like, whether the action is correctly executed can be determined, and the wrong action execution mode can be timely found and corrected.
The action evaluation can help the trainer to know the advantages and the disadvantages of the trainer in different actions, and through evaluating various aspects of the actions, such as flexibility, strength, coordination and the like, the trainer can be found to perform excellently in some aspects, and the trainer can have the disadvantages in other aspects, thereby being beneficial to formulating a targeted training plan, further developing and improving the advantages of the trainer and improving the disadvantages.
Accurate motion assessment can help trainers and coaches find potential athletic risks and bad motion habits. By evaluating performance in terms of stability of motion, joint angle, body posture, etc., problems that may lead to injury can be identified and measures can be taken in time to make adjustments and improvements to reduce the risk of injury.
The action evaluation can provide basis for the coach to develop a personalized training plan. By knowing the action capability and characteristics of the trainers, the trainer can formulate corresponding training plans and guiding methods according to the requirements and levels of different trainers so as to achieve the optimal training effect.
Conventional action assessment methods generally require manual participation and mainly rely on observation and judgment by trainers and coaches, and have certain limitations, including time and effort consumption, high subjectivity, influence by experience and ability level of the assessors and the assessors, and the like. Observational assessment is one of the most common methods of action assessment, whereby a coach or evaluator assesses his quality and skill by directly observing the actions performed by a trainer. The observer needs to pay attention to the performance of the trainer in terms of posture, fluency of movement, strength output, etc., and evaluate the accuracy and effect of the movement according to experience and expertise. However, this method is susceptible to subjective factors, and different evaluators may have different observation and judgment criteria, resulting in inconsistent evaluation results.
Video analysis this method records the progress of a trainer's performance of an action using a camera device, and then evaluates the quality of the action by playing back and analyzing the video. Video analysis may provide more detail and angle to enable an evaluator to more carefully observe and analyze the performance of actions. However, this approach still requires subjective judgment by an evaluator, and may take a lot of time and effort to analyze and compare video data.
Some conventional motion assessment methods use sensor devices to measure and record motion data of a trainer, such as joint angles, force outputs, etc., which may be Inertial Measurement Units (IMUs), pressure sensors, resistance sensors, etc. By collecting and analyzing sensor data, an evaluator may obtain more objective action evaluation results. However, this approach requires specialized equipment and technical support, and in practice there may be some limitations such as accuracy and wearability of the sensor.
Computer vision is a discipline that studies how machines can understand and interpret content in images or videos, combining techniques and methods in many areas, such as image processing, pattern recognition, machine learning, and artificial intelligence, to enable machines to extract meaningful high-level semantic information from visual data. The development and application of computer vision technology has important roles in many areas, including but not limited to the following: computer vision can help machines automatically detect and identify specific target objects, such as faces, vehicles, objects, etc., in images or videos, which have wide application in the fields of face recognition, intelligent monitoring, automatic driving, etc. Computer vision can divide an image into different regions and semantically analyze the regions to understand different parts of the image and their meanings, which is very important for tasks such as medical image analysis, image understanding, scene understanding, and the like.
The computer vision can help the machine to recognize and analyze the gesture and the motion of the human body, thereby realizing motion evaluation and motion recognition, and having potential application value in the fields of physical training, body-building guidance, sports medicine and the like. Computer vision can use generative models and enhancement techniques to generate realistic images, or to enhance and repair existing images, making sense for applications such as image synthesis, virtual reality, and image enhancement.
The development of computer vision technology provides a new idea for constructing training, assessment and evaluation methods. By using the computer vision technology, the image or video data can be compared with the previously established model, and the key characteristics of the action can be automatically extracted and analyzed, so that the accurate assessment of the action quality is realized.
In one embodiment of the present invention, fig. 1 is a flowchart of an intelligent training and checking method based on computer vision provided in the embodiment of the present invention. Fig. 2 is a schematic diagram of a system architecture of an intelligent training and checking method based on computer vision according to an embodiment of the present invention. As shown in fig. 1 and 2, the intelligent training and checking method based on computer vision according to the embodiment of the invention includes: 110, acquiring a training action state monitoring image of a first training action of an object to be checked, which is acquired by a camera; 120, obtaining a reference motion image of the first training motion; 130, extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image; 140, constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and 150, determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
In the step 110, a training motion state monitoring image of a first training motion of the object to be examined acquired by the camera is acquired. The camera can accurately acquire the image data of the object to be examined for executing the first training action, and the definition and the stability of the image are maintained. By acquiring the training action state monitoring image, the visual performance of the object to be checked when the action is executed can be obtained, and image data is provided for subsequent action evaluation.
In the step 120, a reference motion image of the first training motion is acquired. When the reference motion image is acquired, a professional motion presenter or a pre-recorded standard motion video can be used as a reference. The reference action image provides canonical action execution demonstration and can be used as a standard of an evaluation object for comparing and evaluating actions of the object to be checked.
In the step 130, the motion semantic features of the training motion state monitoring image and the reference motion image are extracted to obtain a training motion semantic feature map and a reference motion semantic feature map. Extraction of motion semantic features may use computer vision techniques, such as deep learning models or feature extraction algorithms, to capture and represent key features of a motion from an image. By extracting action semantic features of training actions and reference actions, image data can be converted into feature representations with more expressive and comparable properties, and a basis is provided for subsequent action evaluation.
In the step 140, semantic contrast features between the training action semantic feature map and the reference action semantic feature map are constructed to obtain a global semantic contrast feature vector. The construction of semantic contrast features may use feature matching, similarity calculation, or other comparison methods to measure similarity or variability between training actions and reference actions. By constructing the global semantic comparison feature vector, the degree of difference between the training action and the reference action can be quantified, and a basis is provided for subsequent action normalization evaluation.
In the step 150, it is determined whether the first training action of the object to be examined is normalized based on the global semantic comparison feature vector. And judging whether the first training action of the object to be checked meets the specification or not by utilizing the global semantic comparison feature vector according to the specific evaluation standard and the threshold value. Based on the evaluation result of the global semantic comparison feature vector, whether the first training action of the object to be checked is standard or not can be rapidly and objectively determined, and feedback and guidance are provided for subsequent training and adjustment.
Aiming at the technical problems, the technical conception of the method is to analyze and compare the training action state monitoring image of the object to be checked with the reference action image corresponding to the training action by utilizing the computer vision technology and the intelligent algorithm, thereby realizing the identification of the action type and the state of the trainer and intelligently judging whether the object to be checked is standard or not according to the current training action.
Based on the above, in the technical scheme of the application, firstly, a training action state monitoring image of a first training action of an object to be checked, which is acquired by a camera, is acquired; and acquiring a reference motion image of the first training motion. The training action state monitoring image of the first training action of the object to be checked can reflect the action process and gesture information actually executed by the object to be checked. The reference motion image of the first training motion is a standard and normative reference motion used for helping the model learn the gesture and the execution mode of the standard motion.
And then, the training action state monitoring image and the reference action image are passed through a double-coupling twin detection model comprising a first-stage action feature extractor and a second action feature extractor to obtain a training action semantic feature image and a reference action semantic feature image, wherein the first-stage action feature extractor and the second action feature extractor have the same network structure. That is, the motion semantic features in the training motion state monitoring image and the reference motion image are extracted using the dual-coupled twinning detection model including the first stage motion feature extractor and the second stage motion feature extractor. These action semantic features may include pose information, key point locations, motion trajectories, etc. important feature information for describing and representing actions.
In a specific embodiment of the present application, extracting the motion semantic features of the training motion state monitoring image and the reference motion image to obtain a training motion semantic feature map and a reference motion semantic feature map includes: and the training action state monitoring image and the reference action image are subjected to a double-coupling twin detection model comprising a first-stage action feature extractor and a second-stage action feature extractor to obtain the training action semantic feature map and the reference action semantic feature map.
In particular, in the dual-coupled twin detection model, the first-stage motion feature extractor and the second-stage motion feature extractor have the same network structure, and process training motion state monitoring images and reference motion images, respectively. The design mode can ensure that the training action and the reference action have the same processing procedure in the characteristic extraction stage, so that the extracted characteristics have comparability and consistency, and action semantic interference information caused by the difference of the models is avoided.
In one embodiment of the present application, constructing semantic contrast features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic contrast feature vector includes: performing feature distribution optimization on the training action semantic feature map and the reference action semantic feature map to obtain an optimized training action semantic feature map and an optimized reference action semantic feature map; the optimized training action semantic feature map and the optimized reference action semantic feature map are respectively passed through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map; and calculating local feature metric coefficients between each group of corresponding feature matrices along the channel dimension between the spatial visualization training action semantic feature map and the spatial visualization reference action semantic feature map to obtain a global semantic comparison feature vector composed of a plurality of local feature metric coefficients.
And then, respectively passing the optimized training action semantic feature map and the optimized reference action semantic feature map through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map. Here, the feature representation capability of different regions in the image can be enhanced by the operation of the spatial attention layer. For training and reference actions, some regions may be of particular importance to the performance and effect of the action, such as the hands, legs, etc. By means of the spatial attention layer, the characterizability of these areas can be improved, so that the training actions and the reference actions have a better differentiation over the critical areas.
In a specific embodiment of the present application, the step of passing the optimized training action semantic feature map and the optimized reference action semantic feature map through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map, includes: performing depth convolution coding on the optimized training action semantic feature map and the optimized reference action semantic feature map by using a convolution coding part of the spatial attention layer to obtain an optimized training action convolution feature map and an optimized reference action convolution feature map; inputting the optimized training action convolution feature diagram and the optimized reference action convolution feature diagram into a spatial attention portion of the spatial attention layer to obtain an optimized training action spatial attention diagram and an optimized reference action spatial attention diagram; the optimized training action space attention force diagram and the optimized reference action space attention force diagram are activated through a Softmax activation function to obtain an optimized training action space attention feature diagram and an optimized reference action space attention feature diagram; calculating the position-based point multiplication of the optimized training action space attention feature map and the optimized training action convolution feature map to obtain the space visualization training action semantic feature map; and calculating the position-wise point multiplication of the optimized reference motion space attention feature map and the optimized reference motion convolution feature map to obtain the space visualization reference motion semantic feature map.
Further, local feature metric coefficients between the spatial visualization training action semantic feature graphs and the spatial visualization reference action semantic feature graphs along the channel dimension corresponding to each set of feature matrices are calculated to obtain global semantic comparison feature vectors composed of a plurality of local feature metric coefficients. Here, by calculating the local feature metric coefficient between each group of corresponding feature matrices along the channel dimension between the spatial visualization training motion semantic feature map and the spatial visualization reference motion semantic feature map, the difference between the two feature matrices can be measured, so as to capture the subtle difference between the training motion and the reference motion of the object to be examined.
In a specific example of the application, the implementation manner of calculating the local feature metric coefficients between the spatial visualization training action semantic feature map and the corresponding feature matrices of each group along the channel dimension between the spatial visualization reference action semantic feature map to obtain the global semantic contrast feature vector composed of a plurality of local feature metric coefficients is to calculate the local feature metric coefficients between the corresponding feature matrices of each group along the channel dimension between the spatial visualization training action semantic feature map and the spatial visualization reference action semantic feature map by using the following local feature metric formula to obtain the global semantic contrast feature vector composed of a plurality of local feature metric coefficients; the local characteristic measurement formula is as follows:wherein (1)>Training the feature matrix of the action semantic feature map for the spatial visualization>Characteristic value of the location->In the feature matrix of the semantic feature map of the reference motion for the spatial visualization +.>Characteristic value of the location->And->For the height and width of the feature matrix, < > and>is->Local feature metric coefficient->A logarithmic function operation with a base of 2 is represented.
Here, the cross difference of the corresponding feature matrices of each group along the channel dimension between the spatial visualization training action semantic feature map and the spatial visualization reference action semantic feature map is represented by a local feature metric coefficient, namely, the difference of the feature matrix of the spatial visualization training action semantic feature map relative to the feature matrix of the spatial visualization reference action semantic feature map, and the difference of the feature matrix of the spatial visualization reference action semantic feature map relative to the feature matrix of the spatial visualization training action semantic feature map. The cross difference can extract the difference of the training motion state monitoring image of the first training motion of the object to be checked, which is acquired by the camera, and the reference motion image of the first training motion in the feature space, so that the high-dimensional implicit difference feature distribution between the training motion semantic feature distribution and the reference motion semantic feature distribution is represented.
And then, the global semantic comparison feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether a first training action of the object to be checked is standard or not.
In a specific embodiment of the present application, determining, based on the global semantic comparison feature vector, whether the first training action of the object to be examined is normalized includes: and the global semantic comparison feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first training action of the object to be checked is standard or not.
Specifically, the global semantic comparison feature vector is passed through a classifier to obtain a classification result, where the classification result is used to represent whether the first training action of the object to be examined is normalized, and the method includes: performing full-connection coding on the global semantic comparison feature vector by using a plurality of full-connection layers of the classifier to obtain a coding classification feature vector; and passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.
In the above technical solution, the training motion semantic feature map and the reference motion semantic feature map respectively express image semantic features of the training motion state monitoring image and the reference motion image, that is, they follow spatial distribution of image semantic features in feature matrix dimension and follow channel distribution of convolutional neural network of motion feature extractor in channel dimension, so that the spatial information attribute corresponding to spatial distribution of image semantic features of the training motion semantic feature map and the reference motion semantic feature map in overall distribution dimension is more remarkable when the training motion semantic feature map and the reference motion semantic feature map are respectively passed through spatial attention layer by strengthening local spatial distribution of image semantic features. In this way, if the spatial information expression effect of the training action semantic feature map and the reference action semantic feature map serving as high-dimensional features can be improved, the expression effect of the training action semantic feature map and the reference action semantic feature map can be improved, and the expression effect of the global semantic contrast feature vector can be further improved.
Based on this, the applicant of the present application optimizes the training action semantic feature map and the reference action semantic feature map, expressed as: performing feature distribution optimization on the training action semantic feature map and the reference action semantic feature map by using the following optimization formula to obtain an optimized training action semantic feature map and an optimized reference action semantic feature map; wherein, the optimization formula is:wherein (1)>Is the +.f. in the training action semantic feature map and the reference action semantic feature map>Characteristic value of the location->For local spatial partition coefficients, +.>Is the scale of the local neighborhood, +.>Is the +.f. in the training action semantic feature map and the reference action semantic feature map>Characteristic value of the location->Is the +.f in the optimized training action semantic feature graph and the optimized reference action semantic feature graph>Characteristic values of the location.
Specifically, with each feature mapAfter being unfoldedFor each of said feature maps, based on the local segmentation space in the Hilbert space>Local integration of the curved surface is performed on the feature manifold in the high-dimensional feature space, so that each feature map is corrected based on the local integration processing of the integration function>Phase transition discontinuity points of the feature manifold expressed by the non-stationary data sequence after local spatial expansion, thereby obtaining finer structure and geometric features of the feature manifold, and improving each feature map->And the spatial information expression effect in the high-dimensional feature space is improved, so that the expression effect of the global semantic comparison feature vector is improved, and the accuracy of a classification result obtained by the classifier is improved.
In summary, the computer vision-based intelligent training and checking method is explained, which utilizes a computer vision technology and an intelligent algorithm to analyze and compare a training action state monitoring image of an object to be checked with a reference action image of a corresponding training action, thereby realizing the identification of action types and states of a trainer and intelligently judging whether the current training action of the object to be checked is standard or not.
FIG. 3 is a block diagram of an intelligent training and assessment system based on computer vision according to an embodiment of the present invention. As shown in fig. 3, the intelligent training and checking system 200 based on computer vision includes: a training motion state monitoring image acquisition module 210, configured to acquire a training motion state monitoring image of a first training motion of an object to be examined, which is acquired by a camera; a reference motion image obtaining module 220, configured to obtain a reference motion image of the first training motion; the action semantic feature extraction module 230 is configured to extract action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature map and a reference action semantic feature map; the semantic comparison feature construction module 240 is configured to construct semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and a first training action judging module 250 for determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
In the computer vision-based intelligent training and checking system, the action semantic feature extraction module is used for: and the training action state monitoring image and the reference action image are subjected to a double-coupling twin detection model comprising a first-stage action feature extractor and a second-stage action feature extractor to obtain the training action semantic feature map and the reference action semantic feature map.
It will be appreciated by those skilled in the art that the specific operation of the steps in the above-described computer vision-based intelligent training assessment system has been described in detail in the above description of the computer vision-based intelligent training assessment method with reference to fig. 1 to 2, and thus, repetitive descriptions thereof will be omitted.
As described above, the computer vision-based intelligent training assessment system 200 according to the embodiment of the present invention may be implemented in various terminal devices, for example, a server or the like for computer vision-based intelligent training assessment. In one example, the computer vision-based intelligent training assessment system 200 according to an embodiment of the present invention may be integrated into a terminal device as a software module and/or hardware module. For example, the computer vision-based intelligent training assessment system 200 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the computer vision-based intelligent training assessment system 200 can also be one of a number of hardware modules of the terminal device.
Alternatively, in another example, the computer vision-based intelligent training assessment system 200 and the terminal device may be separate devices, and the computer vision-based intelligent training assessment system 200 may be connected to the terminal device via a wired and/or wireless network and communicate the interaction information in accordance with the agreed data format.
Fig. 4 is an application scenario diagram of an intelligent training and checking method based on computer vision provided in an embodiment of the present invention. As shown in fig. 4, in the application scenario, first, a training motion state monitoring image of a first training motion of an object to be examined acquired by a camera is acquired (e.g., C1 as illustrated in fig. 4); and, acquiring a reference motion image (e.g., C2 as illustrated in fig. 4) of the first training motion; the acquired training motion state monitoring image and reference motion image are then input into a server (e.g., S as illustrated in fig. 4) deployed with a computer vision-based intelligent training assessment algorithm, wherein the server is capable of processing the training motion state monitoring image and the reference motion image based on the computer vision-based intelligent training assessment algorithm to determine whether the first training motion of the subject to be assessed is normative.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. An intelligent training and checking method based on computer vision is characterized by comprising the following steps:
acquiring a training action state monitoring image of a first training action of an object to be checked, which is acquired by a camera;
acquiring a reference motion image of the first training motion;
extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image;
constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and
and determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
2. The computer vision-based intelligent training assessment method according to claim 1, wherein extracting the motion semantic features of the training motion state monitoring image and the reference motion image to obtain a training motion semantic feature map and a reference motion semantic feature map comprises:
and the training action state monitoring image and the reference action image are subjected to a double-coupling twin detection model comprising a first-stage action feature extractor and a second-stage action feature extractor to obtain the training action semantic feature map and the reference action semantic feature map.
3. The computer vision-based intelligent training assessment method of claim 2, wherein the first stage action feature extractor and the second action feature extractor have the same network structure.
4. The computer vision-based intelligent training assessment method according to claim 3, wherein constructing semantic contrast features between the training action semantic feature map and the reference action semantic feature map to obtain global semantic contrast feature vectors comprises:
performing feature distribution optimization on the training action semantic feature map and the reference action semantic feature map to obtain an optimized training action semantic feature map and an optimized reference action semantic feature map;
the optimized training action semantic feature map and the optimized reference action semantic feature map are respectively passed through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map; and
and calculating local feature metric coefficients between each group of corresponding feature matrixes along the channel dimension between the space visualization training action semantic feature graphs and the space visualization reference action semantic feature graphs to obtain a global semantic comparison feature vector composed of a plurality of local feature metric coefficients.
5. The computer vision-based intelligent training assessment method according to claim 4, wherein the step of passing the optimized training action semantic feature map and the optimized reference action semantic feature map through a spatial attention layer to obtain a spatial visualization training action semantic feature map and a spatial visualization reference action semantic feature map, respectively, comprises:
performing depth convolution coding on the optimized training action semantic feature map and the optimized reference action semantic feature map by using a convolution coding part of the spatial attention layer to obtain an optimized training action convolution feature map and an optimized reference action convolution feature map;
inputting the optimized training action convolution feature diagram and the optimized reference action convolution feature diagram into a spatial attention portion of the spatial attention layer to obtain an optimized training action spatial attention diagram and an optimized reference action spatial attention diagram;
the optimized training action space attention force diagram and the optimized reference action space attention force diagram are activated through a Softmax activation function to obtain an optimized training action space attention feature diagram and an optimized reference action space attention feature diagram; and
calculating the position-based point multiplication of the optimized training action space attention feature map and the optimized training action convolution feature map to obtain the space visualization training action semantic feature map; and calculating the position-wise point multiplication of the optimized reference motion space attention feature map and the optimized reference motion convolution feature map to obtain the space visualization reference motion semantic feature map.
6. The computer vision-based intelligent training assessment method of claim 5, wherein computing local feature metric coefficients between respective sets of corresponding feature matrices along a channel dimension between the spatially-visualized training action semantic feature map and the spatially-visualized reference action semantic feature map to obtain a global semantic contrast feature vector consisting of a plurality of local feature metric coefficients, comprises:
calculating local feature metric coefficients between each set of corresponding feature matrices along a channel dimension between the spatial visualization training action semantic feature map and the spatial visualization reference action semantic feature map by using the following local feature metric formula to obtain a global semantic comparison feature vector consisting of a plurality of local feature metric coefficients;
the local characteristic measurement formula is as follows:wherein (1)>Training the feature matrix of the action semantic feature map for the spatial visualization>Characteristic value of the location->In the feature matrix of the semantic feature map of the reference motion for the spatial visualization +.>Characteristic value of the location->And->For the height and width of the feature matrix, < > and>is->Local feature metric coefficient->A logarithmic function operation with a base of 2 is represented.
7. The computer vision-based intelligent training assessment method of claim 6, wherein determining whether the first training action of the object to be assessed is canonical based on the global semantic comparison feature vector comprises:
and the global semantic comparison feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the first training action of the object to be checked is standard or not.
8. The computer vision-based intelligent training assessment method according to claim 7, wherein the passing the global semantic comparison feature vector through a classifier to obtain a classification result, the classification result being used to represent whether the first training action of the object to be assessed is normalized, comprises:
performing full-connection coding on the global semantic comparison feature vector by using a plurality of full-connection layers of the classifier to obtain a coding classification feature vector; and
and the coding classification feature vector is passed through a Softmax classification function of the classifier to obtain the classification result.
9. An intelligent training and checking system based on computer vision, which is characterized by comprising:
the training action state monitoring image acquisition module is used for acquiring a training action state monitoring image of a first training action of the object to be checked, which is acquired by the camera;
the reference motion image acquisition module is used for acquiring a reference motion image of the first training motion;
the action semantic feature extraction module is used for extracting action semantic features of the training action state monitoring image and the reference action image to obtain a training action semantic feature image and a reference action semantic feature image;
the semantic comparison feature construction module is used for constructing semantic comparison features between the training action semantic feature map and the reference action semantic feature map to obtain a global semantic comparison feature vector; and
and the first training action judging module is used for determining whether the first training action of the object to be checked is standard or not based on the global semantic comparison feature vector.
10. The computer vision-based intelligent training assessment system of claim 9, wherein the action semantic feature extraction module is configured to:
and the training action state monitoring image and the reference action image are subjected to a double-coupling twin detection model comprising a first-stage action feature extractor and a second-stage action feature extractor to obtain the training action semantic feature map and the reference action semantic feature map.
CN202311663184.5A 2023-12-06 2023-12-06 Computer vision-based intelligent training and checking system and method Pending CN117542121A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311663184.5A CN117542121A (en) 2023-12-06 2023-12-06 Computer vision-based intelligent training and checking system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311663184.5A CN117542121A (en) 2023-12-06 2023-12-06 Computer vision-based intelligent training and checking system and method

Publications (1)

Publication Number Publication Date
CN117542121A true CN117542121A (en) 2024-02-09

Family

ID=89787946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311663184.5A Pending CN117542121A (en) 2023-12-06 2023-12-06 Computer vision-based intelligent training and checking system and method

Country Status (1)

Country Link
CN (1) CN117542121A (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02291600A (en) * 1989-04-28 1990-12-03 Nec Corp Pattern recognition device and standard pattern generating device
US9600717B1 (en) * 2016-02-25 2017-03-21 Zepp Labs, Inc. Real-time single-view action recognition based on key pose analysis for sports videos
US20190146590A1 (en) * 2017-11-15 2019-05-16 Institute For Information Industry Action evaluation model building apparatus and action evaluation model building method thereof
CN111437583A (en) * 2020-04-10 2020-07-24 哈尔滨工业大学 Badminton basic action auxiliary training system based on Kinect
WO2020244287A1 (en) * 2019-06-03 2020-12-10 中国矿业大学 Method for generating image semantic description
CN113343974A (en) * 2021-07-06 2021-09-03 国网天津市电力公司 Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement
CN115116087A (en) * 2022-05-23 2022-09-27 北京体育大学 Action assessment method, system, storage medium and electronic equipment
CN115546360A (en) * 2021-06-29 2022-12-30 阿里巴巴新加坡控股有限公司 Action result identification method and device
CN115761900A (en) * 2022-12-06 2023-03-07 深圳信息职业技术学院 Internet of things cloud platform for practical training base management
CN116259108A (en) * 2023-02-20 2023-06-13 光控特斯联(重庆)信息技术有限公司 Action quality assessment method and device and action quality assessment model training method
CN116416678A (en) * 2023-03-07 2023-07-11 华中科技大学同济医学院附属协和医院 Method for realizing motion capture and intelligent judgment by using artificial intelligence technology
CN116503785A (en) * 2023-05-06 2023-07-28 吉林体育学院 Natatorium supervision system and natatorium supervision method
CN116740364A (en) * 2023-08-16 2023-09-12 长春大学 Image semantic segmentation method based on reference mechanism
WO2023225858A1 (en) * 2022-05-24 2023-11-30 中山大学 Reading type examination question generation system and method based on commonsense reasoning

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH02291600A (en) * 1989-04-28 1990-12-03 Nec Corp Pattern recognition device and standard pattern generating device
US9600717B1 (en) * 2016-02-25 2017-03-21 Zepp Labs, Inc. Real-time single-view action recognition based on key pose analysis for sports videos
US20190146590A1 (en) * 2017-11-15 2019-05-16 Institute For Information Industry Action evaluation model building apparatus and action evaluation model building method thereof
WO2020244287A1 (en) * 2019-06-03 2020-12-10 中国矿业大学 Method for generating image semantic description
CN111437583A (en) * 2020-04-10 2020-07-24 哈尔滨工业大学 Badminton basic action auxiliary training system based on Kinect
CN115546360A (en) * 2021-06-29 2022-12-30 阿里巴巴新加坡控股有限公司 Action result identification method and device
CN113343974A (en) * 2021-07-06 2021-09-03 国网天津市电力公司 Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement
CN115116087A (en) * 2022-05-23 2022-09-27 北京体育大学 Action assessment method, system, storage medium and electronic equipment
WO2023225858A1 (en) * 2022-05-24 2023-11-30 中山大学 Reading type examination question generation system and method based on commonsense reasoning
CN115761900A (en) * 2022-12-06 2023-03-07 深圳信息职业技术学院 Internet of things cloud platform for practical training base management
CN116259108A (en) * 2023-02-20 2023-06-13 光控特斯联(重庆)信息技术有限公司 Action quality assessment method and device and action quality assessment model training method
CN116416678A (en) * 2023-03-07 2023-07-11 华中科技大学同济医学院附属协和医院 Method for realizing motion capture and intelligent judgment by using artificial intelligence technology
CN116503785A (en) * 2023-05-06 2023-07-28 吉林体育学院 Natatorium supervision system and natatorium supervision method
CN116740364A (en) * 2023-08-16 2023-09-12 长春大学 Image semantic segmentation method based on reference mechanism

Similar Documents

Publication Publication Date Title
US10916158B2 (en) Classroom teaching cognitive load measurement system
Jiang et al. Seeing invisible poses: Estimating 3d body pose from egocentric video
CN102422324B (en) Age estimation device and method
CN113762133A (en) Self-weight fitness auxiliary coaching system, method and terminal based on human body posture recognition
WO2021068781A1 (en) Fatigue state identification method, apparatus and device
CN114550027A (en) Vision-based motion video fine analysis method and device
CN113052138A (en) Intelligent contrast correction method for dance and movement actions
CN113239849B (en) Body-building action quality assessment method, body-building action quality assessment system, terminal equipment and storage medium
CN117133057A (en) Physical exercise counting and illegal action distinguishing method based on human body gesture recognition
Sameki et al. ICORD: Intelligent Collection of Redundant Data-A Dynamic System for Crowdsourcing Cell Segmentations Accurately and Efficiently.
CN114639168B (en) Method and system for recognizing running gesture
CN117542121A (en) Computer vision-based intelligent training and checking system and method
CN106446837B (en) A kind of detection method of waving based on motion history image
CN115019396A (en) Learning state monitoring method, device, equipment and medium
CN114202565A (en) Intelligent learning intervention system based on learning process emotion real-time analysis
Du The computer vision simulation of athlete’s wrong actions recognition model based on artificial intelligence
Huang et al. Appearance-independent pose-based posture classification in infants
CN117726992B (en) Nursing skill training auxiliary system and method
Xie Intelligent Analysis Method of Sports Training Posture Based on Artificial Intelligence
CN116824459B (en) Intelligent monitoring and evaluating method, system and storage medium for real-time examination
Shen Learn from Teachers: A Teacher Learner Yoga Pose Classification and Scoring Network
Yan et al. Application of light sensors based on reinforcement learning in martial arts action optimization and image defect detection
Lu et al. Face quality assessment based on local gradient
XIEa et al. Sports Teaching Action Recognition Based on Hybrid CNN-HMM
CN117576781A (en) Training intensity monitoring system and method based on behavior recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination