CN114677765A - Interactive video motion comprehensive identification and evaluation system and method - Google Patents

Interactive video motion comprehensive identification and evaluation system and method Download PDF

Info

Publication number
CN114677765A
CN114677765A CN202210448232.8A CN202210448232A CN114677765A CN 114677765 A CN114677765 A CN 114677765A CN 202210448232 A CN202210448232 A CN 202210448232A CN 114677765 A CN114677765 A CN 114677765A
Authority
CN
China
Prior art keywords
action
component
video
algorithm
recognition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210448232.8A
Other languages
Chinese (zh)
Inventor
罗明宇
易秋晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dongyun Ruilian Wuhan Computing Technology Co ltd
Original Assignee
Dongyun Ruilian Wuhan Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dongyun Ruilian Wuhan Computing Technology Co ltd filed Critical Dongyun Ruilian Wuhan Computing Technology Co ltd
Priority to CN202210448232.8A priority Critical patent/CN114677765A/en
Publication of CN114677765A publication Critical patent/CN114677765A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an interactive video motion comprehensive identification evaluation system and method; the invention expands the action description scope based on the human body skeleton modeling algorithm in the prior art, can model the actions generated by single person, interaction between person and object and interaction between multiple persons, provides a rich comprehensive evaluation index system, forms a universal video action recognition and evaluation method solution, can describe the difference between the action to be evaluated and the standard from multiple qualitative and quantitative layers, and can integrate the latest data, data processing technology and the most advanced algorithm model along with the technical development so as to realize the optimal action recognition and evaluation effect.

Description

Interactive video motion comprehensive identification and evaluation system and method
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to an interactive video motion comprehensive identification and evaluation system and method.
Background
The identification and evaluation of video actions are widely applied to industries such as security protection, sports and the like, generally, a static or moving camera is used for collecting a character movement video under a specific scene, the human body limb angle, the action category and the like are identified and analyzed based on an artificial intelligence algorithm, and the identification and analysis are compared with a set standard so as to judge whether abnormal conditions exist or not or evaluate the normative of the actions.
In the prior art, in terms of algorithms for video motion recognition and evaluation, some scholars propose a plurality of expert rules and artificial intelligence algorithms for some specific application scenes.
In the prior art, a video motion recognition method is to collect coordinates of joint points of a human body by using a Kinect sensor, calculate angles between the joint points by a series of artificially designed expert rules, and compare the calculated angles with a preset standard value to judge whether a sit-up motion is qualified.
The other video motion recognition mode is an artificial intelligence-based motion recognition method, and mainly utilizes a classic human body skeleton sequence feature classification model which comprises 3 sub-models respectively used for single-frame image human body skeleton feature extraction, time sequence feature coding and motion classification.
In another video motion recognition method, the characteristics of the target object and the characteristics of the background area are considered at the same time, the influence value of the background area is determined through a series of methods, and the influence value is fused with the characteristics of the target object and then motion classification is performed. Considering that the human body action is likely to be related to the background environment, the method widens the video action identification to more information beyond the human body range, and finally improves the identification precision of the algorithm by utilizing richer features.
However, the definition of motion in the above three prior art techniques is still one-sided. They mostly only take into account the actions reflected by the human skeleton itself and its timing variations (e.g. walking, running, sit-up) or use some ambiguous appearance (e.g. the above-mentioned background area impact values) for improving the recognition accuracy. The broader definition of actions includes not only actions reflected by the shape and changes of the human skeleton, but also actions reflected by local shape and changes (such as facial expression and visual direction) which are not related to the human skeleton, actions reflected by the common changes of the human body and external things (such as drinking water and raising flags, if the external objects such as water cups and flags are not considered, the meaning of the actions is uncertain), and actions reflected by multi-person actions (such as fighting a blow). The scenes are quite common in the fields of sports, security and the like, but the existing method cannot solve the problem. In addition, the existing action evaluation methods generally provide a similarity score or a similarity level based on expert rules or overall similarity calculation, are fuzzy, and cannot provide detail differences and correction directions (such as action delay).
The defects enable the existing method to be only elaborately designed for specific tasks, and the generalization performance is weak, so that the existing method cannot be used as a solution for a universal video motion recognition and evaluation method. In addition, with continuous accumulation of business data and continuous breakthrough of algorithm research, the model precision has the possibility of being improved, the action recognition requirement has the possibility of being expanded or changed, and the existing method cannot be well met.
The above is only for the purpose of assisting understanding of the technical solution of the present invention, and does not represent an admission that the above is the prior art.
Disclosure of Invention
In order to solve the above technical problems mentioned in the background art, the present invention provides an interactive video motion comprehensive identification and evaluation system and method, wherein the system comprises:
the system includes a data collection component 100, a data annotation component 200, an action recognition component 300, and an action evaluation component 400:
the data acquisition component 100 is used for acquiring original video data;
the action recognition component 300 is used for receiving a request of a user for adding an action recognition algorithm model component and adding the action recognition algorithm model component into an action recognition algorithm model library;
the action evaluation component 400 is used for receiving a request of a user for adding an action evaluation algorithm model component and adding the action evaluation algorithm model component into an action evaluation algorithm model library;
the action recognition component 300 is used for receiving action recognition categories and algorithm model combination configuration set by a user to form a video action comprehensive recognition method, entrusts the data labeling component 200 to start data labeling service of a corresponding task for an algorithm which needs to be trained by the action recognition component, and is used for carrying out data labeling on the basis of an interactive interface by the user to generate first labeling data;
the action evaluation component 400 receives action evaluation indexes and algorithm model combination configuration set by a user to form a video action comprehensive evaluation method, and for an algorithm to be trained by the action evaluation component 400, the data annotation component 200 is entrusted to start data annotation service of the corresponding task, so that the user can perform data annotation based on an interactive interface to generate second annotation data; the action evaluation component 400 uses the second labeled data to train an algorithm to be trained in the video action comprehensive evaluation method, and obtains and stores a corresponding evaluation model;
the motion recognition component 300 is configured to perform inference on the video data acquired by the data acquisition component in real time based on the video motion comprehensive recognition method, and output a video motion comprehensive feature recognition result;
the action evaluation component 400 executes reasoning on the video data collected by the data collection component in real time and the recognition result of the video action comprehensive characteristic output by the action recognition component based on the video action comprehensive evaluation method, and outputs a video action comprehensive evaluation result.
Accordingly, the video motion comprehensive feature recognition result output by the motion recognition component 300 at least includes the human body self-feature and the external object feature which has the relevant change with the human body self-feature.
In addition, in order to achieve the above object, the present invention further provides an interactive video motion comprehensive identification and evaluation method, including the following steps:
calling a data acquisition component to acquire original video data;
receiving a request of a user for adding a motion recognition algorithm model component by a motion recognition component, and adding the request into a motion recognition algorithm model library;
receiving a request of a user for adding a new action evaluation algorithm model component by an action evaluation component, and adding the request into an action evaluation algorithm model library;
receiving an action recognition category and algorithm model combination configuration set by a user by an action recognition component to form a video action comprehensive recognition method, entrusting a data annotation component to start a data annotation service of a corresponding task for an algorithm which needs to be trained by the action recognition component, and carrying out data annotation on the basis of an interactive interface by the user to generate first annotation data;
the action evaluation component receives action evaluation indexes and algorithm model combination configuration set by a user to form a video action comprehensive evaluation method, and for the algorithm which needs to be trained by the action evaluation component, the data annotation component is entrusted to start the data annotation service of the corresponding task, so that the user can perform data annotation based on an interactive interface to generate second annotation data; the action evaluation component uses the second labeling data to train an algorithm needing to be trained in the video action comprehensive evaluation method, and obtains and stores a corresponding evaluation model;
the action recognition component executes reasoning on the video data collected by the data collection component in real time based on the video action comprehensive recognition method and outputs a video action comprehensive characteristic recognition result;
the action evaluation component executes reasoning on the video data acquired by the data acquisition component in real time and the identification result of the video action comprehensive characteristic output by the action identification component based on the video action comprehensive evaluation method, and outputs a video action comprehensive evaluation result.
Correspondingly, the video motion comprehensive characteristic recognition result output by the motion recognition component at least comprises the human body self characteristic and the external object characteristic which has relevant change with the human body self characteristic.
Preferably, the step of the motion recognition component performing inference on the video data collected by the data collection component in real time based on the video motion comprehensive recognition method and outputting a recognition result of the video motion comprehensive characteristics includes:
the data acquisition component inputs acquired video data into the action recognition component, the action recognition component calls the video action comprehensive recognition method to perform reasoning to obtain a video frame pool and an action characteristic pool, and simultaneously outputs a recognition result;
correspondingly, the step of the action evaluation component executing reasoning on the video data collected by the data collection component in real time and the video action comprehensive characteristic representation output by the action recognition component based on the video action comprehensive evaluation method and outputting a video action comprehensive evaluation result comprises the following steps:
the action evaluation component analyzes the video frame pool and the action characteristic pool, carries out action evaluation algorithm reasoning based on the video action comprehensive evaluation method and combined with a preset standard action video, and outputs an evaluation result.
Preferably, the action recognition category and the algorithm model set by the user are configured in a combined manner, and the action evaluation category and the algorithm model set by the user are configured in a project configuration file in a combined manner; the project configuration file comprises meta-information configuration file paths of all algorithms or models which a user desires to run in the algorithm model library and runtime configuration file paths corresponding to the algorithms or models.
Preferably, the video motion comprehensive identification method is based on a single-frame image or a video time sequence in a video, and uses a plurality of feature detection algorithms and models for describing the human body and the motion definition and feature depiction formed by interaction of the human body and the environment;
wherein the action definitions and feature types include, but are not limited to:
image features coded by various coding modes in a single-frame or multi-frame sequence;
the coordinate of key points of a single person in a single frame and a coordinate sequence formed in multiple frames;
coordinates of key points of a plurality of human bodies in a single frame and a coordinate sequence formed in a plurality of frames;
the object attributes such as the category, the number and the color of the interested objects in a single frame, the position information such as the coordinate of a bounding box and the coordinate of a boundary point, and the sequence formed by the various characteristics in multiple frames;
a composite attribute defined by characteristics of a human body and various things in a single frame;
comprehensive attributes defined by characteristic sequences of human bodies and various things in multiple frames;
the characteristics and comprehensive properties of the human body and various things which are possibly appeared in the future are determined by the characteristic sequence of the human body and various things at present.
Preferably, the optimal algorithm adopted by the video motion comprehensive identification method can comprise a target detection algorithm, a human body key point detection algorithm, a target tracking algorithm, a skeleton modeling algorithm and a sequence classification algorithm, and combination nesting among various algorithms exists for describing motion definition and feature depiction formed by interaction of human body self features and external object features in a single frame and a video sequence.
Preferably, the action recognition algorithm model library and the action evaluation algorithm model library are a set of code files, model files and other related files of an action recognition algorithm, a model and an action evaluation algorithm and a model; each algorithm/model must include a meta-information configuration file, a runtime configuration file, and an inference script, and the trainable algorithm must also include a training start script.
Preferably, the meta-information configuration file of the algorithm and the model specifically includes: algorithm and model name; algorithm and model type; training starting script path and reasoning starting script path of the algorithm and the model; the algorithm model infers the input data type; the data type of the algorithm and the model inference output;
the data types of the algorithm and the model inference input/output comprise the following conditions: the real type of the object in the memory and the parameters for describing the attributes of the object, and the nesting structure between the real type and the parameters. The operation configuration file of the algorithm and the model configures information for all parameters related to or depended on the operation training process or the reasoning process of the algorithm and the model.
The invention has the beneficial effects that: the interactive video motion comprehensive identification and evaluation system and method expand the motion description scope based on the human body skeleton modeling algorithm in the prior art, can model the motion generated by single person, human-object interaction and multi-person interaction, provides rich comprehensive evaluation index systems, forms a universal video motion identification and evaluation method solution, can describe the difference between the motion to be evaluated and the standard from multiple qualitative and quantitative layers, and can integrate the latest data, data processing technology and the most advanced algorithm model along with the technical development so as to realize the optimal motion identification and evaluation effect.
Drawings
FIG. 1 is a schematic diagram of components of an interactive video motion integrated recognition and evaluation system according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a training process of a motion recognition model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an inference flow of a motion recognition model according to an embodiment of the invention;
FIG. 4 is a schematic diagram of a training flow of a motion estimation model according to an embodiment of the invention;
FIG. 5 is a schematic diagram of an inference flow of an action-assessment model according to an embodiment of the invention;
fig. 6 is a schematic diagram of a physical deployment of an interactive video motion integrated recognition evaluation system according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Interactive video motion comprehensive identification evaluation system and method embodiment
First, an interactive video motion integrated recognition evaluation system proposed according to an embodiment of the present invention will be described with reference to the accompanying drawings. Fig. 1 is a diagram illustrating an interactive video motion comprehensive recognition and evaluation system according to an embodiment of the present invention.
As shown in fig. 1, the system 10 includes: a data collection component 100, a data annotation component 200, an action recognition component 300, and an action evaluation component 400.
The data acquisition assembly 100 is used for acquiring original video data; the raw video data collected by the data collection component 100 is used as an input of the data annotation component in a model training phase to generate annotation data, and is used for generating raw video data to be identified and evaluated in an inference phase.
The action recognition component 300 is used for receiving a request of a user for adding an action recognition algorithm model component and adding the action recognition algorithm model component into an action recognition algorithm model library;
the action evaluation component 400 is used for receiving a request of a user for adding an action evaluation algorithm model component and adding the action evaluation algorithm model component into an action evaluation algorithm model library;
it should be noted that the request of the newly added motion recognition algorithm model component has a potential association with the original video data acquired by the data acquisition component 100, and the request of the newly added motion evaluation algorithm model component has a potential association with the original video data acquired by the data acquisition component 100;
that is, the system needs to identify a person in the video data collected by the data collection component 100, the motion recognition component 300 needs an algorithm model capable of being used for detecting a person, but the embodiment does not verify the relationship between the newly added motion recognition algorithm model component request/motion evaluation algorithm model component request and the collected original video data, because the same algorithm model can be used for a plurality of different data types, and the same data type can also adopt different algorithm models; therefore, the newly added action recognition algorithm model and the newly added action evaluation algorithm model component in the embodiment can be used for a plurality of different data types; in the concrete implementation, taking the video motion identification of a railway signal semaphore as an example, the identification task needs to identify all human bodies in a real-time video, identify whether the motion made by the human bodies is the railway signal semaphore or not and classify the specific semaphore category; then the original video data is the semaphore video of the certain railway signal; the user needs to add a motion recognition algorithm model component and a motion evaluation algorithm model component request which are both in potential association with the semaphore video of the railway signal, namely, the added motion recognition and motion evaluation related algorithms are both in potential association with the motion of the railway signal; the action recognition algorithm model component and the action evaluation algorithm model component which are added by the user refer to a set of the code file and other related files of the algorithm model. The files are added into the respective algorithm model libraries by the action recognition algorithm model component or the action evaluation algorithm model component.
The action recognition component 300 is used for receiving action recognition categories and algorithm model combination configuration set by a user to form a video action comprehensive recognition method, entrusts the data labeling component 200 to start data labeling service of a corresponding task for an algorithm which needs to be trained by the action recognition component, and is used for carrying out data labeling on the basis of an interactive interface by the user to generate first labeling data; the motion recognition component uses the first labeled data to train an algorithm needing to be trained in the video motion comprehensive recognition method, and obtains and stores a corresponding recognition model;
the video motion comprehensive identification method of the embodiment adopts an optimal algorithm which can comprise a target detection algorithm, a human body key point detection algorithm, a target tracking algorithm, a skeleton modeling algorithm and a sequence classification algorithm and is used for describing motion definitions and feature portraits formed by interaction of human body self features and external object features in a single frame and a video sequence;
it can be understood that the user-defined action recognition category and algorithm model combination configuration, and the user-defined action evaluation category and algorithm model combination configuration refer to a project profile. The file contains meta-information profile paths of all algorithms/models in the algorithm model library that the user desires to run and runtime profile paths corresponding to the algorithms/models.
The action evaluation component 400 is used for receiving action evaluation indexes and algorithm model combination configuration set by a user to form a video action comprehensive evaluation method, entrusts the data labeling component 200 to start data labeling service of the corresponding task for the algorithm to be trained by the action evaluation component 400, and is used for carrying out data labeling on the basis of an interactive interface by the user to generate second labeling data; the action evaluation component 400 uses the second labeled data to train an algorithm to be trained in the video action comprehensive evaluation method, and obtains and stores a corresponding evaluation model;
the motion recognition component 300 is configured to perform inference on the video data acquired by the data acquisition component in real time based on the video motion comprehensive recognition method, and output a video motion comprehensive feature recognition result; the video motion comprehensive feature recognition result output by the motion recognition component 300 at least comprises the human body self-feature and the external object feature which has related change with the human body self-feature;
in a specific implementation, the original video data is the semaphore video of the certain railway signal, a staff waving a flag in the video belongs to the characteristics of the human body, and the flag waved by the staff belongs to the characteristics of external objects;
it should be noted that, the video motion comprehensive identification method of the embodiment uses multiple feature detection algorithms and models based on a single frame image or a video time sequence in a video to describe the human body itself and the motion definition and feature depiction formed by interaction with the environment. The action definitions and feature types include, but are not limited to:
image features coded by various coding modes in a single-frame or multi-frame sequence;
the key point coordinates of a single person in a single frame and a coordinate sequence formed in multiple frames;
coordinates of key points of a plurality of human bodies in a single frame and a coordinate sequence formed in a plurality of frames;
the object attributes such as the category, the number and the color of the interested objects in a single frame, the position information such as the coordinate of a bounding box and the coordinate of a boundary point, and the sequence formed by the various characteristics in multiple frames;
a composite attribute defined by characteristics of a human body and various things in a single frame;
comprehensive attributes defined by characteristic sequences of human bodies and various things in multiple frames;
the characteristics and comprehensive properties of the human body and various things which are possibly appeared in the future are determined by the characteristic sequence of the current human body and various things;
the action evaluation component 400 executes reasoning on the video data collected by the data collection component in real time and the recognition result of the video action comprehensive characteristic output by the action recognition component based on the video action comprehensive evaluation method, and outputs a video action comprehensive evaluation result.
In this embodiment, the video motion comprehensive identification method includes an artificial feature extraction algorithm directly used for images and optical flows, and also includes a deep learning algorithm based on image classification, target detection, keypoint detection, a skeleton modeling algorithm, sequence classification, and the like, and there is a combination nesting between various algorithms. These deep learning algorithms require supervised learning using labeled data in the training phase to fit neural network parameters before feature extraction and prediction can be performed in the inference phase. It can be understood that the deep learning algorithm is an algorithm which takes an artificial neural network as a framework and performs characterization learning on data. The learning of the deep learning algorithm is to use a set of hyper-parameters to carry out iterative training on the neural network so as to obtain an estimated value of the neural network parameters. Different deep learning algorithms require different types and formats of data, for example, a target detection algorithm requires single-frame images and the coordinates of a bounding box of an object of interest, a skeleton modeling algorithm requires single-frame or sequence coordinates of human key points, and the data can be derived from an original video frame sequence and the operation output of other algorithms.
Specifically, in this embodiment, the interactive video motion comprehensive identification and evaluation process is mainly divided into two periods, namely training and reasoning. During the training period, the data collection component 100 collects or receives a large amount of image and video data, the action recognition component 300 and the action evaluation component 400 receive an algorithm model component and an algorithm model combination scheme added by a user, and the data annotation component 200 is entrusted to start a data annotation service. The motion recognition component 300 and the motion evaluation component 400 train on the labeled data corresponding to the respective algorithms, and generate and store models. In the inference period, the data acquisition component 100 acquires images or videos and inputs the images or videos into the action recognition component 300, and the action recognition component 300 performs inference of an action recognition algorithm, stores a video frame pool and an action feature pool with a certain length, and outputs a recognition result. The action evaluation component 400 analyzes the video frame pool and the action feature pool, performs action evaluation algorithm reasoning by combining evaluation criteria, and outputs an evaluation result.
Referring to fig. 6, fig. 6 shows a physical deployment diagram of the system, which is composed of an image/video capture device and a server. The server is responsible for the operation of all system components and can carry out data transmission and interaction with the data acquisition device. Specifically, the data acquisition component 100 in the server controls the data acquisition device to acquire and store image and video data in the server when receiving a data acquisition request, and other component operations in the system such as data annotation, action recognition and action evaluation are all operated in the server.
Correspondingly, based on the system of fig. 1, the present invention corresponds to a set of method embodiments, in which the method includes the following steps:
video acquisition and manual operation stage:
calling a data acquisition component to acquire original video data;
receiving a request of a user for adding a motion recognition algorithm model component by a motion recognition component, and adding the request into a motion recognition algorithm model library;
receiving a request of a user for adding a new action evaluation algorithm model component by an action evaluation component, and adding the request into an action evaluation algorithm model library;
and (3) a motion recognition model training stage:
receiving an action recognition category and algorithm model combination configuration set by a user by an action recognition component to form a video action comprehensive recognition method, entrusting a data annotation component to start a data annotation service of a corresponding task for an algorithm which needs to be trained by the action recognition component, and carrying out data annotation on the basis of an interactive interface by the user to generate first annotation data; the motion recognition component uses the first labeling data to train an algorithm needing to be trained in the video motion comprehensive recognition method, and obtains and stores a corresponding recognition model; the video action comprehensive identification method preferably at least comprises a target detection algorithm, a human key point detection algorithm, a target tracking algorithm, a skeleton modeling algorithm and a sequence classification algorithm, and is used for describing action definitions and feature portraits formed by interaction of human self features and external object features in a single frame and a video sequence;
and (3) a motion evaluation model training stage:
the action evaluation component receives action evaluation indexes and algorithm model combination configuration set by a user to form a video action comprehensive evaluation method, and for the algorithm which needs to be trained by the action evaluation component, the data annotation component is entrusted to start the data annotation service of the corresponding task, so that the user can perform data annotation based on an interactive interface to generate second annotation data; the action evaluation component uses the second labeling data to train an algorithm needing to be trained in the video action comprehensive evaluation method, and obtains and stores a corresponding evaluation model;
and (3) action identification reasoning process stage:
the action recognition component executes reasoning on the video data collected by the data collection component in real time based on the video action comprehensive recognition method and outputs a video action comprehensive characteristic recognition result;
and (3) action evaluation reasoning process stage:
and the action evaluation component executes reasoning on the video data acquired by the data acquisition component in real time and the identification result of the video action comprehensive characteristic output by the action identification component based on the video action comprehensive evaluation method, and outputs a video action comprehensive evaluation result.
Further, for each stage of the interactive video motion comprehensive identification and evaluation method of the present invention, detailed descriptions are respectively provided in different method embodiments:
method embodiment 1< motion recognition model training phase >
In the present embodiment, a motion recognition model training embodiment is provided, which can be performed by the motion recognition component 300 in the present embodiment. As a specific example, this embodiment takes the video motion recognition of a railway signal semaphore as an example, and the recognition task needs to recognize all human bodies in the real-time video, recognize whether the motion they do is a railway signal semaphore, and classify a specific semaphore category.
In this embodiment, the motion recognition algorithm model component includes: the system comprises a target detection algorithm for detecting a human body and a flag (distinguishing colors), a target tracking algorithm for tracking the human body and the flag, a key point detection algorithm for detecting key points of human body parts, a human body skeleton modeling algorithm for skeleton modeling, a feature encoder for general image feature encoding and a sequence feature classification algorithm for classifying action categories. Except for the tracking algorithm, the skeleton modeling algorithm and the universal feature encoder, the other algorithms need to be trained. The algorithm listed here is designed according to the recognition requirements of the embodiment, and the invention does not limit the specific motion recognition scenario nor the specific algorithm adopted.
As shown in fig. 2, the motion recognition model training process in this embodiment includes the following steps:
in step S101, the action recognition component 300 requests the data annotation component 200 to start a data annotation function for the above algorithm.
Step S102, the data annotation component 200 starts the data annotation service of the corresponding algorithm, and the user performs a specific annotation operation. In this embodiment, the specific contents to be labeled are as follows:
human body and flag (color discrimination) category labels and bounding box coordinates in the single frame image;
the category and the coordinates of key points of the human body in the single-frame image;
an action category label corresponding to the information of each person in the continuous frame sequence;
step S103, the motion recognition component 300 trains each algorithm, and the specific data used by each algorithm is as follows:
the human body and flag target detection algorithm is marked by using a single frame image and corresponding categories and bounding box coordinates;
the human body key point detection algorithm uses a single image slice defined by a human body boundary frame in a single frame image and human body key point coordinates and category labels relative to the slice; (the following formula is not added:.)
Calling a sequence feature classification algorithm to perform certain processing on the original label: firstly, the target tracking algorithm is used for extracting a key point coordinate sequence of the same human body ID (human body self-characteristic) in any continuous frame sequence, the key point coordinate sequence is converted into a skeleton characteristic sequence by using a skeleton modeling algorithm, and the skeleton characteristic sequence is recorded as an actual human body skeleton characteristic sequence
Figure BDA0003616267760000121
Wherein L is the sequence length, DskAnd outputting the characteristic dimension for the skeleton modeling algorithm. The flags (extrinsic features) present in these frames are extracted using the object tracking algorithm, and only flags (object extrinsic features) whose distance from the person (the human body self-features) is less than a preset threshold are considered. If a target flag (target external object characteristics) meeting the requirements exists, converting the absolute coordinates of the center point of the coordinate frame of the target flag into relative coordinates relative to the center point of the human body coordinate frame of the person, normalizing the width and the height of the coordinate frame of the target flag by using the width and the height of the human body coordinate frame, recording the coordinate frame sequence of the processed target flag as the width and the height of the human body coordinate frame, and recording the coordinate frame sequence of the processed target flag as the coordinate frame sequence of the target flag
Figure BDA0003616267760000131
The color class sequence of the target flag after one-hot coding is recorded as
Figure BDA0003616267760000132
CflagThe number of color categories. Encoding depth feature vectors corresponding to the image slices defined by the original bounding box of the target flag by using a universal feature encoder, compressing the depth feature vectors to a fixed length, and recording the feature sequences as
Figure BDA0003616267760000133
DfeIs the feature dimension of the generic feature encoder output. BBox is addedflag,i,Clsflag,i,FEflag,iStitching along a characteristic dimension of Fflag,i=Concat(BBoxflag,i,Clsflag,i,FEflag,i) And all 0 s are used to fill in the feature values corresponding to the frames which do not appear. Splicing all target flag features meeting the requirements into F according to feature dimensionsflag=Concat(Fflag,i) The flag number is truncated according to a preset maximum number threshold n, and the features corresponding to the missing values are filled with all 0's. Finally F is mixedskAnd FflagSpliced according to characteristic dimension into
Figure BDA0003616267760000134
Sequence length is according to preset sequence window length LmaxTruncated to F', if the sequence exceeds the length, generating several sub-sequence data based on the sliding window
Figure BDA0003616267760000135
And using the action class label of the complete sequence as the action class label of each segment of the truncated subsequence. The sequence feature classification algorithm needs labeled data of sequence features of each sub-sequence and action class labels { F', Y }, and Y is an action class label index.
In step S104, the motion recognition component 300 stores each model in the computer storage after training.
Method embodiment 2< action recognition inference flow stage >
In the present embodiment, a data inference approach for a motion recognition model is provided, which can be performed by the motion recognition component 300. Taking the specific scenario in < method embodiment 1> as an example, all the outputs of each model are stored in the motion feature pool, and since the scenario only concerns the identified motion category, only the sequence feature classification result needs to be presented as the motion identification result.
As shown in fig. 3, the action recognition model inference process in this embodiment includes the following steps:
step S201, the action recognition component 300 initializes the video frame PoolvideoPool of action characteristics Poolrec_featAnd identification result Poolrec_result
Step S202, the action recognition component 300 receives the video stream provided by the data acquisition component 100;
step S203, traversing each frame of the video stream in sequence, stopping if the traversal is finished, or executing step S204;
step S204, the traversed target Frame image FrameiAdding the video frame into a video frame pool, and marking a frame serial number i;
step S205, FrameiForward propagation Detect (-) for inputting running target detection algorithm to obtain all human body bounding box coordinates bbox in current framehumanFlag category clsflagAnd the bounding box coordinates bboxflagTemporarily storing the data into an action characteristic pool;
step S206, from the Frame image FrameiExtracting all human body regions bboxhumanThe image slice of (1) runs the forward propagation KPDetect (DEG) of the human body key point detection algorithm to obtain all the human body key point information s of the current framekpTemporarily storing the data into an action characteristic pool;
step S207, for all the human body key point detection results, operating forward propagation SK (-) of a skeleton modeling algorithm, and enabling a skeleton feature result f to beskTemporarily storing the data into an action characteristic pool;
step S208, from the Frame image FrameiExtracting all flag regions bboxflagThe forward propagation FE (-) of the general feature extraction model is run to obtain the flag image feature FE (f) compressed to a fixed lengthflagTemporarily storing the data into an action characteristic pool;
step S209, using all the detected bounding boxes bbox in the current framehumanAnd bboxflagFor inputting, running a target tracking algorithm Track (·), and determining identity indexes id corresponding to all the target detection results in the current framehumanAnd idflagTo make itAdding the identity index information into the recognition result of the corresponding target feature in the action feature pool, and marking a frame number i;
step S210, when the identity index id of any human body exists in the action feature poolhumanWhich continuously exist over<Method example 1>The sequence window length L as described inmaxThe number of frames of (c). Obtaining all flag detection results bbox appearing in these frame numbersflag,i,clsflag,iRetention of<Method example 1>The flag whose distance from the person is less than a certain threshold value is then set according to<Method example 1>The preset maximum number threshold n of the flags in (1) retains n flags with the largest number of the appeared frames. For the flags, absolute coordinates of the center point of the coordinate frame of the flag are converted into relative coordinates relative to the center point of the human body coordinate frame, the width and the height of the human body coordinate frame are used for normalization, and the width and the height of the coordinate frame of the flag are used for normalization. Then, the processed flag bounding box sequence BBox is processed according to the characteristic dimensionflagFlag color category sequence ClsflagFlag image feature sequence FEflagSpliced into flag overall characteristic sequence Fflag=Concat(BBoxflag,Clsflag,FEflag). And filling all 0 for the feature vector corresponding to the frame number where a certain flag does not appear or the vacant feature vector where the number of flags is less than the maximum truncation value n. The skeleton characteristic f of the person output in the step S207skThe formed actual human skeleton characteristic sequence FskAnd the flag general characteristic sequence FflagSpliced together by characteristic dimension F ═ Concat (F)sk,Fflag) Then enters the sequence feature classification model Action (). And writing the action classification act result into an action feature pool and an identification result pool simultaneously, and marking the frame number range contained in the sequence.
Method embodiment 3< action evaluation model training phase >
In the present embodiment, a manner of motion estimation model training is provided, which may be performed by the motion estimation component 400. Taking the specific scenario in < method embodiment 1> as an example, in addition to detecting the motion category, this embodiment introduces a new task of evaluating the degree of difference between the motion and the standard motion on the basis of recognizing that the motion performed by the human body is the motion of the railroad semaphore signal.
In this embodiment, the motion estimation algorithm model component includes: a numerical difference calculation method and a sequence execution difference prediction algorithm. The numerical difference calculation method outputs the characteristic numerical difference of two sequences at the same time point based on a fixed rule without training; the sequence execution difference prediction algorithm is based on deep learning, is used for predicting the frame rate difference and the time delay executed by the two sections of sequences and needs training. In addition, in the data processing process, the human target detection algorithm, the target tracking model and the human skeleton modeling algorithm in < method embodiment 1> are also used. The algorithm listed here is designed according to the action evaluation requirement of the embodiment, and the invention does not limit the specific action evaluation scenario nor the specific algorithm adopted.
As shown in fig. 4, the motion evaluation model training process in this embodiment includes the following steps:
in step S301, the action evaluation component 400 requests the data labeling component 200 to start a data labeling function for the above algorithm.
Step S302, the data annotation component 200 starts the data annotation service of the corresponding algorithm, and the user performs a specific annotation operation. In this embodiment, the specific contents to be labeled are as follows:
the coordinates of the human body label and the bounding box in the single-frame image;
the category and the coordinates of key points of the human body in the single-frame image;
the above annotation can also be obtained by the same annotation in the data annotation step of the motion recognition model training in < method embodiment 1> of the present invention.
Step S303, the motion evaluation 400 trains each algorithm, and a special training data generation method is required for the sequence execution difference prediction algorithm, which is specifically as follows:
firstly, the target tracking algorithm is used for extracting any continuous frame sequence in the original video dataIn the column, the key point coordinate sequence of the same (human body) human body self-feature ID. In each iteration of training, use<Method example 1>Is a predetermined sequence window length LmaxFirst, a certain sub-sequence of the sequence is randomly intercepted and recorded as
Figure BDA0003616267760000161
CkpThe number of keypoint categories. Then, in a random distribution
Figure BDA0003616267760000162
Sample one phase compared with
Figure BDA0003616267760000163
The frame number interval delta of the start frame of the sequence being in another random distribution
Figure BDA0003616267760000164
A frame rate scaling factor gamma is sampled. By comparison with
Figure BDA0003616267760000165
The start frame interval delta frame of the sequence is used as the start position, int (gamma L)max) Truncating the sequence as the length of the sequence
Figure BDA0003616267760000166
Then to the sequence
Figure BDA0003616267760000167
Using interpolation (e.g. bilinear interpolation) to adjust its length to LmaxTo obtain
Figure BDA0003616267760000168
In a random distribution
Figure BDA0003616267760000169
Jitter distance of sampling key point coordinates
Figure BDA00036162677600001610
And adjust
Figure BDA00036162677600001611
Is composed of
Figure BDA00036162677600001612
Finally, human key point sequence is extracted
Figure BDA00036162677600001613
And
Figure BDA00036162677600001614
transforming into skeletal features using the skeletal modeling algorithm SK (-)
Figure BDA0003616267760000171
And
Figure BDA0003616267760000172
and spliced according to the time-series dimension into
Figure BDA0003616267760000173
The model is trained with δ and γ as model output supervision values as training inputs for the model.
In step S304, the action evaluation module 400 stores each model in the computer memory after the training is completed.
Method embodiment 4< action recognition inference flow stage >
In the present embodiment, a data inference approach for an action valuation model is provided, which can be performed by the action valuation component 400. This embodiment evaluates and compares a video to be evaluated with a preset standard motion video using the algorithm model proposed in < method embodiment 3 >. Wherein, the standard action video is a preset segment or some videos, and includes all the categories of the railroad semaphore action in < method embodiment 1>, and there may be multiple standard videos in the same action category.
According to fig. 5, the action evaluation model inference flow in the present embodiment includes the following steps:
step S401: action evaluation component 400 initializes a pool of standard action features Pooleval_std_featAnd action evaluation output Pooleval_result
Step S402: the action evaluation component 400 extracts human key points s corresponding to all standard actions in the marked action video by using a target detection algorithm Detect (-), a target tracking algorithm Track (-), and a skeleton modeling algorithm SK (-), wherein the human key points s correspond to all standard actions in the marked action videostd_kp(comprising the sequence S)std_kp) And human skeleton characteristics fstd_sk(comprising sequence F)std_sk) And storing the data in a standard action feature pool.
Step S403: the action evaluation component 400 receives the action feature Pool output by the action recognition component 300 in real timerec_feat
Step S404: the motion evaluation component 400 calculates the human body key points s actually detected in the motion feature pool frame by frame using a numerical difference calculation method ValueDiff (·)kp and Standard human Key points s in the Standard motion feature poolstd_kpDifference diff ofkpDifference diffkpAnd marking the corresponding frame number i in the write action evaluation output pool.
Step S405: if the length in the action characteristic pool is the same<Method example 3>Presetting a sequence window length LmaxHuman skeleton characteristic sequence FskThen the action evaluation component 400 will compare the actual human skeleton feature sequence FskAnd a standard human skeleton feature sequence F of a corresponding frame in the standard action feature poolstd_skSplicing according to the time sequence dimension to obtain F ═ Concat (F)std_sk,Fsk) The input sequence executes a difference prediction model ExeDiff (-) to obtain an action delay prediction value
Figure BDA0003616267760000181
And motion frame rate difference prediction
Figure BDA0003616267760000182
And writing the result into an action evaluation output pool, and marking a corresponding frame number range.
The embodiment of the invention has the beneficial effects that:
the comprehensive action identification and evaluation method constructs a set of universal video action identification and evaluation method system, greatly expands the scope of action description of the existing method, can identify actions and characteristics under various conditions of single person, multiple persons, person interaction, static state, dynamic state and the like, also expands the indexes of action evaluation, and has good expandability.
Through the design of the action recognition/evaluation algorithm meta-information and the runtime configuration, a user can add any specific algorithm to a preset algorithm library to realize and complete data labeling and model training. In the inference period, starting from a video original frame, according to an input data type and an output data type in algorithm meta-information configuration adopted by a user, each algorithm model can be implicitly and orderly executed in sequence, data of the required input type is taken from a feature pool as input, and output features are put into the feature pool to be used by other algorithm models later. The expert rules, the human skeleton sequence feature classification model and the background feature fusion in the background technology can be regarded as a special case or specific implementation of an algorithm model combination of the method in a certain application, but the method also has the capability of being expanded to scenes related to character interaction recognition, evaluation and the like, which is not possessed by other methods.
In addition, the comprehensive action evaluation and recognition system provided by the invention has the capability of dynamically expanding data and algorithm models, and can add, change configuration or retrain newly added data as required to adapt to the change of requirements, so that the current latest data, data processing technology and the most advanced algorithm model can be integrated along with the technical development to realize the optimal action identification and evaluation effect.
In addition, in the specific implementation of the above embodiment, the motion recognition algorithm model library and the motion evaluation algorithm model library refer to a collection of code files/model files and other related files of the motion recognition algorithm/model and the motion evaluation algorithm/model. Each algorithm/model must contain meta-information configuration files, runtime configuration files and inference scripts, and the trainable algorithm must also contain a training startup script.
The meta-information configuration file of the algorithm/model specifically includes: algorithm/model name; algorithm/model type; training start script path (if any) and reasoning start script path of the algorithm/model; the algorithm model infers the input data type; the data type of the algorithm/model reasoning output; the data type of the algorithm/model reasoning input/output comprises the following conditions: the real type (such as integer, character string, list, dictionary, numpy array, etc.) of the object in the memory and the parameters (such as list length, array shape, element type, etc.) for describing the property of the object, and the nesting structure between the real type and the parameter. An artificially defined data type tag, for example, uses a character string "obj _ id" to indicate the target identity index (whose real data type is integer or integer list) output by the target tracking algorithm.
The runtime configuration file of the algorithm/model refers to all parameter configuration information related to or dependent on the algorithm/model runtime training process or the inference process, for example: data set paths in model training, hyper-parameters, etc. The same algorithm/model may have multiple different runtime profiles.
It is to be noted that the foregoing description is only exemplary of the invention and that the principles of the technology may be employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in some detail by the above embodiments, the invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the invention, and the scope of the invention is determined by the scope of the appended claims.

Claims (10)

1. An interactive video action comprehensive recognition and evaluation system is characterized by comprising a data acquisition component 100, a data annotation component 200, an action recognition component 300 and an action evaluation component 400:
the data acquisition component 100 is used for acquiring original video data;
the action recognition component 300 is used for receiving a request of a user for adding an action recognition algorithm model component and adding the action recognition algorithm model component into an action recognition algorithm model library;
the action evaluation component 400 is used for receiving a request of a user for adding an action evaluation algorithm model component and adding the action evaluation algorithm model component into an action evaluation algorithm model library;
the action recognition component 300 is used for receiving action recognition categories and algorithm model combination configuration set by a user to form a video action comprehensive recognition method, entrusts the data labeling component 200 to start data labeling service of a corresponding task for an algorithm which needs to be trained by the action recognition component, and is used for carrying out data labeling on the basis of an interactive interface by the user to generate first labeling data; the motion recognition component uses the first labeling data to train an algorithm needing to be trained in the video motion comprehensive recognition method, and obtains and stores a corresponding recognition model;
the action evaluation component 400 receives action evaluation indexes and algorithm model combination configuration set by a user to form a video action comprehensive evaluation method, and for an algorithm to be trained by the action evaluation component 400, the data annotation component 200 is entrusted to start data annotation service of the corresponding task, so that the user can perform data annotation based on an interactive interface to generate second annotation data; the action evaluation component 400 uses the second labeled data to train an algorithm to be trained in the video action comprehensive evaluation method, and obtains and stores a corresponding evaluation model;
the motion recognition component 300 is configured to perform inference on the video data acquired by the data acquisition component in real time based on the video motion comprehensive recognition method, and output a video motion comprehensive feature recognition result;
the action evaluation component 400 is configured to perform inference on the video data acquired by the data acquisition component in real time and the recognition result of the video action comprehensive characteristic output by the action recognition component based on the video action comprehensive evaluation method, and output a video action comprehensive evaluation result.
2. The interactive video action comprehensive recognition and evaluation system according to claim 1, wherein the video action comprehensive feature recognition result outputted by the action recognition component 300 at least comprises a human body self-feature and an external object feature having a change related to the human body self-feature.
3. An interactive video motion comprehensive identification and evaluation method is characterized by comprising the following steps:
calling a data acquisition component to acquire original video data;
receiving a request of a user for adding a motion recognition algorithm model component by a motion recognition component, and adding the request into a motion recognition algorithm model library;
receiving a request of a user for adding a new action evaluation algorithm model component by an action evaluation component, and adding the request into an action evaluation algorithm model library;
receiving an action recognition category and algorithm model combination configuration set by a user by an action recognition component to form a video action comprehensive recognition method, entrusting a data annotation component to start a data annotation service of a corresponding task for an algorithm which needs to be trained by the action recognition component, and carrying out data annotation on the basis of an interactive interface by the user to generate first annotation data; the motion recognition component uses the first labeling data to train an algorithm needing to be trained in the video motion comprehensive recognition method, and obtains and stores a corresponding recognition model;
the action evaluation component receives action evaluation indexes and algorithm model combination configuration set by a user to form a video action comprehensive evaluation method, and for the algorithm which needs to be trained by the action evaluation component, the data annotation component is entrusted to start the data annotation service of the corresponding task, so that the user can perform data annotation based on an interactive interface to generate second annotation data; the action evaluation component uses the second labeling data to train an algorithm needing to be trained in the video action comprehensive evaluation method, and obtains and stores a corresponding evaluation model;
the action recognition component executes reasoning on the video data collected by the data collection component in real time based on the video action comprehensive recognition method and outputs a video action comprehensive characteristic recognition result;
the action evaluation component executes reasoning on the video data acquired by the data acquisition component in real time and the identification result of the video action comprehensive characteristic output by the action identification component based on the video action comprehensive evaluation method, and outputs a video action comprehensive evaluation result.
4. The method according to claim 3, wherein the video motion comprehensive feature recognition result output by the motion recognition component at least comprises human body self-features and external object features with related changes of the human body self-features.
5. The method of claim 3, wherein the action recognition component performs inference on the video data collected by the data collection component in real time based on the video action comprehensive recognition method, and outputs a recognition result of the video action comprehensive characteristics, comprising:
the data acquisition component inputs acquired video data into the action recognition component, the action recognition component calls the video action comprehensive recognition method to perform reasoning to obtain a video frame pool and an action characteristic pool, and simultaneously outputs a recognition result;
correspondingly, the step of the action evaluation component executing reasoning on the video data collected by the data collection component in real time and the video action comprehensive characteristic representation output by the action recognition component based on the video action comprehensive evaluation method and outputting a video action comprehensive evaluation result comprises the following steps:
the action evaluation component analyzes the video frame pool and the action characteristic pool, carries out action evaluation algorithm reasoning based on the video action comprehensive evaluation method and combined with a preset standard action video, and outputs an evaluation result.
6. The method according to any one of claims 3-5, wherein the user-defined action recognition category and algorithm model combination configuration, and the user-defined action evaluation category and algorithm model combination configuration are configured as a project profile; the project configuration file comprises meta-information configuration file paths of all algorithms or models which a user desires to run in the algorithm model library and runtime configuration file paths corresponding to the algorithms or models.
7. The method according to any one of claims 3-5, wherein the video motion comprehensive identification method is based on a single frame image or video time sequence in a video, and uses a plurality of feature detection algorithms and models for describing motion definitions and feature portrayal formed by the human body and the interaction with the environment;
wherein the action definitions and feature types include, but are not limited to:
image features coded by various coding modes in a single-frame or multi-frame sequence;
the coordinate of key points of a single person in a single frame and a coordinate sequence formed in multiple frames;
coordinates of key points of a plurality of human bodies in a single frame and a coordinate sequence formed in a plurality of frames;
the object attributes such as the category, the number and the color of the interested objects in a single frame, the position information such as the coordinate of a bounding box and the coordinate of a boundary point, and the sequence formed by the various characteristics in multiple frames;
comprehensive properties defined by characteristics of the human body and various things in a single frame;
comprehensive attributes defined by characteristic sequences of human bodies and various things in multiple frames;
the characteristics and comprehensive properties of the human body and various things which are possibly appeared in the future are determined by the characteristic sequence of the human body and various things at present.
8. The method according to claim 7, wherein the preferred algorithms adopted by the video motion comprehensive identification method can include a target detection algorithm, a human body key point detection algorithm, a target tracking algorithm, a skeleton modeling algorithm and a sequence classification algorithm, and there is a combination nesting between various algorithms for describing the motion definition and feature depiction formed by the interaction of the human body self-features and the external object features in a single frame and a video sequence.
9. The method according to any one of claims 3-5, wherein the action recognition algorithm model library and the action evaluation algorithm model library are a collection of code files, model files and other related files of action recognition algorithms and models and action evaluation algorithms and models; each algorithm/model must include a meta-information configuration file, a runtime configuration file, and an inference script, and the trainable algorithm must also include a training start script.
10. The method according to any one of claims 3 to 5, wherein the meta-information profiles of the algorithms and models specifically include: algorithm and model name; algorithm and model type; training starting script path and reasoning starting script path of the algorithm and the model; the algorithm model infers the input data type; the data type of the algorithm and the model inference output;
the data types of the algorithm and the model inference input/output comprise the following conditions: the real type of the object in the memory and the parameters for describing the attributes of the object, and the nesting structure between the real type and the parameters. The operation configuration file of the algorithm and the model configures information for all parameters related to or depended on the operation training process or the reasoning process of the algorithm and the model.
CN202210448232.8A 2022-04-24 2022-04-24 Interactive video motion comprehensive identification and evaluation system and method Pending CN114677765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210448232.8A CN114677765A (en) 2022-04-24 2022-04-24 Interactive video motion comprehensive identification and evaluation system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210448232.8A CN114677765A (en) 2022-04-24 2022-04-24 Interactive video motion comprehensive identification and evaluation system and method

Publications (1)

Publication Number Publication Date
CN114677765A true CN114677765A (en) 2022-06-28

Family

ID=82079772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210448232.8A Pending CN114677765A (en) 2022-04-24 2022-04-24 Interactive video motion comprehensive identification and evaluation system and method

Country Status (1)

Country Link
CN (1) CN114677765A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115935008A (en) * 2023-02-16 2023-04-07 杭州网之易创新科技有限公司 Video label generation method, device, medium and computing equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115935008A (en) * 2023-02-16 2023-04-07 杭州网之易创新科技有限公司 Video label generation method, device, medium and computing equipment

Similar Documents

Publication Publication Date Title
CN109919031B (en) Human behavior recognition method based on deep neural network
CN110147743B (en) Real-time online pedestrian analysis and counting system and method under complex scene
CN109146921B (en) Pedestrian target tracking method based on deep learning
Avola et al. 2-D skeleton-based action recognition via two-branch stacked LSTM-RNNs
CN107423398B (en) Interaction method, interaction device, storage medium and computer equipment
Lao et al. Automatic video-based human motion analyzer for consumer surveillance system
Hongeng et al. Video-based event recognition: activity representation and probabilistic recognition methods
CN111738218B (en) Human body abnormal behavior recognition system and method
CN111079658A (en) Video-based multi-target continuous behavior analysis method, system and device
KR20200075114A (en) System and Method for Matching Similarity between Image and Text
CN110991278A (en) Human body action recognition method and device in video of computer vision system
Gunawardena et al. Real-time automated video highlight generation with dual-stream hierarchical growing self-organizing maps
Hammam et al. Real-time multiple spatiotemporal action localization and prediction approach using deep learning
CN114677765A (en) Interactive video motion comprehensive identification and evaluation system and method
CN112101154B (en) Video classification method, apparatus, computer device and storage medium
CN113298015A (en) Video character social relationship graph generation method based on graph convolution network
CN115018215B (en) Population residence prediction method, system and medium based on multi-modal cognitive atlas
Kumar et al. Detection and Content Retrieval of Object in an Image using YOLO
Marchellus et al. Deep learning for 3d human motion prediction: State-of-the-art and future trends
CN115457620A (en) User expression recognition method and device, computer equipment and storage medium
CN111753657B (en) Self-training-based text detector training method and system
Zhao et al. Research on human behavior recognition in video based on 3DCCA
Bai et al. Continuous action recognition and segmentation in untrimmed videos
al Atrash et al. Detecting and Counting People's Faces in Images Using Convolutional Neural Networks
Indhuja et al. Suspicious Activity Detection using LRCN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination