CN114694257A - Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium - Google Patents

Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium Download PDF

Info

Publication number
CN114694257A
CN114694257A CN202210356817.7A CN202210356817A CN114694257A CN 114694257 A CN114694257 A CN 114694257A CN 202210356817 A CN202210356817 A CN 202210356817A CN 114694257 A CN114694257 A CN 114694257A
Authority
CN
China
Prior art keywords
picture
dimensional
dimensional coordinates
joint points
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210356817.7A
Other languages
Chinese (zh)
Inventor
黄凯
陈雪晨
冯锦祥
陈志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202210356817.7A priority Critical patent/CN114694257A/en
Publication of CN114694257A publication Critical patent/CN114694257A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24147Distances to closest patterns, e.g. nearest neighbour classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure provides a method, a device, equipment and a medium for identifying and evaluating a multi-user real-time three-dimensional action, which belong to the technical field of data identification and specifically comprise the following steps: obtaining a target frame and identifying a joint point and a two-dimensional coordinate corresponding to the joint point; selecting any one picture and judging whether the picture is a normal picture or not; if yes, dividing different persons in each picture; if not, ending the analysis of the current picture and continuously identifying the next picture; converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline; extracting a feature vector; and searching and comparing the characteristic vector of each normal picture and the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm according to the inherent characteristics of the human action to obtain the action type corresponding to each normal picture. According to the scheme, the obtained two-dimensional human skeleton coordinate data are converted into the three-dimensional human coordinate data, and the efficiency and accuracy of multi-person action recognition are improved.

Description

Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium
Technical Field
The embodiment of the disclosure relates to the technical field of image processing, in particular to a method, a device, equipment and a medium for identifying and evaluating multi-user real-time three-dimensional actions.
Background
At present, human body action recognition refers to recognition of human body behaviors in an image sequence or a video through a computer vision technology and a machine learning method. In recent years, human body action recognition is widely applied to aspects of intelligent monitoring, video retrieval, human-computer interaction, behavior analysis, virtual reality and the like, and research modes, model algorithms and description methods related to human body action recognition are practically and effectively developed, wherein human body skeleton key point detection is more and more emphasized, human body skeleton key point detection is one of basic algorithms of computer vision, and multi-person posture detection is required, so that a series of unprecedented challenges are faced currently.
Most of the existing action behavior recognition based on computer vision describe human body gesture points in two dimensions, such as video clips, images or paintings. Under the condition that the representation mode has no depth information, due to the problems of inaccuracy of recognition, much noise and the like caused by the influences of shielding, background, human body key point overlapping, illumination, complex actions and the like, human beings can hardly understand the action information, and under the condition of a single-angle camera, the difficulty of action behavior recognition of a plurality of people is higher, and the accuracy performance is difficult to guarantee.
Therefore, a multi-user real-time three-dimensional motion recognition and evaluation method capable of efficiently and accurately recognizing multi-user motions is needed.
Disclosure of Invention
In view of this, embodiments of the present disclosure provide a method, an apparatus, a device, and a medium for identifying and evaluating a multi-user real-time three-dimensional motion, which at least partially solve the problem in the prior art that the efficiency and accuracy of identifying a multi-user motion are poor.
In a first aspect, an embodiment of the present disclosure provides a method for identifying and evaluating a multi-user real-time three-dimensional motion, including:
detecting each image in an input data source by using YOLOV5 to obtain a target frame, and identifying joint points of all people in each image and two-dimensional coordinates corresponding to the joint points by using an openposition model;
selecting any one picture and judging whether the picture is a normal picture or not;
if yes, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, and thus dividing different persons in each picture;
if not, ending the analysis of the current picture and continuously identifying the next picture;
converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline;
extracting a characteristic vector according to the three-dimensional coordinates in each normal picture;
and searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.
According to a specific implementation manner of the embodiment of the present disclosure, all the pictures in the data source are pictures arranged according to a time sequence.
According to a specific implementation manner of the embodiment of the present disclosure, after the step of identifying joint points of all people in each image in the input data source and two-dimensional coordinates corresponding to each joint point by using the openpos model, and after the step of converting two-dimensional coordinates in all normal images into three-dimensional coordinates by using the three-dimensional human body posture estimation baseline, the method further includes:
the two-dimensional coordinates and the three-dimensional coordinates are normalized by subtracting the mean value and dividing by the standard deviation.
According to a specific implementation manner of the embodiment of the present disclosure, the step of selecting any one of the pictures and determining whether the picture is a normal picture includes:
selecting any one picture and acquiring a two-dimensional coordinate corresponding to a joint point of the picture;
if the two-dimensional coordinates can be obtained, the picture is judged to be a normal picture;
and if the two-dimensional coordinates cannot be acquired, judging that the picture is an abnormal picture.
According to a specific implementation manner of the embodiment of the present disclosure, after the step of obtaining a corresponding relationship between a target frame and a distribution of joint points obtained by openposition according to yolov5 detection, and determining that a current picture is closest to a joint point in an adjacent picture thereof by using an extent search method as a same person by combining change conditions of the joint points in the adjacent pictures, thereby dividing different persons in each picture, the method further includes:
and filling the missing value of the current picture according to the two-dimensional coordinates corresponding to the joint points in the adjacent pictures.
According to a specific implementation manner of the embodiment of the present disclosure, the expression of the three-dimensional human body posture estimation baseline is
Figure BDA0003583378180000031
Wherein x isiAs input two-dimensional coordinates, f (x)i) For the converted three-dimensional coordinates, yiThe coordinates of the joint points of the real human body, L is a loss function.
In a second aspect, an embodiment of the present disclosure provides a device for identifying and evaluating a multi-user real-time three-dimensional motion, including:
the identification module is used for detecting a target frame obtained in each picture in the input data source by using the YOLOV5 and identifying joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points by using an openposition model;
the judging module is used for selecting any one picture and judging whether the picture is a normal picture or not;
if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture;
if not, ending the analysis of the current picture and continuously identifying the next picture;
the conversion module is used for converting the two-dimensional coordinates in all the normal pictures into three-dimensional coordinates by utilizing the three-dimensional human body posture estimation baseline;
the extraction module is used for extracting a characteristic vector according to the three-dimensional coordinate in each normal picture;
and the comparison module is used for searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.
In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,
the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the multi-person real-time three-dimensional motion recognition and evaluation method in the first aspect or any implementation manner of the first aspect.
In a fourth aspect, the disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for identifying and evaluating a multi-person real-time three-dimensional motion in any implementation manner of the first aspect or the first aspect.
In a fifth aspect, the present disclosure also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is caused to execute the method for identifying and evaluating a multi-person real-time three-dimensional motion in any of the foregoing first aspect or the foregoing implementation manners of the first aspect.
The scheme for identifying and evaluating the real-time three-dimensional actions of multiple persons in the embodiment of the disclosure comprises the following steps: detecting each image in an input data source by using YOLOV5 to obtain a target frame, and identifying joint points of all people in each image and two-dimensional coordinates corresponding to the joint points by using an openposition model; selecting any one picture and judging whether the picture is a normal picture or not; if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture; if not, ending the analysis of the current picture and continuously identifying the next picture; converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline; extracting a characteristic vector according to the three-dimensional coordinates in each normal picture; and searching and comparing the characteristic vector of each normal picture with a sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.
The beneficial effects of the embodiment of the disclosure are: through the scheme disclosed by the invention, the 2-dimensional human skeleton coordinate data acquired by openposition is converted into the 3-dimensional human coordinate data, so that the human posture point can be acquired more accurately, and the efficiency and accuracy of multi-user action recognition are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of a method for identifying and evaluating a multi-user real-time three-dimensional motion provided in an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of a multi-user real-time three-dimensional motion recognition and evaluation device according to an embodiment of the disclosure;
fig. 3 is a schematic diagram of an electronic device according to an embodiment of the disclosure.
Detailed Description
The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.
The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The existing posture estimation related technologies mainly include three types, one type is an optical capturing instrument like opti-track, a plurality of Mark points are attached to a human body, the positions of the Mark points are detected, and the positions of human body joint points are finally determined. The second type is similar to Kinect, and the three-dimensional joint point coordinates of the human body are obtained in a binocular positioning mode through a plurality of cameras. The third type is that under a monocular camera, the coordinates of two-dimensional joint points of the human body are detected by using a deep learning algorithm, but the conversion to three-dimensional coordinates is not realized, and further information of joints of the human body cannot be acquired. When the action behaviors of a plurality of people are recognized under the condition of a single camera and a single angle in the prior art, the action analysis and recognition which cannot be completed correctly due to inaccuracy of the posture points are caused by the fact that two-dimensional human posture points are generated and the influence of other factors such as action shielding or illumination, background and the like is inevitable. And the algorithm for converting part of 2-dimensional coordinate data into 3-dimensional coordinate data depends on limited training data, and the converted image only has a good effect on part of image processing, and cannot be widely used.
The embodiment of the disclosure provides a multi-user real-time three-dimensional motion recognition and evaluation method, which can be applied to a multi-user motion recognition process in a sports test or a daily training scene.
Referring to fig. 1, a flow chart of a method for identifying and evaluating a multi-user real-time three-dimensional motion provided by the embodiment of the disclosure is schematically shown. As shown in fig. 1, the method mainly comprises the following steps:
s101, detecting an obtained target frame in each picture in an input data source by using a YOLOV5, and identifying joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points by using an openposition model;
furthermore, all the pictures in the data source are arranged according to a time sequence.
In specific implementation, considering that the existing method obtains the three-dimensional joint point coordinates of the human body in a binocular positioning mode through a plurality of cameras, the method is large in use limitation, multiple in hardware assistance, high in cost and not beneficial to popularization and promotion, the data source can adopt image acquisition equipment such as a multi-angle camera to acquire videos including a plurality of human bodies in a physical training test in real time, and all the obtained pictures are pictures arranged according to time sequences.
The data source may then be input into the openpos model, and the openpos model identifies joint points of all the persons in each picture in the data source and two-dimensional coordinates corresponding to each joint point.
For example, the overall flow of the openpos method can be summarized as the following steps:
(a) inputting a color character image, video (or rtsp video stream);
(b) predicting the positions of key points of a detection target by a feed-forward network, and obtaining a two-dimensional confidence mapping S and a group of 2D vector fields L;
(c) coding the associated vector field between each part of the detection target by using S and L;
(d) and finally marking the 2D key points of all the detection targets by analyzing the affinity vector field of the detection targets through confidence.
(e) The joint points corresponding to the identified point sequences are shown in Table 1
Figure BDA0003583378180000071
Figure BDA0003583378180000081
TABLE 1
S102, selecting any picture and judging whether the picture is a normal picture or not;
optionally, the step of selecting any one of the pictures and determining whether the picture is a normal picture includes: selecting any one picture and acquiring a two-dimensional coordinate corresponding to a joint point of the picture;
if the two-dimensional coordinates can be obtained, the picture is judged to be a normal picture;
and if the two-dimensional coordinates cannot be acquired, judging that the picture is an abnormal picture.
In specific implementation, considering the requirement of timeliness, people in a video need to be divided by a simple and quick method for accurately identifying the actions of multiple people, before that, any picture can be selected and then judged whether to be a normal picture, for example, when no people appear in input data, a two-dimensional coordinate of any human skeleton cannot be obtained, the picture is judged to be an abnormal picture, and if the two-dimensional coordinate of the picture can be obtained, the picture is judged to be a normal picture.
If yes, executing step S103, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change situation of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture;
if the current picture is a normal picture, the corresponding relationship between the target frame and the distribution of the joint points obtained by openposition can be obtained by detecting according to YOLOV5, and the closest joint points in the current picture and the adjacent pictures thereof are determined to be the same person by using an extent search method in combination with the change situation of the joint points in the adjacent pictures, so as to divide different persons in each picture.
Optionally, in step S103, after the corresponding relationship between the target frame and the distribution of the joint points obtained by openposition is detected and obtained according to YOLOV5, and the change condition of the joint points in the adjacent pictures is combined, the closest distance between the joint points in the current picture and the adjacent pictures is determined as the same person by using a breadth search method, so as to divide different people in each picture, the method further includes:
and filling the missing value of the current picture according to the two-dimensional coordinates corresponding to the joint points in the adjacent pictures.
Of course, considering that the human body may have shielding during movement, after the division of the current picture is completed, the missing value of the current picture is filled according to the two-dimensional coordinates corresponding to the joint points in the adjacent pictures, so that the subsequent recognition result is more accurate.
If not, executing the step S104, ending the analysis of the current picture and continuously identifying the next picture;
s105, converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline;
further, the expression of the three-dimensional human body posture estimation baseline is
Figure BDA0003583378180000091
Figure BDA0003583378180000092
Wherein x isiAs input two-dimensional coordinates, f (x)i) For the converted three-dimensional coordinates, yiThe coordinates of the joint points of the real human body, L is a loss function.
Optionally, after the openpos model is used to identify the joint points of all the persons in each image in the input data source and the two-dimensional coordinates corresponding to each joint point in each image in step S101, and after the three-dimensional human body posture estimation baseline is used to convert the two-dimensional coordinates in all the normal images into three-dimensional coordinates in step S105, the method further includes:
the two-dimensional coordinates and the three-dimensional coordinates are normalized by subtracting the mean value and dividing by the standard deviation.
In specific implementation, considering that a specific corresponding action type cannot be identified for a two-dimensional coordinate image, after all normal pictures are divided by people, the two-dimensional coordinates of all the normal pictures need to be converted into three-dimensional coordinates. Specifically, the two-dimensional coordinates in all normal pictures can be converted into three-dimensional coordinates by using the three-dimensional human body posture estimation baseline, and then the three-dimensional coordinates are normalized by using a method of subtracting an average value and dividing the average value by a standard deviation, so that the accuracy of a subsequent identification process is higher.
For example, the goal is to estimate the human joint position in 3-dimensional space given a two-dimensional input. Formally, our input is a series of 2d points x ∈ R2n and our output is 3d space f (x)i) E.g. a series of points in R3 n. We aim to learn a loss function
Figure BDA0003583378180000101
We then train our model with batch normalization, discard, relu activation, and residual concatenation starting from the existing open source 3-dimensional human pose dataset based on a simple, deep, multi-layered neural network to build the mapping of two-dimensional poses to three-dimensional poses. The model learns the projection relation of the two-dimensional posture of the three-dimensional human body posture data set under the Openpos model, 400-500-ten-thousand training parameters are obtained through model training, and the condition that two-dimensional posture points are mapped to a three-dimensional space can be met. Meanwhile, the human body posture enhancer is constructed by considering the geometric factors (body size, fixed angles and distances of adjacent joint points) of the human body posture so as to judge the rationality of the proportion of the local joint angles, the size of the whole person and the space distances between the joint points, so that the rationality check of the generated 3-dimensional human body posture and the correction of the local joint points can be completed. A plurality of changes of the architecture and training hyper-parameters of the human body are widely tested through the conventional COCO and Human3.6M data set, and experiments show that the module can smoothly and quickly complete the conversion from 2D coordinate data to 3D coordinate data, and effectively improve the precision of the conventional 3-dimensional human body posture estimation.
S106, extracting a feature vector according to the three-dimensional coordinates in each normal picture;
in specific implementation, in a motion posture, the distance between joints and the body distortion direction visually represent the feature vector of the posture, and after three-dimensional coordinates corresponding to the joint points of different persons in each normal picture are obtained, the feature vector can be extracted according to the three-dimensional coordinates in each normal picture.
S107, searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.
After the feature vector of each normal picture is obtained, the feature vector of each normal picture and the sample vector corresponding to the sample action type can be searched and compared by using the k-best algorithm. The method mainly judges which type the current posture belongs to according to the distance between the current posture and the template posture. Before calculation, some abnormal values need to be processed according to the characteristics of human joints. There are some gestures in which although the distance of the entire joint is close to the template distance, there is a partial joint bending direction inconsistency, and such gestures cannot be recognized as the same class as the template gesture. The joint bending directions are basically consistent when the distance is close, and the joint bending directions can be identified as the same category
According to the multi-user real-time three-dimensional motion recognition and evaluation method provided by the embodiment, the data source is acquired by the method under the monocular camera, then the division of the target person is realized, and the 2-dimensional human skeleton coordinate data acquired by openposition is converted into the 3-dimensional human coordinate data, so that the human posture point can be acquired more accurately, and the efficiency and accuracy of multi-user motion recognition are improved.
Corresponding to the above method embodiment, referring to fig. 2, the disclosed embodiment further provides a multi-person real-time three-dimensional motion recognition and evaluation apparatus 20, including:
the identification module 201 is configured to detect, by using the YOLOV5, an obtained target frame in each picture in the input data source, and identify, by using an openposition model, joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points;
the judging module 202 is used for selecting any one picture and judging whether the picture is a normal picture;
if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture;
if not, ending the analysis of the current picture and continuously identifying the next picture;
the conversion module 203 is used for converting the two-dimensional coordinates in all the normal pictures into three-dimensional coordinates by using the three-dimensional human body posture estimation baseline;
an extracting module 204, configured to extract a feature vector according to the three-dimensional coordinate in each of the normal pictures;
a comparing module 205, configured to perform search and comparison on the feature vector of each normal picture and the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm, so as to obtain an action type corresponding to each normal picture.
The apparatus shown in fig. 2 may correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.
Referring to fig. 3, an embodiment of the present disclosure also provides an electronic device 30, including: at least one processor and a memory communicatively coupled to the at least one processor. The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the multi-user real-time three-dimensional motion recognition and evaluation method in the method embodiment.
The disclosed embodiment also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the multi-person real-time three-dimensional motion recognition and evaluation method in the foregoing method embodiment.
Embodiments of the present disclosure also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the multi-person real-time three-dimensional motion recognition evaluation method in the foregoing method embodiments.
Referring now to FIG. 3, a schematic diagram of an electronic device 30 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 3, the electronic device 30 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data necessary for the operation of the electronic apparatus 30 are also stored. The processing device 301, the ROM302, and the RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 30 to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device 30 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processing device 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the steps associated with the method embodiments.
Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, enable the electronic device to perform the steps associated with the method embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware.
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (9)

1. A multi-person real-time three-dimensional action recognition and evaluation method is characterized by comprising the following steps:
detecting a target frame obtained in each picture in an input data source by using YOLOV5, and identifying joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points by using an openposition model;
selecting any one picture and judging whether the picture is a normal picture or not;
if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture;
if not, ending the analysis of the current picture and continuously identifying the next picture;
converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline;
extracting a characteristic vector according to the three-dimensional coordinates in each normal picture;
and searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.
2. The method of claim 1, wherein all the pictures in the data source are arranged according to a time sequence.
3. The method according to claim 1, wherein after the step of using the openpos model to identify the joint points of all the persons in each picture of the input data source and the two-dimensional coordinates corresponding to each joint point, and after the step of using the three-dimensional human body posture estimation baseline to convert the two-dimensional coordinates in all the normal pictures into three-dimensional coordinates, the method further comprises:
the two-dimensional coordinates and the three-dimensional coordinates are normalized by subtracting the mean value and dividing by the standard deviation.
4. The method of claim 1, wherein the step of selecting any one of the pictures and determining whether the picture is a normal picture comprises:
selecting any one picture and acquiring a two-dimensional coordinate corresponding to a joint point of the picture;
if the two-dimensional coordinates can be obtained, the picture is judged to be a normal picture;
and if the two-dimensional coordinates cannot be acquired, judging that the picture is an abnormal picture.
5. The method as claimed in claim 1, wherein after the step of detecting a corresponding relationship between the target frame and the distribution of the joint points obtained by openposition according to yolov5, and determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth search method according to the change of the joint points in the adjacent pictures, thereby dividing different people in each picture, the method further comprises:
and filling the missing value of the current picture according to the two-dimensional coordinates corresponding to the joint points in the adjacent pictures.
6. The method of claim 1, wherein the three-dimensional body pose estimation baseline is expressed as
Figure FDA0003583378170000021
Wherein x isiAs input two-dimensional coordinates, f (x)i) For the converted three-dimensional coordinates, yiThe coordinates of the joint points of the real human body, L is a loss function.
7. A multi-person real-time three-dimensional motion recognition and evaluation device is characterized by comprising:
the identification module is used for detecting a target frame obtained in each picture in the input data source by using the YOLOV5 and identifying joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points by using an openposition model;
the judging module is used for selecting any one picture and judging whether the picture is a normal picture or not;
if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by Openpos, which is obtained by the detection of YOLOV5, and the change condition of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, so as to divide different persons in each picture;
if not, ending the analysis of the current picture and continuously identifying the next picture;
the conversion module is used for converting the two-dimensional coordinates in all the normal pictures into three-dimensional coordinates by utilizing the three-dimensional human body posture estimation baseline;
the extraction module is used for extracting a characteristic vector according to the three-dimensional coordinate in each normal picture;
and the comparison module is used for searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.
8. An electronic device, characterized in that the electronic device comprises:
at least one processor; and the number of the first and second groups,
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-person real-time three-dimensional motion recognition assessment method of any one of the preceding claims 1-6.
9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the multi-person real-time three-dimensional motion recognition assessment method of any one of the preceding claims 1-6.
CN202210356817.7A 2022-04-06 2022-04-06 Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium Pending CN114694257A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210356817.7A CN114694257A (en) 2022-04-06 2022-04-06 Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210356817.7A CN114694257A (en) 2022-04-06 2022-04-06 Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN114694257A true CN114694257A (en) 2022-07-01

Family

ID=82143235

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210356817.7A Pending CN114694257A (en) 2022-04-06 2022-04-06 Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN114694257A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116403288A (en) * 2023-04-28 2023-07-07 中南大学 Motion gesture recognition method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116403288A (en) * 2023-04-28 2023-07-07 中南大学 Motion gesture recognition method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN110866953B (en) Map construction method and device, and positioning method and device
CN110221690B (en) Gesture interaction method and device based on AR scene, storage medium and communication terminal
WO2020010979A1 (en) Method and apparatus for training model for recognizing key points of hand, and method and apparatus for recognizing key points of hand
US8442307B1 (en) Appearance augmented 3-D point clouds for trajectory and camera localization
CN109582880B (en) Interest point information processing method, device, terminal and storage medium
WO2015172679A1 (en) Image processing method and device
CN112435338B (en) Method and device for acquiring position of interest point of electronic map and electronic equipment
CN108960114A (en) Human body recognition method and device, computer readable storage medium and electronic equipment
CN111597975B (en) Personnel action detection method and device and electronic equipment
CN110660102B (en) Speaker recognition method, device and system based on artificial intelligence
CN111008576B (en) Pedestrian detection and model training method, device and readable storage medium
CN112115894B (en) Training method and device of hand key point detection model and electronic equipment
CN112232311B (en) Face tracking method and device and electronic equipment
CN111784776A (en) Visual positioning method and device, computer readable medium and electronic equipment
Ryumin et al. Automatic detection and recognition of 3D manual gestures for human-machine interaction
KR20220098312A (en) Method, apparatus, device and recording medium for detecting related objects in an image
US20200357137A1 (en) Determining a Pose of an Object in the Surroundings of the Object by Means of Multi-Task Learning
AU2021204583A1 (en) Methods, apparatuses, devices and storage medium for predicting correlation between objects
CN113557546B (en) Method, device, equipment and storage medium for detecting associated objects in image
CN113610034B (en) Method and device for identifying character entities in video, storage medium and electronic equipment
CN109829431B (en) Method and apparatus for generating information
WO2022237048A1 (en) Pose acquisition method and apparatus, and electronic device, storage medium and program
CN114694257A (en) Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium
CN112270242B (en) Track display method and device, readable medium and electronic equipment
CN111914841B (en) CT image processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination