CN114694257A

CN114694257A - Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium

Info

Publication number: CN114694257A
Application number: CN202210356817.7A
Authority: CN
Inventors: 黄凯; 陈雪晨; 冯锦祥; 陈志刚
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2022-04-06
Filing date: 2022-04-06
Publication date: 2022-07-01

Abstract

The embodiment of the disclosure provides a method, a device, equipment and a medium for identifying and evaluating a multi-user real-time three-dimensional action, which belong to the technical field of data identification and specifically comprise the following steps: obtaining a target frame and identifying a joint point and a two-dimensional coordinate corresponding to the joint point; selecting any one picture and judging whether the picture is a normal picture or not; if yes, dividing different persons in each picture; if not, ending the analysis of the current picture and continuously identifying the next picture; converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline; extracting a feature vector; and searching and comparing the characteristic vector of each normal picture and the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm according to the inherent characteristics of the human action to obtain the action type corresponding to each normal picture. According to the scheme, the obtained two-dimensional human skeleton coordinate data are converted into the three-dimensional human coordinate data, and the efficiency and accuracy of multi-person action recognition are improved.

Description

Multi-user real-time three-dimensional action recognition and evaluation method, device, equipment and medium

Technical Field

The embodiment of the disclosure relates to the technical field of image processing, in particular to a method, a device, equipment and a medium for identifying and evaluating multi-user real-time three-dimensional actions.

Background

At present, human body action recognition refers to recognition of human body behaviors in an image sequence or a video through a computer vision technology and a machine learning method. In recent years, human body action recognition is widely applied to aspects of intelligent monitoring, video retrieval, human-computer interaction, behavior analysis, virtual reality and the like, and research modes, model algorithms and description methods related to human body action recognition are practically and effectively developed, wherein human body skeleton key point detection is more and more emphasized, human body skeleton key point detection is one of basic algorithms of computer vision, and multi-person posture detection is required, so that a series of unprecedented challenges are faced currently.

Most of the existing action behavior recognition based on computer vision describe human body gesture points in two dimensions, such as video clips, images or paintings. Under the condition that the representation mode has no depth information, due to the problems of inaccuracy of recognition, much noise and the like caused by the influences of shielding, background, human body key point overlapping, illumination, complex actions and the like, human beings can hardly understand the action information, and under the condition of a single-angle camera, the difficulty of action behavior recognition of a plurality of people is higher, and the accuracy performance is difficult to guarantee.

Therefore, a multi-user real-time three-dimensional motion recognition and evaluation method capable of efficiently and accurately recognizing multi-user motions is needed.

Disclosure of Invention

In view of this, embodiments of the present disclosure provide a method, an apparatus, a device, and a medium for identifying and evaluating a multi-user real-time three-dimensional motion, which at least partially solve the problem in the prior art that the efficiency and accuracy of identifying a multi-user motion are poor.

In a first aspect, an embodiment of the present disclosure provides a method for identifying and evaluating a multi-user real-time three-dimensional motion, including:

detecting each image in an input data source by using YOLOV5 to obtain a target frame, and identifying joint points of all people in each image and two-dimensional coordinates corresponding to the joint points by using an openposition model;

selecting any one picture and judging whether the picture is a normal picture or not;

if yes, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, and thus dividing different persons in each picture;

if not, ending the analysis of the current picture and continuously identifying the next picture;

converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline;

extracting a characteristic vector according to the three-dimensional coordinates in each normal picture;

and searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.

According to a specific implementation manner of the embodiment of the present disclosure, all the pictures in the data source are pictures arranged according to a time sequence.

According to a specific implementation manner of the embodiment of the present disclosure, after the step of identifying joint points of all people in each image in the input data source and two-dimensional coordinates corresponding to each joint point by using the openpos model, and after the step of converting two-dimensional coordinates in all normal images into three-dimensional coordinates by using the three-dimensional human body posture estimation baseline, the method further includes:

the two-dimensional coordinates and the three-dimensional coordinates are normalized by subtracting the mean value and dividing by the standard deviation.

According to a specific implementation manner of the embodiment of the present disclosure, the step of selecting any one of the pictures and determining whether the picture is a normal picture includes:

selecting any one picture and acquiring a two-dimensional coordinate corresponding to a joint point of the picture;

if the two-dimensional coordinates can be obtained, the picture is judged to be a normal picture;

and if the two-dimensional coordinates cannot be acquired, judging that the picture is an abnormal picture.

According to a specific implementation manner of the embodiment of the present disclosure, after the step of obtaining a corresponding relationship between a target frame and a distribution of joint points obtained by openposition according to yolov5 detection, and determining that a current picture is closest to a joint point in an adjacent picture thereof by using an extent search method as a same person by combining change conditions of the joint points in the adjacent pictures, thereby dividing different persons in each picture, the method further includes:

and filling the missing value of the current picture according to the two-dimensional coordinates corresponding to the joint points in the adjacent pictures.

According to a specific implementation manner of the embodiment of the present disclosure, the expression of the three-dimensional human body posture estimation baseline is

Wherein x is_iAs input two-dimensional coordinates, f (x)_i) For the converted three-dimensional coordinates, y_iThe coordinates of the joint points of the real human body, L is a loss function.

In a second aspect, an embodiment of the present disclosure provides a device for identifying and evaluating a multi-user real-time three-dimensional motion, including:

the identification module is used for detecting a target frame obtained in each picture in the input data source by using the YOLOV5 and identifying joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points by using an openposition model;

the judging module is used for selecting any one picture and judging whether the picture is a normal picture or not;

if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture;

the conversion module is used for converting the two-dimensional coordinates in all the normal pictures into three-dimensional coordinates by utilizing the three-dimensional human body posture estimation baseline;

the extraction module is used for extracting a characteristic vector according to the three-dimensional coordinate in each normal picture;

and the comparison module is used for searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.

In a third aspect, an embodiment of the present disclosure further provides an electronic device, where the electronic device includes:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein, the first and the second end of the pipe are connected with each other,

the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the multi-person real-time three-dimensional motion recognition and evaluation method in the first aspect or any implementation manner of the first aspect.

In a fourth aspect, the disclosed embodiments also provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the method for identifying and evaluating a multi-person real-time three-dimensional motion in any implementation manner of the first aspect or the first aspect.

In a fifth aspect, the present disclosure also provides a computer program product, which includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is caused to execute the method for identifying and evaluating a multi-person real-time three-dimensional motion in any of the foregoing first aspect or the foregoing implementation manners of the first aspect.

The scheme for identifying and evaluating the real-time three-dimensional actions of multiple persons in the embodiment of the disclosure comprises the following steps: detecting each image in an input data source by using YOLOV5 to obtain a target frame, and identifying joint points of all people in each image and two-dimensional coordinates corresponding to the joint points by using an openposition model; selecting any one picture and judging whether the picture is a normal picture or not; if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change conditions of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture; if not, ending the analysis of the current picture and continuously identifying the next picture; converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline; extracting a characteristic vector according to the three-dimensional coordinates in each normal picture; and searching and comparing the characteristic vector of each normal picture with a sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.

The beneficial effects of the embodiment of the disclosure are: through the scheme disclosed by the invention, the 2-dimensional human skeleton coordinate data acquired by openposition is converted into the 3-dimensional human coordinate data, so that the human posture point can be acquired more accurately, and the efficiency and accuracy of multi-user action recognition are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings needed to be used in the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present disclosure, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a method for identifying and evaluating a multi-user real-time three-dimensional motion provided in an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of a multi-user real-time three-dimensional motion recognition and evaluation device according to an embodiment of the disclosure;

fig. 3 is a schematic diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

The embodiments of the present disclosure are described in detail below with reference to the accompanying drawings.

The embodiments of the present disclosure are described below with specific examples, and other advantages and effects of the present disclosure will be readily apparent to those skilled in the art from the disclosure in the specification. It is to be understood that the described embodiments are merely illustrative of some, and not restrictive, of the embodiments of the disclosure. The disclosure may be embodied or carried out in various other specific embodiments, and various modifications and changes may be made in the details within the description without departing from the spirit of the disclosure. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

It is noted that various aspects of the embodiments are described below within the scope of the appended claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the disclosure, one skilled in the art should appreciate that one aspect described herein may be implemented independently of any other aspects and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. Additionally, such an apparatus may be implemented and/or such a method may be practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present disclosure, and the drawings only show the components related to the present disclosure rather than the number, shape and size of the components in actual implementation, and the type, amount and ratio of the components in actual implementation may be changed arbitrarily, and the layout of the components may be more complicated.

In addition, in the following description, specific details are provided to facilitate a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.

The existing posture estimation related technologies mainly include three types, one type is an optical capturing instrument like opti-track, a plurality of Mark points are attached to a human body, the positions of the Mark points are detected, and the positions of human body joint points are finally determined. The second type is similar to Kinect, and the three-dimensional joint point coordinates of the human body are obtained in a binocular positioning mode through a plurality of cameras. The third type is that under a monocular camera, the coordinates of two-dimensional joint points of the human body are detected by using a deep learning algorithm, but the conversion to three-dimensional coordinates is not realized, and further information of joints of the human body cannot be acquired. When the action behaviors of a plurality of people are recognized under the condition of a single camera and a single angle in the prior art, the action analysis and recognition which cannot be completed correctly due to inaccuracy of the posture points are caused by the fact that two-dimensional human posture points are generated and the influence of other factors such as action shielding or illumination, background and the like is inevitable. And the algorithm for converting part of 2-dimensional coordinate data into 3-dimensional coordinate data depends on limited training data, and the converted image only has a good effect on part of image processing, and cannot be widely used.

The embodiment of the disclosure provides a multi-user real-time three-dimensional motion recognition and evaluation method, which can be applied to a multi-user motion recognition process in a sports test or a daily training scene.

Referring to fig. 1, a flow chart of a method for identifying and evaluating a multi-user real-time three-dimensional motion provided by the embodiment of the disclosure is schematically shown. As shown in fig. 1, the method mainly comprises the following steps:

s101, detecting an obtained target frame in each picture in an input data source by using a YOLOV5, and identifying joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points by using an openposition model;

furthermore, all the pictures in the data source are arranged according to a time sequence.

In specific implementation, considering that the existing method obtains the three-dimensional joint point coordinates of the human body in a binocular positioning mode through a plurality of cameras, the method is large in use limitation, multiple in hardware assistance, high in cost and not beneficial to popularization and promotion, the data source can adopt image acquisition equipment such as a multi-angle camera to acquire videos including a plurality of human bodies in a physical training test in real time, and all the obtained pictures are pictures arranged according to time sequences.

The data source may then be input into the openpos model, and the openpos model identifies joint points of all the persons in each picture in the data source and two-dimensional coordinates corresponding to each joint point.

For example, the overall flow of the openpos method can be summarized as the following steps:

(a) inputting a color character image, video (or rtsp video stream);

(b) predicting the positions of key points of a detection target by a feed-forward network, and obtaining a two-dimensional confidence mapping S and a group of 2D vector fields L;

(c) coding the associated vector field between each part of the detection target by using S and L;

(d) and finally marking the 2D key points of all the detection targets by analyzing the affinity vector field of the detection targets through confidence.

(e) The joint points corresponding to the identified point sequences are shown in Table 1

TABLE 1

S102, selecting any picture and judging whether the picture is a normal picture or not;

optionally, the step of selecting any one of the pictures and determining whether the picture is a normal picture includes: selecting any one picture and acquiring a two-dimensional coordinate corresponding to a joint point of the picture;

In specific implementation, considering the requirement of timeliness, people in a video need to be divided by a simple and quick method for accurately identifying the actions of multiple people, before that, any picture can be selected and then judged whether to be a normal picture, for example, when no people appear in input data, a two-dimensional coordinate of any human skeleton cannot be obtained, the picture is judged to be an abnormal picture, and if the two-dimensional coordinate of the picture can be obtained, the picture is judged to be a normal picture.

If yes, executing step S103, according to the corresponding relation between the target frame and the distribution of the joint points obtained by openposition obtained by the detection of YOLOV5, and by combining the change situation of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, thereby dividing different persons in each picture;

if the current picture is a normal picture, the corresponding relationship between the target frame and the distribution of the joint points obtained by openposition can be obtained by detecting according to YOLOV5, and the closest joint points in the current picture and the adjacent pictures thereof are determined to be the same person by using an extent search method in combination with the change situation of the joint points in the adjacent pictures, so as to divide different persons in each picture.

Optionally, in step S103, after the corresponding relationship between the target frame and the distribution of the joint points obtained by openposition is detected and obtained according to YOLOV5, and the change condition of the joint points in the adjacent pictures is combined, the closest distance between the joint points in the current picture and the adjacent pictures is determined as the same person by using a breadth search method, so as to divide different people in each picture, the method further includes:

Of course, considering that the human body may have shielding during movement, after the division of the current picture is completed, the missing value of the current picture is filled according to the two-dimensional coordinates corresponding to the joint points in the adjacent pictures, so that the subsequent recognition result is more accurate.

If not, executing the step S104, ending the analysis of the current picture and continuously identifying the next picture;

s105, converting two-dimensional coordinates in all normal pictures into three-dimensional coordinates by using a three-dimensional human body posture estimation baseline;

further, the expression of the three-dimensional human body posture estimation baseline is

Optionally, after the openpos model is used to identify the joint points of all the persons in each image in the input data source and the two-dimensional coordinates corresponding to each joint point in each image in step S101, and after the three-dimensional human body posture estimation baseline is used to convert the two-dimensional coordinates in all the normal images into three-dimensional coordinates in step S105, the method further includes:

In specific implementation, considering that a specific corresponding action type cannot be identified for a two-dimensional coordinate image, after all normal pictures are divided by people, the two-dimensional coordinates of all the normal pictures need to be converted into three-dimensional coordinates. Specifically, the two-dimensional coordinates in all normal pictures can be converted into three-dimensional coordinates by using the three-dimensional human body posture estimation baseline, and then the three-dimensional coordinates are normalized by using a method of subtracting an average value and dividing the average value by a standard deviation, so that the accuracy of a subsequent identification process is higher.

For example, the goal is to estimate the human joint position in 3-dimensional space given a two-dimensional input. Formally, our input is a series of 2d points x ∈ R2n and our output is 3d space f (x)_i) E.g. a series of points in R3 n. We aim to learn a loss function

We then train our model with batch normalization, discard, relu activation, and residual concatenation starting from the existing open source 3-dimensional human pose dataset based on a simple, deep, multi-layered neural network to build the mapping of two-dimensional poses to three-dimensional poses. The model learns the projection relation of the two-dimensional posture of the three-dimensional human body posture data set under the Openpos model, 400-500-ten-thousand training parameters are obtained through model training, and the condition that two-dimensional posture points are mapped to a three-dimensional space can be met. Meanwhile, the human body posture enhancer is constructed by considering the geometric factors (body size, fixed angles and distances of adjacent joint points) of the human body posture so as to judge the rationality of the proportion of the local joint angles, the size of the whole person and the space distances between the joint points, so that the rationality check of the generated 3-dimensional human body posture and the correction of the local joint points can be completed. A plurality of changes of the architecture and training hyper-parameters of the human body are widely tested through the conventional COCO and Human3.6M data set, and experiments show that the module can smoothly and quickly complete the conversion from 2D coordinate data to 3D coordinate data, and effectively improve the precision of the conventional 3-dimensional human body posture estimation.

S106, extracting a feature vector according to the three-dimensional coordinates in each normal picture;

in specific implementation, in a motion posture, the distance between joints and the body distortion direction visually represent the feature vector of the posture, and after three-dimensional coordinates corresponding to the joint points of different persons in each normal picture are obtained, the feature vector can be extracted according to the three-dimensional coordinates in each normal picture.

S107, searching and comparing the characteristic vector of each normal picture with the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm to obtain the action type corresponding to each normal picture.

After the feature vector of each normal picture is obtained, the feature vector of each normal picture and the sample vector corresponding to the sample action type can be searched and compared by using the k-best algorithm. The method mainly judges which type the current posture belongs to according to the distance between the current posture and the template posture. Before calculation, some abnormal values need to be processed according to the characteristics of human joints. There are some gestures in which although the distance of the entire joint is close to the template distance, there is a partial joint bending direction inconsistency, and such gestures cannot be recognized as the same class as the template gesture. The joint bending directions are basically consistent when the distance is close, and the joint bending directions can be identified as the same category

According to the multi-user real-time three-dimensional motion recognition and evaluation method provided by the embodiment, the data source is acquired by the method under the monocular camera, then the division of the target person is realized, and the 2-dimensional human skeleton coordinate data acquired by openposition is converted into the 3-dimensional human coordinate data, so that the human posture point can be acquired more accurately, and the efficiency and accuracy of multi-user motion recognition are improved.

Corresponding to the above method embodiment, referring to fig. 2, the disclosed embodiment further provides a multi-person real-time three-dimensional motion recognition and evaluation apparatus 20, including:

the identification module 201 is configured to detect, by using the YOLOV5, an obtained target frame in each picture in the input data source, and identify, by using an openposition model, joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points;

the judging module 202 is used for selecting any one picture and judging whether the picture is a normal picture;

the conversion module 203 is used for converting the two-dimensional coordinates in all the normal pictures into three-dimensional coordinates by using the three-dimensional human body posture estimation baseline;

an extracting module 204, configured to extract a feature vector according to the three-dimensional coordinate in each of the normal pictures;

a comparing module 205, configured to perform search and comparison on the feature vector of each normal picture and the sample vector corresponding to the sample action type by using a k-nearest neighbor algorithm, so as to obtain an action type corresponding to each normal picture.

The apparatus shown in fig. 2 may correspondingly execute the content in the above method embodiment, and details of the part not described in detail in this embodiment refer to the content described in the above method embodiment, which is not described again here.

Referring to fig. 3, an embodiment of the present disclosure also provides an electronic device 30, including: at least one processor and a memory communicatively coupled to the at least one processor. The memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the multi-user real-time three-dimensional motion recognition and evaluation method in the method embodiment.

The disclosed embodiment also provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to execute the multi-person real-time three-dimensional motion recognition and evaluation method in the foregoing method embodiment.

Embodiments of the present disclosure also provide a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions that, when executed by a computer, cause the computer to perform the multi-person real-time three-dimensional motion recognition evaluation method in the foregoing method embodiments.

Referring now to FIG. 3, a schematic diagram of an electronic device 30 suitable for use in implementing embodiments of the present disclosure is shown. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, mobile terminals such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), in-vehicle terminals (e.g., car navigation terminals), and the like, and fixed terminals such as digital TVs, desktop computers, and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 30 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM303, various programs and data necessary for the operation of the electronic apparatus 30 are also stored. The processing device 301, the ROM302, and the RAM303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, image sensor, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 30 to communicate wirelessly or by wire with other devices to exchange data. While the figures illustrate an electronic device 30 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processing device 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the steps associated with the method embodiments.

Alternatively, the computer readable medium carries one or more programs which, when executed by the electronic device, enable the electronic device to perform the steps associated with the method embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware.

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.

The above description is only for the specific embodiments of the present disclosure, but the scope of the present disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present disclosure should be covered within the scope of the present disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. A multi-person real-time three-dimensional action recognition and evaluation method is characterized by comprising the following steps:

detecting a target frame obtained in each picture in an input data source by using YOLOV5, and identifying joint points of all people in each picture and two-dimensional coordinates corresponding to the joint points by using an openposition model;

2. The method of claim 1, wherein all the pictures in the data source are arranged according to a time sequence.

3. The method according to claim 1, wherein after the step of using the openpos model to identify the joint points of all the persons in each picture of the input data source and the two-dimensional coordinates corresponding to each joint point, and after the step of using the three-dimensional human body posture estimation baseline to convert the two-dimensional coordinates in all the normal pictures into three-dimensional coordinates, the method further comprises:

4. The method of claim 1, wherein the step of selecting any one of the pictures and determining whether the picture is a normal picture comprises:

5. The method as claimed in claim 1, wherein after the step of detecting a corresponding relationship between the target frame and the distribution of the joint points obtained by openposition according to yolov5, and determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth search method according to the change of the joint points in the adjacent pictures, thereby dividing different people in each picture, the method further comprises:

6. The method of claim 1, wherein the three-dimensional body pose estimation baseline is expressed as

7. A multi-person real-time three-dimensional motion recognition and evaluation device is characterized by comprising:

if so, according to the corresponding relation between the target frame and the distribution of the joint points obtained by Openpos, which is obtained by the detection of YOLOV5, and the change condition of the joint points in the adjacent pictures, determining that the joint points in the current picture and the adjacent pictures are closest to each other by using a breadth searching method to be the same person, so as to divide different persons in each picture;

8. An electronic device, characterized in that the electronic device comprises:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the multi-person real-time three-dimensional motion recognition assessment method of any one of the preceding claims 1-6.

9. A non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the multi-person real-time three-dimensional motion recognition assessment method of any one of the preceding claims 1-6.