WO2022100119A1 - 多人三维动作捕捉方法、存储介质及电子设备 - Google Patents

多人三维动作捕捉方法、存储介质及电子设备 Download PDF

Info

Publication number
WO2022100119A1
WO2022100119A1 PCT/CN2021/105486 CN2021105486W WO2022100119A1 WO 2022100119 A1 WO2022100119 A1 WO 2022100119A1 CN 2021105486 W CN2021105486 W CN 2021105486W WO 2022100119 A1 WO2022100119 A1 WO 2022100119A1
Authority
WO
WIPO (PCT)
Prior art keywords
joint point
person
matching
joint
information
Prior art date
Application number
PCT/CN2021/105486
Other languages
English (en)
French (fr)
Inventor
邱又海
唐毅
徐倩茹
Original Assignee
深圳市洲明科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202011260863.4A external-priority patent/CN112379773B/zh
Application filed by 深圳市洲明科技股份有限公司 filed Critical 深圳市洲明科技股份有限公司
Priority to JP2022546634A priority Critical patent/JP7480312B2/ja
Publication of WO2022100119A1 publication Critical patent/WO2022100119A1/zh
Priority to US18/166,649 priority patent/US20230186684A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T19/00Manipulating 3D models or images for computer graphics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/251Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Definitions

  • the present invention relates to the technical field of human-computer interaction, and in particular, to a multi-person three-dimensional motion capture method, a storage medium and an electronic device.
  • VR Virtual Reality, virtual reality
  • motion capture technology is a key technology in VR technology.
  • the traditional motion capture system adopts optical, inertial or mechanical working principles, and its users need to rely on third-party hardware, wear specific motion capture clothing and props to interact with the system.
  • the problem is that the interaction is not direct enough.
  • Computers can perform pose estimation on human movements, facial expressions, and finger movements in images.
  • the OpenPose (Github open source human gesture recognition project) human gesture recognition project can realize gesture estimation such as human movements, facial expressions and finger movements, and supports the motion capture function of a single person.
  • OpenPose can perform 3D (three dimensional, three dimensional) reconstruction of the 2D (two dimensional, two-dimensional) joint points of the person; when in a crowded scene, due to the coverage of the camera's perspective, the camera recognizes The number of personnel varies, and its algorithm for selecting personnel is difficult to match the correct 2D joint points of the personnel, so that the 3D joint point information of the personnel is incorrectly reconstructed.
  • One aspect of the present application provides a multi-person three-dimensional motion capture method, comprising the steps of:
  • Acquire synchronous video frames of a plurality of cameras perform joint identification and positioning on each of the synchronous video frames of each of the cameras, and obtain 2D joint information of each person under each of the cameras;
  • 3D reconstruction of each person is performed according to the best 2D personnel matching scheme, and 3D information of each person is generated for 3D motion capture.
  • Another aspect of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the above-mentioned multi-person three-dimensional motion capture method.
  • an electronic device including a memory and a processor.
  • the memory stores a computer program that can be executed on the processor.
  • the processor executes the computer program, the above-mentioned multi-person three-dimensional motion capture method is implemented.
  • Another aspect of the present application provides a multi-person three-dimensional motion capture device, comprising:
  • a synchronized video frame acquisition module is configured to acquire synchronized video frames of a plurality of cameras, and respectively perform joint point identification and positioning on each of the synchronized video frames of each of the cameras, and obtain each person under each of the cameras. 2D joint point information;
  • the clustering module is configured to calculate the back-projection rays of each 2D joint point, and perform clustering according to the coordinates of the two endpoints when the back-projection rays are the shortest distance, so as to obtain the best 2D personnel matching scheme, so
  • the back-projection ray is a ray directed to the 2D joint point by the camera corresponding to the 2D joint point;
  • the reconstruction module is configured to perform 3D reconstruction of each person according to the best 2D personnel matching scheme, and generate 3D information of each person for 3D motion capture.
  • FIG. 1 is a schematic flowchart of a multi-person 3D motion capture method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of an overall flow of a multi-person 3D motion capture method involved in an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a matching flow of 2D joint points involved in an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a reconstruction process of a 3D joint point of a person involved in an embodiment of the present invention
  • FIG. 5 is a schematic structural diagram of an OpenPose 25-type joint point involved in an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of a simulated scene of a multi-person 3D motion capture method involved in an embodiment of the present invention
  • FIG. 7 is a schematic diagram of a clustering effect of the multi-person 3D motion capture method involved in an embodiment of the present invention in a simulated scene;
  • FIG. 8 is a schematic diagram of matching of 2D joint points involved in an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of the effect of filtering the confidence level of joint points involved in an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of matching of left hand joints of five persons involved in an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of matching in a multi-person scene of a multi-person 3D motion capture method involved in an embodiment of the present invention
  • FIG. 12 is a schematic diagram of calculation of the shortest line segment between two back-projection rays involved in an embodiment of the present invention
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
  • a multi-person three-dimensional motion capture method including steps:
  • Acquire synchronous video frames of a plurality of cameras perform joint identification and positioning on each of the synchronous video frames of each of the cameras, and obtain 2D joint information of each person under each of the cameras;
  • 3D reconstruction of each person is performed according to the best 2D personnel matching scheme, and 3D information of each person is generated for 3D motion capture.
  • the beneficial effects of the present invention are: collecting multiple cameras to collect and identify different perspectives in the scene, and calculating the shortest distance based on the principle that the back-projected rays of the same joint point should be overlapped by different cameras and clustering to correctly match and correspond to multiple joint points of different cameras, so as to solve the problem of 2D point set matching caused by occlusion and 2D misrecognition in dense scenes, and realize the motion capture of multiple people.
  • the calculation of the back projection ray of each 2D joint point specifically includes the following steps:
  • the camera 3D coordinates are obtained according to the coordinate origin and the corresponding camera extrinsic parameters, and the camera 3D coordinates are used as the endpoints of the back-projection rays, according to the corresponding camera intrinsic parameters, camera extrinsic parameters and each
  • the 2D joint point coordinate information of the 2D joint point obtains the direction vector of the back-projection ray, so as to obtain the back-projection ray of each of the 2D joint points.
  • the clustering according to the shortest distance between the back-projection rays to obtain the joint point matching scheme of each joint point type includes the following steps:
  • All the endpoints in each of the joint types are clustered to output a clustering result, and a joint matching scheme for each of the joint types is combined and generated according to the clustering results.
  • the obtaining the endpoint coordinates of the two endpoints when the two back-projection rays are the shortest line segment specifically includes the following steps, including the following steps:
  • the vector expression of is the back projected ray
  • the vector expression of is the back projected ray
  • the vector expression of the two endpoint coordinates is: and note vector then the back projection ray and the back projected ray
  • the vector expression for the shortest line segment between is the first formula: the s c and the t c are scalars;
  • the back-projection rays of different cameras to the same joint point should overlap, but based on the observation error and the error of 2D recognition, the back-projection rays of different cameras to the same joint point may not overlap, but definitely Therefore, the calculation method of the shortest distance between two straight lines in space is used to obtain the shortest distance between the back-projection rays, so as to judge whether the joint points of different cameras are the same joint point.
  • the shortest distance d min
  • the k-nearest neighbor classification algorithm is used to perform nearest neighbor matching on all the endpoints in the endpoint set R in each of the joint point types, so as to obtain the clusters formed by the nearest neighbor endpoints, according to the number of endpoints in each of the clusters. How many, sort the clusters from high to low, and obtain the sorted cluster information in each of the joint point types;
  • Generate a matching scheme for each of the joint point types obtain the current matching item, traverse each endpoint in each cluster, if the traversed endpoint is not used by the current matching item and the current matching item does not the camera identifier of the traversed endpoint, add the endpoint to the current matching item, until each endpoint in the cluster is traversed, then obtain a joint point matching scheme for each joint point type;
  • the joint point matching scheme of merging all the joint point types to obtain the best 2D personnel matching scheme specifically includes the following steps:
  • the number of occurrences of each matching item in the joint point matching schemes of all relevant node types is counted, and the matching items with the most occurrences are combined into the best 2D personnel matching scheme.
  • the back-projection rays of different cameras to the same joint point should be very close. If it exceeds the preset distance threshold, it is considered not to belong to the same joint point, which will not affect the
  • the accuracy of the joint point matching can also reduce the calculation amount of the algorithm because the number of endpoints is reduced, so as to further improve the processing speed and work efficiency; at the same time, the endpoint clustering algorithm adopts the density clustering algorithm, and the algorithm only requires the setting of the field radius parameter , there is no need to pre-determine the number of clusters, and the personnel matching scheme is generated through the clustering algorithm and the combined optimization algorithm, which can identify the number of personnel in the scene to the greatest extent.
  • performing 3D reconstruction of each person according to the best 2D personnel matching scheme, and generating the 3D information of each person specifically includes the following steps:
  • the plurality of 2D joint point coordinate information is calculated into 3D three-dimensional space coordinates, and the 3D joint point information is obtained;
  • the 3D information of a person is generated according to all the retained 3D joint point information
  • 3D information of multiple persons corresponding to the number of matching items is obtained.
  • the offset distance between the center of gravity of a person in the next synchronous video frame and all the people in the current synchronous video frame is greater than the preset maximum movement distance threshold, it will be identified as a new person , assigning a new personnel unique ID to the newly appearing personnel, and adding the position information of the newly appearing personnel to the historical personnel position set;
  • the offset distance between a certain person in the current synchronous video frame and the position of the center of gravity of all personnel in the next synchronous video frame is greater than the preset maximum movement distance threshold, it is determined as a leaving person,
  • the location information of the person's unique ID corresponding to the leaving person is removed from the historical person location set.
  • the 3D position information of the person can be tracked through the unique ID of the person and the offset distance of the position of the center of gravity of the person in the upper and lower frames.
  • Another embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the above-mentioned multi-person three-dimensional motion capture method.
  • FIG. 13 another embodiment of the present invention provides an electronic device, including a memory and a processor, the memory stores a computer program that can run on the processor, and the processor implements the above-mentioned computer program when the processor executes the computer program. Multiplayer 3D motion capture method.
  • the following embodiments are provided with a plurality of cameras in a scene to capture the three-dimensional motion of the personnel in the scene, such as multi-person three-dimensional motion capture under the VR scene, etc.
  • the details are as follows:
  • the first embodiment of the present invention is:
  • a multi-person three-dimensional motion capture method comprising the steps of:
  • multiple cameras are assumed in a scene where three-dimensional motion capture is required, and synchronized video frames of the multiple cameras are acquired, wherein each video frame is a 3-channel BGR format image.
  • A ⁇ n
  • OpenPose is used to detect and locate the joint points of each human body in all cameras, and obtain 2D joint point information of each person under each camera.
  • the number of 2D persons recognized by each camera is K n
  • k is a person number
  • its value range is 1...K n .
  • X ⁇ Rnode nk
  • Personnel composition and cameras to which 2D personnel belong have no intersection. Since the personnel entering the scene are uncertain, multiple groups of 2D personnel need to be matched.
  • threshold filtering is performed on the confidence of joint points to filter and remove misidentified joint points.
  • This method allows users to change the setting threshold condition filtering to solve the problem caused by site interference. of misidentification.
  • the threshold of the confidence degree can be flexibly set according to the scene scene or practical application.
  • the back-projection rays of different cameras to the same joint point should overlap, but based on the observation error and the error of 2D recognition, the back-projection rays of different cameras to the same joint point may not overlap, but It will definitely be very close. Therefore, by clustering the coordinates of the two endpoints when the back-projection rays are the shortest distance, the matching relationship of the joint points in different cameras can be judged, so as to determine the correspondence of different people in different cameras. In this way, different perspectives in the scene are collected and identified, and the shortest distance calculation and clustering are carried out through the principle that the back-projection rays of the same joint point should be overlapped by different cameras. Multiple joint points of different cameras are correctly matched and corresponded, thus solving the problem of 2D point set matching caused by occlusion and 2D misrecognition in dense scenes, and realizing the motion capture of multiple people.
  • the second embodiment of the present invention is:
  • a multi-person three-dimensional motion capture method on the basis of the above-mentioned first embodiment, this embodiment is further limited as follows:
  • step S2 specifically includes the following steps:
  • the identified 2D joint points are composed of 25 types. Therefore, the types of joint points in this embodiment are 25 types. In other embodiments, if the joint point type is appropriately selected according to actual requirements, it also belongs to the equivalent embodiments of the present application.
  • the 25 types of 2D joint points are respectively subjected to the matching calculation only for the joint points of the same joint point type, thereby reducing the number of matching operations and improving the processing speed. Especially when the number of personnel is large, the processing speed can be significantly improved.
  • a 2D personnel matching scheme is obtained from the joint point matching scheme of 25 categories, and the 2D personnel matching scheme includes several matching items, and the number of the matching items is the same as the number of people entering the scene.
  • the third embodiment of the present invention is:
  • a multi-person three-dimensional motion capture method on the basis of the above-mentioned second embodiment, as shown in FIG. 2 , this embodiment is further limited as follows:
  • step S22 specifically includes the following steps:
  • each 2D joint point obtain the camera 3D coordinates according to the coordinate origin and the corresponding camera external parameters, take the camera 3D coordinates as the endpoint of the back-projection ray, and obtain the camera 3D coordinates according to the corresponding camera internal parameters, camera external parameters and each 2D joint point.
  • the 2D joint point coordinate information obtains the direction vector of the back-projection ray to obtain the back-projection ray of each 2D joint point.
  • the back projection ray of the 2D joint point uv nk is calculated, uv nk ⁇ Rnode nk .
  • the back projection ray consists of the camera 3D coordinates and the direction vector on the 2D joint point uv nk .
  • extrinsic n is the camera external parameter
  • O is the coordinate origin (0,0,0)
  • the direction vector can be obtained by calculating the 2D joint point coordinate information, the internal and external parameters of the camera, and the formula is as follows:
  • intrinsic n' is the camera internal parameter.
  • a 3D joint point of a certain person observed by the multi-camera must have its back-projection ray intersected at the joint point, but based on the observation error and 2D recognition error, The back-projection rays of different cameras to the same joint point may not coincide, but they will definitely be very close. Therefore, in 3D space, a large number of endpoints should be gathered near the 3D joint points.
  • the distance between the two back-projection rays is the distance between the two in space.
  • Fig. 12 is only used as an illustration for calculation. Therefore, obtaining the endpoint coordinates of the two endpoints when the two back-projection rays are the shortest line segment specifically includes the following steps, including the following steps:
  • back projection ray The starting point coordinates and direction vectors of are s 0 and back projection ray The starting point coordinates and direction vectors of are t 0 and then the backprojection ray The vector expression of is back projection ray The vector expression of is
  • S223 Cluster all endpoints in each joint point type to output a clustering result, and combine and generate a joint point matching scheme for each joint point type according to the clustering result.
  • step S223 specifically includes the following steps:
  • S2232 Filter and delete the shortest line segment whose shortest distance d min exceeds the preset distance threshold in each joint point type, and form the endpoint set R with the endpoints corresponding to the shortest line segments retained in each joint point type, and each endpoint corresponds to For 2D joint points, camera identification and personnel identification;
  • S2233 perform nearest neighbor matching on all endpoints in the endpoint set R in each joint point type by the k nearest neighbor classification algorithm to obtain clusters formed by the nearest neighbor endpoints, according to the number of endpoints in each cluster, from high to high Sort the clusters to the low ground to get the sorted cluster information in each joint point type;
  • the clustering will be performed according to the density in space, and the only parameter input by the user is the ⁇ (Eps) neighborhood.
  • the implementation of this method is as follows:
  • the distance between the endpoint and the endpoint is obtained by the k nearest neighbor classification algorithm knn. Assuming that there are at most two endpoints in the ⁇ (Eps) neighborhood, the value of knn can be set as:
  • each endpoint in the cluster may correspond to a 2D joint point uv nk and a 2D person Rnode nk .
  • FIG. 10 it is a matching scheme for the joints of the left hand of five persons.
  • the number of occurrences of each matching item in the joint point matching schemes of all relevant node types is counted, and the matching items with the most occurrences are combined into the best 2D personnel matching scheme.
  • step S3 specifically includes the following steps:
  • each point corresponds to a back-projection ray
  • each straight line and the straight line of different cameras find the shortest straight line distance.
  • 24 shortest line segments can be obtained, with a total of 48 endpoints.
  • the radius of the density clustering algorithm is 0.2 meters
  • 13 results can be clustered, and the number of endpoints of two results is 12 as shown in the circle.
  • the 12 points are exactly the endpoints of the shortest line segment between the 4 straight lines, then according to the clustering results of the close points, a 2D point matching scheme is generated:
  • Group 1 clustering results: ⁇ 0-0, 1-1, 2-1, 3-1 ⁇ .
  • this embodiment is described with the application scenario of four cameras and two personnel.
  • the four cameras are 0, 1, 2, and 3 from left to right, and the two personnel are respectively is 0 and 1, then the final matching scheme should be: ⁇ 0-0,1-0,2-1,3-0 ⁇ , ⁇ 0-1,1-1,2-0,3-1 ⁇ .
  • ⁇ 0-1, 1-1, 2-0, 3-1 ⁇ , ⁇ 0-0, 1-0, 2-1, 3-0 ⁇ is the most joint point matching scheme, which is as high as 21 times, and the matching schemes of other nodes belong to this subset of schemes, so select the ⁇ 0-1,1-1,2-0,3-1 ⁇ , ⁇ 0-0,1-0,2-1,3- 0 ⁇ personnel matching scheme, which is consistent with the expected scheme.
  • the fourth embodiment of the present invention is:
  • a multi-person three-dimensional motion capture method on the basis of the above-mentioned third embodiment, this embodiment further includes the following steps:
  • S36 Acquire the 3D information of multiple persons obtained from the current synchronized video frame and the next synchronized video frame respectively, and determine whether the offset distance of the center of gravity of each person in the upper and lower synchronized video frames is less than the preset maximum movement distance Threshold, if it is, then update the position information corresponding to the unique ID (Identity document, identity mark) of the personnel in the historical personnel position collection according to the position of the center of gravity of each person in the next synchronous video frame;
  • c' is the central position of the personnel in the historical personnel LC; set the maximum movement distance D max of the personnel, if there is d(c, c') ⁇ D max , which means that the change of the distance between the central positions of the personnel is less than the threshold, c and c' are regarded as the same Person, update the center position of person c'; for The distance between c and any person in LC is greater than the threshold, c is a new person, generate an ID for the person, and LC joins the person; for If the distance between c' and any person in C is greater than the threshold, c' is regarded as a leaving person, and LC removes the person.
  • Another embodiment of the present invention provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the multi-person three-dimensional motion capture method in any one of the above-mentioned first to fourth embodiments. .
  • the sixth embodiment of the present invention is:
  • An electronic device 1 comprising a memory 3, a processor 2, and a computer program stored on the memory 3 and running on the processor 2, and the processor 2 implements any one of the above-mentioned embodiments one to four when the computer program is executed. Multiplayer 3D motion capture method.
  • the multi-person three-dimensional motion capture method, storage medium and electronic device can perform posture detection through OpenPose posture detection and deep convolutional network, without any wearable device. ; And collect multiple cameras to collect and identify different perspectives in the scene, and calculate and cluster the shortest distance based on the principle that the back-projection rays of the same joint point should be overlapped by different cameras, so as to analyze the multiple points of different cameras.
  • the joint points are correctly matched and corresponded, thus solving the problem of 2D point set matching caused by occlusion and 2D misrecognition in dense scenes, and realizing the motion capture of many people.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Graphics (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)

Abstract

本发明公开了多人三维动作捕捉方法、存储介质及电子设备,其包括步骤:获取多个相机的同步视频帧,对每一个相机的每一张同步视频帧分别进行关节点识别及定位,得到每一个相机下每一个人员的2D关节点信息;计算每一个2D关节点的反投影射线,根据反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案,反投影射线为2D关节点所对应的相机指向2D关节点的射线;根据最佳的2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息,以进行三维动作捕捉。

Description

多人三维动作捕捉方法、存储介质及电子设备
本申请要求于2020年11月12日提交的申请号为202011260863.4、名称为“多人三维动作捕捉方法、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及人机交互技术领域,特别涉及多人三维动作捕捉方法、存储介质及电子设备。
背景技术
从2013年开始,VR(Virtual Reality,虚拟现实)技术在全球开始逐渐普及,其中,动作捕捉技术是VR技术中的一项关键性技术。
现阶段,传统的动作捕捉系统,采用光学式、惯性式或机械式的工作原理,其用户都需要借助第三方硬件、戴特定的动作捕捉服以及道具,来与系统进行交互,以上存在人机交互不够直接的问题。
随着深度学习的技术进度,计算机算力、算法精准度都得到了有效提高,尤其是图像处理领域,计算机可以对图像中的人体动作、面部表情、手指运动等进行实现姿态估计。其中,OpenPose(Github开源人体姿态识别项目)人体姿态识别项目能实现人体动作、面部表情和手指运动等姿态估计,并支持单个人员的动作捕捉功能。当单个人员进入场景后,OpenPose能够很好的对人员2D(two dimensional,二维)关节点进行3D(three dimensional,三维)重建;当在人群密集场景下,由于相机视角覆盖的原因,相机识别的人员个数不尽相同,其算法选取人员的算法很难匹配出正确的人员2D关节点,使得错误重建出人员3D关节点信息。
发明内容
本申请一方面提供一种多人三维动作捕捉方法,包括步骤:
获取多个相机的同步视频帧,对每一个所述相机的每一张所述同步视频帧分别进行关节点识别及定位,得到每一个所述相机下每一个人员的2D关节点信息;
计算每一个2D关节点的反投影射线,根据所述反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案,所述反投影射线为所述2D关节点所对应的相机指向所述2D关节点的射线;
根据最佳的所述2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息,以进行三维动作捕捉。
本申请另一方面提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的多人三维动作捕捉方法。
本申请再一方面提供一种电子设备,包括存储器和处理器,存储器上存储可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述的多人三维动作捕捉方法。
本申请再一方面提供一种多人三维动作捕捉装置,包括:
同步视频帧获取模块,配置为获取多个相机的同步视频帧,对每一个所述相机的每一张所述同步视频帧分别进行关节点识别及定位,得到每一个所述相机下每一个人员的2D关节点信息;
聚类模块,配置为计算每一个2D关节点的反投影射线,根据所述反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案,所述反投影射线为所述2D关节点所对应的相机指向所述2D关节点的射线;以及
重建模块,配置为根据最佳的所述2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息,以进行三维动作捕捉。本发明的各个实施例的细节将在下面的附图和描述中进行说明。根据说明书、附图以及权利要求书的记载,本领域技术人员将容易理解本发明的其它特征、解决的问题以及有益效果。
附图说明
为了更好地描述和说明本申请的实施例,可参考一幅或多幅附图,但用于描述附图的附加细节或示例不应当被认为是对本申请的发明创造、目前所描述的实施例或优选方式中任何一者的范围的限制。
图1为本发明实施例的多人三维动作捕捉方法的流程示意图;
图2为本发明实施例涉及的多人三维动作捕捉方法的整体流程示意图;
图3为本发明实施例涉及的2D关节点的匹配流程示意图;
图4为本发明实施例涉及的人员3D关节点的重建流程示意图;
图5为本发明实施例涉及的OpenPose 25类关节点的结构示意图;
图6为本发明实施例涉及的多人三维动作捕捉方法的模拟场景示意图;
图7为本发明实施例涉及的多人三维动作捕捉方法在模拟场景下的聚类效果示意图;
图8为本发明实施例涉及的2D关节点的匹配示意图;
图9为本发明实施例涉及的关节点置信度过滤的效果示意图;
图10为本发明实施例涉及的五个人员左手关节的匹配示意图;
图11为本发明实施例涉及的多人三维动作捕捉方法在多人场景下的匹配示意图;
图12为本发明实施例涉及的两条反投影射线之间最短线段的计算示意图;
图13为本发明实施例的电子设备的结构示意图。
具体实施方式
为详细说明本发明的技术内容、所实现目的及效果,以下结合实施方式并配合附图予以说明。
请参照图1至图11,一种多人三维动作捕捉方法,包括步骤:
获取多个相机的同步视频帧,对每一个所述相机的每一张所述同步视频帧分别进行关节点识别及定位,得到每一个所述相机下每一个人员的2D关节点信息;
计算每一个2D关节点的反投影射线,根据所述反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案,所述反投影射线为所述2D关节点所对应的相机指向所述2D关节点的射线;
根据最佳的所述2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息,以进行三维动作捕捉。
从上述描述可知,本发明的有益效果在于:而采集多个相机对场景内的不同视角进行采集识别,并通过不同相机对同一关节点的反投影射线之间应当重合的原理进行最短距离的计算和聚类,以对不同相机的多个关节点进行正确匹配对应,从而解决了密集场景下存在的遮挡、2D误识别而导致2D点集匹配问题,实现了对多人的动作捕捉。
进一步地,所述计算每一个2D关节点的反投影射线,根据所述反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案具体包括以下步骤:
对所有所述2D关节点信息按照关节点类型进行划分,以得到每一个所述关节类型所对应的所有所述2D关节点信息;
对每一个所述关节点类型的每一个所述2D关节点信息分别进行反投影射线的计算,根据所述反投影射线之间的最短距离来进行聚类,以得到每一个所述关节点类型的关节点匹配方案;
合并所有所述关节点类型的关节点匹配方案,以得到最佳的2D人员匹配方案。
从上述描述可知,在进行关节点定位时,按照关节点类型对多个关节点进行分类,只对同关节点类型的关节点进行匹配计算,从而减少了匹配运算次数,提高处理速度。
进一步地,所述计算每一个2D关节点的反投影射线具体包括以下步骤:
对每一个所述2D关节点,根据坐标原点和对应的相机外参得到相机3D坐标,将所述相机3D坐标作为所述反投影射线的端点,根据对应的相机内参、相机外参和每一个所述2D关节点的2D关节点坐标信息获得所述反投影射线的方向向量,以得到每一个所述2D关节点的反投影射线。
从上述描述可知,对于所要建立的坐标系有个坐标原点,根据实际空间相对关系,则可以得到相机3D坐标和2D关节点所对应的3D坐标,从而得到每一个2D关节点的反投影射线。
进一步地,所述根据所述反投影射线之间的最短距离来进行聚类,以得到每一个所述关节点类型的关节点匹配方案包括以下步骤:
分别对每一个所述相机中的每一个所述2D关节点的反投影射线与同一所述关节点类型下其他所述相机中的每一个所述2D关节点的反投影射线进行两两最短线段的计算,并获得两条所述反投影射线之间为最短线段时的两个端点的端点坐标;
对每一个所述关节点类型内的所有所述端点进行聚类,以输出聚类结果,根据所述聚类结果组合生成每一个所述关节点类型的关节点匹配方案。
进一步地,所述获得两条反投影射线之间为最短线段时的两个端点的端点坐标具体包括以下步骤包括如下步骤:
所述反投影射线
Figure PCTCN2021105486-appb-000001
的起点坐标和方向向量分别为s 0
Figure PCTCN2021105486-appb-000002
所述反投影射线
Figure PCTCN2021105486-appb-000003
的起点坐标和方向向量分别为t 0
Figure PCTCN2021105486-appb-000004
则所述反投影射线
Figure PCTCN2021105486-appb-000005
的向量表达式为
Figure PCTCN2021105486-appb-000006
所述反投影射线
Figure PCTCN2021105486-appb-000007
的向量表达式为
Figure PCTCN2021105486-appb-000008
假设所述反投影射线
Figure PCTCN2021105486-appb-000009
和所述反投影射线
Figure PCTCN2021105486-appb-000010
之间为最短线段时的两个端点的端点坐标分别为s j和t j,则两个所述端点坐标的向量表达式为
Figure PCTCN2021105486-appb-000011
Figure PCTCN2021105486-appb-000012
记向量
Figure PCTCN2021105486-appb-000013
则所述反投影射线
Figure PCTCN2021105486-appb-000014
和所述反投影射线
Figure PCTCN2021105486-appb-000015
之间的最短线段的向量表达式为第一公式:
Figure PCTCN2021105486-appb-000016
所述s c和所述t c为标量;
将所述最短线段的向量表达式分别代入到
Figure PCTCN2021105486-appb-000017
Figure PCTCN2021105486-appb-000018
得到第二公式和第三公式,所述第二公式为
Figure PCTCN2021105486-appb-000019
所述第三公式为
Figure PCTCN2021105486-appb-000020
Figure PCTCN2021105486-appb-000021
则得到所述标量s c=(be-cd)/(ac-b 2)和所述标量t c=(ae-bd)/(ac-b 2);
判断所述ac-b 2是否等于0,若是,则所述反投影射线
Figure PCTCN2021105486-appb-000022
和所述反投影射线
Figure PCTCN2021105486-appb-000023
为平行关系,则在任意一条所述反投影射线上指定一固定点作为其中一个端点,代入所述第二公式和所述第三公式即可得到两个所述端点坐标s j和t j,否则根据所述标量s c=(be-cd)/(ac-b 2)和所述标量t c=(ae-bd)/(ac-b 2)求得两个所述端点坐标s j和t j
从上述描述可知,不同相机对同一关节点的反投影射线之间应当重合,但是基于观测误差以及2D识别的误差,不同相机对同一关节点的反投影射线之间有可能不会重合,但肯定会靠的很近,由此,采用空间两直线之间的最短距离的计算方式来求得反投影射线之间的最短距离,从而用来判断不同相机的关节点之间是否为同一关节点。
进一步地,所述对每一个所述关节点类型内的所有所述端点进行聚类,以输出聚类结果,根据所述聚类结果组合生成每一个所述关节点类型的关节点匹配方案具体包括以下步骤:
在得到任意两条所述反投影射线之间为最短线段时的两个端点坐标时,根据两个所述端点坐标s j和t j得到所述最短线段的最短距离d min=|s j-t j|;
将每一个所述关节点类型内最短距离d min超过预设距离阈值的所述最短线段进行过滤删除,将每一个所述关节点类型内保留下来的所述最短线段所对应的端点形成端点集合R,每一个所述端点对应于2D关节点、相机标识和人员标识;
通过k最邻近分类算法对每一个所述关节点类型内的所述端点集合R内的所有端点进行近邻匹配,以得到近邻端点所形成的簇团,按照每一个所述簇团内端点数量的多少,由高往低地对所述簇团进行排序,得到每一个所述关节点类型内排序后的簇团信息;
对每一个所述关节点类型分别进行匹配方案的生成:获取当前匹配项,遍历每一个簇团中每一个端点,若所遍历的端点未被所述当前匹配项使用且所述当前匹配项没有所遍历的端点的所述相机标识,则在所述当前匹配项中增加所述端点,直至所述簇团中的每一个端点均遍历完成,则得到每一个关节点类型的关节点匹配方案;
所述合并所有所述关节点类型的关节点匹配方案,以得到最佳的2D人员匹配方案具体包括以下步骤:
统计所有关节点类型的关节点匹配方案中每一个匹配项的出现次数,将出现次数最多的匹配项组合为最佳的2D人员匹配方案。
从上述描述可知,虽然存在误差,但是不同相机对同一关节点的反投影射线之间也应该靠的很近,若是超过了预设距离阈值,则认为不属于同一关节点,这样既不会影响关节点匹配的准确性,也能因为减少了端点个数而减少算法的计算量,以进一步提高处理速度和工作效率;同时,端点聚类算法采用密度聚类算法,算法仅要求设置领域半径参数,不需要预先给定簇的数量,通过该聚类算法及组合优化算法生成人员匹配方案,可最大程度的识别场景内的人员个数。
进一步地,所述根据最佳的所述2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息具体包括以下步骤:
遍历所述2D人员匹配方案的每一个匹配项,获取每一个所述匹配项中每一个关节点所包括的2D关节点坐标信息和对应的相机参数信息;
依据多相机视觉三维测量技术将多个所述2D关节点坐标信息计算成3D三维空间坐标,得到3D关节点信息;
根据所述3D关节点信息,计算重投影误差,将所述重投影误差超过预设投影误差的所述3D关节点信息进行过滤;
直至遍历完每一个所述匹配项中的所有关节点,根据保留下来的所有所述3D关节点信息生成一个人员的3D信息;
直至遍历完所有所述匹配项后,则得到与所述匹配项数量对应的多个人员的3D信息。
从上述描述可知,当两个人员距离很近时,容易出现其中一个关节点同时属于两个人员,由此将这样的关节点进行删除,从而得到两个人员各自的关节点团簇,以降低误识别的出现概率,提高3D重建的准确性。
进一步地,还包括以下步骤:
获取当前同步视频帧和下一同步视频帧所分别得到的多个人员的3D信息,判断上下两个同步视频帧中每一个人员的人体重心位置的偏移距离是否小于预设最大运动距离阈值,若是,则根据所述下一同步视频帧的每一个人员的人体重心位置来更新历史人员位置集合中人员唯一ID所对应的位置信息;
若所述下一同步视频帧的某一人员与所述当前同步视频帧中所有的人员的人体重心位置的偏移距离均大于所述预设最大运动距离阈值,则将其认定为新出现人员,为所述新出现人员赋予一个新的人员唯一ID,并将所述新出现人员的位置信息增加至所述历史人员位置集合;
若所述当前同步视频帧的某一人员与所述下一同步视频帧中所有的人员的人体重心位置的偏移距离均大于所述预设最大运动距离阈值,则将其认定为离开人员,从所述历史人员位置集合中移除所述离开人员所对应的人员唯一ID的位置信息。
从上述描述可知,通过人员唯一ID和上下帧的人体重心位置的偏移距离,从而跟踪到人员3D位置信息。
本发明另一实施方式提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述的多人三维动作捕捉方法。
请参照图13、本发明另一实施方式提供了一种电子设备,包括存储器和处理器,存储器上存储可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述的多人三维动作捕捉方法。
其中,关于上述两种实施方式中的计算机程序所包含的图像缩放方法的具体实现过程和对应效果,可以参照前面实施方式的多人三维动作捕捉方法中的相关描述。
由此,以下实施例在一个场景内设置有多个相机,以对该场景内的人员进行三维动作捕 捉,比如VR场景下的多人三维动作捕捉等等,具体如下:
请参照图1至图11,本发明的实施例一为:
一种多人三维动作捕捉方法,包括步骤:
S1、获取多个相机的同步视频帧,对每一个相机的每一张同步视频帧分别进行关节点识别及定位,得到每一个相机下每一个人员的2D关节点信息;
在本实施例中,在需要进行三维动作捕捉的场景中假设多个相机,并获取多个相机的同步视频帧,其中,每一个视频帧为3通道BGR格式图像。设A={n|1≤n≤N},A为相机的集合,n为相机序号。在本实施例中,相机设置有12个,在其他等同实施例中,相机的个数可以根据实际需求进行设置。
在本实施例中,利用OpenPose检测出所有相机中每一个人体的关节点并定位,获得每个相机下每个人员的2D关节点信息。其中,每个相机识别到的2D人员个数分别为K n,设k表示为人员编号,其取值范围为1…K n。设X={Rnode nk|n∈A,1≤k≤K n}表示为所有相机识别到的2D人员,Rnode nk表示为识别到的2D人员,每个3D人员的重建输入参数由一组2D人员组成并且2D人员所属相机无交集。由于进入场景的人员不确定,则需要进行多组2D人员的匹配。
其中,如图9所示,在2D点匹配算法中,对关节点置信度进行阈值过滤,过滤去除误识别的关节点,该方法,能让用户改变设置阈值条件过滤,解决因场地干扰而造成的误识别现。其中,置信度的阈值可以根据现场场景或者实际应用进行灵活设置。
S2、计算每一个2D关节点的反投影射线,根据反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案,反投影射线为2D关节点所对应的相机指向2D关节点的射线;
在本实施例中,不同相机对同一关节点的反投影射线之间应当重合,但是基于观测误差以及2D识别的误差,不同相机对同一关节点的反投影射线之间有可能不会重合,但肯定会靠的很近,因此,通过反投影射线之间为最短距离时的两个端点坐标来进行聚类,可以判断不同相机中的关节点的匹配关系,从而确定不同相机内不同人员的对应关系,得到2D人员匹配方案,这样,既对场景内的不同视角进行采集识别,并通过不同相机对同一关节点的反投影射线之间应当重合的原理进行最短距离的计算和聚类,以对不同相机的多个关节点进行正确匹配对应,从而解决了密集场景下存在的遮挡、2D误识别而导致2D点集匹配问题,实现了对多人的动作捕捉。
S3、根据最佳的2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信 息,以进行三维动作捕捉。
请参照图1至图11,本发明的实施例二为:
一种多人三维动作捕捉方法,在上述实施例一的基础上,本实施例的进一步限定如下:
其中,步骤S2具体包括以下步骤:
S21、对所有2D关节点信息按照关节点类型进行划分,以得到每一个关节类型所对应的所有2D关节点信息;
如图5所示,对于OpenPose来说,所识别的2D关节点由25类组成,因此,本实施例中的关节点类型为25类。在其他实施例中,若根据实际需求进行适当的选择关节点类型,也属于本申请的等同实施例。
S22、对每一个关节点类型的每一个2D关节点信息分别进行反投影射线的计算,根据反投影射线之间的最短距离来进行聚类,以得到每一个关节点类型的关节点匹配方案;
在本实施例中,25类的2D关节点分别进行只对同关节点类型的关节点进行匹配计算,从而减少了匹配运算次数,提高处理速度。尤其是在人员的数量较多时,能明显提高处理速度。
S23、合并所有关节点类型的关节点匹配方案,以得到最佳的2D人员匹配方案。
即将25类的关节点匹配方案得到一个2D人员匹配方案,该2D人员匹配方案包括若干个匹配项,该匹配项的个数和进入场景的人数相同。
请参照图1至图11,本发明的实施例三为:
一种多人三维动作捕捉方法,在上述实施例二的基础上,如图2所示,本实施例的进一步限定如下:
如图3所示,步骤S22具体包括以下步骤:
S221、对每一个2D关节点,根据坐标原点和对应的相机外参得到相机3D坐标,将相机3D坐标作为反投影射线的端点,根据对应的相机内参、相机外参和每一个2D关节点的2D关节点坐标信息获得反投影射线的方向向量,以得到每一个2D关节点的反投影射线。
在本实施例中,计算出过2D关节点uv nk的反投影射线,uv nk∈Rnode nk
其中,反投影射线由相机3D坐标和2D关节点uv nk上的方向向量组成。
其中,相机3D坐标P n可由相机外参乘坐标原点计算获得,其公式为:P n=extrinsic n*O。其中,extrinsic n为相机外参,O为坐标原点(0,0,0)
其中,方向向量可由2D关节点坐标信息、所属相机内参和外参计算获得,其公式如下:
Figure PCTCN2021105486-appb-000024
其中,intrinsic n’为相机内参。
S222、分别对每一个相机中的每一个2D关节点的反投影射线与同一关节点类型下其他相机中的每一个2D关节点的反投影射线进行两两最短线段的计算,并获得两条反投影射线之间为最短线段时的两个端点的端点坐标;
在本实施例中,依据多相机视觉的基本原理,多相机观测到的某个人员某个3D关节点,其反投影射线必定相交于该关节点上,但是基于观测误差以及2D识别的误差,不同相机对同一关节点的反投影射线之间有可能不会重合,但肯定会靠的很近。由此,在3D空间上,3D关节点附近应当聚集着大量的端点。
在本实施例中,如图12所示,应当知道的是,两条反投影射线之间的距离是空间两条之间的距离,对于图12所示似乎有相交的点,但在三维空间内可能为不相交,因此,图12仅作为计算时的图示。由此,获得两条反投影射线之间为最短线段时的两个端点的端点坐标具体包括以下步骤包括如下步骤:
S2221、反投影射线
Figure PCTCN2021105486-appb-000025
的起点坐标和方向向量分别为s 0
Figure PCTCN2021105486-appb-000026
反投影射线
Figure PCTCN2021105486-appb-000027
的起点坐标和方向向量分别为t 0
Figure PCTCN2021105486-appb-000028
则反投影射线
Figure PCTCN2021105486-appb-000029
的向量表达式为
Figure PCTCN2021105486-appb-000030
反投影射线
Figure PCTCN2021105486-appb-000031
的向量表达式为
Figure PCTCN2021105486-appb-000032
S2222、假设反投影射线
Figure PCTCN2021105486-appb-000033
和反投影射线
Figure PCTCN2021105486-appb-000034
之间为最短线段时的两个端点的端点坐标分别为s j和t j,则两个端点坐标的向量表达式为
Figure PCTCN2021105486-appb-000035
Figure PCTCN2021105486-appb-000036
记向量
Figure PCTCN2021105486-appb-000037
则反投影射线
Figure PCTCN2021105486-appb-000038
和反投影射线
Figure PCTCN2021105486-appb-000039
之间的最短线段的向量表达式为第一公式:
Figure PCTCN2021105486-appb-000040
s c和t c为标量;
S2223、将最短线段的向量表达式分别代入到
Figure PCTCN2021105486-appb-000041
Figure PCTCN2021105486-appb-000042
得到第二公式和第三公式,第二公式为
Figure PCTCN2021105486-appb-000043
第三公式为
Figure PCTCN2021105486-appb-000044
Figure PCTCN2021105486-appb-000045
则得到标量s c=(be-cd)/(ac-b 2)和标量t c=(ae-bd)/(ac-b 2);
S2224、判断ac-b 2是否等于0,若是,则反投影射线
Figure PCTCN2021105486-appb-000046
和反投影射线
Figure PCTCN2021105486-appb-000047
为平行关系,则在任意一条反投影射线上指定一固定点作为其中一个端点,代入第二公式和第三公式即可得到两个端点坐标s j和t j,否则根据标量s c=(be-cd)/(ac-b 2)和标量t c=(ae-bd)/(ac-b 2)求得两个端点坐标s j和t j
当ac-b 2等于0,则反投影射线
Figure PCTCN2021105486-appb-000048
和反投影射线
Figure PCTCN2021105486-appb-000049
为平行关系,两者之间的距离为常 数,因此任意选取一个端点进行计算得到的距离都是一样的。
S223、对每一个关节点类型内的所有端点进行聚类,以输出聚类结果,根据聚类结果组合生成每一个关节点类型的关节点匹配方案。
在本实施例中,步骤S223具体包括以下步骤:
S2231、在得到任意两条反投影射线之间为最短线段时的两个端点坐标时,根据两个端点坐标s j和t j得到最短线段的最短距离d min=|s j-t j|;
S2232、将每一个关节点类型内最短距离d min超过预设距离阈值的最短线段进行过滤删除,将每一个关节点类型内保留下来的最短线段所对应的端点形成端点集合R,每一个端点对应于2D关节点、相机标识和人员标识;
S2233、通过k最邻近分类算法对每一个关节点类型内的端点集合R内的所有端点进行近邻匹配,以得到近邻端点所形成的簇团,按照每一个簇团内端点数量的多少,由高往低地对簇团进行排序,得到每一个关节点类型内排序后的簇团信息;
其中,在本实施例中,将根据空间上的稠密度进行聚类,用户输入的唯一参数是ε(Eps)邻域。实现该方法具体如下:
通过k最邻近分类算法knn求得端点与端点之间的距离,假定ε(Eps)邻域内最多出现两个端点,那么knn取值可设为:
Figure PCTCN2021105486-appb-000050
搜索完毕后,遍历结果集,使用ε(Eps)过滤搜索结果中的近邻距离,给每个端点生成近邻匹配关系;
之后对端点的近邻匹配关系进行排序;
依次抽出未处理的端点,并找出该端点密度直达的近邻端点形成一个簇团,直到所有的近邻匹配关系处理完毕。最后按密度由高往低排序将簇团进行排序,输出簇团信息,其效果如图8所示。
S2234、对每一个关节点类型分别进行匹配方案的生成:获取当前匹配项,遍历每一个簇团中每一个端点,若所遍历的端点未被当前匹配项使用且当前匹配项没有所遍历的端点的相机标识,则在当前匹配项中增加端点,直至簇团中的每一个端点均遍历完成,则得到每一个关节点类型的关节点匹配方案;
在本实施例中,簇团中的每个端点,都可对应到一个2D关节点uv nk、2D人员Rnode nk。当算法开始运行时,设Rnode availabe表示未使用2D关节点集合,当前匹配项由m current表示, Rnode availabe=X,m current=Φ;遍历每个端点Rnode nk,根据相机的互斥原则,若端点未被使用,且当前匹配项m current没有相同相机的端点,则m current+=Rnode,Rnode availabe-=Rnode;遍历簇团中所有端点,直到端点处理完毕退出,到此为止匹配项生成。
依次处理完所有簇团,或Rnode availabe=Φ时,算法工作完毕生成多个匹配项,完成匹配方案的生成。
其中,在本实施例中,如图10所示,为五个人员左手关节的匹配方案。
S2235、合并所有关节点类型的关节点匹配方案,以得到最佳的2D人员匹配方案具体包括以下步骤:
统计所有关节点类型的关节点匹配方案中每一个匹配项的出现次数,将出现次数最多的匹配项组合为最佳的2D人员匹配方案。
在本实施例中,需要对25组关节点匹配方案进行合并,具体方法为:统计25组匹配方案中所有匹配项,根据匹配项出现次数进行排序;当算法开始运行时,匹配方案为空M=Φ;之后依次遍历匹配项m,若m与匹配方案不冲突,则将该匹配项加入到方案中,公式如下:
Figure PCTCN2021105486-appb-000051
isConflict(m,M)为冲突函数。
如图4所示,步骤S3具体包括以下步骤:
S31、遍历2D人员匹配方案的每一个匹配项,获取每一个匹配项中每一个关节点所包括的2D关节点坐标信息和对应的相机参数信息;
S32、依据多相机视觉三维测量技术将多个2D关节点坐标信息计算成3D三维空间坐标,得到3D关节点信息;
S33、根据3D关节点信息,计算重投影误差,将重投影误差超过预设投影误差的3D关节点信息进行过滤;
S34、直至遍历完每一个匹配项中的所有关节点,根据保留下来的所有3D关节点信息生成一个人员的3D信息;
S35、直至遍历完所有匹配项后,则得到与匹配项数量对应的多个人员的3D信息。
其中,如图6和图7所示,使用4个摄像机+2个保龄球进行3D重建的过程演示,其中,摄像机由左往右[0-3]依次编号,画面中的保龄球2D点编号为:[0-0]、[0-1]、[1-0]、[1-1]、[2-0]、[2-1]、[3-0]和[3-1]。其中,保龄球3D重建正确2D匹配方案应当为:{0-1,1-0,2-0,3-0},{0-0,1-1,2-1,3-1}。
如图7所示,4个摄像机+2个保龄球则有8个保龄球2D点,每个点对应一条反投影射线,每条直线与不同摄像机的直线求最短直线距离。可得最短线段24条,共计48个端点,将密度聚类算法的半径为0.2米,则可聚类出13个结果,其中两个结果的端点个数如圆圈内所示为12个,该12个点刚好为4条直线间最短线段的端点,则根据接近点聚类结果,生成2D点匹配方案:
第0组聚类结果:{0-1,1-0,2-0,3-0}
第1组聚类结果:{0-0,1-1,2-1,3-1}。
以上的2D点匹配方案与预期一致,从而实现对保龄球2D点的准确识别。
如图11所示,在实际应用场景中,本实施例以四个相机加两个人员的应用场景进行说明,四个相机从左往右依次为0、1、2和3,两个人员分别为0和1,则则最终的匹配方案应当为:{0-0,1-0,2-1,3-0},{0-1,1-1,2-0,3-1}。
通过关节点的识别和匹配关系的计算,最终得到25组匹配方案分别如表1所示:
表一:25组匹配方案
Figure PCTCN2021105486-appb-000052
Figure PCTCN2021105486-appb-000053
根据以上分析{0-1,1-1,2-0,3-1},{0-0,1-0,2-1,3-0}为大多数关节点匹配的方案,其高达21次,并且其他节点的匹配方案属于该方案子集,所以选择该{0-1,1-1,2-0,3-1},{0-0,1-0,2-1,3-0}的人员匹配方案,与预期方案一致。
请参照图1至图11,本发明的实施例四为:
一种多人三维动作捕捉方法,在上述实施例三的基础上,本实施例还包括以下步骤:
S36、获取当前同步视频帧和下一同步视频帧所分别得到的多个人员的3D信息,判断上下两个同步视频帧中每一个人员的人体重心位置的偏移距离是否小于预设最大运动距离阈值,若是,则根据下一同步视频帧的每一个人员的人体重心位置来更新历史人员位置集合中人员唯一ID(Identity document,身份标识)所对应的位置信息;
S37、若下一同步视频帧的某一人员与当前同步视频帧中所有的人员的人体重心位置的偏移距离均大于预设最大运动距离阈值,则将其认定为新出现人员,为新出现人员赋予一个新的人员唯一ID,并将新出现人员的位置信息增加至历史人员位置集合;
S38、若当前同步视频帧的某一人员与下一同步视频帧中所有的人员的人体重心位置的偏移距离均大于预设最大运动距离阈值,则将其认定为离开人员,从历史人员位置集合中移除离开人员所对应的人员唯一ID的位置信息。
在本实施例中,假设历史人员位置集合为LC,初始运行时LC=Φ;下一帧识别到一组人员3D信息P,计算出该组人员的中心位置信息C={c *},c表示为单个人员的中心位置,其计算公式为
Figure PCTCN2021105486-appb-000054
通过近邻搜索算法,求得LC、C中各人员中心相互位置距离,公式d(c,c′)=|c-c′|,c∈C,c′∈LC,其中c表示C中人员中心位置,c′为历史人员LC中人员中心位置;设人员最大运动距离D max,若存在d(c,c′)≤D max意即人员中心位置距离的改变小于阈值,c与c′视为同一个人员,更新人员c′的中心位置;对于
Figure PCTCN2021105486-appb-000055
c与LC任何人 员的距离都大于阈值,c为新出现人员,给该人员生成ID,LC加入该人员;对于
Figure PCTCN2021105486-appb-000056
c′与C任何人员的距离都大于阈值,c′视为离开人员,LC移除该人员。
本发明的实施例五为:
本发明另一实施方式提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述实施例一至四中任一实施例的多人三维动作捕捉方法。
请参照图13,本发明的实施例六为:
一种电子设备1,包括存储器3、处理器2及存储在存储器3上并可在处理器2上运行的计算机程序,处理器2执行计算机程序时实现上述实施例一至四中任一实施例的多人三维动作捕捉方法。
综上所述,本发明提供的多人三维动作捕捉方法、存储介质及电子设备,通过OpenPose姿态检测,采用深度卷积网络进行体态检测,不借助于任何的穿戴设备,就可以进行人体姿态检测;而采集多个相机对场景内的不同视角进行采集识别,并通过不同相机对同一关节点的反投影射线之间应当重合的原理进行最短距离的计算和聚类,以对不同相机的多个关节点进行正确匹配对应,从而解决了密集场景下存在的遮挡、2D误识别而导致2D点集匹配问题,实现了对多人的动作捕捉。同时,通过按照关节点类型对多个关节点进行分类、设置预设距离阈值进行过滤、置信度过滤、端点聚类算法采用密度聚类算法以及遮挡的关节点删除等进一步限定,从而能提高处理速度、提高工作效率且能保证识别的准确性。
以上所述仅为本发明的实施例,并非因此限制本发明的专利范围,凡是利用本发明说明书及附图内容所作的等同变换,或直接或间接运用在相关的技术领域,均同理包括在本发明的专利保护范围内。

Claims (11)

  1. 一种多人三维动作捕捉方法,包括步骤:
    获取多个相机的同步视频帧,对每一个所述相机的每一张所述同步视频帧分别进行关节点识别及定位,得到每一个所述相机下每一个人员的2D关节点信息;
    计算每一个2D关节点的反投影射线,根据所述反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案,所述反投影射线为所述2D关节点所对应的相机指向所述2D关节点的射线;
    根据最佳的所述2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息,以进行三维动作捕捉。
  2. 根据权利要求1所述的多人三维动作捕捉方法,其中,所述计算每一个2D关节点的反投影射线,根据所述反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案具体包括以下步骤:
    对所有所述2D关节点信息按照关节点类型进行划分,以得到每一个所述关节类型所对应的所有所述2D关节点信息;
    对每一个所述关节点类型的每一个所述2D关节点信息分别进行反投影射线的计算,根据所述反投影射线之间的最短距离来进行聚类,以得到每一个所述关节点类型的关节点匹配方案;
    合并所有所述关节点类型的关节点匹配方案,以得到最佳的2D人员匹配方案。
  3. 根据权利要求1所述的多人三维动作捕捉方法,其中,所述计算每一个2D关节点的反投影射线具体包括以下步骤:
    对每一个所述2D关节点,根据坐标原点和对应的相机外参得到相机3D坐标,将所述相机3D坐标作为所述反投影射线的端点,根据对应的相机内参、相机外参和每一个所述2D关节点的2D关节点坐标信息获得所述反投影射线的方向向量,以得到每一个所述2D关节点的反投影射线。
  4. 根据权利要求2所述的多人三维动作捕捉方法,其中,所述根据所述反投影射线之间的最短距离来进行聚类,以得到每一个所述关节点类型的关节点匹配方案包括以下步骤:
    分别对每一个所述相机中的每一个所述2D关节点的反投影射线与同一所述关节点类型下其他所述相机中的每一个所述2D关节点的反投影射线进行两两最短线段的计算,并获得两条所述反投影射线之间为最短线段时的两个端点的端点坐标;
    对每一个所述关节点类型内的所有所述端点进行聚类,以输出聚类结果,根据所述聚类结果组合生成每一个所述关节点类型的关节点匹配方案。
  5. 根据权利要求4所述的多人三维动作捕捉方法,其中,所述获得两条反投影射线之 间为最短线段时的两个端点的端点坐标具体包括以下步骤包括如下步骤:
    所述反投影射线
    Figure PCTCN2021105486-appb-100001
    的起点坐标和方向向量分别为s 0
    Figure PCTCN2021105486-appb-100002
    所述反投影射线
    Figure PCTCN2021105486-appb-100003
    的起点坐标和方向向量分别为t 0
    Figure PCTCN2021105486-appb-100004
    则所述反投影射线
    Figure PCTCN2021105486-appb-100005
    的向量表达式为
    Figure PCTCN2021105486-appb-100006
    所述反投影射线
    Figure PCTCN2021105486-appb-100007
    的向量表达式为
    Figure PCTCN2021105486-appb-100008
    假设所述反投影射线
    Figure PCTCN2021105486-appb-100009
    和所述反投影射线
    Figure PCTCN2021105486-appb-100010
    之间为最短线段时的两个端点的端点坐标分别为s j和t j,则两个所述端点坐标的向量表达式为
    Figure PCTCN2021105486-appb-100011
    Figure PCTCN2021105486-appb-100012
    记向量
    Figure PCTCN2021105486-appb-100013
    则所述反投影射线
    Figure PCTCN2021105486-appb-100014
    和所述反投影射线
    Figure PCTCN2021105486-appb-100015
    之间的最短线段的向量表达式为第一公式:
    Figure PCTCN2021105486-appb-100016
    所述s c和所述t c为标量;
    将所述最短线段的向量表达式分别代入到
    Figure PCTCN2021105486-appb-100017
    Figure PCTCN2021105486-appb-100018
    得到第二公式和第三公式,所述第二公式为
    Figure PCTCN2021105486-appb-100019
    所述第三公式为
    Figure PCTCN2021105486-appb-100020
    Figure PCTCN2021105486-appb-100021
    则得到所述标量s c=(be-cd)/(ac-b 2)和所述标量t c=(ae-bd)/(ac-b 2);
    判断所述ac-b 2是否等于0,若是,则所述反投影射线
    Figure PCTCN2021105486-appb-100022
    和所述反投影射线
    Figure PCTCN2021105486-appb-100023
    为平行关系,则在任意一条所述反投影射线上指定一固定点作为其中一个端点,代入所述第二公式和所述第三公式即可得到两个所述端点坐标s j和t j,否则根据所述标量s c=(be-cd)/(ac-b 2)和所述标量t c=(ae-bd)/(ac-b 2)求得两个所述端点坐标s j和t j
  6. 根据权利要求4所述的多人三维动作捕捉方法,其中,所述对每一个所述关节点类型内的所有所述端点进行聚类,以输出聚类结果,根据所述聚类结果组合生成每一个所述关节点类型的关节点匹配方案具体包括以下步骤:
    在得到任意两条所述反投影射线之间为最短线段时的两个端点坐标时,根据两个所述端点坐标s j和t j得到所述最短线段的最短距离d min=|s j-t j|;
    将每一个所述关节点类型内最短距离d min超过预设距离阈值的所述最短线段进行过滤删除,将每一个所述关节点类型内保留下来的所述最短线段所对应的端点形成端点集合R,每一个所述端点对应于2D关节点、相机标识和人员标识;
    通过k最邻近分类算法对每一个所述关节点类型内的所述端点集合R内的所有端点进行近邻匹配,以得到近邻端点所形成的簇团,按照每一个所述簇团内端点数量的多少,由高往低地对所述簇团进行排序,得到每一个所述关节点类型内排序后的簇团信息;
    对每一个所述关节点类型分别进行匹配方案的生成:获取当前匹配项,遍历每一个簇团中每一个端点,若所遍历的端点未被所述当前匹配项使用且所述当前匹配项没有所遍历的端点的所述相机标识,则在所述当前匹配项中增加所述端点,直至所述簇团中的每一个端点均遍历完成,则得到每一个关节点类型的关节点匹配方案;
    所述合并所有所述关节点类型的关节点匹配方案,以得到最佳的2D人员匹配方案具体包括以下步骤:
    统计所有关节点类型的关节点匹配方案中每一个匹配项的出现次数,将出现次数最多的匹配项组合为最佳的2D人员匹配方案。
  7. 根据权利要求4所述的多人三维动作捕捉方法,其中,所述根据最佳的所述2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息具体包括以下步骤:
    遍历所述2D人员匹配方案的每一个匹配项,获取每一个所述匹配项中每一个关节点所包括的2D关节点坐标信息和对应的相机参数信息;
    依据多相机视觉三维测量技术将多个所述2D关节点坐标信息计算成3D三维空间坐标,得到3D关节点信息;
    根据所述3D关节点信息,计算重投影误差,将所述重投影误差超过预设投影误差的所述3D关节点信息进行过滤;
    直至遍历完每一个所述匹配项中的所有关节点,根据保留下来的所有所述3D关节点信息生成一个人员的3D信息;
    直至遍历完所有所述匹配项后,则得到与所述匹配项数量对应的多个人员的3D信息。
  8. 根据权利要求1所述的多人三维动作捕捉方法,其中,还包括以下步骤:
    获取当前同步视频帧和下一同步视频帧所分别得到的多个人员的3D信息,判断上下两个同步视频帧中每一个人员的人体重心位置的偏移距离是否小于预设最大运动距离阈值,若是,则根据所述下一同步视频帧的每一个人员的人体重心位置来更新历史人员位置集合中人员唯一ID所对应的位置信息;
    若所述下一同步视频帧的某一人员与所述当前同步视频帧中所有的人员的人体重心位置的偏移距离均大于所述预设最大运动距离阈值,则将其认定为新出现人员,为所述新出现人员赋予一个新的人员唯一ID,并将所述新出现人员的位置信息增加至所述历史人员位置集合;
    若所述当前同步视频帧的某一人员与所述下一同步视频帧中所有的人员的人体重心位置的偏移距离均大于所述预设最大运动距离阈值,则将其认定为离开人员,从所述历史人员位置集合中移除所述离开人员所对应的人员唯一ID的位置信息。
  9. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行 时实现如权利要求1-8任意一项所述的多人三维动作捕捉方法。
  10. 一种电子设备,包括存储器和处理器,所述存储器上存储可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1-8任意一项所述的多人三维动作捕捉方法。
  11. 一种多人三维动作捕捉装置,包括:
    同步视频帧获取模块,配置为获取多个相机的同步视频帧,对每一个所述相机的每一张所述同步视频帧分别进行关节点识别及定位,得到每一个所述相机下每一个人员的2D关节点信息;
    聚类模块,配置为计算每一个2D关节点的反投影射线,根据所述反投影射线之间为最短距离时的两个端点坐标来进行聚类,以得到最佳的2D人员匹配方案,所述反投影射线为所述2D关节点所对应的相机指向所述2D关节点的射线;以及
    重建模块,配置为根据最佳的所述2D人员匹配方案进行每一个人员的3D重建,生成每一个人员的3D信息,以进行三维动作捕捉。
PCT/CN2021/105486 2020-11-12 2021-07-09 多人三维动作捕捉方法、存储介质及电子设备 WO2022100119A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
JP2022546634A JP7480312B2 (ja) 2020-11-12 2021-07-09 複数人三次元モーションキャプチャ方法、記憶媒体及び電子機器
US18/166,649 US20230186684A1 (en) 2020-11-12 2023-02-09 Method for capturing three dimensional actions of multiple persons, storage medium and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011260863.4A CN112379773B (zh) 2020-11-12 多人三维动作捕捉方法、存储介质及电子设备
CN202011260863.4 2020-11-12

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/166,649 Continuation US20230186684A1 (en) 2020-11-12 2023-02-09 Method for capturing three dimensional actions of multiple persons, storage medium and electronic device

Publications (1)

Publication Number Publication Date
WO2022100119A1 true WO2022100119A1 (zh) 2022-05-19

Family

ID=74583279

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/105486 WO2022100119A1 (zh) 2020-11-12 2021-07-09 多人三维动作捕捉方法、存储介质及电子设备

Country Status (3)

Country Link
US (1) US20230186684A1 (zh)
JP (1) JP7480312B2 (zh)
WO (1) WO2022100119A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898471B (zh) * 2022-07-12 2022-09-30 华中科技大学 一种基于人体骨架特征的行为检测方法及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100097262A1 (en) * 2008-10-21 2010-04-22 Lang Hong 3D Video-Doppler-Radar (VIDAR) Imaging System
US20170323443A1 (en) * 2015-01-20 2017-11-09 Indian Institute Of Technology, Bombay Systems and methods for obtaining 3-d images from x-ray information
CN111028271A (zh) * 2019-12-06 2020-04-17 浩云科技股份有限公司 基于人体骨架检测的多摄像机人员三维定位跟踪系统
CN111797714A (zh) * 2020-06-16 2020-10-20 浙江大学 基于关键点聚类的多视点人体运动捕捉方法
CN112379773A (zh) * 2020-11-12 2021-02-19 深圳市洲明科技股份有限公司 多人三维动作捕捉方法、存储介质及电子设备

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101424942B1 (ko) 2004-07-30 2014-08-01 익스트림 리얼리티 엘티디. 이미지 프로세싱을 기반으로 한 3d 공간 차원용 시스템 및 방법
US8384714B2 (en) 2008-05-13 2013-02-26 The Board Of Trustees Of The Leland Stanford Junior University Systems, methods and devices for motion capture using video imaging
US9183631B2 (en) 2012-06-29 2015-11-10 Mitsubishi Electric Research Laboratories, Inc. Method for registering points and planes of 3D data in multiple coordinate systems
CN110544301A (zh) 2019-09-06 2019-12-06 广东工业大学 一种三维人体动作重建系统、方法和动作训练系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100097262A1 (en) * 2008-10-21 2010-04-22 Lang Hong 3D Video-Doppler-Radar (VIDAR) Imaging System
US20170323443A1 (en) * 2015-01-20 2017-11-09 Indian Institute Of Technology, Bombay Systems and methods for obtaining 3-d images from x-ray information
CN111028271A (zh) * 2019-12-06 2020-04-17 浩云科技股份有限公司 基于人体骨架检测的多摄像机人员三维定位跟踪系统
CN111797714A (zh) * 2020-06-16 2020-10-20 浙江大学 基于关键点聚类的多视点人体运动捕捉方法
CN112379773A (zh) * 2020-11-12 2021-02-19 深圳市洲明科技股份有限公司 多人三维动作捕捉方法、存储介质及电子设备

Also Published As

Publication number Publication date
JP7480312B2 (ja) 2024-05-09
CN112379773A (zh) 2021-02-19
US20230186684A1 (en) 2023-06-15
JP2023512282A (ja) 2023-03-24

Similar Documents

Publication Publication Date Title
Rogez et al. Lcr-net++: Multi-person 2d and 3d pose detection in natural images
Ji et al. Interactive body part contrast mining for human interaction recognition
Han et al. Space-time representation of people based on 3D skeletal data: A review
Mitra et al. Multiview-consistent semi-supervised learning for 3d human pose estimation
Aggarwal et al. Human motion analysis: A review
Huang et al. Data-driven segmentation and labeling of freehand sketches
CN110555412B (zh) 基于rgb和点云相结合的端到端人体姿态识别方法
Gao et al. Multi-perspective and multi-modality joint representation and recognition model for 3D action recognition
Ghazal et al. Human posture classification using skeleton information
CN109101864A (zh) 基于关键帧和随机森林回归的人体上半身动作识别方法
Liu et al. HDS-SP: A novel descriptor for skeleton-based human action recognition
Liu et al. A stochastic attribute grammar for robust cross-view human tracking
Cong et al. Weakly supervised 3d multi-person pose estimation for large-scale scenes based on monocular camera and single lidar
WO2022100119A1 (zh) 多人三维动作捕捉方法、存储介质及电子设备
Lu et al. Exploring high-order spatio–temporal correlations from skeleton for person Re-identification
Tang et al. Using a selective ensemble support vector machine to fuse multimodal features for human action recognition
Liang et al. Lower limb action recognition with motion data of a human joint
Chai et al. Human gait recognition: approaches, datasets and challenges
Zhang et al. On the correlation among edge, pose and parsing
Cohen et al. 3D body reconstruction for immersive interaction
CN112379773B (zh) 多人三维动作捕捉方法、存储介质及电子设备
Liu et al. Better dense trajectories by motion in videos
Cao et al. Anatomy and geometry constrained one-stage framework for 3d human pose estimation
CN114548224A (zh) 一种用于强交互人体运动的2d人体位姿生成方法及装置
Zhou et al. Human interaction recognition with skeletal attention and shift graph convolution

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21890650

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2022546634

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21890650

Country of ref document: EP

Kind code of ref document: A1