WO2023223508A1 - 映像処理装置、映像処理方法、およびプログラム - Google Patents

映像処理装置、映像処理方法、およびプログラム Download PDF

Info

Publication number
WO2023223508A1
WO2023223508A1 PCT/JP2022/020848 JP2022020848W WO2023223508A1 WO 2023223508 A1 WO2023223508 A1 WO 2023223508A1 JP 2022020848 W JP2022020848 W JP 2022020848W WO 2023223508 A1 WO2023223508 A1 WO 2023223508A1
Authority
WO
WIPO (PCT)
Prior art keywords
posture information
frame
subject
pattern
video processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/JP2022/020848
Other languages
English (en)
French (fr)
Japanese (ja)
Inventor
明男 亀田
誠明 松村
裕司 青野
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NTT Inc
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to PCT/JP2022/020848 priority Critical patent/WO2023223508A1/ja
Priority to JP2024521489A priority patent/JP7726390B2/ja
Publication of WO2023223508A1 publication Critical patent/WO2023223508A1/ja
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • One aspect of the present invention relates to an image processing device, a computer-based image processing method, and a program for detecting, for example, a person's posture and creating a 3D (three-dimensional) wire frame model.
  • [Bodily knowledge] is expressed in a person's posture and movements, for example, as [techniques].
  • OpenPose is an open source library that can detect skeletal coordinates from image data of a person as a subject and generate a wireframe model.
  • a technique is known that utilizes this to estimate the position (2D posture information) of each joint point for each image frame and quantify the 2D posture (for example, see Non-Patent Document 1).
  • Techniques for converting 2D (two-dimensional) posture information into three dimensions and obtaining 3D (three-dimensional) posture information are also known (for example, see Non-Patent Documents 2 and 3).
  • the left and right coordinates are reversed when estimating the 2D pose (for example, the skeletal coordinates related to the right arm are incorrectly changed to the skeletal coordinates related to the left arm). ). If data whose left and right sides are reversed in 2D skeletal coordinates is used, the accuracy of estimating 3D posture information will decrease. This is undesirable, for example, because it causes deterioration in the processing accuracy of skill capture.
  • This invention was made in view of the above circumstances, and its purpose is to provide a technology that suppresses left-right reversal of skeletal coordinates in 2D posture information, thereby making it possible to accurately estimate 3D posture information. be.
  • a video processing device includes a 2D posture information generation section, a posture information processing section, and a 3D conversion processing section.
  • the 2D posture information generation unit generates 2D posture information of the subject from video data of the subject.
  • the posture information processing section corrects the 2D posture information.
  • the 3D processing unit generates 3D posture information of the subject from the corrected 2D posture information.
  • the posture information processing section includes a left-right reversal determination section that performs left-right reversal determination processing. Then, in the horizontal reversal determination process, first to fourth patterns corresponding to whether or not the upper body and lower body are reversed are set for the image frame of the subject in the video data, and the first to fourth patterns are set. This process selects the pattern that minimizes the score calculated by an evaluation formula using the variance and average of distances between corresponding joint positions between the previous frame and the current frame as variables.
  • FIG. 1 is a diagram showing an example of a workflow for reproducing and transmitting physical knowledge.
  • FIG. 2 is a diagram showing an example of a wireframe model used in skill capture.
  • FIG. 3 is a diagram for explaining estimating 3D posture information from 2D posture information.
  • FIG. 4 is a diagram for explaining horizontal reversal.
  • FIG. 5 is a functional block diagram showing an example of a video processing device according to an embodiment.
  • FIG. 6 is a diagram showing an example of a table stored in the 2D posture information 61.
  • FIG. 7 is a diagram showing an example of a table stored in the 3D posture information 62.
  • FIG. 8 is a diagram showing the flow of data between the functional blocks shown in FIG. 6.
  • FIG. 9 is a diagram for explaining an overview of [basic processing].
  • FIG. 10 is a diagram for explaining the process of [Correction 2].
  • FIG. 11 is a diagram for explaining the process of [Correction 3].
  • FIG. 12 is a diagram
  • FIG. 1 is a diagram showing an example of a workflow for reproducing and teaching physical knowledge (including simulated experiences, etc.).
  • Embodied knowledge involves analyzing skills captured by cameras and sensors, for example, comparing video data of an expert with video data of a practitioner, extracting areas for improvement, and providing feedback in an appropriate presentation method. will be used effectively.
  • Skill capture uses multiple cameras installed without paying special attention to the camera parameters (position, orientation, viewing angle, distortion), tracking of each subject, and 3D measurement of each subject without camera calibration. This can be said to be a technology that acquires items such as posture (three-dimensional coordinates of joints) at once.
  • Skill capture comprises a series of processes: 2D pose estimation from video data, digitization process, rotation angle fitting, and rotation angle denoising.
  • 2D pose estimation the pose of feature points (2D skeletal coordinates, etc.) is estimated using each frame of a video as input. It is also possible to accommodate a large number of people.
  • digitization process each subject is separated and tracked, the pose of the 3D skeletal coordinates of the skeleton is estimated, and camera parameters are also estimated at the same time.
  • rotation angle fitting and noise removal 3D coordinates are converted into a 3D rotation angle model, and noise is further removed.
  • the "skills" to be analyzed include physical movements (including fingertips), physiological information, physiological reactions, psychological states, and the like.
  • the scope of analysis can be further expanded.
  • FIG. 2 is a diagram showing an example of a wireframe model used in skill capture.
  • a number (part ID) is assigned to each body part such as a joint.
  • Two-dimensional posture information is expressed by acquiring the coordinates of each part of the wireframe model.
  • FIG. 3 is a diagram for explaining estimating 3D posture information from 2D posture information.
  • 2D posture information is obtained from a plurality of cameras (camera 1, camera 2) installed at different viewpoints.
  • the plurality of cameras are synchronized with each other and each acquires video data of a subject (such as a person).
  • FIG. 4 is a diagram for explaining horizontal reversal.
  • FIG. 4A it can be seen that the left and right correspondences of the upper body parts of the subject are reversed. This state is "left-right reversal.”
  • FIG. 4(b) the right hand correctly corresponds to the right hand, and the left hand corresponds to the left hand, and no horizontal reversal occurs.
  • Left/right reversal occurs due to recognition errors during video processing, and can occur in either the upper or lower body.
  • combinations of four cases are considered: (upper body, not inverted), (upper body, inverted), (lower body, not inverted), and (lower body, inverted).
  • the posture of feature points (2D skeleton coordinates, etc.) is estimated using each frame of the video as input.
  • the 2D skeleton coordinates may be horizontally reversed. This is not preferable because it causes a decrease in the accuracy of the 3D skeletal coordinates estimated from the 2D skeletal coordinates.
  • existing techniques do not include a determination algorithm for solving the horizontal reversal problem for input 2D pose information.
  • the presence or absence of horizontal reversal is determined for each frame/for each subject ID, and further, a correction process is performed based on the result.
  • a [basic process] for determining the presence or absence of horizontal reversal, and a plurality of correction processes distinguished by [correction 1], [correction 2], and [correction 3] will be described.
  • past 3D skeletal coordinates cannot be referenced in the first frame of the video data.
  • the past 3D skeleton coordinates can be referenced (Case 1).
  • the object can be tracked between frames (the predicted object for the current (nth) frame can be generated from the past (n-1)th frame). Therefore, in the embodiment, (Case 1) will be explained in detail.
  • FIG. 5 is a functional block diagram showing an example of a video processing device according to an embodiment.
  • the video processing device 10 is an information processing device (computer) that includes a processor 20 and a memory 90.
  • the video processing device 10 includes a storage 60 and an interface unit 70 connected to the plurality of cameras 1-1 to 1-n. Each part of the video processing device 10 is connected via a bus 80.
  • the storage 60 stores 2D posture information 61, 3D posture information 62, and a program 63.
  • the program 63 is loaded into the memory 90 by the OS (Operation System) of the video processing device 10 and executed by the processor 20.
  • the program 63 causes the processor 20 to function as the 2D posture information generation section 30, the posture information processing section 40, and the 3D conversion processing section 50.
  • the posture information processing section 40 includes a left/right reversal determination section 41 and a left/right reversal correction processing section 42 as a processing routine.
  • the 2D posture information generation unit 30 generates 2D posture information of the subject from video data of the subject.
  • the posture information processing section 40 corrects the 2D posture information generated by the 2D posture information generation section 30.
  • the 3D processing section 50 generates 3D posture information of the subject from the 2D posture information corrected by the posture information processing section 40.
  • the horizontal reversal determination unit 41 selects four patterns for the image frame of the subject in the video data of the subject: [as detected], [upper body horizontally reversed], [lower body horizontally reversed], and [entire horizontally reversed]. Set. These patterns are combinations of four cases: (upper body, no inversion), (upper body, with inversion), (lower body, no inversion), and (lower body, with inversion). In other words, these patterns correspond to cases where the upper body and lower body are inverted or not inverted.
  • the left/right reversal determining unit 41 selects, from among these four patterns, the pattern that minimizes the score calculated by the evaluation formula using the variance and average of the distances between the corresponding joint positions of the previous frame and the current frame as variables. do.
  • the horizontal reversal correction processing unit 42 performs a horizontal reversal determination process within a frame.
  • the left-right reversal correction processing unit 42 performs two-stage left-right reversal determination processing within a frame.
  • the horizontal reversal correction processing unit 42 performs horizontal reversal determination processing between consecutive frames.
  • the horizontal reversal correction processing performed by the horizontal reversal correction processing section 42 is similar to the processing performed by the horizontal reversal determination processing section 41.
  • FIG. 6 is a diagram showing an example of a table stored in the 2D posture information 61.
  • the 2D posture information 61 is a table whose columns are subject ID, camera ID, region ID, region x coordinate, and region y coordinate.
  • FIG. 7 is a diagram showing an example of a table stored in the 3D posture information 62.
  • the 3D posture information 62 is a table whose columns are subject ID, region ID, region x coordinate, region y coordinate, and region z coordinate.
  • FIG. 8 is a diagram showing the flow of data between the functional blocks shown in FIG. 6.
  • FIG. 8 shows the case of camera parameter pre-calibration.
  • multi-view video data from cameras 1-1 to 1-n is passed to the current frame extraction unit 21 along with the camera ID.
  • the current frame extraction unit 21 extracts the n-th frame image for each camera ID from the video data and passes it to the 2D posture information generation unit 30.
  • the 2D posture information generation section 30 calculates 2D posture information in the n-th frame image and passes it to the horizontal reversal determination section 41 of the posture information processing section 40 . Further, the 2D posture information is stored in the 2D posture information storage section of the posture information processing section 40.
  • the posture information processing section 40 performs [basic processing] using the horizontal reversal determining section 41, and performs [correction 1], [correction 2], and [correction 3] processing using the horizontal reversal correction processing section 42 to obtain the 3D posture. Calculate information.
  • the 3D posture information is output to the outside and is also stored in the 3D posture information storage section.
  • FIG. 9 is a diagram for explaining an overview of [basic processing].
  • Basic processing is the idea that ⁇ a predicted object is projected for each camera in the current frame, and if the distance between the predicted object and the two-dimensional posture information of the current frame is large, horizontal reversal processing is performed.''
  • a predicted three-dimensional object is projected onto each camera and matched with the detected object for each camera. That is, matching processing is performed on a 2D plane on each camera image. It is assumed that the three-dimensional object (predicted three-dimensional object) generated from the past (n-1)th frame is correct left and right (no inversion). Then, for one of the two-dimensional objects (detected object) detected from the current (nth) frame, [As detected], [Upper body horizontal flip], [Lower body horizontal flip], and [Entire horizontal flip] are selected. Generate four patterns.
  • a pattern is selected in which the variance or average of the joint position distance (error) with respect to the predicted object (the predicted three-dimensional object is projected onto each camera) is small. This is because if the left and right sides are correct, the average distance for all joints should be the same and the variance should be small. It is particularly effective to focus on dispersion.
  • the evaluation value can be determined using equation (1).
  • is a parameter, and is set to 1, for example.
  • the evaluation value of each of the four patterns is calculated, and the pattern with the smallest evaluation value (score of each pattern) is selected. For example, if there is upper body horizontal reversal, the score as detected will be large, and the score for upper body horizontal reversal will be smaller than this.
  • [Correction 1] is, so to speak, a horizontal reversal determination process within a frame. The predicted subject may be incorrect. However, assuming that the frequency of 2D skeleton information being horizontally reversed is low, if there are many reversed patterns (for each camera/each subject), there is a high possibility that the selection of the horizontally reversed pattern has failed. Therefore, in [Correction 1], if there are many cameras with inversion, it is determined that the result is incorrect and the result is not adopted (no correction is made).
  • FIG. 10 is a diagram for explaining the process of [Correction 2].
  • the estimated three-dimensional object obtained in the first step is input as the predicted three-dimensional object in the second step.
  • horizontal reversal is determined based on the estimated three-dimensional coordinates and the detected subject, and a second estimated value of the three-dimensional coordinates is obtained.
  • FIG. 11 is a diagram for explaining the process of correction 3.
  • the following explanation will focus on horizontal reversal of only the upper body. Actually, the same process is performed for the lower body.
  • FIG. 11 if (1) there is no inversion in the past frame and (2) there is inversion between frames, there is a high possibility that inversion will occur in the current frame (nth frame).
  • FIG. 12 is a diagram for explaining details of the correction 3 process.
  • the inter-frame horizontal reversal determination process will be described in detail.
  • inter-frame horizontal reversal determination processing for the current frame using inter-frame comparison will be described.
  • length indicates the two-dimensional length on the screen when the three-dimensional length between the waist joints of the target subject is displayed on the screen.
  • pprevious indicates the joint position of the past frame.
  • pcurrent indicates the joint position of the current frame.
  • avg() is a function that calculates the average.
  • S indicates each left-right reversal pattern.
  • Spred indicates the horizontal reversal pattern of the current frame predicted from the results of [1] and [2].
  • the length indicates the two-dimensional length on the screen when the three-dimensional length between the waist joints of the target subject is displayed on the screen.
  • is a coefficient (parameter) of the correction term.
  • a pattern with a small variance/average of distances between corresponding joint positions between the previous frame and the current frame is selected from the four patterns of upper body, lower body, with inversion, and without inversion. Furthermore, intra-frame correction and inter-frame correction are performed based on the selected pattern. By doing so, it is possible to suppress horizontal reversal errors, and therefore it is possible to accurately estimate two-dimensional posture information and three-dimensional posture information.
  • an algorithm for determining the horizontal reversal problem of 2D posture information is introduced as post-processing. This improves the robustness of processing and increases the accuracy of estimating 3D posture information. For these reasons, according to the embodiment, it is possible to suppress horizontal reversal of skeletal coordinates in 2D posture information, thereby making it possible to accurately estimate 3D posture information.
  • the present invention can be embodied by modifying the constituent elements without departing from the gist of the present invention at the implementation stage.
  • various inventions can be formed by appropriately combining the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, components from different embodiments may be combined as appropriate.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)
PCT/JP2022/020848 2022-05-19 2022-05-19 映像処理装置、映像処理方法、およびプログラム Ceased WO2023223508A1 (ja)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/JP2022/020848 WO2023223508A1 (ja) 2022-05-19 2022-05-19 映像処理装置、映像処理方法、およびプログラム
JP2024521489A JP7726390B2 (ja) 2022-05-19 2022-05-19 映像処理装置、映像処理方法、およびプログラム

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/020848 WO2023223508A1 (ja) 2022-05-19 2022-05-19 映像処理装置、映像処理方法、およびプログラム

Publications (1)

Publication Number Publication Date
WO2023223508A1 true WO2023223508A1 (ja) 2023-11-23

Family

ID=88834939

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/020848 Ceased WO2023223508A1 (ja) 2022-05-19 2022-05-19 映像処理装置、映像処理方法、およびプログラム

Country Status (2)

Country Link
JP (1) JP7726390B2 (https=)
WO (1) WO2023223508A1 (https=)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021048988A1 (ja) * 2019-09-12 2021-03-18 富士通株式会社 骨格認識方法、骨格認識プログラムおよび情報処理装置
WO2022074886A1 (ja) * 2020-10-05 2022-04-14 株式会社島津製作所 姿勢検出装置、姿勢検出方法および寝相判定方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7209333B2 (ja) * 2018-09-10 2023-01-20 国立大学法人 東京大学 関節位置の取得方法及び装置、動作の取得方法及び装置
JP7427188B2 (ja) * 2019-12-26 2024-02-05 国立大学法人 東京大学 3dポーズ取得方法及び装置
JP7316236B2 (ja) * 2020-02-28 2023-07-27 Kddi株式会社 骨格追跡方法、装置およびプログラム
JP7375921B2 (ja) * 2020-04-27 2023-11-08 日本電気株式会社 画像分類装置、画像分類方法、およびプログラム
JP7519665B2 (ja) * 2020-05-22 2024-07-22 国立大学法人 東京大学 皮膚情報を用いた運動特徴量の取得方法及び装置
JP7491380B2 (ja) * 2020-07-06 2024-05-28 日本電気株式会社 画像選択装置、画像選択方法、及びプログラム
WO2022018811A1 (ja) * 2020-07-20 2022-01-27 日本電信電話株式会社 被写体の3次元姿勢推定装置、3次元姿勢推定方法、及びプログラム
JP7586189B2 (ja) * 2020-10-26 2024-11-19 日本電気株式会社 追跡装置、追跡システム、追跡方法、およびプログラム

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021048988A1 (ja) * 2019-09-12 2021-03-18 富士通株式会社 骨格認識方法、骨格認識プログラムおよび情報処理装置
WO2022074886A1 (ja) * 2020-10-05 2022-04-14 株式会社島津製作所 姿勢検出装置、姿勢検出方法および寝相判定方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MORIMOTO TAKASHI, IKUHISA MIKAMI: "Through spatiotemporal integration of posture data from multiple Kinects motion capture system ", IPSJ SIG TECHNICAL REPORT, vol. 2019-CVIM-2017, no. 25, 30 May 2019 (2019-05-30), XP093109000 *

Also Published As

Publication number Publication date
JPWO2023223508A1 (https=) 2023-11-23
JP7726390B2 (ja) 2025-08-20

Similar Documents

Publication Publication Date Title
US20220189211A1 (en) Physical activity quantification and monitoring
CN111414797B (zh) 用于估计对象的姿势和姿态信息的系统和方法
JP7367764B2 (ja) 骨格認識方法、骨格認識プログラムおよび情報処理装置
US9330470B2 (en) Method and system for modeling subjects from a depth map
JP4148281B2 (ja) モーションキャプチャ装置及びモーションキャプチャ方法、並びにモーションキャプチャプログラム
JP4349367B2 (ja) 物体の位置姿勢を推定する推定システム、推定方法および推定プログラム
US11398049B2 (en) Object tracking device, object tracking method, and object tracking program
CN108921907B (zh) 一种运动测试评分的方法、装置、设备及存储介质
US20100208038A1 (en) Method and system for gesture recognition
US20200311395A1 (en) Method and apparatus for estimating and correcting human body posture
US20200372245A1 (en) Scoring metric for physical activity performance and tracking
JP2017123087A (ja) 連続的な撮影画像に映り込む平面物体の法線ベクトルを算出するプログラム、装置及び方法
CN110991292A (zh) 动作识别比对方法、系统、计算机存储介质和电子装置
CN115841602A (zh) 基于多视角的三维姿态估计数据集的构建方法及装置
JP2019012497A (ja) 部位認識方法、装置、プログラム、及び撮像制御システム
Guo et al. Monocular 3D multi-person pose estimation via predicting factorized correction factors
CN111524183A (zh) 一种基于透视投影变换的目标行列定位方法
CN116348909A (zh) 姿势检测装置、姿势检测方法及睡相判定方法
JP7726390B2 (ja) 映像処理装置、映像処理方法、およびプログラム
US20240281984A1 (en) Motion data generation device, motion data generation method, and recording medium
JPH06213632A (ja) 画像計測装置
CN116648190B (zh) 骨架估计装置、骨架估计方法及体操评分辅助系统
JP7533765B2 (ja) 骨格認識方法、骨格認識プログラムおよび体操採点支援システム
JP2005309782A (ja) 画像処理装置
Ahmed Unified Skeletal Animation Reconstruction with Multiple Kinects.

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22942712

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2024521489

Country of ref document: JP

Kind code of ref document: A

WWE Wipo information: entry into national phase

Ref document number: 18861227

Country of ref document: US

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22942712

Country of ref document: EP

Kind code of ref document: A1