WO2023223508A1 - Video processing device, video processing method, and program - Google Patents

Video processing device, video processing method, and program Download PDF

Info

Publication number
WO2023223508A1
WO2023223508A1 PCT/JP2022/020848 JP2022020848W WO2023223508A1 WO 2023223508 A1 WO2023223508 A1 WO 2023223508A1 JP 2022020848 W JP2022020848 W JP 2022020848W WO 2023223508 A1 WO2023223508 A1 WO 2023223508A1
Authority
WO
WIPO (PCT)
Prior art keywords
posture information
frame
subject
pattern
video processing
Prior art date
Application number
PCT/JP2022/020848
Other languages
French (fr)
Japanese (ja)
Inventor
明男 亀田
誠明 松村
裕司 青野
Original Assignee
日本電信電話株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 日本電信電話株式会社 filed Critical 日本電信電話株式会社
Priority to PCT/JP2022/020848 priority Critical patent/WO2023223508A1/en
Publication of WO2023223508A1 publication Critical patent/WO2023223508A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis

Definitions

  • One aspect of the present invention relates to an image processing device, a computer-based image processing method, and a program for detecting, for example, a person's posture and creating a 3D (three-dimensional) wire frame model.
  • [Bodily knowledge] is expressed in a person's posture and movements, for example, as [techniques].
  • OpenPose is an open source library that can detect skeletal coordinates from image data of a person as a subject and generate a wireframe model.
  • a technique is known that utilizes this to estimate the position (2D posture information) of each joint point for each image frame and quantify the 2D posture (for example, see Non-Patent Document 1).
  • Techniques for converting 2D (two-dimensional) posture information into three dimensions and obtaining 3D (three-dimensional) posture information are also known (for example, see Non-Patent Documents 2 and 3).
  • the left and right coordinates are reversed when estimating the 2D pose (for example, the skeletal coordinates related to the right arm are incorrectly changed to the skeletal coordinates related to the left arm). ). If data whose left and right sides are reversed in 2D skeletal coordinates is used, the accuracy of estimating 3D posture information will decrease. This is undesirable, for example, because it causes deterioration in the processing accuracy of skill capture.
  • This invention was made in view of the above circumstances, and its purpose is to provide a technology that suppresses left-right reversal of skeletal coordinates in 2D posture information, thereby making it possible to accurately estimate 3D posture information. be.
  • a video processing device includes a 2D posture information generation section, a posture information processing section, and a 3D conversion processing section.
  • the 2D posture information generation unit generates 2D posture information of the subject from video data of the subject.
  • the posture information processing section corrects the 2D posture information.
  • the 3D processing unit generates 3D posture information of the subject from the corrected 2D posture information.
  • the posture information processing section includes a left-right reversal determination section that performs left-right reversal determination processing. Then, in the horizontal reversal determination process, first to fourth patterns corresponding to whether or not the upper body and lower body are reversed are set for the image frame of the subject in the video data, and the first to fourth patterns are set. This process selects the pattern that minimizes the score calculated by an evaluation formula using the variance and average of distances between corresponding joint positions between the previous frame and the current frame as variables.
  • FIG. 1 is a diagram showing an example of a workflow for reproducing and transmitting physical knowledge.
  • FIG. 2 is a diagram showing an example of a wireframe model used in skill capture.
  • FIG. 3 is a diagram for explaining estimating 3D posture information from 2D posture information.
  • FIG. 4 is a diagram for explaining horizontal reversal.
  • FIG. 5 is a functional block diagram showing an example of a video processing device according to an embodiment.
  • FIG. 6 is a diagram showing an example of a table stored in the 2D posture information 61.
  • FIG. 7 is a diagram showing an example of a table stored in the 3D posture information 62.
  • FIG. 8 is a diagram showing the flow of data between the functional blocks shown in FIG. 6.
  • FIG. 9 is a diagram for explaining an overview of [basic processing].
  • FIG. 10 is a diagram for explaining the process of [Correction 2].
  • FIG. 11 is a diagram for explaining the process of [Correction 3].
  • FIG. 12 is a diagram
  • FIG. 1 is a diagram showing an example of a workflow for reproducing and teaching physical knowledge (including simulated experiences, etc.).
  • Embodied knowledge involves analyzing skills captured by cameras and sensors, for example, comparing video data of an expert with video data of a practitioner, extracting areas for improvement, and providing feedback in an appropriate presentation method. will be used effectively.
  • Skill capture uses multiple cameras installed without paying special attention to the camera parameters (position, orientation, viewing angle, distortion), tracking of each subject, and 3D measurement of each subject without camera calibration. This can be said to be a technology that acquires items such as posture (three-dimensional coordinates of joints) at once.
  • Skill capture comprises a series of processes: 2D pose estimation from video data, digitization process, rotation angle fitting, and rotation angle denoising.
  • 2D pose estimation the pose of feature points (2D skeletal coordinates, etc.) is estimated using each frame of a video as input. It is also possible to accommodate a large number of people.
  • digitization process each subject is separated and tracked, the pose of the 3D skeletal coordinates of the skeleton is estimated, and camera parameters are also estimated at the same time.
  • rotation angle fitting and noise removal 3D coordinates are converted into a 3D rotation angle model, and noise is further removed.
  • the "skills" to be analyzed include physical movements (including fingertips), physiological information, physiological reactions, psychological states, and the like.
  • the scope of analysis can be further expanded.
  • FIG. 2 is a diagram showing an example of a wireframe model used in skill capture.
  • a number (part ID) is assigned to each body part such as a joint.
  • Two-dimensional posture information is expressed by acquiring the coordinates of each part of the wireframe model.
  • FIG. 3 is a diagram for explaining estimating 3D posture information from 2D posture information.
  • 2D posture information is obtained from a plurality of cameras (camera 1, camera 2) installed at different viewpoints.
  • the plurality of cameras are synchronized with each other and each acquires video data of a subject (such as a person).
  • FIG. 4 is a diagram for explaining horizontal reversal.
  • FIG. 4A it can be seen that the left and right correspondences of the upper body parts of the subject are reversed. This state is "left-right reversal.”
  • FIG. 4(b) the right hand correctly corresponds to the right hand, and the left hand corresponds to the left hand, and no horizontal reversal occurs.
  • Left/right reversal occurs due to recognition errors during video processing, and can occur in either the upper or lower body.
  • combinations of four cases are considered: (upper body, not inverted), (upper body, inverted), (lower body, not inverted), and (lower body, inverted).
  • the posture of feature points (2D skeleton coordinates, etc.) is estimated using each frame of the video as input.
  • the 2D skeleton coordinates may be horizontally reversed. This is not preferable because it causes a decrease in the accuracy of the 3D skeletal coordinates estimated from the 2D skeletal coordinates.
  • existing techniques do not include a determination algorithm for solving the horizontal reversal problem for input 2D pose information.
  • the presence or absence of horizontal reversal is determined for each frame/for each subject ID, and further, a correction process is performed based on the result.
  • a [basic process] for determining the presence or absence of horizontal reversal, and a plurality of correction processes distinguished by [correction 1], [correction 2], and [correction 3] will be described.
  • past 3D skeletal coordinates cannot be referenced in the first frame of the video data.
  • the past 3D skeleton coordinates can be referenced (Case 1).
  • the object can be tracked between frames (the predicted object for the current (nth) frame can be generated from the past (n-1)th frame). Therefore, in the embodiment, (Case 1) will be explained in detail.
  • FIG. 5 is a functional block diagram showing an example of a video processing device according to an embodiment.
  • the video processing device 10 is an information processing device (computer) that includes a processor 20 and a memory 90.
  • the video processing device 10 includes a storage 60 and an interface unit 70 connected to the plurality of cameras 1-1 to 1-n. Each part of the video processing device 10 is connected via a bus 80.
  • the storage 60 stores 2D posture information 61, 3D posture information 62, and a program 63.
  • the program 63 is loaded into the memory 90 by the OS (Operation System) of the video processing device 10 and executed by the processor 20.
  • the program 63 causes the processor 20 to function as the 2D posture information generation section 30, the posture information processing section 40, and the 3D conversion processing section 50.
  • the posture information processing section 40 includes a left/right reversal determination section 41 and a left/right reversal correction processing section 42 as a processing routine.
  • the 2D posture information generation unit 30 generates 2D posture information of the subject from video data of the subject.
  • the posture information processing section 40 corrects the 2D posture information generated by the 2D posture information generation section 30.
  • the 3D processing section 50 generates 3D posture information of the subject from the 2D posture information corrected by the posture information processing section 40.
  • the horizontal reversal determination unit 41 selects four patterns for the image frame of the subject in the video data of the subject: [as detected], [upper body horizontally reversed], [lower body horizontally reversed], and [entire horizontally reversed]. Set. These patterns are combinations of four cases: (upper body, no inversion), (upper body, with inversion), (lower body, no inversion), and (lower body, with inversion). In other words, these patterns correspond to cases where the upper body and lower body are inverted or not inverted.
  • the left/right reversal determining unit 41 selects, from among these four patterns, the pattern that minimizes the score calculated by the evaluation formula using the variance and average of the distances between the corresponding joint positions of the previous frame and the current frame as variables. do.
  • the horizontal reversal correction processing unit 42 performs a horizontal reversal determination process within a frame.
  • the left-right reversal correction processing unit 42 performs two-stage left-right reversal determination processing within a frame.
  • the horizontal reversal correction processing unit 42 performs horizontal reversal determination processing between consecutive frames.
  • the horizontal reversal correction processing performed by the horizontal reversal correction processing section 42 is similar to the processing performed by the horizontal reversal determination processing section 41.
  • FIG. 6 is a diagram showing an example of a table stored in the 2D posture information 61.
  • the 2D posture information 61 is a table whose columns are subject ID, camera ID, region ID, region x coordinate, and region y coordinate.
  • FIG. 7 is a diagram showing an example of a table stored in the 3D posture information 62.
  • the 3D posture information 62 is a table whose columns are subject ID, region ID, region x coordinate, region y coordinate, and region z coordinate.
  • FIG. 8 is a diagram showing the flow of data between the functional blocks shown in FIG. 6.
  • FIG. 8 shows the case of camera parameter pre-calibration.
  • multi-view video data from cameras 1-1 to 1-n is passed to the current frame extraction unit 21 along with the camera ID.
  • the current frame extraction unit 21 extracts the n-th frame image for each camera ID from the video data and passes it to the 2D posture information generation unit 30.
  • the 2D posture information generation section 30 calculates 2D posture information in the n-th frame image and passes it to the horizontal reversal determination section 41 of the posture information processing section 40 . Further, the 2D posture information is stored in the 2D posture information storage section of the posture information processing section 40.
  • the posture information processing section 40 performs [basic processing] using the horizontal reversal determining section 41, and performs [correction 1], [correction 2], and [correction 3] processing using the horizontal reversal correction processing section 42 to obtain the 3D posture. Calculate information.
  • the 3D posture information is output to the outside and is also stored in the 3D posture information storage section.
  • FIG. 9 is a diagram for explaining an overview of [basic processing].
  • Basic processing is the idea that ⁇ a predicted object is projected for each camera in the current frame, and if the distance between the predicted object and the two-dimensional posture information of the current frame is large, horizontal reversal processing is performed.''
  • a predicted three-dimensional object is projected onto each camera and matched with the detected object for each camera. That is, matching processing is performed on a 2D plane on each camera image. It is assumed that the three-dimensional object (predicted three-dimensional object) generated from the past (n-1)th frame is correct left and right (no inversion). Then, for one of the two-dimensional objects (detected object) detected from the current (nth) frame, [As detected], [Upper body horizontal flip], [Lower body horizontal flip], and [Entire horizontal flip] are selected. Generate four patterns.
  • a pattern is selected in which the variance or average of the joint position distance (error) with respect to the predicted object (the predicted three-dimensional object is projected onto each camera) is small. This is because if the left and right sides are correct, the average distance for all joints should be the same and the variance should be small. It is particularly effective to focus on dispersion.
  • the evaluation value can be determined using equation (1).
  • is a parameter, and is set to 1, for example.
  • the evaluation value of each of the four patterns is calculated, and the pattern with the smallest evaluation value (score of each pattern) is selected. For example, if there is upper body horizontal reversal, the score as detected will be large, and the score for upper body horizontal reversal will be smaller than this.
  • [Correction 1] is, so to speak, a horizontal reversal determination process within a frame. The predicted subject may be incorrect. However, assuming that the frequency of 2D skeleton information being horizontally reversed is low, if there are many reversed patterns (for each camera/each subject), there is a high possibility that the selection of the horizontally reversed pattern has failed. Therefore, in [Correction 1], if there are many cameras with inversion, it is determined that the result is incorrect and the result is not adopted (no correction is made).
  • FIG. 10 is a diagram for explaining the process of [Correction 2].
  • the estimated three-dimensional object obtained in the first step is input as the predicted three-dimensional object in the second step.
  • horizontal reversal is determined based on the estimated three-dimensional coordinates and the detected subject, and a second estimated value of the three-dimensional coordinates is obtained.
  • FIG. 11 is a diagram for explaining the process of correction 3.
  • the following explanation will focus on horizontal reversal of only the upper body. Actually, the same process is performed for the lower body.
  • FIG. 11 if (1) there is no inversion in the past frame and (2) there is inversion between frames, there is a high possibility that inversion will occur in the current frame (nth frame).
  • FIG. 12 is a diagram for explaining details of the correction 3 process.
  • the inter-frame horizontal reversal determination process will be described in detail.
  • inter-frame horizontal reversal determination processing for the current frame using inter-frame comparison will be described.
  • length indicates the two-dimensional length on the screen when the three-dimensional length between the waist joints of the target subject is displayed on the screen.
  • pprevious indicates the joint position of the past frame.
  • pcurrent indicates the joint position of the current frame.
  • avg() is a function that calculates the average.
  • S indicates each left-right reversal pattern.
  • Spred indicates the horizontal reversal pattern of the current frame predicted from the results of [1] and [2].
  • the length indicates the two-dimensional length on the screen when the three-dimensional length between the waist joints of the target subject is displayed on the screen.
  • is a coefficient (parameter) of the correction term.
  • a pattern with a small variance/average of distances between corresponding joint positions between the previous frame and the current frame is selected from the four patterns of upper body, lower body, with inversion, and without inversion. Furthermore, intra-frame correction and inter-frame correction are performed based on the selected pattern. By doing so, it is possible to suppress horizontal reversal errors, and therefore it is possible to accurately estimate two-dimensional posture information and three-dimensional posture information.
  • an algorithm for determining the horizontal reversal problem of 2D posture information is introduced as post-processing. This improves the robustness of processing and increases the accuracy of estimating 3D posture information. For these reasons, according to the embodiment, it is possible to suppress horizontal reversal of skeletal coordinates in 2D posture information, thereby making it possible to accurately estimate 3D posture information.
  • the present invention can be embodied by modifying the constituent elements without departing from the gist of the present invention at the implementation stage.
  • various inventions can be formed by appropriately combining the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, components from different embodiments may be combined as appropriate.

Abstract

A video processing device according to one aspect of the present invention comprises a two-dimensional pose information generation unit, a pose information processing unit, and a three-dimensionalizing processing unit. The two-dimensional pose information generation unit generates two-dimensional pose information of a photographic subject from video data of the photographic subject. The pose information processing unit corrects the two-dimensional pose information. The three-dimensionalizing processing unit generates three-dimensional pose information of the photographic subject from the corrected two-dimensional pose information. The pose information processing unit comprises a left/right flip determination unit that performs left/right flip determination processing. The left-right flip determination processing is processing in which a first pattern through a fourth pattern are set, said first pattern through fourth pattern corresponding to an upper body and a lower body being flipped or not being flipped in an image frame of the photographic subject in the video data, and the pattern with the lowest score among the first pattern through the fourth pattern is selected, said score being calculated using an evaluation formula in which the variance and the mean of distances of joint locations corresponding to the previous frame and the current frame are treated as variables.

Description

映像処理装置、映像処理方法、およびプログラムVideo processing device, video processing method, and program
 この発明の一態様は、例えば人の姿勢を検知して3D(3次元)のワイヤフレームモデルを作成するための映像処理装置と、コンピュータによる映像処理方法、およびプログラムに関する。 One aspect of the present invention relates to an image processing device, a computer-based image processing method, and a program for detecting, for example, a person's posture and creating a 3D (three-dimensional) wire frame model.
 [身体知]は、例えば[技]として人の姿勢や動作に表れる。このような言語化されない、いわば人の感覚に基づく情報を映像処理によって数値化し、記録する技術が盛んに研究されている。人の姿勢を数値化すること(以下、技能キャプチャ)ができれば、事後の解析や比較により、例えば指導者が分析するための可視化が可能になる。また、指導者の技を効果的に再現し、効率的に練習者に伝授するためにも、人の姿勢を高い精度で推定(取得)できることが望ましい。 [Bodily knowledge] is expressed in a person's posture and movements, for example, as [techniques]. There is active research into technology that uses video processing to digitize and record information that cannot be expressed in words, but is based on human senses. If a person's posture can be quantified (hereinafter referred to as skill capture), it will be possible to visualize it for analysis by instructors, for example, through post-mortem analysis and comparison. Furthermore, in order to effectively reproduce the instructor's techniques and efficiently teach them to practitioners, it is desirable to be able to estimate (obtain) a person's posture with high accuracy.
 OpenPoseは、被写体としての人物の画像データから骨格の座標を検出し、ワイヤフレームモデルを生成することの可能なオープンソースライブラリである。これを利用して、画像フレームごとに各関節点の位置(2D姿勢情報)を推定し、2D姿勢を数値化する技術が知られている(例えば、非特許文献1を参照)。2D(2次元)の姿勢情報を3次元化し、3D(3次元)姿勢情報を得るための技術も知られている(例えば、非特許文献2,3を参照)。 OpenPose is an open source library that can detect skeletal coordinates from image data of a person as a subject and generate a wireframe model. A technique is known that utilizes this to estimate the position (2D posture information) of each joint point for each image frame and quantify the 2D posture (for example, see Non-Patent Document 1). Techniques for converting 2D (two-dimensional) posture information into three dimensions and obtaining 3D (three-dimensional) posture information are also known (for example, see Non-Patent Documents 2 and 3).
 姿勢推定技術を用いた被写体群の3次元復元に関する既存の技術において、2D姿勢を推定する際に左右の座標が反転してしまう(例えば、右腕に関わる骨格座標が、誤って左腕に関わる骨格座標として推定されてしまう)ことがあった。2Dでの骨格座標で左右が反転したデータを用いると、3Dの姿勢情報の推定精度が低下してしまう。このことは、例えば技能キャプチャの処理精度の劣化をもたらすため、好ましくない。 
 この発明は上記事情に着目してなされたもので、その目的は、2D姿勢情報における骨格座標の左右反転を抑圧し、これにより3D姿勢情報を正確に推定することのできる技術を提供することにある。
In the existing technology for 3D reconstruction of a group of subjects using pose estimation technology, the left and right coordinates are reversed when estimating the 2D pose (for example, the skeletal coordinates related to the right arm are incorrectly changed to the skeletal coordinates related to the left arm). ). If data whose left and right sides are reversed in 2D skeletal coordinates is used, the accuracy of estimating 3D posture information will decrease. This is undesirable, for example, because it causes deterioration in the processing accuracy of skill capture.
This invention was made in view of the above circumstances, and its purpose is to provide a technology that suppresses left-right reversal of skeletal coordinates in 2D posture information, thereby making it possible to accurately estimate 3D posture information. be.
 この発明の一態様に係る映像処理装置は、2D姿勢情報生成部と、姿勢情報処理部と、3D化処理部とを具備する。2D姿勢情報生成部は、被写体の映像データから被写体の2D姿勢情報を生成する。姿勢情報処理部は、2D姿勢情報を補正する。3D化処理部は、補正された2D姿勢情報から被写体の3D姿勢情報を生成する。姿勢情報処理部は、左右反転判定処理を行う左右反転判定部を備える。そして、左右反転判定処理は、映像データの被写体の画像フレームに対して上半身および下半身についての反転有りまたは反転無しに対応する第1パターン乃至第4パターンを設定し、当該第1パターン乃至第4パターンのうち、前フレームと現フレームの対応する関節位置の距離の分散と平均を変数とする評価式により計算されるスコアが最小となるパターンを選択する処理である。 A video processing device according to one aspect of the present invention includes a 2D posture information generation section, a posture information processing section, and a 3D conversion processing section. The 2D posture information generation unit generates 2D posture information of the subject from video data of the subject. The posture information processing section corrects the 2D posture information. The 3D processing unit generates 3D posture information of the subject from the corrected 2D posture information. The posture information processing section includes a left-right reversal determination section that performs left-right reversal determination processing. Then, in the horizontal reversal determination process, first to fourth patterns corresponding to whether or not the upper body and lower body are reversed are set for the image frame of the subject in the video data, and the first to fourth patterns are set. This process selects the pattern that minimizes the score calculated by an evaluation formula using the variance and average of distances between corresponding joint positions between the previous frame and the current frame as variables.
 この発明の一態様によれば、2D姿勢情報における骨格座標の左右反転を抑圧し、これにより3D姿勢情報を正確に推定することのできる技術を提供することができる。 According to one aspect of the present invention, it is possible to provide a technique that suppresses horizontal reversal of skeletal coordinates in 2D posture information and thereby accurately estimates 3D posture information.
図1は、身体知を再現、伝授するためのワークフローの一例を示す図である。FIG. 1 is a diagram showing an example of a workflow for reproducing and transmitting physical knowledge. 図2は、技能キャプチャで用いられるワイヤフレームモデルの一例を示す図である。FIG. 2 is a diagram showing an example of a wireframe model used in skill capture. 図3は、2D姿勢情報から3D姿勢情報を推定することについて説明するための図である。FIG. 3 is a diagram for explaining estimating 3D posture information from 2D posture information. 図4は、左右の反転について説明するための図である。FIG. 4 is a diagram for explaining horizontal reversal. 図5は、実施形態に係わる映像処理装置の一例を示す機能ブロック図である。FIG. 5 is a functional block diagram showing an example of a video processing device according to an embodiment. 図6は、2D姿勢情報61に格納されるテーブルの一例を示す図である。FIG. 6 is a diagram showing an example of a table stored in the 2D posture information 61. 図7は、3D姿勢情報62に格納されるテーブルの一例を示す図である。FIG. 7 is a diagram showing an example of a table stored in the 3D posture information 62. 図8は、図6に示される機能ブロック間でのデータの流れを示す図である。FIG. 8 is a diagram showing the flow of data between the functional blocks shown in FIG. 6. 図9は、[基本処理]の概要について説明するための図である。FIG. 9 is a diagram for explaining an overview of [basic processing]. 図10は、[補正2]の処理について説明するための図である。FIG. 10 is a diagram for explaining the process of [Correction 2]. 図11は、[補正3]の処理について説明するための図である。FIG. 11 is a diagram for explaining the process of [Correction 3]. 図12は、[補正3]の処理の詳細について説明するための図である。FIG. 12 is a diagram for explaining details of the process of [Correction 3].
 以下、図面を参照してこの発明に係わる実施形態を説明する。 
 図1は、身体知を再現、伝授(疑似体験等含む)するためのワークフローの一例を示す図である。身体知は、カメラやセンサにより取得(キャプチャ)された技能を解析し、例えば熟練者の映像データと練習者の映像データとを比較して改善点を抽出し、適切な提示方法でフィードバックすることで有効に活用される。
Embodiments of the present invention will be described below with reference to the drawings.
FIG. 1 is a diagram showing an example of a workflow for reproducing and teaching physical knowledge (including simulated experiences, etc.). Embodied knowledge involves analyzing skills captured by cameras and sensors, for example, comparing video data of an expert with video data of a practitioner, extracting areas for improvement, and providing feedback in an appropriate presentation method. will be used effectively.
 技能キャプチャは、特に注意を払うこともせず設置された複数のカメラを用いて、カメラキャリブレーションなしに、カメラパラメータ(位置・向き・視野角・歪み)、各被写体の追跡、各被写体の3次元姿勢(関節の3次元座標)等の項目を一度に取得する技術といえる。 Skill capture uses multiple cameras installed without paying special attention to the camera parameters (position, orientation, viewing angle, distortion), tracking of each subject, and 3D measurement of each subject without camera calibration. This can be said to be a technology that acquires items such as posture (three-dimensional coordinates of joints) at once.
 技能キャプチャは、映像データからの2D姿勢推定、デジタル化プロセス、回転角のフィッティング、および、回転角のノイズ除去の一連のプロセスを備える。 
 2D姿勢推定では、映像の各フレームを入力として特徴点(2D骨格座標等)の姿勢が推定される。多人数への対応も可能である。 
 デジタル化プロセスでは、被写体毎に分離・追跡して、骨格の3D骨格座標を姿勢推定し、同時にカメラパラメータも推定される。 
 回転角のフィッティングおよびノイズ除去では、3D座標から3D回転角のモデルに変換し、さらにノイズが除去される。
Skill capture comprises a series of processes: 2D pose estimation from video data, digitization process, rotation angle fitting, and rotation angle denoising.
In 2D pose estimation, the pose of feature points (2D skeletal coordinates, etc.) is estimated using each frame of a video as input. It is also possible to accommodate a large number of people.
In the digitization process, each subject is separated and tracked, the pose of the 3D skeletal coordinates of the skeleton is estimated, and camera parameters are also estimated at the same time.
In rotation angle fitting and noise removal, 3D coordinates are converted into a 3D rotation angle model, and noise is further removed.
 身体知の再現、伝授をより効果的に実現するためには、図1に示される各要素を洗練することが望まれる。ここで、解析の対象となる「技能」は、身体動作(指先まで含む)、生理情報、生理反応、心理状態等を含む。解析対象の範囲はさらに拡大することができる。 In order to more effectively reproduce and transfer physical knowledge, it is desirable to refine each element shown in Figure 1. Here, the "skills" to be analyzed include physical movements (including fingertips), physiological information, physiological reactions, psychological states, and the like. The scope of analysis can be further expanded.
 図2は、技能キャプチャで用いられるワイヤフレームモデルの一例を示す図である。ワイヤフレームモデルにおいて、関節などの身体部位ごとに番号(部位ID)が付与される。ワイヤフレームモデルの各部位の座標を取得することで、2次元での姿勢情報が表現される。 FIG. 2 is a diagram showing an example of a wireframe model used in skill capture. In the wire frame model, a number (part ID) is assigned to each body part such as a joint. Two-dimensional posture information is expressed by acquiring the coordinates of each part of the wireframe model.
 図3は、2D姿勢情報から3D姿勢情報を推定することについて説明するための図である。異なる視点に設置された複数のカメラ(カメラ1、カメラ2)において、それぞれ2D姿勢情報が取得される。ここで複数のカメラは互いに同期し、被写体(人物など)の映像データをそれぞれ取得する。 FIG. 3 is a diagram for explaining estimating 3D posture information from 2D posture information. 2D posture information is obtained from a plurality of cameras (camera 1, camera 2) installed at different viewpoints. Here, the plurality of cameras are synchronized with each other and each acquires video data of a subject (such as a person).
 2D姿勢情報とは、各カメラ映像の第nフレームにおいて被写体の関節位置を表す2D座標である。例えば、(カメラID,部位ID,部位x座標,部位y座標)の4元ベクトルの組で表すことができる。図3において、カメラ2のフレームにおいて右手首を表すと、2D姿勢情報は、例えば(カメラID=2,部位ID=4,部位x座標=30,部位y座標=100)と表される。 2D posture information is 2D coordinates representing the joint positions of the subject in the n-th frame of each camera image. For example, it can be represented by a set of four vectors (camera ID, part ID, part x coordinate, part y coordinate). In FIG. 3, when the right wrist is represented in the frame of camera 2, the 2D posture information is expressed as (camera ID=2, part ID=4, part x coordinate=30, part y coordinate=100), for example.
 3D姿勢情報とは、各カメラ映像の第nフレームにおいて、複数のカメラの2D姿勢情報から3D推定される関節位置を表す3D座標である。例えば、(部位ID,部位x座標,部位y座標,部位z座標)の4元ベクトルの組で表すことができる。図3の例では、3D姿勢情報は、例えば(部位ID=4,部位x座標=50,部位y座標=300,部位z座標=150)と表される。なお、図3においては2台カメラが示されるが、3台以上の場合も同様である。 3D posture information is 3D coordinates representing joint positions estimated in 3D from 2D posture information of multiple cameras in the n-th frame of each camera image. For example, it can be represented by a set of four-dimensional vectors (part ID, part x coordinate, part y coordinate, part z coordinate). In the example of FIG. 3, the 3D posture information is expressed, for example, as (part ID=4, part x coordinate=50, part y coordinate=300, part z coordinate=150). Note that although two cameras are shown in FIG. 3, the same applies to the case of three or more cameras.
 図4は、左右の反転について説明するための図である。図4(a)において、被写体の上半身の部位の左右の対応が逆になっていることが分かる。この状態が、「左右の反転」である。一方、図4(b)においては右手は右手に、左手は左手に正しく対応しており、左右の反転は起こっていない。 FIG. 4 is a diagram for explaining horizontal reversal. In FIG. 4A, it can be seen that the left and right correspondences of the upper body parts of the subject are reversed. This state is "left-right reversal." On the other hand, in FIG. 4(b), the right hand correctly corresponds to the right hand, and the left hand corresponds to the left hand, and no horizontal reversal occurs.
 左右の反転は、映像処理の際の認識エラー等が原因となって発生するもので、上半身、下半身のいずれにも発生し得る。実施形態では、(上半身、反転無し)、(上半身、反転有り)、(下半身、反転無し)、(下半身、反転有り)の4つのケースの組み合わせを考える。 Left/right reversal occurs due to recognition errors during video processing, and can occur in either the upper or lower body. In the embodiment, combinations of four cases are considered: (upper body, not inverted), (upper body, inverted), (lower body, not inverted), and (lower body, inverted).
 このように、技能キャプチャ処理においては、映像の各フレームを入力として特徴点(2D骨格座標等)の姿勢が推定される。このとき、2D骨格座標が左右反転する場合がある。これは、2D骨格座標から推定される3D骨格座標の精度の低下を招くので好ましくない。3D骨格座標を姿勢推定する際に、既存の技術では、入力される2D姿勢情報に対して、左右反転問題を解決する判定アルゴリズムを含まない。 In this way, in the skill capture process, the posture of feature points (2D skeleton coordinates, etc.) is estimated using each frame of the video as input. At this time, the 2D skeleton coordinates may be horizontally reversed. This is not preferable because it causes a decrease in the accuracy of the 3D skeletal coordinates estimated from the 2D skeletal coordinates. When estimating the pose of 3D skeletal coordinates, existing techniques do not include a determination algorithm for solving the horizontal reversal problem for input 2D pose information.
 そこで実施形態では、2D骨格座標から3D骨格座標を推定する際に、フレーム毎/被写体ID毎に左右反転の有無を判定し、さらにその結果に基づいて補正処理を行う。以下の説明では、左右反転の有無を判定する[基本処理]と、[補正1]、[補正2]、[補正3]で区別される複数の補正処理について説明する。 Therefore, in the embodiment, when estimating 3D skeletal coordinates from 2D skeletal coordinates, the presence or absence of horizontal reversal is determined for each frame/for each subject ID, and further, a correction process is performed based on the result. In the following description, a [basic process] for determining the presence or absence of horizontal reversal, and a plurality of correction processes distinguished by [correction 1], [correction 2], and [correction 3] will be described.
 ここで、映像データにおける最初のフレームにおいては過去の3D骨格座標を参照することができない。しかし、次のフレームからは過去の3D骨格座標を参照することができるようになる(ケース1)。つまりフレーム間での被写体追跡ができる(過去(第n-1)フレームから、現(第n)フレームの予測被写体を生成可能)。そこで実施形態では(ケース1)について詳しく説明する。 Here, past 3D skeletal coordinates cannot be referenced in the first frame of the video data. However, from the next frame onwards, the past 3D skeleton coordinates can be referenced (Case 1). In other words, the object can be tracked between frames (the predicted object for the current (nth) frame can be generated from the past (n-1)th frame). Therefore, in the embodiment, (Case 1) will be explained in detail.
 [基本処理]
 予測被写体は正しいことを仮定し(ただし後工程で補正する)、現フレームとの関節位置距離の分散・平均が少ないパターン(上/下半身での反転有無の4つ)から選択する。
[Basic processing]
It is assumed that the predicted subject is correct (however, it will be corrected in a later process), and selected from patterns (four patterns with and without inversion in the upper/lower body) that have a small variance/average of joint position distances from the current frame.
 (フレーム内補正)
 (1-1)[補正1] 反転有り/無しパターンのカメラ数で判定し補正する。 
 (1-2)[補正2] 基本処理に加えて補正1を2回(2回目の予測被写体は1回目の結果から生成する。これを2段階左右反転判定と称する)にわたり実施する。
(Intra-frame correction)
(1-1) [Correction 1] Determine and correct based on the number of cameras in the pattern with/without inversion.
(1-2) [Correction 2] In addition to the basic processing, correction 1 is performed twice (the predicted object for the second time is generated from the result of the first time. This is referred to as a two-step horizontal reversal determination).
 (フレーム間補正)
 (1-3)[補正3] 現フレームと過去フレームとの間での左右反転の有無を判定(フレーム間左右判定)し、補正を行う。
(Interframe correction)
(1-3) [Correction 3] Determine whether there is left-right reversal between the current frame and the past frame (inter-frame left-right determination), and perform correction.
 図5は、実施形態に係わる映像処理装置の一例を示す機能ブロック図である。映像処理装置10は、プロセッサ20およびメモリ90を具備する情報処理装置(コンピュータ)である。加えて、映像処理装置10は、ストレージ60と、複数のカメラ1-1~1-nに接続されるインタフェース部70とを備える。映像処理装置10の各部はバス80を介して接続される。 FIG. 5 is a functional block diagram showing an example of a video processing device according to an embodiment. The video processing device 10 is an information processing device (computer) that includes a processor 20 and a memory 90. In addition, the video processing device 10 includes a storage 60 and an interface unit 70 connected to the plurality of cameras 1-1 to 1-n. Each part of the video processing device 10 is connected via a bus 80.
 ストレージ60は、2D姿勢情報61、3D姿勢情報62、およびプログラム63を記憶する。プログラム63は、映像処理装置10のOS(Operation System)によりメモリ90にロードされ、プロセッサ20により実行される。プログラム63は、プロセッサ20を、2D姿勢情報生成部30、姿勢情報処理部40、および、3D化処理部50として機能させる。姿勢情報処理部40は、処理ルーチンとしての左右反転判定部41、および左右反転補正処理部42を備える。 The storage 60 stores 2D posture information 61, 3D posture information 62, and a program 63. The program 63 is loaded into the memory 90 by the OS (Operation System) of the video processing device 10 and executed by the processor 20. The program 63 causes the processor 20 to function as the 2D posture information generation section 30, the posture information processing section 40, and the 3D conversion processing section 50. The posture information processing section 40 includes a left/right reversal determination section 41 and a left/right reversal correction processing section 42 as a processing routine.
 2D姿勢情報生成部30は、被写体の映像データから当該被写体の2D姿勢情報を生成する。 
 姿勢情報処理部40は、2D姿勢情報生成部30により生成された2D姿勢情報を補正する。 
 3D化処理部50は、姿勢情報処理部40により補正された2D姿勢情報から、当該被写体の3D姿勢情報を生成する。
The 2D posture information generation unit 30 generates 2D posture information of the subject from video data of the subject.
The posture information processing section 40 corrects the 2D posture information generated by the 2D posture information generation section 30.
The 3D processing section 50 generates 3D posture information of the subject from the 2D posture information corrected by the posture information processing section 40.
 左右反転判定部41は、被写体の映像データの被写体の画像フレームに対して、[検出のまま]、[上半身左右反転]、[下半身左右反転]、および、[全体左右反転]の4つのパターンを設定する。これらのパターンは、(上半身、反転無し)、(上半身、反転有り)、(下半身、反転無し)、(下半身、反転有り)の4つのケースの組み合わせである。つまりこれらのパターンは、上半身および下半身についての反転有りまたは反転無しに対応する。 The horizontal reversal determination unit 41 selects four patterns for the image frame of the subject in the video data of the subject: [as detected], [upper body horizontally reversed], [lower body horizontally reversed], and [entire horizontally reversed]. Set. These patterns are combinations of four cases: (upper body, no inversion), (upper body, with inversion), (lower body, no inversion), and (lower body, with inversion). In other words, these patterns correspond to cases where the upper body and lower body are inverted or not inverted.
 そして、左右反転判定部41は、これら4つのパターンのうち、前フレームと現フレームの対応する関節位置の距離の分散と平均を変数とする評価式により計算されるスコアが最小となるパターンを選択する。 Then, the left/right reversal determining unit 41 selects, from among these four patterns, the pattern that minimizes the score calculated by the evaluation formula using the variance and average of the distances between the corresponding joint positions of the previous frame and the current frame as variables. do.
 左右反転補正処理部42は、フレーム内での左右反転判定処理を行う。 
 または、左右反転補正処理部42は、フレーム内での2段階の左右反転判定処理を行う。 
 または、左右反転補正処理部42は、連続するフレーム間での左右反転判定処理を行う。 
 ここで、左右反転補正処理部42により実施される左右反転補正処理は、左右反転判定処理部41による処理と同様である。
The horizontal reversal correction processing unit 42 performs a horizontal reversal determination process within a frame.
Alternatively, the left-right reversal correction processing unit 42 performs two-stage left-right reversal determination processing within a frame.
Alternatively, the horizontal reversal correction processing unit 42 performs horizontal reversal determination processing between consecutive frames.
Here, the horizontal reversal correction processing performed by the horizontal reversal correction processing section 42 is similar to the processing performed by the horizontal reversal determination processing section 41.
 図6は、2D姿勢情報61に格納されるテーブルの一例を示す図である。2D姿勢情報61は、被写体ID、カメラID、部位ID、部位x座標、および部位y座標をカラムとするテーブルである。 FIG. 6 is a diagram showing an example of a table stored in the 2D posture information 61. The 2D posture information 61 is a table whose columns are subject ID, camera ID, region ID, region x coordinate, and region y coordinate.
 図7は、3D姿勢情報62に格納されるテーブルの一例を示す図である。3D姿勢情報62は、被写体ID、部位ID、部位x座標、部位y座標、および部位z座標をカラムとするテーブルである。 FIG. 7 is a diagram showing an example of a table stored in the 3D posture information 62. The 3D posture information 62 is a table whose columns are subject ID, region ID, region x coordinate, region y coordinate, and region z coordinate.
 図8は、図6に示される機能ブロック間でのデータの流れを示す図である。図8は、カメラパラメータ事前キャリブレーションの場合について示す。図8において、カメラ1-1~1-nからの多視点の映像データは、カメラIDとともに現フレーム抽出部21に渡される。 FIG. 8 is a diagram showing the flow of data between the functional blocks shown in FIG. 6. FIG. 8 shows the case of camera parameter pre-calibration. In FIG. 8, multi-view video data from cameras 1-1 to 1-n is passed to the current frame extraction unit 21 along with the camera ID.
 現フレーム抽出部21は、映像データからカメラIDごとの第nフレーム画像を抽出し、2D姿勢情報生成部30に渡す。 
 2D姿勢情報生成部30は、第nフレーム画像における2D姿勢情報を算出して、姿勢情報処理部40の左右反転判定部41に渡す。また2D姿勢情報は、姿勢情報処理部40の2D姿勢情報記憶部に記憶される。
The current frame extraction unit 21 extracts the n-th frame image for each camera ID from the video data and passes it to the 2D posture information generation unit 30.
The 2D posture information generation section 30 calculates 2D posture information in the n-th frame image and passes it to the horizontal reversal determination section 41 of the posture information processing section 40 . Further, the 2D posture information is stored in the 2D posture information storage section of the posture information processing section 40.
 姿勢情報処理部40は、左右反転判定部41により[基本処理]を行い、左右反転補正処理部42により[補正1]、[補正2]、および[補正3]の各処理を行って3D姿勢情報を算出する。3D姿勢情報は外部に出力されるとともに、3D姿勢情報記憶部に記憶される。 The posture information processing section 40 performs [basic processing] using the horizontal reversal determining section 41, and performs [correction 1], [correction 2], and [correction 3] processing using the horizontal reversal correction processing section 42 to obtain the 3D posture. Calculate information. The 3D posture information is output to the outside and is also stored in the 3D posture information storage section.
 [基本処理について]
 図9は、[基本処理]の概要について説明するための図である。[基本処理]は、「現フレームで、カメラ毎に予測被写体を投影し、この予測被写体と現フレームの2次元姿勢情報との距離大なら左右反転処理する」という考え方である。
[About basic processing]
FIG. 9 is a diagram for explaining an overview of [basic processing]. [Basic processing] is the idea that ``a predicted object is projected for each camera in the current frame, and if the distance between the predicted object and the two-dimensional posture information of the current frame is large, horizontal reversal processing is performed.''
 左右反転に対処するにあたり、[基本処理]では、予測3次元被写体を各カメラに投影し、カメラ毎に検出被写体とマッチングする。すなわち、各カメラ画像上の2D平面でのマッチング処理を行う。 
 過去(第n-1)フレームから生成した3次元被写体(予測3次元被写体)が左右正しい(反転無し)と仮定する。そして、現(第n)フレームから検出した2次元被写体の1つ(検出被写体)に対して[検出のまま]、[上半身左右反転]、[下半身左右反転]、および、[全体左右反転]の4つのパターンを生成する。そのうえで、(予測3次元被写体を各カメラ上に投影した)予測被写体との関節位置距離(誤差)の分散、または平均が少ないパターンを選択する。左右が正しければ全関節で距離平均は同程度で分散が小さいはずだからである。特に分散に着目することが有効である。
To deal with horizontal reversal, in [basic processing] a predicted three-dimensional object is projected onto each camera and matched with the detected object for each camera. That is, matching processing is performed on a 2D plane on each camera image.
It is assumed that the three-dimensional object (predicted three-dimensional object) generated from the past (n-1)th frame is correct left and right (no inversion). Then, for one of the two-dimensional objects (detected object) detected from the current (nth) frame, [As detected], [Upper body horizontal flip], [Lower body horizontal flip], and [Entire horizontal flip] are selected. Generate four patterns. Then, a pattern is selected in which the variance or average of the joint position distance (error) with respect to the predicted object (the predicted three-dimensional object is projected onto each camera) is small. This is because if the left and right sides are correct, the average distance for all joints should be the same and the variance should be small. It is particularly effective to focus on dispersion.
 例えば式(1)により評価値を求めることができる。
Figure JPOXMLDOC01-appb-M000001
For example, the evaluation value can be determined using equation (1).
Figure JPOXMLDOC01-appb-M000001
 式(1)においてαはパラメータであり、例えば1に設定される。式(1)を用いて、4つのパターンのそれぞれの評価値を算出し、評価値(各パターンのスコア)が最も小さいパターンが選択される。例えば、上半身左右反転有りであれば、検出のままのスコアが大となり、上半身左右反転のスコアはこれより小となる。 In formula (1), α is a parameter, and is set to 1, for example. Using equation (1), the evaluation value of each of the four patterns is calculated, and the pattern with the smallest evaluation value (score of each pattern) is selected. For example, if there is upper body horizontal reversal, the score as detected will be large, and the score for upper body horizontal reversal will be smaller than this.
 [補正1について]
 [補正1]は、いわばフレーム内での左右反転判定処理である。予測被写体は誤っている可能性がある。ただし、2D骨格情報が左右反転する頻度は少ないとすると、反転させたパターン(カメラ毎/被写体毎)が多いときは、左右反転パターンの選択が失敗している可能性が高い。そこで[補正1]では、反転有のカメラが多い場合は誤った判定として、結果を採用しない(補正しない)。
[About correction 1]
[Correction 1] is, so to speak, a horizontal reversal determination process within a frame. The predicted subject may be incorrect. However, assuming that the frequency of 2D skeleton information being horizontally reversed is low, if there are many reversed patterns (for each camera/each subject), there is a high possibility that the selection of the horizontally reversed pattern has failed. Therefore, in [Correction 1], if there are many cameras with inversion, it is determined that the result is incorrect and the result is not adopted (no correction is made).
 反転させたカメラ数が多い場合は、予測が間違っていると判断し、検出のままを採用する。つまり、検出のまま=n1台、上半身左右反転=n2台、下半身左右反転=n3台、全体左右反転=n4台とし、カメラの総数NをN=n1+n2+n3+n4とする。このときn2+n3+n4 > N/2であれば、全てのカメラについて「検出のまま」に戻す補正を行う。 If there are a large number of reversed cameras, it is determined that the prediction is incorrect and the detection is used as is. That is, as detected = n1 cameras, upper body horizontally reversed = n2 cameras, lower body horizontally reversed = n3 cameras, whole horizontally reversed = n4 cameras, and the total number of cameras N is N = n1 + n2 + n3 + n4. At this time, if n2+n3+n4>N/2, correction is performed to return all cameras to "as detected".
 [補正2について]
 図10は、[補正2]の処理について説明するための図である。[補正2]は、いわばフレーム内での2段階左右反転判定処理である。 
 2D骨格座標が左右反転する頻度が少なく、[補正1]の左右反転判定の誤りが少なければ、予測された3次元姿勢情報(予測3次元被写体=1フレーム前の3次元被写体)から推定された3次元姿勢情報(推定3次元被写体)は、真値の座標に近い可能性が高い。そこで、真値の座標により近づけるため、[補正2]では基本処理+[補正1]をもう1度繰り返す。
[About correction 2]
FIG. 10 is a diagram for explaining the process of [Correction 2]. [Correction 2] is, so to speak, a two-step horizontal reversal determination process within a frame.
If the frequency of 2D skeletal coordinates being reversed horizontally is low, and there are few errors in determining horizontal reversal in [Correction 1], the predicted 3D pose information (predicted 3D object = 3D object 1 frame before) will be The three-dimensional posture information (estimated three-dimensional object) is likely to be close to the true coordinates. Therefore, in order to bring the coordinates closer to the true value, in [Correction 2], the basic processing + [Correction 1] is repeated once again.
 (a)1段階目で得られた推定3次元被写体を、予測3次元被写体として2段階目に入力する。 
 (b)2段階目では、推定3次元座標と検出被写体で左右反転を判断し、2回目の3次元座標の推定値が得られる。
(a) The estimated three-dimensional object obtained in the first step is input as the predicted three-dimensional object in the second step.
(b) In the second stage, horizontal reversal is determined based on the estimated three-dimensional coordinates and the detected subject, and a second estimated value of the three-dimensional coordinates is obtained.
 [補正3について]
 図11は、補正3の処理について説明するための図である。図11において、説明の簡易化のため、以降では上半身のみの左右反転に着目して説明する。実際には下半身部分でも同じ処理を行う。図11の(1)過去フレームで反転無しで、(2)フレーム間で反転有りであれば、現フレーム(第nフレーム)では反転有りの可能性が高い。
[About correction 3]
FIG. 11 is a diagram for explaining the process of correction 3. In FIG. 11, in order to simplify the explanation, the following explanation will focus on horizontal reversal of only the upper body. Actually, the same process is performed for the lower body. In FIG. 11, if (1) there is no inversion in the past frame and (2) there is inversion between frames, there is a high possibility that inversion will occur in the current frame (nth frame).
 [補正3]は、いわばフレーム間での左右反転判定処理である。 
 フレーム間の2D姿勢情報で左右反転を検出するロジックを導入すると、現フレームでの左右反転を予測できる。例えば、過去フレームで反転無しで、フレーム間で反転有りであれば、現フレームでは反転有りの可能性が高い。そこで、以下の比較により、(1)、(2)の推定を行う。 
 (1)過去フレームの左右反転状況は、3次元姿勢情報と過去フレームの2次元姿勢情報を比較すれば推定できる。 
 (2)過去フレームの2D姿勢情報と現フレームの2D姿勢情報を比較(フレーム間比較)することにより、2D姿勢情報が現フレームと過去フレームで左右反転しているかが推定できる。
[Correction 3] is, so to speak, a horizontal reversal determination process between frames.
By introducing logic that detects horizontal reversal using 2D posture information between frames, it is possible to predict horizontal reversal in the current frame. For example, if there is no inversion in the past frame and there is inversion between frames, there is a high possibility that inversion will occur in the current frame. Therefore, (1) and (2) are estimated by the following comparison.
(1) The horizontal reversal status of the past frame can be estimated by comparing the 3D orientation information and the 2D orientation information of the past frame.
(2) By comparing the 2D orientation information of the past frame and the 2D orientation information of the current frame (inter-frame comparison), it can be estimated whether the 2D orientation information is horizontally reversed between the current frame and the past frame.
 図12は、補正3の処理の詳細について説明するための図である。図12を参照して、フレーム間左右反転判定処理について詳しく説明する。以下では、フレーム間比較を用いた現フレームにおけるフレーム間左右反転判定処理を説明する。簡単のため上半身の反転ありと反転なしの2パターンで説明する。実際は下半身についても同様な処理を行う。 FIG. 12 is a diagram for explaining details of the correction 3 process. Referring to FIG. 12, the inter-frame horizontal reversal determination process will be described in detail. In the following, inter-frame horizontal reversal determination processing for the current frame using inter-frame comparison will be described. For simplicity, we will explain two patterns, one with and without inversion of the upper body. Actually, similar processing is performed for the lower body.
 図12(a)と図12(b)との比較により、現フレームでの左右反転を予測できる。つまり、「過去フレームと現フレーム間で左右反転有無を予測したもの(=d=a+b)」と、「現フレームの左右反転有無(=c)」とが合致すれば、cの結果が信頼できると考えられる。このとき、信頼できるパターンでは、スコアの値を下げることで採用されやすくなる。 By comparing FIG. 12(a) and FIG. 12(b), it is possible to predict the horizontal reversal in the current frame. In other words, if "prediction of horizontal reversal between past frame and current frame (=d=a+b)" matches "presence of horizontal reversal of current frame (=c)", the result of c is reliable. it is conceivable that. At this time, reliable patterns are more likely to be adopted by lowering their score.
 [1]過去フレームの左右反転判定(図12(a))
 過去フレームで推定した3次元被写体を画面に投影し、過去フレームでの2D姿勢情報と比較する。比較により、過去フレームでの対象カメラ・被写体の左右反転を判断する。左右反転判断には、左右反転パターンを生成し、投影被写体との関節間の位置距離が最も少ないパターンを選択する。この時点での左右反転推定は過去フレームからの動きによる予測ではなく、複数のカメラの観測点から推定した3次元座標であるので、予測を使用したものよりは、精度が高いと考えられる。
[1] Determination of horizontal reversal of past frames (Figure 12(a))
The 3D object estimated in the past frame is projected onto the screen and compared with the 2D pose information in the past frame. By comparison, it is determined whether the target camera/subject is horizontally reversed in the past frame. For horizontal reversal determination, horizontal reversal patterns are generated, and the pattern with the smallest positional distance between joints from the projection subject is selected. The left-right inversion estimation at this point is not a prediction based on movement from past frames, but is based on three-dimensional coordinates estimated from observation points of multiple cameras, so it is considered to be more accurate than one using prediction.
 [2]フレーム間左右反転判定(図12(a))
 過去フレームの2D姿勢情報と現フレームの2D姿勢情報を比較し、対象カメラ・被写体でのフレーム間で左右反転が起こっているか判断する。
過去フレームと現フレームでの関節位置の差を算出し、右半身・左半身の関節の位置差がそれぞれ閾値を超えている場合にフレーム間で左右反転していると判断する。これは、左右反転していなければ、関節位置は大幅に変化しないという仮定を置いている。関節の位置差は、下式のように被写体の画面に映っている大きさでスケールを調整する。閾値dthを0.8と設定し、式(2)の条件が右半身・左半身共に満たされるときにフレーム間で左右反転しているとする。
[2] Inter-frame left/right reversal determination (Figure 12(a))
The 2D posture information of the past frame and the 2D posture information of the current frame are compared to determine whether horizontal reversal has occurred between the frames of the target camera/subject.
The difference in joint positions between the past frame and the current frame is calculated, and if the position difference between the joints on the right and left sides exceeds a threshold value, it is determined that the left and right sides have been reversed between frames. This assumes that unless left and right is flipped, the joint positions will not change significantly. The scale of the joint position difference is adjusted according to the size of the subject on the screen, as shown in the formula below. It is assumed that the threshold value d th is set to 0.8, and that the left and right sides are reversed between frames when the condition of equation (2) is satisfied for both the right and left sides of the body.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 式(2)において、lengthは、対象被写体の腰関節間の3次元の長さ相当を画面映したときの画面上の2次元の長さを示す。ppreviousは過去フレームの関節位置を示す。pcurrentは、現フレームの関節位置を示す。avg()は平均を求める関数である。 In formula (2), length indicates the two-dimensional length on the screen when the three-dimensional length between the waist joints of the target subject is displayed on the screen. pprevious indicates the joint position of the past frame. pcurrent indicates the joint position of the current frame. avg() is a function that calculates the average.
 [3]現フレームの左右反転パターンスコア算出(図13(c))
現フレーム左右反転判定は、左右反転パターン比較による方法と同じ、予測3次元被写体と現フレームの2D姿勢情報を用いて、各反転パターンへの(1)式を用いたスコア算出までを行う。
[3] Calculation of horizontal reversal pattern score for current frame (Figure 13(c))
The current frame horizontal reversal determination is the same as the method using the horizontal reversal pattern comparison, and uses the predicted three-dimensional object and the 2D posture information of the current frame to calculate the score using equation (1) for each reversal pattern.
 [4]現フレームの左右反転判定(図13(d))
 [1]と[2]の結果から現フレームの左右反転を予測する。[1]と[2]のどちらかで反転ありと判断されたならば、左右反転ありと予測する。[1]と[2]のいずれも反転なし、または反転ありならば、左右反転なしと予測される。両方とも反転の場合は、過去で左右反転状態のものがフレーム間で反転している。つまり反転の反転なので、結果として反転なしとなる。ここで得られた予測の信頼度は単独で用いるには足りないので、[3]で算出したスコアの補正として取り扱う。
[4] Judgment of horizontal reversal of current frame (Figure 13(d))
Predict the horizontal reversal of the current frame from the results of [1] and [2]. If it is determined that there is inversion in either [1] or [2], it is predicted that there will be left-right inversion. If both [1] and [2] have no inversion or inversion, it is predicted that there will be no left-right inversion. If both are reversed, the horizontally reversed state in the past has been reversed between frames. In other words, it is a reversal of an inversion, so the result is no inversion. Since the reliability of the prediction obtained here is insufficient to be used alone, it is treated as a correction to the score calculated in [3].
 [3]で算出したスコアに、予測した左右反転パターンのスコアが選択されやすくなるように補正する。式(1)のスコアにこの補正を加えると式(3)を得る。 The score calculated in [3] is corrected so that the score of the predicted left-right reversal pattern is more likely to be selected. Adding this correction to the score in equation (1) yields equation (3).
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 式(3)において、Sは各左右反転パターンを示す。Spredは、[1]と[2]の結果から予測した現フレームの左右反転パターンを示す。lengthは、対象被写体の腰関節間の3次元の長さ相当を画面映したときの画面上の2次元の長さを示す。βは補正項の係数(パラメータ)である。 In formula (3), S indicates each left-right reversal pattern. Spred indicates the horizontal reversal pattern of the current frame predicted from the results of [1] and [2]. The length indicates the two-dimensional length on the screen when the three-dimensional length between the waist joints of the target subject is displayed on the screen. β is a coefficient (parameter) of the correction term.
 以上述べたように、実施形態によれば、上半身、下半身、反転有り、反転無しの4パターンから、前フレームと現フレームの対応する関節位置の距離の分散・平均が小さいパターンが選択される。さらに、選択されたパターンを踏まえて、フレーム内補正とフレーム間補正を実施するようにした。このようにすることで、左右反転のエラーを抑制することができ、従って、2次元姿勢情報、および3次元姿勢情報を正確に推定することが可能になる。 As described above, according to the embodiment, a pattern with a small variance/average of distances between corresponding joint positions between the previous frame and the current frame is selected from the four patterns of upper body, lower body, with inversion, and without inversion. Furthermore, intra-frame correction and inter-frame correction are performed based on the selected pattern. By doing so, it is possible to suppress horizontal reversal errors, and therefore it is possible to accurately estimate two-dimensional posture information and three-dimensional posture information.
 すなわち実施形態によれば、2D姿勢情報の左右反転問題に対して、これを判定するアルゴリズムをポスト処理として導入する。これにより、処理のロバスト性が向上し、3D姿勢情報の推定精度を高めることができる。これらのことから実施形態によれば、2D姿勢情報における骨格座標の左右反転を抑圧し、これにより3D姿勢情報を正確に推定することが可能になる。 That is, according to the embodiment, an algorithm for determining the horizontal reversal problem of 2D posture information is introduced as post-processing. This improves the robustness of processing and increases the accuracy of estimating 3D posture information. For these reasons, according to the embodiment, it is possible to suppress horizontal reversal of skeletal coordinates in 2D posture information, thereby making it possible to accurately estimate 3D posture information.
 なお、本発明は、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。さらに、上記実施形態に開示されている複数の構成要素の適宜な組み合せにより種々の発明を形成できる。例えば、実施形態に示される全構成要素から幾つかの構成要素を削除してもよい。さらに、異なる実施形態に亘る構成要素を適宜組み合せてもよい。 It should be noted that the present invention can be embodied by modifying the constituent elements without departing from the gist of the present invention at the implementation stage. Furthermore, various inventions can be formed by appropriately combining the plurality of components disclosed in the above embodiments. For example, some components may be deleted from all the components shown in the embodiments. Furthermore, components from different embodiments may be combined as appropriate.
 1-1~1-n…カメラ
 10…映像処理装置
 20…プロセッサ
 21…現フレーム抽出部
 30…D姿勢情報生成部
 40…姿勢情報処理部
 41…左右反転判定部
 42…左右反転補正処理部
 50…3D化処理部
 60…ストレージ
 61…2D姿勢情報
 62…3D姿勢情報
 63…プログラム
 70…インタフェース部
 80…バス
 90…メモリ。

 
1-1 to 1-n...Camera 10...Video processing device 20...Processor 21...Current frame extraction section 30...D attitude information generation section 40...Attitude information processing section 41...Left-right reversal determination section 42...Left-right reversal correction processing section 50 ...3D processing unit 60...Storage 61...2D attitude information 62...3D attitude information 63...Program 70...Interface unit 80...Bus 90...Memory.

Claims (6)

  1.  被写体の映像データから前記被写体の2D姿勢情報を生成する2D姿勢情報生成部と、
     前記2D姿勢情報を補正する姿勢情報処理部と、
     前記補正された2D姿勢情報から前記被写体の3D姿勢情報を生成する3D化処理部とを具備し、
      前記姿勢情報処理部は、左右反転判定処理を行う左右反転判定部を備え、
     左右反転判定処理は、前記映像データの前記被写体の画像フレームに対して上半身および下半身についての反転有りまたは反転無しに対応する第1パターン乃至第4パターンを設定し、当該第1パターン乃至第4パターンのうち、前フレームと現フレームの対応する関節位置の距離の分散と平均を変数とする評価式により計算されるスコアが最小となるパターンを選択する処理である、映像処理装置。
    a 2D posture information generation unit that generates 2D posture information of the subject from video data of the subject;
    a posture information processing unit that corrects the 2D posture information;
    a 3D processing unit that generates 3D posture information of the subject from the corrected 2D posture information,
    The posture information processing unit includes a left-right reversal determination unit that performs left-right reversal determination processing,
    In the horizontal reversal determination process, first to fourth patterns are set corresponding to whether or not the upper body and lower body are reversed for the image frame of the subject in the video data, and the first to fourth patterns are set. A video processing device that selects a pattern that minimizes the score calculated by an evaluation formula that uses as variables the variance and average of distances between corresponding joint positions between the previous frame and the current frame.
  2.  前記姿勢情報処理部は、フレーム内での前記左右反転判定処理を行う左右反転補正処理部をさらに備える、請求項1に記載の映像処理装置。 The video processing device according to claim 1, wherein the posture information processing section further includes a left-right reversal correction processing section that performs the left-right reversal determination process within a frame.
  3.  前記姿勢情報処理部は、フレーム内での2段階の前記左右反転判定処理を行う左右反転補正処理部をさらに備える、請求項1に記載の映像処理装置。 The video processing device according to claim 1, wherein the posture information processing section further includes a left-right reversal correction processing section that performs the left-right reversal determination process in two stages within a frame.
  4.  前記姿勢情報処理部は、連続するフレーム間での前記左右反転判定処理を行う左右反転補正処理部をさらに備える、請求項1に記載の映像処理装置。 The video processing device according to claim 1, wherein the posture information processing section further includes a left-right reversal correction processing section that performs the left-right reversal determination process between consecutive frames.
  5.  プロセッサとストレージとを具備する映像処理装置の前記プロセッサにより実行される映像処理方法において、
     前記プロセッサが、被写体の映像データから前記被写体の2D姿勢情報を生成する過程と、
     前記プロセッサが、前記2D姿勢情報を補正する過程と、
     前記プロセッサが、前記補正された2D姿勢情報から前記被写体の3D姿勢情報を生成する過程とを具備し、
      前記2D姿勢情報を補正する過程は、
     前記プロセッサが、前記映像データの前記被写体の画像フレームに対して上半身および下半身についての反転有りまたは反転無しに対応する第1パターン乃至第4パターンを設定する過程と、
     前記プロセッサが、当該第1パターン乃至第4パターンのうち、前フレームと現フレームの対応する関節位置の距離の分散と平均を変数とする評価式によりスコアを計算する過程と、
     前記プロセッサが、前記スコアが最小となるパターンを選択する過程とを含む、映像処理方法。
    In a video processing method executed by the processor of a video processing device comprising a processor and a storage,
    a step in which the processor generates 2D posture information of the subject from video data of the subject;
    the processor correcting the 2D pose information;
    The processor generates 3D posture information of the subject from the corrected 2D posture information,
    The process of correcting the 2D posture information includes:
    a step in which the processor sets first to fourth patterns corresponding to whether or not the upper body and lower body are inverted for the image frame of the subject in the video data;
    a step in which the processor calculates a score using an evaluation formula that uses as variables the variance and average of distances between corresponding joint positions of the previous frame and the current frame among the first to fourth patterns;
    The video processing method includes a step in which the processor selects a pattern that minimizes the score.
  6.  コンピュータを、請求項1乃至4のいずれか1項に記載の映像処理装置の前記各部として機能させる、プログラム。

     
    A program that causes a computer to function as each section of the video processing apparatus according to claim 4.

PCT/JP2022/020848 2022-05-19 2022-05-19 Video processing device, video processing method, and program WO2023223508A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/020848 WO2023223508A1 (en) 2022-05-19 2022-05-19 Video processing device, video processing method, and program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2022/020848 WO2023223508A1 (en) 2022-05-19 2022-05-19 Video processing device, video processing method, and program

Publications (1)

Publication Number Publication Date
WO2023223508A1 true WO2023223508A1 (en) 2023-11-23

Family

ID=88834939

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2022/020848 WO2023223508A1 (en) 2022-05-19 2022-05-19 Video processing device, video processing method, and program

Country Status (1)

Country Link
WO (1) WO2023223508A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021048988A1 (en) * 2019-09-12 2021-03-18 富士通株式会社 Skeleton recognition method, skeleton recognition program, and information processing device
WO2022074886A1 (en) * 2020-10-05 2022-04-14 株式会社島津製作所 Posture detection device, posture detection method, and sleeping posture determination method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021048988A1 (en) * 2019-09-12 2021-03-18 富士通株式会社 Skeleton recognition method, skeleton recognition program, and information processing device
WO2022074886A1 (en) * 2020-10-05 2022-04-14 株式会社島津製作所 Posture detection device, posture detection method, and sleeping posture determination method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MORIMOTO TAKASHI, IKUHISA MIKAMI: "Through spatiotemporal integration of posture data from multiple Kinects motion capture system ", IPSJ SIG TECHNICAL REPORT, vol. 2019-CVIM-2017, no. 25, 30 May 2019 (2019-05-30), XP093109000 *

Similar Documents

Publication Publication Date Title
Zago et al. 3D tracking of human motion using visual skeletonization and stereoscopic vision
US11763603B2 (en) Physical activity quantification and monitoring
US9330470B2 (en) Method and system for modeling subjects from a depth map
KR101616926B1 (en) Image processing apparatus and method
JP4349367B2 (en) Estimation system, estimation method, and estimation program for estimating the position and orientation of an object
JP4148281B2 (en) Motion capture device, motion capture method, and motion capture program
US11398049B2 (en) Object tracking device, object tracking method, and object tracking program
CN108921907B (en) Exercise test scoring method, device, equipment and storage medium
Ye et al. A depth camera motion analysis framework for tele-rehabilitation: Motion capture and person-centric kinematics analysis
JP7367764B2 (en) Skeleton recognition method, skeleton recognition program, and information processing device
US20100208038A1 (en) Method and system for gesture recognition
US11403882B2 (en) Scoring metric for physical activity performance and tracking
JP6584208B2 (en) Information processing apparatus, information processing method, and program
US20200311395A1 (en) Method and apparatus for estimating and correcting human body posture
CN110751100A (en) Auxiliary training method and system for stadium
JP2005339288A (en) Image processor and its method
JP2016170605A (en) Posture estimation device
WO2023223508A1 (en) Video processing device, video processing method, and program
WO2019244536A1 (en) Object tracking device, object tracking system, and object tracking method
CN115841602A (en) Construction method and device of three-dimensional attitude estimation data set based on multiple visual angles
JPH06213632A (en) Image measurement device
Munn et al. FixTag: An algorithm for identifying and tagging fixations to simplify the analysis of data collected by portable eye trackers
JP2005309782A (en) Image processor
Ahmed Unified Skeletal Animation Reconstruction with Multiple Kinects.
WO2023062762A1 (en) Estimation program, estimation method, and information processing device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22942712

Country of ref document: EP

Kind code of ref document: A1