JP2020080096A

JP2020080096A - Object identifying apparatus, identifying system and identifying method

Info

Publication number: JP2020080096A
Application number: JP2018213719A
Authority: JP
Inventors: 建鋒徐; Kenho Jo; 和之田坂; Kazuyuki Tasaka
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2018-11-14
Filing date: 2018-11-14
Publication date: 2020-05-28
Anticipated expiration: 2038-11-14
Also published as: JP7012631B2

Abstract

To provide an object identifying apparatus that accomplishes, in a real-time manner, identifications of a large number of person objects by a small number of GPUs.SOLUTION: In a frame image unifying unit 28, a dividing method deciding unit 281 decides the dividing method of a frame memory M. A resizing unit 282 resizes each camera image obtained from a user terminal 1 in accordance with the size of each partial area of the frame memory M. An image layout unit 283 lays out each resized camera image obtained at a predetermined time cycle in the partial area allocated to each user terminal 1 in sequence, thereby forming a unified frame image I at the predetermined time cycle in sequence. An attitude estimating unit 29 estimates, on the basis of a feature map created from the unified frame image, the positions of the body parts of a person object and the connectivity thereof by a single estimation through a bottom-up approach.SELECTED DRAWING: Figure 5

Description

本発明は、オブジェクト識別装置、識別システムおよび識別方法に係り、特に、人物オブジェクトを撮影した複数のカメラ画像を対象に身体パーツの位置および連結性をボトムアップ的アプローチにより推定し、各オブジェクトを識別するオブジェクト識別装置、識別システムおよび識別方法に関する。 The present invention relates to an object identification device, an identification system, and an identification method, and in particular, the position and connectivity of body parts are estimated by a bottom-up approach for a plurality of camera images of human objects to identify each object. Object identifying apparatus, identifying system, and identifying method.

防犯やマーケティングなどの分野において、カメラ画像に写る多数の人物オブジェクトの行動を識別する各種の技術が提案されている。従来は、カメラ画像から人物オブジェクトを抽出し、各人物オブジェクトのBounding Boxを検出した後に各Boxに対して姿勢推定を行うトップダウン方式を採用していた。しかしながら、トップダウン方式では識別対象の人物オブジェクト数に比例して計算量が増加してしまう。また、人物オブジェクトの抽出に失敗すると、その行動を識別することもできなくなる。 In fields such as crime prevention and marketing, various technologies for identifying the actions of a large number of person objects shown in a camera image have been proposed. Conventionally, a top-down method has been adopted in which a person object is extracted from a camera image, a Bounding Box of each person object is detected, and then posture estimation is performed for each box. However, in the top-down method, the calculation amount increases in proportion to the number of person objects to be identified. Moreover, if the extraction of the person object fails, the action cannot be identified.

このような技術課題に対して、非特許文献１には、Confidence MapとPart Affinity Fields（PAFs）を用いた二つの逐次予測プロセスにより、カメラ画像から抽出した人物オブジェクトの身体パーツの位置および連結性をボトムアップ的アプローチにより推定することで、人物オブジェクト数に関わらずリアルタイムで各オブジェクトの行動を高精度に推定する技術が開示されている。 With respect to such a technical problem, Non-Patent Document 1 describes the position and connectivity of body parts of a human object extracted from a camera image by two sequential prediction processes using Confidence Map and Part Affinity Fields (PAFs). A technique for estimating the behavior of each object with high accuracy in real time regardless of the number of human objects is disclosed by estimating a bottom-up approach.

一方、健康の維持あるいは促進を目的として、自宅やトレーニングジムなどの施設でトレーニングを実施することがある。安全かつ有効なトレーニングには専門家の指導が不可欠となるが、自宅でのトレーニングでは専門家の指導を受けることが難しく、自己流のトレーニングになることになる。 On the other hand, training may be carried out at home or in a facility such as a training gym for the purpose of maintaining or promoting health. Professional training is indispensable for safe and effective training, but it is difficult to receive professional training at home training, and it becomes a self-paced training.

トレーニングジムにはトレーニング機器が揃い、専門の指導員が常駐する。しかしながら、トレーニングジムまで出向かなければならないので手軽さに欠ける。また、マンツーマンで指導を受けるためには相応のコスト負担を強いられ、時間的な拘束も増すことになる。 The training gym is fully equipped with training equipment and professional instructors are stationed there. However, it is not convenient because I have to go to the training gym. In addition, the one-on-one instruction requires a corresponding cost burden, which increases the time constraint.

このような技術課題に対して、特許文献１には、入力画像に映る人物の運動を認識する認識部と、認識された運動の有効性に応じて異なる仮想オブジェクトを入力画像に重畳する表示制御部とを備え、認識部により認識される運動の有効性を示すスコアを算出し、算出結果を入力画像に重畳することで、運動の有効性に関するフィードバックを目に見える形でユーザに呈示する画像処理装置が提案されている。 With respect to such a technical problem, Patent Document 1 discloses a recognition unit that recognizes a motion of a person shown in an input image, and a display control that superimposes a different virtual object on the input image according to the effectiveness of the recognized motion. An image that presents feedback regarding the effectiveness of exercise to the user by calculating a score indicating the effectiveness of exercise recognized by the recognition unit and superimposing the calculation result on the input image. A processing device has been proposed.

非特許文献２には、トレーニングするユーザを撮影した映像をモーションセンサシステムで分析することで骨格情報を取得し、ユーザの筋力トレーニングの動作を検知し、検知したトレーニング動作を速度、角度の観点から支援する筋力トレーニング支援システムが開示されている。 In Non-Patent Document 2, a skeleton information is acquired by analyzing a video image of a training user with a motion sensor system, a user's muscle training motion is detected, and the detected training motion is detected from the viewpoint of speed and angle. A strength training support system for supporting is disclosed.

特開2013-103010号公報JP 2013-103010 A

Z. Cao, T. Simon, S. Wei and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 1302-1310.Z. Cao, T. Simon, S. Wei and Y. Sheikh, "Realtime Multi-person 2D Pose Estimation Using Part Affinity Fields," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp .1302-1310. "Kinectを用いた筋力トレーニング支援システム"，第77回全国大会講演論文集（P437-438，2015-03-17）"Muscular strength training support system using Kinect", Proc. of the 77th National Convention (P437-438, 2015-03-17)

ユーザ端末が送信する各ユーザのカメラ画像をサーバが取得し、適宜の分析アルゴリズムに適用して人物オブジェクトの行動を識別するためには、人物オブジェクトの写ったカメラ画像をGPUのフレームメモリ上に展開し、当該メモリ上で姿勢推定を実行することになる。 In order for the server to acquire the camera image of each user sent from the user terminal and apply it to an appropriate analysis algorithm to identify the behavior of the human object, the camera image showing the human object is expanded on the frame memory of the GPU. Then, the posture estimation is executed on the memory.

非特許文献１のような姿勢推定アルゴリズムでは、人物オブジェクトの写ったカメラ画像がGPUのフレームメモリ上に展開される。このとき、一つのカメラ画像をフレームメモリの全域に展開してしまうと、一つのGPUで一つのカメラ画像内の人物の姿勢推定しか行えない。したがって、複数のユーザ端末から同時期に並行して多数のカメラ画像を受信する場合、リアルタイムでのオブジェクト識別を実現するためにはユーザ端末数分のGPUが必要となる。 In the posture estimation algorithm as in Non-Patent Document 1, a camera image including a human object is developed on the frame memory of the GPU. At this time, if one camera image is expanded in the entire area of the frame memory, one GPU can only estimate the posture of a person in one camera image. Therefore, when receiving a large number of camera images in parallel from a plurality of user terminals at the same time, as many GPUs as the number of user terminals are required to realize object identification in real time.

本発明の目的は、上記の技術課題を解決し、一つのGPUが低い処理負荷で複数のカメラ映像に対して同時並行的に姿勢推定を実現できるようにすることで、少ないGPUで、それぞれ別々の場所に存在する多数の人物オブジェクトの識別をリアルタイムで実現できるオブジェクトの識別装置、識別システムおよび識別方法を提供することにある。 An object of the present invention is to solve the above technical problems and enable one GPU to simultaneously perform posture estimation for a plurality of camera images with a low processing load. It is to provide an object identification device, an identification system, and an identification method that can realize real-time identification of a large number of person objects existing in a place.

上記の目的を達成するために、本発明は、カメラ画像をフレームメモリ上に展開して人物オブジェクトの姿勢推定を実行するオブジェクト識別装置において、以下の構成を具備した点に特徴がある。 In order to achieve the above-mentioned object, the present invention is characterized in that an object identification device that develops a camera image on a frame memory and estimates the posture of a human object has the following configuration.

(1) 複数のユーザ端末が送信したカメラ画像の時系列を取得する手段と、フレームメモリを仮想的に複数の部分領域に分割する手段と、フレームメモリの各部分領域に各カメラ画像を配置することで複数のカメラ画像を１フレームに統合した統合フレーム画像を生成する手段と、前記統合フレーム画像から生成した特徴マップに基づいて、人物オブジェクトの身体パーツの位置および連結性をボトムアップ的アプローチにより一回の推論で推定する姿勢推定手段とを具備した。 (1) A means for acquiring a time series of camera images transmitted by a plurality of user terminals, a means for virtually dividing a frame memory into a plurality of partial areas, and arranging each camera image in each partial area of the frame memory By means of a bottom-up approach, the position and connectivity of the body parts of the person object are determined based on the means for generating an integrated frame image by integrating a plurality of camera images into one frame and the feature map generated from the integrated frame image. A posture estimating means for estimating the posture by one inference is provided.

(2) 姿勢推定手段は、異なる部分領域から抽出された身体パーツの連結性を推定対象外とすることで、部分領域ごとに人物オブジェクトの姿勢を推定するようにした。 (2) The posture estimating means estimates the posture of the human object for each partial region by excluding the connectivity of body parts extracted from different partial regions from the estimation target.

(3) 分割する手段は、姿勢推定手段が出力する推定尤度に基づいて、当該推定尤度が高くなるほど分割する部分領域の数を多くするようにした。 (3) Based on the estimated likelihood output from the posture estimating means, the dividing means is configured to increase the number of partial areas to be divided as the estimated likelihood increases.

(4) ユーザ端末に対して、送信するカメラ画像のサイズ変更を要求する手段を具備した。 (4) A means for requesting the user terminal to change the size of the camera image to be transmitted is provided.

(5) ユーザ端末に対して、カメラ画像の送信周期の変更を要求する手段を具備した。 (5) A means for requesting the user terminal to change the transmission cycle of the camera image is provided.

(6) フレームメモリが非等分割された部分領域を含むようにし、カメラ画像を送信したユーザの属性情報に基づいて、当該カメラ画像を配置する部分領域を決定するようにした。 (6) The frame memory is made to include non-uniformly divided partial areas, and the partial area in which the camera image is arranged is determined based on the attribute information of the user who transmitted the camera image.

(7) 統合フレーム画像を生成する手段は、m個のカメラ画像を対象に、１番目からn（＜m）番目の各カメラ画像をフレームメモリの各部分領域に配置して今回の統合フレーム画像を生成し、次の周期ではn+１番目からm番目の各カメラ画像を前記フレームメモリの各部分領域に配置して統合フレーム画像を生成し、これを後続のm個のカメラ画像ごとに繰り返すようにした。 (7) The integrated frame image is generated by arranging each of the 1st to n(<m)th camera images in each partial area of the frame memory for m camera images. In the next cycle, each of n+1 to mth camera images is arranged in each partial area of the frame memory to generate an integrated frame image, and this is repeated for every m subsequent camera images. I did it.

本発明によれば、以下のような効果が達成される。 According to the present invention, the following effects are achieved.

(1) フレームメモリを仮想的に複数の部分領域に分割し、各部分領域に各ユーザのカメラ画像を配置することで統合フレーム画像を生成するので、少ないGPU数で多数のユーザの姿勢推定を同時に行えるようになる。 (1) Since the frame memory is virtually divided into multiple partial areas and the camera image of each user is placed in each partial area to generate an integrated frame image, pose estimation for many users can be performed with a small number of GPUs. You can do it at the same time.

(2) 異なる部分領域から抽出された身体パーツの連結性を推定対象外とし、部分領域ごとに人物オブジェクトの姿勢を推定するようにしたので、誤推定を抑制しつつ、複数のユーザの姿勢推定をGPUごとに同時かつ低負荷で行えるようになる。 (2) Since the connectivity of body parts extracted from different partial areas is excluded from the estimation target and the pose of the human object is estimated for each partial area, pose estimation for multiple users is performed while suppressing false estimation. Can be performed simultaneously and with low load for each GPU.

(3) 姿勢の推定尤度に基づいて、当該推定尤度が高くなるほど分割する部分領域の数を多くするようにしたので、信頼性の高い姿勢推定に必要な画像解像度を維持しながら、複数のユーザの姿勢推定をGPUごとに同時かつ低負荷で行えるようになる。 (3) Based on the estimated likelihood of the pose, the number of sub-regions to be divided is increased as the estimated likelihood increases. Therefore, while maintaining the image resolution required for highly reliable pose estimation, It will be possible to estimate the user's posture for each GPU simultaneously and with low load.

(4) ユーザ端末に対して、カメラ画像のサイズ変更を要求する手段を設けたので、フレームメモリの分割数が変更されてカメラ画像のサイズを変更する必要が生じたときでも、サイズ変更の処理負荷が各ユーザ端末に分散されるので負荷集中を防止できるのみならず、サイズの縮小を要求すれば、ユーザ端末からサーバへのトラヒック量が減ぜられる。 (4) Since the means for requesting the user terminal to change the size of the camera image is provided, even if the number of divisions of the frame memory is changed and the size of the camera image needs to be changed, the size change processing is performed. Since the load is distributed to each user terminal, not only the load concentration can be prevented, but if the size is requested to be reduced, the traffic volume from the user terminal to the server can be reduced.

(5) ユーザ端末に対して、カメラ画像の送信周期の変更を要求する手段を設けたので、カメラ画像を送信するユーザ数が増えても、カメラ画像のバッファ溢れやリアルタイム処理の破綻を防止できるようになる。 (5) Since the means for requesting the user terminal to change the camera image transmission cycle is provided, it is possible to prevent the camera image buffer overflow and the failure of real-time processing even if the number of users transmitting camera images increases. Like

(6) フレームメモリが非等分割された部分領域を含むようにし、カメラ画像を送信したユーザの属性情報に基づいて、当該カメラ画像を配置する部分領域を決定するようにしたので、ユーザごとに提供するサービスを最適化できるようになる。 (6) The frame memory is configured to include non-uniformly divided partial areas, and the partial area in which the camera image is arranged is determined based on the attribute information of the user who sent the camera image. You will be able to optimize the services you provide.

(7) 統合フレーム画像を生成する手段は、m個のカメラ画像を対象に、１番目からn（＜m）番目の各カメラ画像をフレームメモリの各部分領域に配置して今回の統合フレーム画像を生成し、次の周期ではn+１番目からm番目の各カメラ画像を前記フレームメモリの各部分領域に配置して統合フレーム画像を生成し、これを後続のm個のカメラ画像ごとに繰り返すので、GPUごとに、画像解像度の低下を抑えながら、フレームメモリの分割数を超える個数のカメラ画像を同時に処理できるようになる。 (7) The integrated frame image is generated by arranging each of the 1st to n(<m)th camera images in each partial area of the frame memory for m camera images. In the next cycle, each of n+1 to mth camera images is arranged in each partial area of the frame memory to generate an integrated frame image, and this is repeated for every m subsequent camera images. Therefore, it becomes possible to simultaneously process camera images of a number exceeding the number of divisions of the frame memory while suppressing a decrease in image resolution for each GPU.

本発明を適用したトレーニング支援装置の構成を示したブロック図である。It is a block diagram showing the composition of the training support device to which the present invention is applied. トレーニング支援サーバ(2)およびトレーニングDB(3)の主要部の構成を示した機能ブロック図である。It is a functional block diagram showing the composition of the main part of training support server (2) and training DB (3). トレーニングメニューの一覧を示した図である。It is the figure which showed the list of the training menu. トレーニングメニューの内容を管理するトレーニングテーブルの一例を示した図である。It is a figure showing an example of a training table which manages the contents of a training menu. ユーザ端末(1)，フレーム画像統合部(28)および姿勢推定部(29)の構成を示したブロック図である。FIG. 3 is a block diagram showing the configurations of a user terminal (1), a frame image integration unit (28), and a posture estimation unit (29). カメラ画像のトリミング方法の例を示した図である。It is the figure which showed the example of the trimming method of a camera image. フレームメモリを等分割する例を示した図である。It is the figure which showed the example which equally divides a frame memory. フレームメモリを非等分割する例を示した図である。It is the figure which showed the example which divides|segments a frame memory unequal. 上半身を左右に旋回させるバランス診断メニューを示した図である。It is the figure which showed the balance diagnostic menu which turns the upper body left and right. 左右の踵を一方ずつ順番に上げ下げするバランス診断メニューを示した図である。It is the figure which showed the balance diagnostic menu which raises and lowers the left and right heels one by one. ユーザ端末(1)、トレーニング支援サーバ(2)およびトレーニングDB(3)間での通信および各種処理の手順を時系列で示したシーケンスフローである。3 is a sequence flow showing a procedure of communication and various processes among the user terminal (1), the training support server (2) and the training DB (3) in time series. トレーニング目的をユーザに選択される画面の例を示した図である。It is the figure which showed the example of the screen which a user selects a training objective. ユーザ端末の操作手順およびユーザ端末とユーザとの相対位置を調整させる画面の例を示した図である。It is the figure which showed the example of the operation procedure of a user terminal, and the screen which adjusts the relative position of a user terminal and a user. 体幹バランス診断用のガイダンスの表示例を示した図である。It is a figure showing a display example of guidance for trunk balance diagnosis. 診断結果の表示例を示した図である。It is the figure which showed the example of a display of a diagnostic result. トレーニングメニューの開始画面の例を示した図である。It is the figure which showed the example of the start screen of a training menu. 今回のトレーニング内容の一覧表示の例を示した図である。It is the figure which showed the example of the list display of this training content. トレーニングごとに内容や注意点の表示例を示した図である。It is the figure which showed the example of a display of the content and cautions for every training. トータルスコア画面の例を示した図である。It is the figure which showed the example of the total score screen. トータルスコアの履歴を時系列で一覧表示する例を示した図である。It is the figure which showed the example which displays the history of the total score in time series. フレーム画像統合部(28)による他の統合方法を示した図である。It is a figure showing other integration methods by a frame image integration part (28).

以下、図面を参照して本発明の実施の形態について詳細に説明する。図１は、本発明のオブジェクト識別装置を適用したトレーニング支援装置の主要部の構成を示したブロック図であり、ユーザ端末１、トレーニング支援サーバ２およびトレーニングDB３を主要な構成としている。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. FIG. 1 is a block diagram showing a configuration of a main part of a training support device to which the object identification device of the present invention is applied, and a user terminal 1, a training support server 2, and a training DB 3 are main components.

ユーザ端末１は、動画撮影機能、無線通信機能およびディスプレイを備え、例えばスマートフォンで代替できる。トレーニング支援サーバ２は、ユーザ端末１からトレーニング中のユーザを撮影したトレーニング動画を、Wi-Fi、基地局BSおよびネットワークNW経由で取得し、その映像を分析して適正なトレーニング評価をユーザ端末１へ提供する。トレーニングDB３には、各ユーザの属性情報、多数のトレーニングメニューや診断メニューが記憶されている。 The user terminal 1 includes a moving image capturing function, a wireless communication function, and a display, and can be replaced with, for example, a smartphone. The training support server 2 acquires a training video of a user who is training from the user terminal 1 via Wi-Fi, the base station BS and the network NW, analyzes the video, and obtains an appropriate training evaluation. To provide. The training DB 3 stores attribute information of each user and a large number of training menus and diagnosis menus.

図２は、前記トレーニング支援サーバ２およびトレーニングDB３の主要部の構成を示した機能ブロック図であり、トレーニングDB３は、属性情報記憶部３１、トレーニングレベル記憶部３２、診断メニュー記憶部３３およびトレーニングメニュー記憶部３４を含む。 FIG. 2 is a functional block diagram showing a configuration of main parts of the training support server 2 and the training DB 3, and the training DB 3 includes an attribute information storage unit 31, a training level storage unit 32, a diagnostic menu storage unit 33, and a training menu. The storage unit 34 is included.

前記属性情報記憶部３１には、ユーザIDごとに当該ユーザの属性情報として、年齢、性別、身長、体重、血圧、既往症などが記憶されている。トレーニングレベル記憶部３２には、ユーザIDごとに現在のトレーニングレベルが記憶されている。診断メニュー記憶部３３には、複数の診断メニューが記憶されている。 The attribute information storage unit 31 stores, for each user ID, age, sex, height, weight, blood pressure, anamnessis, etc. as attribute information of the user. The training level storage unit 32 stores the current training level for each user ID. The diagnostic menu storage unit 33 stores a plurality of diagnostic menus.

トレーニングメニュー記憶部３４には、複数のトレーニングメニューが記憶されている。図３は、トレーニングメニュー記憶部３４に記憶されているトレーニングメニューの一覧を示した図であり、本実施形態では、「スクワット」、「アームツイスト」、「ウッドチョッパー」など、３２種類のトレーニングメニューが用意されている。 The training menu storage unit 34 stores a plurality of training menus. FIG. 3 is a diagram showing a list of training menus stored in the training menu storage unit 34. In the present embodiment, 32 types of training menus such as “squat”, “arm twist”, and “wood chopper” are shown. Is prepared.

図４は、前記トレーニングメニューの内容を管理するトレーニングテーブルの一例を示した図であり、トレーニングメニューごとに、その効能（主に鍛える部位）、基準テンポ、左右の有無、音声ガイド識別子、指導ポイントおよびNG閾値等が登録されている。NG閾値とは、有効なトレーニングと認められない閾値条件であり、例えばトレーニングメニューが「スクワット」であれば、下脚の角度が４０°以上であると有効なトレーニングと認められない。 FIG. 4 is a diagram showing an example of a training table that manages the contents of the training menu, and for each training menu, its effect (mainly training area), reference tempo, left/right presence/absence, voice guide identifier, instruction point And the NG threshold and the like are registered. The NG threshold is a threshold condition that is not recognized as effective training. For example, if the training menu is “squat”, it is not recognized as effective training when the angle of the lower leg is 40° or more.

また、本実施形態では、ユーザのトレーニングレベルや属性情報に応じて最適なトレーニングメニューが選択されるように、多数のトレーニングメニューが複数のグループに分類され、また忌避情報として、例えば高血圧、心臓疾患、腰痛等の既往症ごとに不向きなトレーニングメニューも登録されている。 Further, in the present embodiment, a large number of training menus are classified into a plurality of groups so that an optimal training menu is selected according to the user's training level and attribute information, and as repulsion information, for example, hypertension and heart disease. A training menu that is unsuitable for each medical condition such as low back pain is also registered.

図２へ戻り、前記トレーニング支援サーバ２において、入出力インタフェース２１は、各ユーザ端末１との間に通信セッションを確立し、音声、映像またはテキスト等を利用したメッセージの送受信を行う。ユーザ認証部２２は、ユーザ端末１から送信されるユーザIDとトレーニングDB３に登録されているユーザIDとを突き合わせることでユーザ認証を実施する。トレーニング目的受付部２３は、ユーザがユーザ端末１から指定したトレーニング目的を受け付ける。 Returning to FIG. 2, in the training support server 2, the input/output interface 21 establishes a communication session with each user terminal 1 and transmits/receives a message using voice, video or text. The user authentication unit 22 performs user authentication by matching the user ID transmitted from the user terminal 1 with the user ID registered in the training DB 3. The training purpose receiving unit 23 receives the training purpose designated by the user from the user terminal 1.

フレーム画像統合部２８および姿勢推定部２９は、図５に示したように、各ユーザ端末１と協働して、複数のユーザ端末１から同時期に送信された複数のカメラ画像から、トレーニングするユーザの骨格情報を一回の推論で取得して各ユーザの姿勢推定を実行する。 As shown in FIG. 5, the frame image integration unit 28 and the posture estimation unit 29 cooperate with each user terminal 1 to perform training from a plurality of camera images transmitted from a plurality of user terminals 1 at the same time. The skeleton information of each user is acquired by one inference, and the pose of each user is estimated.

ユーザ端末１において、カメラ部１０１は、ユーザを動画撮影してカメラ映像を出力する。キャプチャ部１０２は、カメラ映像を所定のサンプリング周期でキャプチャすることで静止画の時系列に変換する。前記キャプチャ部１０２によるサンプリング周期は、後に詳述するように、トレーニング支援サーバ２から姿勢推定の尤度に応じてフィードバックされる変更要求に応答して変更され得る。 In the user terminal 1, the camera unit 101 shoots a moving image of the user and outputs a camera image. The capture unit 102 captures a camera image at a predetermined sampling period to convert the still image into a time series of still images. The sampling period by the capture unit 102 can be changed in response to a change request fed back from the training support server 2 according to the likelihood of posture estimation, as described in detail later.

トリミング部１０３は、図６に示したように、縦長または横長のキャプチャ画像からユーザの映っている中央部を矩形にトリミングする。なお、このようなトリミング処理は、前記カメラ部１０１で実施するようにしてもよい。画像縮小部１０４は、フレーム画像統合部２８からフィードバックされるサイズ変更要求、あるいはユーザ端末１のネットワーク環境に応じて、前記矩形にトリミングされたカメラ画像の縮小率を変更する。 As shown in FIG. 6, the trimming unit 103 trims a vertically or horizontally long captured image into a rectangular shape at the center of the image of the user. Note that such trimming processing may be performed by the camera unit 101. The image reduction unit 104 changes the reduction ratio of the camera image trimmed into the rectangle according to the size change request fed back from the frame image integration unit 28 or the network environment of the user terminal 1.

本実施形態では、ユーザ端末１がWiFi経由でネットワークに接続する環境下であれば、例えばサイズが160×160、品質が0.65のJPEGフォーマットで圧縮されるのに対して、有線でネットワークに接続する環境下であれば、サイズが320×320、品質が0.65のJPEGフォーマットで圧縮される。また、4G回線でネットワークに接続する環境下であれば、例えばサイズが160×160、品質が0.45のJPEGフォーマットで圧縮される。 In the present embodiment, if the user terminal 1 is in an environment in which it is connected to the network via WiFi, the user terminal 1 is compressed in the JPEG format having a size of 160×160 and a quality of 0.65, while it is connected to the network by wire. Under the environment, it is compressed in the JPEG format with a size of 320 x 320 and a quality of 0.65. Also, in an environment in which a network is connected via a 4G line, for example, it is compressed in a JPEG format with a size of 160×160 and a quality of 0.45.

送信部１０５は、縮小された矩形のカメラ画像の時系列（A1，A2，A3…）、（B1，B2，B3…）、（C1，C2，C3…）、（D1，D2，D3…）をWeb socketでトレーニング支援サーバ２へ順次に送信する。前記送信部１０５によるカメラ画像の送信周期は前記キャプチャ部１０２によるサンプリング周期に依存する。 The transmission unit 105 uses the time series of the reduced rectangular camera images (A1, A2, A3...), (B1, B2, B3...), (C1, C2, C3...), (D1, D2, D3...). Are sequentially transmitted to the training support server 2 via Web socket. The transmission cycle of the camera image by the transmission unit 105 depends on the sampling cycle by the capture unit 102.

フレーム画像統合部２８において、分割方法決定部２８１は、同時期にカメラ画像を送信するユーザ端末数、後述する姿勢推定において得られる推定尤度や認識率に基づいてフレームメモリMの分割方法を決定する。本実施形態では、多数のユーザ端末１からカメラ画像が送信されており、姿勢推定の尤度が十分であれば、例えば図７(a)，(b)に示したように、フレームメモリMを仮想的に４等分割または９等分割し、各ユーザ端末１にいずれかの部分領域を割り当てる。 In the frame image integration unit 28, the division method determination unit 281 determines the division method of the frame memory M based on the number of user terminals transmitting camera images at the same time and the estimated likelihood and recognition rate obtained in posture estimation described later. To do. In the present embodiment, if camera images are transmitted from a large number of user terminals 1 and the likelihood of posture estimation is sufficient, for example, as shown in FIGS. It is virtually divided into 4 equal parts or 9 equal parts, and any partial area is allocated to each user terminal 1.

リサイズ部２８２は、取得したJPEGフォーマットの各カメラ画像を復号し、更にフレームメモリMの各部分領域のサイズに応じてリサイズする。画像配置部２８３は、リサイズされた各カメラ画像（a1，a2，a3…）、（b1，b2，b3…）、（c1，c2，c3…）、（d1，d2，d3…）を、各ユーザ端末１に割り当てられた部分領域に順次に配置することで一の統合フレーム画像Iを所定の周期で順次に構築する。 The resizing unit 282 decodes each camera image of the acquired JPEG format, and further resizes the camera image according to the size of each partial area of the frame memory M. The image arrangement unit 283 provides the resized camera images (a1, a2, a3...), (b1, b2, b3...), (c1, c2, c3...), (d1, d2, d3...) By sequentially arranging them in the partial areas assigned to the user terminal 1, one integrated frame image I is sequentially constructed in a predetermined cycle.

姿勢推定部２９は、前記非特許文献１と同様に、前記一の統合フレーム画像Iに対して初めに特徴マップ抽出を行う。次いで、抽出した特徴マップに対して、身体パーツの位置をエンコードするConfidence Mapおよび身体パーツ間の連結性をエンコードするPart Affinity Fields（PAFs）を用いた二つの逐次予測プロセスを順次に適用し、統合フレーム画像から抽出した人物オブジェクト（ユーザ）の身体パーツの位置および連結性をボトムアップ的アプローチにより一回の推論で推定することでスケルトンモデルを構築する。 The posture estimation unit 29 first performs feature map extraction on the one integrated frame image I, as in Non-Patent Document 1. Next, the extracted feature map is sequentially applied with two sequential prediction processes using Confidence Map that encodes the positions of the body parts and Part Affinity Fields (PAFs) that encodes the connectivity between the body parts, and integrated. A skeleton model is constructed by estimating the position and connectivity of body parts of human objects (users) extracted from frame images by a single inference using a bottom-up approach.

このとき、異なる部分領域から抽出された身体パーツの連結性を推定対象外とする処理を実装することで、身体パーツの位置および連結性を部分領域ごとに、すなわちユーザごとにオブジェクトのスケルトンモデルを推定できるようになる。 At this time, by implementing a process that excludes the connectivity of body parts extracted from different partial regions as an estimation target, the position and connectivity of the body parts can be calculated for each partial region, that is, for each user to obtain the skeleton model of the object. Be able to estimate.

前記推定プロセスで得られる推定尤度は、フレーム画像統合部２８に前記フレームメモリMの分割サイズ変更要求としてフィードバックされる。前記分割方法決定部２８１は、推定尤度が十分に高く、かつカメラ画像を送信するユーザ端末数が所定の閾値を上回っていると、例えば図７(b)に示したように、フレームメモリMを仮想的に９等分割するなどして一フレームに統合するカメラ画像数を増加させる。前記リサイズ部２８２は、変更後の等分割数に応じてカメラ画像をリサイズする。 The estimated likelihood obtained in the estimation process is fed back to the frame image integration unit 28 as a division size change request for the frame memory M. If the estimation likelihood is sufficiently high and the number of user terminals transmitting camera images exceeds a predetermined threshold, the division method determination unit 281 determines, for example, as shown in FIG. Is virtually divided into 9 and the number of camera images to be integrated into one frame is increased. The resizing unit 282 resizes the camera image according to the changed equal division number.

これに対して、推定尤度が低下して所定の下限値を下回ると、例えば９等分割中であれば４等分割に、４等分割中であれば分割無しとするといったように、１フレームに統合するカメラ画像数を減じ、高い解像度を維持することで十分な推定尤度を確保する。 On the other hand, when the estimated likelihood decreases and falls below a predetermined lower limit value, for example, if 9 equal divisions are being performed, 4 divisions are performed, and if 4 divisions are being performed, no division is performed. A sufficient estimation likelihood is secured by reducing the number of camera images to be integrated in and maintaining high resolution.

前記姿勢推定部２９は、ユーザ端末ごとに抽出された身体パーツの位置および連結性（以下、骨格情報で総称する場合もある）を含む姿勢推定の結果を、後段のバランス診断部２４またはトレーニング評価部２６へ順次に転送する。 The posture estimation unit 29 uses the result of posture estimation including the position and connectivity of body parts extracted for each user terminal (hereinafter, sometimes collectively referred to as skeleton information) to the balance diagnosis unit 24 in the subsequent stage or the training evaluation. The data is sequentially transferred to the unit 26.

このように、本実施形態では姿勢の推定尤度に基づいて、１フレームに統合するカメラ画像数を適応的に変更できるので、推定精度を低下させることなく複数ユーザの行動認識を一つのGPUで同時に行えるようになる。 As described above, in the present embodiment, the number of camera images to be integrated into one frame can be adaptively changed based on the estimated likelihood of the posture, so that the behavior recognition of multiple users can be performed by one GPU without lowering the estimation accuracy. You can do it at the same time.

なお、上記の実施形態ではGPUのフレームメモリMを等分割して各部分領域に各カメラ画像を同サイズで配置するものとして説明したが、本発明はこれのみに限定されるものではなく、図８に示したように、例えば契約しているサービスの内容や料金に応じてユーザごとに非等分割することで各部分領域の大きさを異ならせてもよい。 In the above embodiment, the frame memory M of the GPU is equally divided and each camera image is arranged in each partial area with the same size, but the present invention is not limited to this. As shown in FIG. 8, the size of each partial area may be different by, for example, unequal division for each user according to the content and fee of the contracted service.

このようにすれば、例えば専門的な診断サービスや料金のより高いサービスを契約しているユーザには優先ユーザとしてより大きな部分領域を割り当てることができる。したがって、例えば指先の動きや顔の表情といったように、高解像のカメラ画像でなければ推定できない骨格情報を用いた高度な診断が可能になる。 By doing so, a larger partial area can be assigned as a priority user to, for example, a user who subscribes to a specialized diagnostic service or a service with a higher fee. Therefore, it is possible to perform advanced diagnosis using skeleton information that cannot be estimated unless it is a high-resolution camera image, such as fingertip movements and facial expressions.

なお、各ユーザが優先ユーザであるか否かの情報は、予めユーザIDと対応付けて当該ユーザの属性情報の一つとして前記属性情報記憶部３１に登録しておき、ユーザ認証時にユーザIDに基づいて判別するようにしてもよい。 Information regarding whether each user is a priority user is registered in advance in the attribute information storage unit 31 as one of the attribute information of the user in association with the user ID, and the user ID is stored in the user ID at the time of user authentication. You may make it based on this.

さらに、上記の実施形態では前記姿勢推定部２９における推定尤度が低下するとフレームメモリMの分割数を減じるものとして説明したが、フレームメモリMの分割数を減じただけでは各ユーザ端末から取得したカメラ画像の処理が滞ってしまい、トレーニング支援サーバ側にバッファ溢れ等の不都合が生じうる。 Furthermore, in the above-described embodiment, it is described that the number of divisions of the frame memory M is reduced when the estimated likelihood in the posture estimation unit 29 is reduced, but the number of divisions of the frame memory M is only reduced to obtain from each user terminal. Processing of camera images may be delayed, and inconvenience such as buffer overflow may occur on the training support server side.

そこで、本実施形態ではトレーニング支援サーバ２における統合フレーム画像Iの処理レートが各ユーザ端末1によるカメラ画像の送信レートを下回るようになると、トレーニング支援サーバ２から各ユーザ端末1へカメラ画像の送信周期を低下させる変更要求が送信される。 Therefore, in this embodiment, when the processing rate of the integrated frame image I in the training support server 2 becomes lower than the transmission rate of the camera image by each user terminal 1, the transmission cycle of the camera image from the training support server 2 to each user terminal 1. Change request is sent to reduce the.

各ユーザ端末１では、前記キャプチャ部１０２が前記変更要求に応答して、ユーザを撮影した動画像から静止画をキャプチャする際のサンプリング周期を長くし、これと同期するようにカメラ画像の送信周期を低下させる。その結果、カメラ画像を送信するユーザ数が増えても、カメラ画像のバッファ溢れやリアルタイム処理の破綻を防止できるようになる。 In each user terminal 1, in response to the change request, the capture unit 102 lengthens a sampling period when capturing a still image from a moving image of the user, and a transmission period of a camera image so as to synchronize with the sampling period. Lower. As a result, even if the number of users transmitting camera images increases, it is possible to prevent the buffer overflow of camera images and the failure of real-time processing.

図２へ戻り、バランス診断部２４は、診断メニュー選択部２４１、診断メニュー配信部２４２および第１行動認識部２４３を含み、バランス診断運動を実施するユーザの骨格情報から推定した姿勢に基づいて体幹の左右バランスを診断する。 Returning to FIG. 2, the balance diagnosis unit 24 includes a diagnosis menu selection unit 241, a diagnosis menu distribution unit 242, and a first action recognition unit 243, and a body based on the posture estimated from the skeleton information of the user who performs the balance diagnosis exercise. Diagnose the left-right balance of the trunk.

前記診断メニュー選択部２４１は、ユーザの属性情報、トレーニング目的およびトレーニングレベルに基づいて、ユーザの体幹バランスを診断するための診断メニューを前記診断メニュー記憶部３３から選択する。診断メニュー配信部２４２は、前記選択された診断メニューをユーザ端末１へ配信し、そのディスプレイに表示させる。 The diagnosis menu selection unit 241 selects a diagnosis menu for diagnosing the trunk balance of the user from the diagnosis menu storage unit 33 based on the attribute information of the user, the training purpose, and the training level. The diagnostic menu delivery unit 242 delivers the selected diagnostic menu to the user terminal 1 and displays it on its display.

第１行動認識部２４３は、診断メニューを実施するユーザの診断映像から抽出した骨格情報に基づいてユーザの行動認識を実行する。骨格情報に基づく行動認識手法は、例えば前記非特許文献１に開示されている。バランス診断部２４は、行動認識の結果に基づいて当該ユーザの体幹バランスを診断できる。 The first action recognition unit 243 executes the action recognition of the user based on the skeletal information extracted from the diagnostic image of the user who executes the diagnostic menu. A behavior recognition method based on skeletal information is disclosed in, for example, Non-Patent Document 1 above. The balance diagnosis unit 24 can diagnose the trunk balance of the user based on the result of the action recognition.

図９，１０は、ユーザの体幹バランスを評価するために、前記診断メニュー配信部２４２により配信される診断メニューの一例を示した図であり、図９のように、上半身を左右に旋回させる診断メニュー、あるいは図１０に示したように、左右の踵を一方ずつ順番に上げ下げする診断メニューが各ユーザに配信される。 9 and 10 are diagrams showing an example of a diagnostic menu distributed by the diagnostic menu distribution unit 242 in order to evaluate the trunk balance of the user. As shown in FIG. 9, the upper body is turned left and right. A diagnostic menu, or as shown in FIG. 10, a diagnostic menu for raising and lowering the left and right heels one by one is delivered to each user.

前記第１行動認識部２４３は、図９の診断メニューを実施するユーザの診断映像から推定したユーザの姿勢およびその変化に基づいて、左旋回の角度と右旋回の角度とを比較し、その差分に基づいて体幹バランスを診断できる。図１０の診断メニューを実施するユーザの診断映像を取得すると、左踵を上げた時と右踵を上げた時との腰位置の高さを比較することで体幹バランスを診断できる。 The first action recognition unit 243 compares the left turning angle and the right turning angle based on the user's posture and its change estimated from the diagnostic image of the user who executes the diagnostic menu of FIG. The trunk balance can be diagnosed based on the difference. When the diagnostic image of the user who executes the diagnostic menu of FIG. 10 is acquired, the trunk balance can be diagnosed by comparing the height of the waist position when the left heel is raised and when the right heel is raised.

体調診断部２５は、バランス診断メニューを実施するユーザの映像に基づいて当該ユーザの現在の体調を診断する。このような体調診断は、例えばバランス診断メニューを実施するユーザを撮影した診断映像に基づいて、当該ユーザの動きのテンポを計測することで実現できる。 The physical condition diagnosis unit 25 diagnoses the current physical condition of the user based on the image of the user who executes the balance diagnosis menu. Such a physical condition diagnosis can be realized, for example, by measuring the tempo of the user's movement based on a diagnostic image obtained by photographing the user who executes the balance diagnosis menu.

トレーニング評価部２６は、トレーニングメニュー選択部２６１，トレーニングメニュー配信部２６２、第２行動認識部２６３、改善ポイント指導部２６４、トレーニング質判定部２６５、トレーニング量カウント部２６６、スコア計算部２６７および評価結果配信部２６８を含み、トレーニングを実施するユーザのトレーニング映像から推定した姿勢を分析し、特定部位の遷移を観測することでトレーニングを評価する。 The training evaluation unit 26 includes a training menu selection unit 261, a training menu distribution unit 262, a second action recognition unit 263, an improvement point guidance unit 264, a training quality determination unit 265, a training amount counting unit 266, a score calculation unit 267, and an evaluation result. The training is evaluated by including the distribution unit 268, analyzing the posture estimated from the training video of the user who performs the training, and observing the transition of the specific part.

前記トレーニングメニュー選択部２６１は、ユーザのトレーニング目的、体幹バランスの診断結果およびトレーニングレベル、更には必要に応じて当該ユーザの属性情報や体調に基づいて、現時点で当該ユーザに最適なトレーニングメニューを前記トレーニングDB３から選択する。 The training menu selection unit 261 selects a training menu most suitable for the user at present based on the training purpose of the user, the diagnosis result of the trunk balance and the training level, and further, if necessary, the attribute information and the physical condition of the user. Select from the training DB3.

トレーニングメッセージ配信部２６２は、前記選択されたトレーニングメニューをユーザ端末１へ配信する。第２行動認識部２６３は、配信したトレーニングメニューを実施するユーザのトレーニング映像から推定した姿勢の変化に基づいて当該ユーザの行動認識を行う。 The training message delivery unit 262 delivers the selected training menu to the user terminal 1. The second behavior recognition unit 263 recognizes the behavior of the user based on the change in the posture estimated from the training image of the user who executes the delivered training menu.

改善ポイント指導部２６４は、前記ユーザの行動認識の結果に基づいて、前記図４に示した各「指導ポイント」が、対応する各「NG閾値」をクリアしているか否かを判断し、NG閾値をクリアできていなければ当該指導ポイントを改善ポイントとする指導をユーザに対して行う。 The improvement point instructing unit 264 determines whether or not each “instruction point” shown in FIG. 4 has cleared the corresponding “NG threshold” based on the result of the action recognition of the user, and NG If the threshold value has not been cleared, the user is instructed to use the instruction point as an improvement point.

本実施形態では、例えばトレーニングメニュー「スクワット」に関して、「下脚の角度」が指導ポイントの一つとして登録されており、そのNG閾値が「４０°以上」とされている。前記改善ポイント指導部２６４は、前記行動認識の結果に基づいて下脚の角度を判別し、これが４０°以上であれば、音声番号「２０１」に登録された指導メッセージをトレーニングDB３から選択してユーザ端末１へ配信する。当該指導メッセージには、たとえば「下脚をもっと曲げてください」といった内容が登録されている。なお、指導メッセージは音声メッセージに限定されず、音声付きの映像メッセージであっても良い。 In the present embodiment, for example, with respect to the training menu “squat”, “lower leg angle” is registered as one of the teaching points, and the NG threshold is set to “40° or more”. The improvement point instruction unit 264 determines the angle of the lower leg based on the result of the action recognition, and if it is 40° or more, selects the instruction message registered in the voice number “201” from the training DB 3 to select the user. Deliver to terminal 1. In the guidance message, for example, the content such as "Please bend the lower leg further" is registered. The instruction message is not limited to a voice message, and may be a video message with voice.

トレーニング質判定部２６５は、前記改善ポイント指導部２６４が指導した改善ポイントの指導回数やトレーニングのテンポなどに基づいてトレーニングの質を判断する。本実施形態では、改善ポイントの指導回数が多いほど、またトレーニングのテンポとして、例えば所定の動き（例えば、屈伸）に要する時間を計測し、当該テンポが標準テンポから外れるほど、トレーニングの質が低いと判定され、判定結果が例えば０．５（質が低い）〜１（質が高い）の値に定量化、正規化される。 The training quality determination unit 265 determines the quality of training based on the number of times of improvement point instruction given by the improvement point instruction unit 264 and the training tempo. In the present embodiment, as the number of times of improvement point teaching is increased, and as the training tempo, for example, the time required for a predetermined movement (for example, bending and stretching) is measured, the lower the tempo deviates from the standard tempo, the lower the quality of training. Is determined and the determination result is quantified and normalized to a value of 0.5 (low quality) to 1 (high quality), for example.

トレーニング量カウント部２６６は、前記行動認識の結果を予め用意されている評価ポリシーに適用することでユーザのトレーニング量をカウントする。本実施形態では、配信したトレーニングメニューごとに、骨格の所定部位が所定の順序で所定の遷移条件を充足したことがカウント条件として規定され、トレーニング映像から取得した骨格情報に基づいて、前記カウント条件が充足されるごとにトレーニング量がカウントされる。 The training amount counting unit 266 counts the training amount of the user by applying the result of the action recognition to an evaluation policy prepared in advance. In the present embodiment, for each delivered training menu, it is defined as a count condition that a predetermined part of the skeleton satisfies a predetermined transition condition in a predetermined order, and the count condition is determined based on the skeleton information acquired from the training video. The training amount is counted each time is satisfied.

例えば、提供したトレーニングメニューがスクワットであれば、屈伸した際の膝の角度θ、１回の屈伸に要する時間（テンポ）t、屈伸時における頭と肘との側面視での相対位置Pなどが評価項目として登録されており、前記角度θ，時間tおよび相対位置Pが所定の条件を満足した回数がカウントされる。 For example, if the provided training menu is squat, the angle θ of the knee when bending and stretching, the time (tempo) t required for bending and stretching once, the relative position P of the head and elbow when bending and stretching in side view, etc. It is registered as an evaluation item, and the number of times that the angle θ, time t, and relative position P satisfy predetermined conditions is counted.

スコア計算部２６７は、前記トレーニング量のカウント値に前記トレーニングの質に関する判断結果を反映して今回のトレーニングをスコア化する。本実施形態では、トレーニングの質に応じてトレーニング量のカウント値を減じてスコア化するものとし、例えば、トレーニング量のカウント値が「１００」であり、トレーニングの質が「０．８」であれば、スコアは１００×０．８=８０として計算される。以上のようにしてスコア化された評価結果は、前記評価結果配信部２６８によりユーザ端末１へ配信される。 The score calculation unit 267 reflects the determination result regarding the quality of the training in the count value of the training amount and scores this training. In the present embodiment, it is assumed that the count value of the training amount is reduced and scored according to the quality of the training. For example, if the count value of the training amount is “100” and the quality of the training is “0.8”. For example, the score is calculated as 100×0.8=80. The evaluation result scored as described above is distributed to the user terminal 1 by the evaluation result distribution unit 268.

トレーニング情報更新部２７は、前記ユーザIDと対応付けて前記トレーニングレベル記憶部３２に記憶されている当該ユーザのトレーニングレベルを今回の評価結果に基づいて更新する。トレーニングレベルは、初めはレベル１からスタートし、トレーニング量の累計が所定の基準値を超えるごとにレベル２、レベル３…レベルN（上級）へと更新される。 The training information update unit 27 updates the training level of the user stored in the training level storage unit 32 in association with the user ID, based on the evaluation result of this time. The training level starts from level 1, and is updated to level 2, level 3... Level N (advanced) every time the cumulative amount of training exceeds a predetermined reference value.

図１１は、前記ユーザ端末１、トレーニング支援サーバ２およびトレーニングDB３間での通信および各種処理の手順を時系列で示したシーケンスフローである。 FIG. 11 is a sequence flow showing the procedures of communication and various processes among the user terminal 1, the training support server 2, and the training DB 3 in time series.

時刻t1において、ユーザがユーザ端末１に予め実装されているトレーニング支援アプリケーションを起動すると、時刻t2では、ユーザ認証が実施されてユーザ端末１とトレーニング支援サーバ２との間に通信セッションが確立される。 At time t1, when the user starts the training support application installed in advance in the user terminal 1, user authentication is performed at time t2 and a communication session is established between the user terminal 1 and the training support server 2. ..

時刻t3では、図１２(a)に示したように、ユーザ端末１のディスプレイ上に、今回のトレーニング目的をユーザに入力させるための選択画面が表示される。ユーザが「ダイエット」を選択すると、同図(b)に示したように、ダイエットを希望する具体的な部位を選択させる画面が表示される。本実施形態では、「全身」，「体幹・お腹周り」，「上半身・二の腕」，「下半身・ピップアップ」のいずれかを選択できる。 At time t3, as shown in FIG. 12A, a selection screen for allowing the user to input the current training purpose is displayed on the display of the user terminal 1. When the user selects "diet", a screen for selecting a specific portion for which a diet is desired is displayed as shown in FIG. In the present embodiment, any one of “whole body”, “trunk/abdomen”, “upper body/second arm”, and “lower body/pip-up” can be selected.

一方、前記選択画面で「コンディショニング」を選択すると、同図(c)に示したように、気になる症状を選択させる画面が表示される。本実施形態では、「肩こり改善」，「腰痛改善」，「柔軟性向上」のいずれかを選択できる。 On the other hand, when "conditioning" is selected on the selection screen, a screen for selecting a symptom of concern is displayed as shown in FIG. In the present embodiment, any one of “stiff shoulder improvement”, “back pain improvement”, and “flexibility improvement” can be selected.

時刻t4では、前記選択されたトレーニング目的およびその「部位」または「症状」がユーザ端末１からトレーニング支援サーバ２へ送信される。時刻t5では、トレーニング支援サーバ２が、前記ユーザIDに基づいてトレーニングDB３の属性情報記憶部３１およびトレーニングレベル記憶部３２を参照し、当該ユーザの属性情報およびトレーニングレベルを抽出する。そして、当該抽出した属性情報およびトレーニングレベル、ならびに前記送信されたトレーニング目的に基づいて、前記図９または図１０に示したバランス診断メニューを選択する。 At time t4, the selected training purpose and its “site” or “symptom” are transmitted from the user terminal 1 to the training support server 2. At time t5, the training support server 2 refers to the attribute information storage unit 31 and the training level storage unit 32 of the training DB 3 based on the user ID, and extracts the attribute information and training level of the user. Then, based on the extracted attribute information and training level, and the transmitted training purpose, the balance diagnosis menu shown in FIG. 9 or 10 is selected.

時刻t6では、前記選択されたバランス診断メニューがユーザ端末１へ送信される。時刻t7では、ユーザ端末１の内蔵カメラが動画撮影モードで起動される。時刻t8では、図１３に示したように、ユーザ端末１の操作手順およびユーザ端末１とユーザとの相対位置を調整するための画面が、ユーザ端末１のディスプレイにスクロール形式で表示される。 At time t6, the selected balance diagnosis menu is transmitted to the user terminal 1. At time t7, the built-in camera of the user terminal 1 is activated in the moving image shooting mode. At time t8, as shown in FIG. 13, a screen for adjusting the operating procedure of the user terminal 1 and the relative position between the user terminal 1 and the user is displayed in a scroll format on the display of the user terminal 1.

ここで、ユーザが「測定開始」ボタンをタップ等すると、時刻t9では、図１４(a)に示したように、ユーザ端末１のディスプレイに、体幹バランス診断用のガイダンスが表示される。ここで、ユーザがメッセージ「画面の人型ガイドに合わせて、カメラの正面に立ってください」に応じて、ディスプレイ上での自身の表示位置をガイドに合わせると、同図(b)に示したように、前記バランス診断メニューが開始され、映像およびメッセージにしたがって体を動かすことが要求される。ユーザが前記バランス診断メニューに基づいて体を動かすと、時刻t10以降、その映像がユーザ端末１からトレーニング支援サーバ２へ診断映像として送信される。 Here, when the user taps the “start measurement” button or the like, at time t9, the guidance for trunk balance diagnosis is displayed on the display of the user terminal 1 as shown in FIG. Here, when the user adjusts the display position of himself/herself on the display to the guide in response to the message “Please match the humanoid guide on the screen and stand in front of the camera”, the result shown in (b) in the figure is displayed. Thus, the balance diagnosis menu is started, and it is required to move the body according to the image and the message. When the user moves the body based on the balance diagnosis menu, the image is transmitted from the user terminal 1 to the training support server 2 as a diagnostic image after time t10.

時刻t11では、トレーニング支援サーバ２のバランス診断部２４において前記ユーザの体幹バランスが評価される。体調診断部２５は、バランス診断映像におけるユーザの顔色や動き（テンポ）を分析してユーザの現在の体調を診断する。時刻t12では、前記診断結果がユーザ端末１へ配信される。時刻t13では、図１５に示したように、前記診断結果がユーザ端末１のディスプレイに表示される。 At time t11, the balance diagnosis unit 24 of the training support server 2 evaluates the trunk balance of the user. The physical condition diagnosis unit 25 analyzes the user's complexion and movement (tempo) in the balance diagnosis image to diagnose the current physical condition of the user. At time t12, the diagnosis result is delivered to the user terminal 1. At time t13, as shown in FIG. 15, the diagnosis result is displayed on the display of the user terminal 1.

本実施形態では、ユーザが選択したトレーニング目的「ダイエット」および気になる部位「お腹周り」と共に、体幹バランスの診断結果として「左重心」が表示されている。 In the present embodiment, the “left center of gravity” is displayed as the trunk balance diagnosis result, together with the training purpose “diet” selected by the user and the region of concern “abdomen”.

前記診断結果画面でユーザが「決定」ボタンをタップすると、時刻t14では、トレーニング支援サーバ２のトレーニングメニュー選択部２６１が、前記体幹バランスの分析結果、トレーニング目的，ユーザのトレーニングレベル及び現在の体調、更には必要に応じて当該ユーザの属性情報を、予め構築されているトレーニングメニュー選択ポリシーに適用することで、現在のユーザに最適なトレーニングメニューを前記トレーニングメニュー記憶部３４から選択する。 When the user taps the “OK” button on the diagnosis result screen, at time t14, the training menu selection unit 261 of the training support server 2 causes the trunk balance analysis result, the training purpose, the training level of the user and the current physical condition. Further, if necessary, the attribute information of the user is applied to a training menu selection policy that is built in advance, so that the most suitable training menu for the current user is selected from the training menu storage unit 34.

例えば、体調診断の結果が「良好」であれば、体幹バランス、トレーニングレベルおよびトレーニング目的に適合するトレーニングメニューが選択される。これに対して、体調診断の結果が「不調」であれば、体幹バランス、トレーニングレベルおよびトレーニング目的に適合するトレーニングメニューよりも負荷の低いトレーニングメニューが選択される。なお、体調の「不調」の程度によっては今回のトレーニングを推奨しない旨のメッセージが配信されるようにしても良い。 For example, if the result of the physical condition diagnosis is “good”, a training menu suitable for the trunk balance, the training level and the training purpose is selected. On the other hand, if the result of the physical condition diagnosis is “improper”, a training menu having a lower load than the training menu suitable for the trunk balance, the training level and the training purpose is selected. It should be noted that, depending on the degree of "disorder" of the physical condition, a message indicating that this training is not recommended may be distributed.

また、ユーザの属性情報を更に考慮するのであれば、年齢や性別に応じてトレーニングメニューの負荷を増減させても良い。あるいは、既往症として「高血圧」が登録されているユーザであれば血圧を上昇させにくいトレーニングメニューが選択されえるようにしても良い。同様に、既往症として「腰痛」が登録されているユーザであれば、腰の負担が少ないトレーニングメニューが選択されるようにしても良い。 Further, if the attribute information of the user is further considered, the load of the training menu may be increased or decreased according to age or sex. Alternatively, a user who has registered “hypertension” as an existing condition may be allowed to select a training menu that does not easily increase blood pressure. Similarly, a user who has registered "back pain" as an existing condition may select a training menu with a low back load.

時刻t15では、トレーニング支援サーバ２からユーザ端末１へ、前記選択したトレーニングメニューがトレーニングメニュー配信部２６２により配信される。時刻t16では、図１６に示したように、前記選択されたトレーニングメニューが表示される。 At time t15, the training menu distribution unit 262 distributes the selected training menu from the training support server 2 to the user terminal 1. At time t16, the selected training menu is displayed as shown in FIG.

図１６は、トレーニングメニューの開始画面の例を示した図であり、前記選択されたトレーニングメニューの内容および説明が表示される。 FIG. 16 is a diagram showing an example of a training menu start screen, in which the contents and description of the selected training menu are displayed.

本実施形態では、トレーニング数が「４」であり、時間目安が２０分であり、トレーニング内容が４回とも「リバースランジ」である旨が表示されている。ユーザは提案されたメニューを完全に消化する必要は無く、例えば時間的な制約や体調から一部を実施したくない場合は、時刻t17において、実施したくないトレーニングの案件をタップ等することでスキップすることもできる。 In the present embodiment, it is displayed that the number of trainings is “4”, the time standard is 20 minutes, and the training content is “reverse lunge” for all four trainings. The user does not have to completely digest the proposed menu. For example, if you do not want to implement a part due to time constraints or physical condition, you can tap the training item you do not want to implement at time t17. You can also skip.

以上のようにして、トレーニングメニューの修正が完了すると、図１７に示したように、今回のトレーニング内容が一覧表示される。この場合、前記スキップされたトレーニングはグレーアウトされる。 When the modification of the training menu is completed as described above, a list of the current training contents is displayed as shown in FIG. In this case, the skipped training is grayed out.

ユーザがトレーニング開始ボタンをタップすると、時刻t18においてトレーニングメニューがスタートし、図１８に示したように、実施するトレーニングごとにトレーニング内容や注意点が表示される。 When the user taps the training start button, the training menu starts at time t18, and as shown in FIG. 18, the training content and cautions are displayed for each training to be performed.

時刻t19では、トレーニングするユーザを撮影したトレーニング映像がフレーム単位でトレーニング支援サーバ２へ送信される。時刻t20では、トレーニング支援サーバ２のトレーニング評価部２６が、前記トレーニング映像から推定したユーザの姿勢の変化に基づいて行動認識を行う。 At time t19, the training video of the training user is transmitted to the training support server 2 in frame units. At time t20, the training evaluation unit 26 of the training support server 2 performs action recognition based on the change in the posture of the user estimated from the training video.

時刻t21では、前記評価結果がユーザ端末１へ送信され、時刻t22でユーザ端末１のディスプレイに表示される。時刻t23では、今回のスコアを過去のスコアの履歴情報に累積することで既登録のトレーニングレベルが更新される。 At time t21, the evaluation result is transmitted to the user terminal 1, and is displayed on the display of the user terminal 1 at time t22. At time t23, the registered training level is updated by accumulating the current score in the history information of the past score.

本実施形態では、トレーニングレベルが例えば第１レベル（初級）から第５レベル（上級）まで規定されており、第nレベルにおいて獲得したスコアの累計が所定の閾値を上回ると、次の第n+1レベルに昇格するように構成されている。 In the present embodiment, the training level is defined, for example, from the first level (beginner level) to the fifth level (advanced level), and when the cumulative score obtained at the nth level exceeds a predetermined threshold, the next nth+ Configured to promote to one level.

図１９は、トレーニング後にユーザ端末に表示される評価結果の一例を示した図であり、実施したトレーニング内容、各トレーニングの個別スコアおよびメッセージが時系列で一覧表示されると共に、今回のトレーニングのトータルスコア「３９０」が表示されている。 FIG. 19 is a diagram showing an example of the evaluation result displayed on the user terminal after the training, in which the contents of the training carried out, the individual score of each training and the message are displayed in a time series, and the total score of the current training is displayed. "390" is displayed.

前記トータルスコアは履歴情報として管理されており、図２０に示したように時系列で一覧表示させることもできる。これにより、ユーザはトレーニングの履歴を容易に確認することができるようになり、特にスコアを参照することで、トレーニングレベルが向上していることを客観的に感じることができるので、トレーニングに対するモチベーションの向上が期待できる。 The total score is managed as history information and can be displayed in a time series list as shown in FIG. As a result, the user can easily check the training history, and by referring to the score in particular, it is possible to objectively feel that the training level is improving, so that the motivation for the training can be improved. Can be expected to improve.

図２１は、前記フレーム画像統合部２８の他の画像統合方法を模式的に示した図である。 FIG. 21 is a diagram schematically showing another image integration method of the frame image integration unit 28.

上記の実施形態では、例えばフレームメモリMを４等分割して４つのカメラ画像（a，b，c，d）を統合［同図(a)］することで、同時に４人のユーザを対象に姿勢推定を実施しているときに、ユーザ数が増えて８つのカメラ画像（a，b，c，d，e，f，g，h）を統合する必要が生じると、フレームメモリMが適応的に９等分割［同図(b)］され、最大で９人のユーザに対して同時に姿勢推定を実施できるものとして説明した。 In the above-described embodiment, for example, the frame memory M is divided into four equal parts and four camera images (a, b, c, d) are integrated [(a) in the figure], thereby simultaneously targeting four users. If the number of users increases and it becomes necessary to integrate eight camera images (a, b, c, d, e, f, g, h) while performing posture estimation, the frame memory M is adaptive. It has been described that the posture estimation can be simultaneously performed for a maximum of 9 users by dividing into 9 equal parts [(b) in the figure].

しかしながら、本発明はこれのみに限定されるものではなく、同図(c)に示したように、フレームメモリMの４等分割は維持したまま、統合対象のカメラ画像集合を２つのグループ（a，b，c，d）および（e，f，g，h）に分類し、フレーム単位で統合するグループを順次に切り換えることで、一つのGPUで８人のユーザに対して実質的に同時に姿勢推定を実施できるようにしても良い。 However, the present invention is not limited to this, and as shown in FIG. 7C, the camera image set to be integrated is divided into two groups (a , B, c, d) and (e, f, g, h), and by sequentially switching the groups to be integrated in frame units, the postures of eight users can be simultaneously evaluated with one GPU. The estimation may be performed.

すなわち、k番目の処理ではフレームメモリMに一方のグループのカメラ画像集合（a1，b1，c1，d1）を配置し、k+１番目の処理では他方のグループのカメラ画像集合（e1，f1，g1，h1）を配置し、k+2番目の処理では再び一方のグループのカメラ画像集合（a2，b2，c2，d2）を配置し…といったように、各グループのカメラ画像集合を順番に統合するようにしても良い。 That is, in the kth process, one group of camera image sets (a1, b1, c1, d1) is arranged in the frame memory M, and in the k+1st process, the other group of camera image sets (e1, f1, g1 and h1) are arranged, and in the k+2nd processing, the camera image sets (a2, b2, c2, d2) of one group are arranged again, and so on. It may be done.

このように、m個のカメラ画像を対象に、１番目からn（＜m）番目の各カメラ画像をフレームメモリの各部分領域に配置して今回の統合フレーム画像を生成し、次の周期ではn+１番目からm番目の各カメラ画像を前記フレームメモリの各部分領域に配置して統合フレーム画像を生成し、これを後続のm個のカメラ画像ごとに繰り返せば、GPUごとに、画像解像度の低下を抑えながら、フレームメモリの分割数を超える個数のカメラ画像を同時に処理できるようになる。 In this way, for m camera images, the first to n (<m)-th camera images are arranged in the partial areas of the frame memory to generate the integrated frame image of this time, and in the next cycle, By arranging each of the (n+1)th to mth camera images in each partial area of the frame memory to generate an integrated frame image, and repeating this for every m subsequent camera images, the image resolution for each GPU It is possible to simultaneously process a number of camera images exceeding the number of divisions of the frame memory while suppressing the deterioration of

なお、トレーニング支援サーバ２におけるフレーム画像の処理レートが各ユーザ端末1によるカメラ画像の送信周期を下回ると、トレーニング支援サーバ側にバッファ溢れ等の不都合が生じうる。そこで、本実施形態でもトレーニング支援サーバ２におけるフレーム画像の処理レートが各ユーザ端末1によるカメラ画像の送信周期を下回ると、トレーニング支援サーバ２から各ユーザ端末1へカメラ画像の送信周期を低下させる変更要求が送信される。 If the frame image processing rate in the training support server 2 is lower than the camera image transmission cycle of each user terminal 1, inconveniences such as a buffer overflow may occur on the training support server side. Therefore, also in the present embodiment, when the frame image processing rate in the training support server 2 is lower than the camera image transmission cycle of each user terminal 1, the training support server 2 reduces the camera image transmission cycle to each user terminal 1. Request is sent.

各ユーザ端末１は、同図(d)に一例を示したように、前記変更要求に応答して、ユーザを撮影した動画像から静止画をキャプチャする際のサンプリング周期を長くし、これと同期するようにカメラ画像の送信周期を低下させる。
さらに、上記の実施形態では、カメラ画像のキャプチャ部１０２、トリミング部１０３および縮小部１０４をユーザ端末１に設けるものとして説明したが、本発明はこれのみに限定されるものではなく、これらの全てまたは一部をトレーニング支援サーバ２に設けるようにしてもよい。 In response to the change request, each user terminal 1, in response to the change request, lengthens the sampling period when capturing a still image from the moving image of the user, and synchronizes with this. As described above, the transmission cycle of the camera image is reduced.
Furthermore, in the above embodiment, the camera image capture unit 102, the trimming unit 103, and the reduction unit 104 are described as being provided in the user terminal 1, but the present invention is not limited to this, and all of these are provided. Alternatively, a part of the training support server 2 may be provided.

１…ユーザ端末，２…トレーニング支援サーバ，３…トレーニングDB，２２…ユーザ認証部，２３…トレーニング目的受付部，２４…バランス診断部，２５…体調診断部，２６…トレーニング評価部，２７…トレーニング情報更新部，２８…フレーム画像統合部，２９…姿勢推定部，３１…属性情報記憶部，３２…トレーニングレベル記憶部，３３…診断メニュー記憶部，３４…トレーニングメニュー記憶部 DESCRIPTION OF SYMBOLS 1... User terminal, 2... Training support server, 3... Training DB, 22... User authentication part, 23... Training purpose acceptance part, 24... Balance diagnosis part, 25... Physical condition diagnosis part, 26... Training evaluation part, 27... Training Information updating unit, 28... Frame image integrating unit, 29... Posture estimating unit, 31... Attribute information storage unit, 32... Training level storage unit, 33... Diagnostic menu storage unit, 34... Training menu storage unit

Claims

In an object identification device that develops a camera image on a frame memory and performs posture estimation of a human object,
Means for acquiring a time series of camera images transmitted by a plurality of user terminals,
Means for virtually dividing the frame memory into a plurality of partial areas,
A means for generating an integrated frame image in which a plurality of camera images are integrated into one frame by arranging each camera image in each partial area of the frame memory;
Object identification, comprising: posture estimation means for estimating the position and connectivity of body parts of a human object by a single inference by a bottom-up approach based on a feature map generated from the integrated frame image. apparatus.

The object identification apparatus according to claim 1, wherein the posture estimation unit excludes connectivity of body parts extracted from different partial areas from estimation targets.

3. The object identifying apparatus according to claim 1, wherein the posture estimating unit estimates the posture of the human object for each partial area.

4. The dividing unit increases the number of partial regions to be divided as the estimated likelihood increases, based on the estimated likelihood output from the posture estimating unit. The described object identification device.

The object identifying apparatus according to claim 4, further comprising means for requesting the user terminal to change the size of a camera image to be transmitted.

6. The object identifying apparatus according to claim 4, further comprising means for requesting the user terminal to change the transmission cycle of the camera image.

7. The object identification device according to claim 1, wherein the frame memory is equally divided into a plurality of partial areas.

8. The object identifying apparatus according to claim 7, wherein the frame memory includes non-equally divided partial areas.

The means for generating the integrated frame image determines a partial area in which the camera image is arranged based on attribute information of a user who has transmitted the camera image. Object identification device.

The unit for generating the integrated frame image generates the integrated frame image of this time by arranging each of the 1st to n(<m)th camera images in each partial area of the frame memory for m camera images. Then, in the next cycle, each of the n+1 to m-th camera images is arranged in each partial area of the frame memory to generate an integrated frame image, and this is repeated for each of the subsequent m camera images. 10. The object identification device according to claim 1, wherein the object identification device is an object identification device.

In an object identification system that develops camera images transmitted by a plurality of user terminals on a frame memory of an object identification device and performs posture estimation of a human object,
Each of the user terminals,
A means for capturing a camera image at a predetermined sampling period,
Means for transmitting a time series of the captured camera image to an object identification device,
The object identification device,
Means for acquiring a time series of camera images from each user terminal,
Means for virtually dividing the frame memory into a plurality of partial areas,
A means for generating an integrated frame image in which a plurality of camera images are integrated into one frame by arranging each camera image in each partial area of the frame memory;
Object identification, comprising: posture estimation means for estimating the position and connectivity of body parts of a human object by a single inference by a bottom-up approach based on a feature map generated from the integrated frame image. system.

The object identification device further comprises means for requesting the user terminal to resize a camera image,
The object identification system according to claim 11, wherein the user terminal changes a size of a camera image to be transmitted in response to the size change request.

The object identification device further comprises means for requesting the user terminal to change a transmission cycle of a camera image,
The object identification system according to claim 11 or 12, wherein the user terminal changes a transmission cycle of a camera image in response to the change request.

14. The object identification system according to claim 11, wherein the user terminal further comprises means for trimming a camera image.

In an object identification method in which a computer develops a camera image on a frame memory and executes posture estimation of a human object,
A procedure for acquiring and storing time series of camera images transmitted by a plurality of user terminals,
A procedure of virtually dividing the frame memory into a plurality of partial areas,
A procedure for generating an integrated frame image in which a plurality of camera images are integrated into one frame by arranging each camera image in each partial area of the frame memory;
A method of estimating the position and connectivity of body parts of a human object by a single inference by a bottom-up approach based on a feature map generated from the integrated frame image.