JP2022541709A

JP2022541709A - Attitude detection and video processing method, device, electronic device and storage medium

Info

Publication number: JP2022541709A
Application number: JP2021567798A
Authority: JP
Inventors: チェンチエン; ジュンイーリン; モンティンチェン
Original assignee: ベイジン・センスタイム・テクノロジー・デベロップメント・カンパニー・リミテッド
Priority date: 2020-06-19
Filing date: 2020-12-21
Publication date: 2022-09-27
Also published as: KR20210157470A

Abstract

本発明は、姿勢検出およびビデオ処理方法、および装置、電子機器並びに記憶媒体に関する。前記姿勢検出方法は、ターゲット画像を取得することと、前記ターゲット画像に従って、前記ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することと、前記連続深度情報および前記位置情報に従って、前記ターゲット対象の姿勢を決定することと、を含む。上記の過程を介して、ターゲット対象の姿勢をより正確に検出し、姿勢検出の精度および効果を向上させることができる。【選択図】図１The present invention relates to attitude detection and video processing methods and devices, electronic equipment and storage media. The pose detection method comprises obtaining a target image, obtaining continuous depth information and position information of a target object in the target image according to the target image, and according to the continuous depth information and the position information, the determining the pose of the target subject. Through the above process, the pose of the target object can be detected more accurately, and the accuracy and effect of pose detection can be improved. [Selection drawing] Fig. 1

Description

本願は、２０２０年０６月１９日に中国特許局に提出された、出願番号が２０２０１０５６６３８８．７である、中国特許出願に基づいて提出されるものであり、当該中国特許出願の優先権を主張し、当該中国特許出願の全ての内容が参照によって本願に組み込まれる。 This application is based on and claims priority from a Chinese patent application with application number 202010566388.7 filed with the Chinese Patent Office on June 19, 2020. , the entire content of the Chinese patent application is incorporated herein by reference.

本発明は、画像処理技術分野に関し、特に、姿勢検出およびビデオ処理方法、装置、電子機器並びに記憶媒体に関する。 The present invention relates to the field of image processing technology, and more particularly to pose detection and video processing methods, devices, electronic devices and storage media.

三次元人体姿勢推定は、１枚の画像または１つのビデオ内の人体の三次元位置を推定することを指す。当該タスクは、コンピュータビジョン分野の１つの活躍する研究課題であり、多くのアプリケーションプログラム（例えば、動き識別、人間とコンピュータとの対話および自律運転）の重要なステップである。入力された画像を介して、高精度な三次元位置情報予測を実現することは、現在の１つの緊急の問題である。 3D human body pose estimation refers to estimating the 3D position of a human body in an image or a video. This task is one active research topic in the field of computer vision and is a key step in many application programs (eg motion identification, human-computer interaction and autonomous driving). Achieving highly accurate 3D position information prediction via input images is one of the current pressing problems.

本発明は、姿勢検出の技術的解決策を提案する。 The present invention proposes a technical solution for pose detection.

本発明の一態様によれば、姿勢検出方法を提供し、前記方法は、
ターゲット画像を取得することと、前記ターゲット画像に従って、前記ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することと、前記連続深度情報および前記位置情報に従って、前記ターゲット対象の姿勢を決定することと、を含む。 According to one aspect of the invention, there is provided a pose detection method, the method comprising:
obtaining a target image; obtaining continuous depth information and position information of a target object within the target image according to the target image; and determining a pose of the target object according to the continuous depth information and the position information. including doing and

一可能な実現方式において、前記ターゲット画像に従って、前記ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することは、前記ターゲット画像を第１ニューラルネットワークモデルを介して、前記ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することと、前記第１ニューラルネットワークモデルは、第１トレーニングデータおよび第２トレーニングデータを介してトレーニングすることにより得られることと、を含み、ここで、前記第１トレーニングデータは、トレーニング対象を含むトレーニング画像であり、前記第２トレーニングデータは、前記トレーニング対象の連続深度情報、および前記トレーニング対象の位置情報を含む。 In one possible implementation, obtaining continuous depth information and position information of a target object in the target image according to the target image includes passing the target image through a first neural network model to: obtaining continuous depth and location information of a target subject; and wherein said first neural network model is obtained by training via first training data and second training data, wherein: The first training data are training images including a training object, and the second training data include continuous depth information of the training object and position information of the training object.

一可能な実現方式において、前記連続深度情報および前記位置情報に従って、前記ターゲット対象の姿勢を決定することは、前記連続深度情報および前記位置情報を、第２ニューラルネットワークモデルを介して、前記ターゲット対象の姿勢を取得することと、前記第２ニューラルネットワークモデルは、第２トレーニングデータおよび第３トレーニングデータを介してトレーニングすることと、を含み、ここで、前記第２トレーニングデータは、トレーニング対象の連続深度情報、および前記トレーニング対象の位置情報を含み、前記第３トレーニングデータは、前記トレーニング対象の姿勢を含む。 In one possible implementation, determining the pose of the target object according to the continuous depth information and the position information includes: applying the continuous depth information and the position information to the target object through a second neural network model; and training the second neural network model via second training data and third training data, wherein the second training data is a training object continuous The third training data includes depth information and position information of the training object, and the third training data includes the pose of the training object.

一可能な実現方式において、前記第２トレーニングデータは、第３トレーニングデータに従って生成され、前記第３トレーニングデータは、前記トレーニング対象の姿勢を含み、前記第３トレーニングデータに従って、前記第２トレーニングデータを生成することは、前記第３トレーニングデータ内の前記トレーニング対象の姿勢に従って、前記トレーニング対象の離散深度情報、および前記トレーニング対象の位置情報を取得することと、少なくとも前記離散深度情報の一部を処理して、前記トレーニング対象の連続深度情報を取得することと、前記トレーニング対象の連続深度情報および前記トレーニング対象の位置情報に従って、前記第２トレーニングデータを生成することと、を含む。 In one possible implementation, the second training data is generated according to third training data, the third training data includes the posture of the training object, and the second training data is generated according to the third training data. generating comprises obtaining discrete depth information of the training object and position information of the training object according to the pose of the training object in the third training data; and processing at least a portion of the discrete depth information. and obtaining continuous depth information of the training object; and generating the second training data according to the continuous depth information of the training object and the position information of the training object.

一可能な実現方式において、前記少なくとも前記離散深度情報の一部を処理して、前記トレーニング対象の連続深度情報を取得することは、少なくとも前記離散深度情報の一部に対応する少なくとも１つの接続を取得することと、前記接続に対応する離散深度情報に従って、少なくとも１つの前記接続の連続深度サブ情報を決定することと、少なくとも１つの前記連続深度サブ情報を統計して、前記トレーニング対象の連続深度情報を取得することと、を含む。 In one possible implementation, processing the at least a portion of the discrete depth information to obtain continuous depth information for the training target comprises establishing at least one connection corresponding to at least a portion of the discrete depth information. obtaining; determining continuous depth sub-information of at least one connection according to discrete depth information corresponding to the connection; obtaining information.

一可能な実現方式において、前記接続に対応する離散深度情報に従って、少なくとも１つの前記接続の連続深度サブ情報を決定することは、前記接続に対応する離散深度情報に従って、線形補間を介して、前記接続上の少なくとも１つの点の第１連続深度サブ情報を取得することと、少なくとも１つの前記接続に対応する接続範囲を決定することと、前記第１連続深度サブ情報に従って、前記接続に対応する接続範囲内の少なくとも１つの点の第２連続深度サブ情報を決定することと、前記第１連続深度サブ情報および／または第２連続深度サブ情報に従って、前記接続に対応する連続深度サブ情報を取得して、前記少なくとも１つの前記接続の連続深度サブ情報を取得することと、を含む。 In one possible implementation, determining continuous depth sub-information of at least one of said connections according to discrete depth information corresponding to said connections comprises: via linear interpolation according to discrete depth information corresponding to said connections; obtaining first continuous depth sub-information of at least one point on a connection; determining a connection range corresponding to at least one said connection; and corresponding to said connection according to said first continuous depth sub-information. Determining second continuous depth sub-information for at least one point within a connection range, and obtaining continuous depth sub-information corresponding to said connection according to said first continuous depth sub-information and/or second continuous depth sub-information. to obtain continuous depth sub-information for the at least one connection.

一可能な実現方式において、前記第１連続深度サブ情報に従って、前記接続に対応する接続範囲内の少なくとも１つの点の第２連続深度サブ情報を決定することは、前記接続範囲が、前記接続に対応する離散深度情報のプリセットの範囲内にある場合、前記接続に対応する離散深度情報を、前記接続範囲内の少なくとも１つの点の第２連続深度サブ情報として使用することと、前記接続範囲が、前記接続に対応する離散深度情報のプリセットの範囲以外にある場合、前記接続内の前記接続範囲内の点と最も近い第１連続深度サブ情報に従って、前記接続範囲内の少なくとも１つの点の第２連続深度サブ情報を取得することと、を含む。 In one possible implementation, determining second continuous depth sub-information of at least one point within a connection range corresponding to said connection according to said first continuous depth sub-information is performed if said connection range corresponds to said connection. using the discrete depth information corresponding to the connection as second continuous depth sub-information for at least one point within the connection range, if within a preset range of the corresponding discrete depth information; and , the first point of at least one point within said connection range according to the first continuous depth sub-information closest to the point within said connection range within said connection, if it is outside the preset range of discrete depth information corresponding to said connection; and obtaining two consecutive depth sub-information.

本発明の一態様によれば、ビデオ処理方法を提供し、前記方法は、
現在のシナリオに対して画像収集を実行して、収集ビデオを取得することと、前記収集ビデオから、少なくとも２フレームの、ターゲット対象を含むターゲット画像を選択することと、上記のいずれか一項に記載の姿勢検出方法を介して、少なくとも２フレームの前記ターゲット画像内の前記ターゲット対象に対して姿勢検出を実行して、前記収集ビデオ内の前記ターゲット対象の少なくとも２つの姿勢を決定することと、を含む。 According to one aspect of the invention, there is provided a video processing method, the method comprising:
performing image acquisition for a current scenario to obtain an acquired video; selecting from the acquired video a target image comprising at least two frames of the target object; performing pose detection on the target object in the target images of at least two frames via the described pose detection method to determine at least two poses of the target object in the acquired video; including.

一可能な実現方式において、前記ターゲット対象の少なくとも２つの姿勢、および前記収集ビデオ内のフレームの時間に従って、前記ターゲット対象の連続姿勢を取得することと、前記ターゲット対象の連続姿勢に従って、前記ターゲット対象を追跡することと、をさらに含む。 In one possible implementation, obtaining successive poses of the target object according to at least two poses of the target object and time of frames in the acquired video; and tracking.

本発明の一態様によれば、姿勢検出装置を提供し、前記装置は、
ターゲット画像を取得するように構成される、ターゲット画像取得部と、前記ターゲット画像に従って、前記ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得するように構成される、情報取得部と、前記連続深度情報および前記位置情報に従って、前記ターゲット対象の姿勢を決定するように構成される、姿勢決定部と、を備える。 According to one aspect of the present invention, there is provided an attitude detection device, the device comprising:
a target image acquisition unit configured to acquire a target image; an information acquisition unit configured to acquire continuous depth information and position information of a target object in the target image according to the target image; a pose determiner configured to determine a pose of the target object according to the continuous depth information and the position information.

一可能な実現方式において、前記情報取得部は、前記ターゲット画像を第１ニューラルネットワークモデルを介して、前記ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得し、前記第１ニューラルネットワークモデルは、第１トレーニングデータおよび第２トレーニングデータを介してトレーニングすることにより得られるように構成され、ここで、前記第１トレーニングデータは、トレーニング対象を含むトレーニング画像であり、前記第２トレーニングデータは、前記トレーニング対象の連続深度情報、および前記トレーニング対象の位置情報を含む。 In one possible implementation, the information acquisition unit passes the target image through a first neural network model to acquire continuous depth information and position information of a target object in the target image, and is configured to be obtained by training via first training data and second training data, wherein said first training data is a training image containing a training target, and said second training data is , continuous depth information of the training object, and location information of the training object.

一可能な実現方式において、前記姿勢決定部は、前記連続深度情報および前記位置情報を、第２ニューラルネットワークモデルを介して、前記ターゲット対象の姿勢を取得し、前記第２ニューラルネットワークモデルは、第２トレーニングデータおよび第３トレーニングデータを介してトレーニングするように構成され、ここで、前記第２トレーニングデータは、トレーニング対象の連続深度情報、および前記トレーニング対象の位置情報を含み、前記第３トレーニングデータは、前記トレーニング対象の姿勢を含む。 In one possible implementation, the pose determiner acquires the pose of the target object from the continuous depth information and the position information through a second neural network model, the second neural network model configured to train via two training data and third training data, wherein said second training data includes continuous depth information of a training object and position information of said training object; and said third training data contains the posture of the training target.

一可能な実現方式において、前記接続に対応する離散深度情報に従って、少なくとも１つの前記接続の連続深度サブ情報を決定することは、前記接続に対応する離散深度情報に従って、線形補間を介して、前記接続上の少なくとも１つの点の第１連続深度サブ情報を取得することと、少なくとも１つの前記接続に対応する接続範囲を決定することと、前記第１連続深度サブ情報に従って、前記接続に対応する接続範囲内の少なくとも１つの点の第２連続深度サブ情報を決定することと、前記第１連続深度サブ情報および／または第２連続深度サブ情報に従って、前記接続に対応する連続深度サブ情報を取得することと、を含む。 In one possible implementation, determining continuous depth sub-information of at least one of said connections according to discrete depth information corresponding to said connections comprises: via linear interpolation according to discrete depth information corresponding to said connections; obtaining first continuous depth sub-information of at least one point on a connection; determining a connection range corresponding to at least one said connection; and corresponding to said connection according to said first continuous depth sub-information. Determining second continuous depth sub-information for at least one point within a connection range, and obtaining continuous depth sub-information corresponding to said connection according to said first continuous depth sub-information and/or second continuous depth sub-information. including doing and

本発明の一態様によれば、ビデオ処理装置を提供し、前記装置は、
現在のシナリオに対して画像収集を実行して、収集ビデオを取得するように構成される、画像収集部と、前記収集ビデオから、少なくとも２フレームの、ターゲット対象を含むターゲット画像を選択するように構成される、選択部と、上記のいずれか一項に記載の姿勢検出方法を介して、少なくとも２フレームの前記ターゲット画像内の前記ターゲット対象に対して姿勢検出を実行して、前記収集ビデオ内の前記ターゲット対象の少なくとも２つの姿勢を決定するように構成される、姿勢取得部と、を備える。 According to one aspect of the invention, there is provided a video processing device, the device comprising:
an image acquisition unit configured to perform image acquisition for a current scenario to obtain an acquired video; and to select from the acquired video at least two frames of a target image comprising a target object. and performing pose detection on the target object in at least two frames of the target image to obtain a a pose obtaining unit configured to determine at least two poses of the target object of .

一可能な実現方式において、前記ビデオ処理装置は、さらに、前記ターゲット対象の少なくとも２つの姿勢、および前記収集ビデオ内のフレームの時間に従って、前記ターゲット対象の連続姿勢を取得し、前記ターゲット対象の連続姿勢に従って、前記ターゲット対象を追跡するように構成される。 In one possible implementation, the video processing unit further obtains successive poses of the target object according to at least two poses of the target object and time of frames in the acquired video; It is configured to track the target object according to its pose.

本発明の一態様によれば、電子機器を提供し、前記電子機器は、
プロセッサと、プロセッサ実行可能命令を記憶するように構成されるメモリと、を備え、ここで、前記プロセッサは、前記メモリによって記憶される命令を呼び出して、上記の姿勢検出方法を実行するように構成される。 According to one aspect of the present invention, an electronic device is provided, the electronic device comprising:
a processor and a memory configured to store processor-executable instructions, wherein the processor is configured to invoke instructions stored by the memory to perform the pose detection method described above. be done.

本発明の一態様によれば、コンピュータプログラム命令が記憶される、コンピュータ可読記憶媒体を提供し、前記コンピュータプログラム命令はプロセッサによって実行されるとき上記の姿勢検出方法を実現する。 According to one aspect of the present invention, there is provided a computer readable storage medium having computer program instructions stored thereon, said computer program instructions, when executed by a processor, implementing the pose detection method described above.

本発明の一態様によれば、コンピュータ可読コードを含む、コンピュータプログラムを提供し、前記コンピュータ可読コードが、電子機器で実行され、前記電子機器内のプロセッサによって実行されるとき、上記の姿勢検出方法を実現する。 According to one aspect of the invention, there is provided a computer program product comprising computer readable code, said computer readable code being executed in an electronic device and, when executed by a processor within said electronic device, the attitude detection method described above. Realize

本発明の実行例において、ターゲット画像およびターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することを介して、連続深度情報および位置情報に従って、ターゲット対象の姿勢を決定する。上記の過程を介して、ターゲット対象の連続深度情報を使用して、ターゲット対象の姿勢を予測することができ、深度情報は連続するため、離散の深度情報に対して、ターゲット対象の姿勢をより正確に検出し、姿勢検出の精度および効果を向上させることができる。 In an implementation of the present invention, through obtaining the target image and the continuous depth information and position information of the target object within the target image, the pose of the target object is determined according to the continuous depth information and the position information. Through the above process, the continuous depth information of the target object can be used to predict the pose of the target object. It can accurately detect and improve the accuracy and effectiveness of attitude detection.

上記した一般的な説明及び後述する詳細な説明は、単なる例示及び説明に過ぎず、本発明を限定するものではないことを理解されたい。 It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

以下の図面を参照しや例示的な実施例に対する詳細な説明によれば、本発明の他の特徴及び態様は明らかになる。 Other features and aspects of the present invention will become apparent from the following detailed description of illustrative embodiments with reference to the drawings.

ここでの図面は、本明細書に組み込まれてその一部を構成し、これらの図面は、本発明と一致する実施例を示すものであり、明細書とともに本発明の実施例の技術的解決策を説明するために使用される。
本発明の一実施例による姿勢検出方法のフローチャートを示す。本発明の一実施例によるターゲット対象連続深度情報を取得する概略図を示す。本発明の一実施例による接続範囲を決定する概略図を示す。本発明の一実施例によるビデオ処理方法のフローチャートを示す。本発明の一応用例による概略図を示す。本発明の一実施例による姿勢検出装置のブロック図を示す。本発明の一実施例によるビデオ処理装置のブロック図を示す。本発明の実施例による電子機器のブロック図を示す。本発明の実施例による電子機器のブロック図を示す。 The drawings herein are incorporated in and constitute a part of this specification, and these drawings illustrate embodiments consistent with the present invention, and together with the description, the technical solution of the embodiments of the present invention. used to describe policies.
Fig. 3 shows a flow chart of a posture detection method according to an embodiment of the present invention; FIG. 4 shows a schematic diagram of acquiring target object continuous depth information according to one embodiment of the present invention; Fig. 3 shows a schematic diagram of determining a connection range according to an embodiment of the present invention; 1 shows a flow chart of a video processing method according to an embodiment of the present invention; 1 shows a schematic diagram according to one application of the invention; FIG. 1 shows a block diagram of an attitude detection device according to an embodiment of the present invention; FIG. 1 shows a block diagram of a video processing device according to one embodiment of the present invention; FIG. 1 shows a block diagram of an electronic device according to an embodiment of the invention; FIG. 1 shows a block diagram of an electronic device according to an embodiment of the invention; FIG.

以下、本発明の様々な例示的な実施例、特徴及び態様を図面を参照して詳細に説明する。図面内の同じ参照番号は、同じまたは類似の機能の要素を表示する。実施例の様々な態様を図面に示したが、特に明記しない限り、縮尺通りに図面を描く必要がない。 Various exemplary embodiments, features, and aspects of the invention are described in detail below with reference to the drawings. The same reference numbers in the drawings indicate elements of the same or similar function. Although various aspects of the illustrative embodiments are illustrated in the drawings, the drawings are not necessarily drawn to scale unless specified otherwise.

ここで専用の用語「例示的」とは、「例、実施例または説明用として使用される」ことを意味する。ここで、「例示的」として説明される任意の実施例は、他の実施例より優れるまたはより好ましいと解釈する必要はない。 As used herein, the proprietary term "exemplary" means "used as an example, example, or illustration." Any embodiment described herein as "exemplary" is not necessarily to be construed as superior or preferred over other embodiments.

本明細書における「及び／または」という用語は、関連付けられた対象を説明する単なる関連付けであり、３種類の関係が存在し得ることを表示し、例えば、Ａ及び／またはＢは、Ａが独立で存在する場合、ＡとＢが同時に存在する場合、Ｂが独立で存在する場合などの３つの場合を表示する。さらに、本明細書における「少なくとも１つ」という用語は、複数のうちの１つまたは複数のうちの少なくとも２つの任意の組み合わせを示し、例えば、Ａ、Ｂ、Ｃのうちの少なくとも１つを含むことは、Ａ、Ｂ及びＣで構成されるセットから選択された任意の１つまたは複数の要素を含むことを示す。 The term "and/or" herein is a mere association describing the subject being associated and indicates that three types of relationships may exist, e.g. , A and B exist at the same time, and B exists independently. Further, the term "at least one" herein refers to any combination of one of the plurality or at least two of the plurality, including, for example, at least one of A, B, C indicates that it includes any one or more elements selected from the set consisting of A, B and C.

さらに、本発明の実施例をよりよく説明するために、以下の具体的な実施形態において多くの特定の詳細が与えられる。当業者は、特定のいくつかの詳細なしに、本発明を同様に実施することができることを理解するはずである。いくつかの例において、当業者に周知の方法、手段、要素及び回路は、本開示の要旨を強調するために、詳細に説明しない。 Moreover, numerous specific details are given in the following specific embodiments in order to better explain the embodiments of the present invention. It should be understood by those skilled in the art that the present invention may be similarly practiced without some of the specific details. In some instances, methods, means, elements and circuits that are well known to those skilled in the art have not been described in detail in order to emphasize the subject matter of this disclosure.

図１は、本発明の一実施例による姿勢検出方法のフローチャートを示し、当該方法は、姿勢検出装置に適用されることができ、姿勢検出装置は、端末機器、サーバまたは他の処理機器などであり得る。ここで、端末機器はユーザ機器（ＵＥ：ＵｓｅｒＥｑｕｉｐｍｅｎｔ）、モバイル機器、ユーザ端末、端末、携帯電話、コードレス電話、パーソナルデジタル処理（ＰＤＡ：ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ハンドヘルド機器、コンピューティング機器、車載機器、ウェアラブル機器などであり得る。
いくつかの可能な実現方式において、当該姿勢検出方法は、プロセッサによって、メモリに記憶されるコンピュータ可読命令を呼び出す方式を介して実現さすることができる。 FIG. 1 shows a flow chart of an attitude detection method according to an embodiment of the present invention, which can be applied to an attitude detection device, such as a terminal device, a server or other processing equipment. could be. Here, the terminal equipment includes user equipment (UE), mobile equipment, user terminal, terminal, mobile phone, cordless phone, personal digital assistant (PDA), handheld equipment, computing equipment, in-vehicle equipment, It can be a wearable device or the like.
In some possible implementations, the pose detection method can be implemented by a processor via calling computer readable instructions stored in memory.

図１に示されたように、前記姿勢検出方法は、以下のステップを含み得る。 As shown in FIG. 1, the pose detection method may include the following steps.

ステップＳ１１において、ターゲット画像を取得する。 In step S11, a target image is acquired.

ステップＳ１２において、ターゲット画像に従って、ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得する。 In step S12, obtain continuous depth information and position information of the target object in the target image according to the target image.

ステップＳ１３において、連続深度情報および位置情報に従って、ターゲット対象の姿勢を決定する。 In step S13, the pose of the target object is determined according to the continuous depth information and the position information.

ここで、ターゲット画像は、姿勢検出のための任意の画像であり得、その実現形態は、本発明の実施例では限定されなく、実際の場合によって柔軟に決定することができる。一可能な実現方式において、ターゲット画像は、人体姿勢検出のための人体画像であってもよいし、一可能な実現方式において、ターゲット画像は、機械姿勢検出（ロボット姿勢検出など）のための機械画像などであってもよい。ターゲット対象は、ターゲット画像内の、姿勢検出を実行するための対象であり得、その実現形態は、ターゲット画像と姿勢検出の実際の場合によって柔軟に决定することができ、一可能な実現方式において、ターゲット画像が、人体姿勢検出のための人体画像である場合、ターゲット対象は、ターゲット画像に含まれる完全な人体であってもよいし、腕、太もも、胴体、頭などの、ターゲット画像に含まれる人体の位置または関節の一部であってもよく、一可能な実現方式において、ターゲット画像が、機械姿勢検出のための機械画像である場合、ターゲット対象は、ターゲット画像に含まれる機械対象の全体であってもよいし、機械の臂または機械が移動するようにサポートする歩行機構などの、ターゲット画像に含まれる機械のある部位であってもよい。後続の各開示された実施例は、すべて、ターゲット対象を人体とし、姿勢検出は、人体三次元姿勢検出であることを例として説明し、他の可能な実現方式は、後続の各開示される実施例を参照して柔軟的に拡張することができ、詳細に説明しない。 Here, the target image can be any image for posture detection, and its implementation mode is not limited in the embodiments of the present invention and can be flexibly determined according to actual cases. In one possible implementation, the target image may be a human body image for human body pose detection, and in one possible implementation, the target image may be a machine image for machine pose detection (such as robot pose detection). It may be an image or the like. The target object can be an object for performing pose detection in the target image, and its implementation mode can be flexibly determined according to the actual case of the target image and pose detection. , if the target image is a human body image for human body pose detection, the target object may be the complete human body included in the target image, or the arm, thigh, torso, head, etc. included in the target image. In one possible implementation, if the target image is a machine image for machine pose detection, the target object is the position of the machine object contained in the target image. It may be the entirety or some part of the machine contained in the target image, such as the arm of the machine or a walking mechanism that supports the machine to move. The embodiments disclosed below all take the target object to be the human body, and the pose detection is human three-dimensional pose detection. It can be flexibly expanded with reference to the embodiments and will not be described in detail.

ターゲット画像の数は、本発明の実施例では限定しなく、１つであってもよいし複数であってもよいし、即ち、一回の姿勢検出の過程で、一枚のみのターゲット画像に対して姿勢検出を実行することができ、複数枚のターゲット画像に対して同時に姿勢検出を実行することもでき、検出の実際のニーズによって柔軟に決定するとよい。ターゲット画像に含まれるターゲット対象の数は、本発明の実施例では制限しなく、一可能な実現方式において、一枚のターゲット画像には、１つのターゲット対象が含まれてもよいし、一可能な実現方式において、一枚のターゲット画像には、複数のターゲット対象を同時に含まれてもよいし、実際の場合によって柔軟に決定するとよい。 The number of target images is not limited in the embodiment of the present invention, and may be one or more. The pose detection can be performed for multiple target images at the same time, and can be flexibly determined according to the actual needs of detection. The number of target objects included in the target image is not limited in the embodiments of the present invention. In such an implementation method, one target image may include a plurality of target objects at the same time, and may be flexibly determined according to actual cases.

ステップＳ１１では、ターゲット画像を取得する方式も本発明の実施例では限定されなく、一可能な実現方式において、姿勢検出装置が、ターゲット対象に対してアクティブに画像収集（写真またはビデオの撮影など）を実行する方式を介してターゲット画像を取得することであり得、一可能な実現方式において、姿勢検出装置が、ターゲット画像を受動的に受信することであってもよく、どうやって実現するかは、姿勢検出装置の実際の場合によって柔軟に决定するとよく、本発明の実施例では制限しない。 In step S11, the method of acquiring the target image is also not limited to the embodiments of the present invention, and in one possible implementation method, the posture detection device actively acquires images (such as taking pictures or videos) of the target object. In one possible implementation, the pose detection device may passively receive the target image, how to implement It may be determined flexibly according to the actual situation of the attitude detection device, and the embodiments of the present invention are not limited.

ターゲット対象の連続深度情報は、ターゲット対象の連続する複数のサンプリングポイントの深度情報であり得、ターゲット対象を検出する過程で、ターゲット対象の頭、首、肩、肘、手、腰、膝および足などの、ターゲット対象のいくつかのキーポイントを検出することができ、人体構造に基づいて、これらのキーポイントの間には、通常、比較的に長い距離があり、即ち、これらのキーポイントは、互いに離散しており、例えば、ターゲット対象の肘と手首との間に腕の距離だけ離れているため、キーポイントの深度情報は、通常、離散した深度情報であることが分かることができる。連続深度情報は、人体胴体の剛構造に基づいて、隣接する離散キーポイント間のすべてのポイントに対して深度推定を実行して得られる連続情報であり、一可能な実現方式において、連続深度情報は、連続深度特徴マップとして表すことができる。 The continuous depth information of the target object can be the depth information of a plurality of continuous sampling points of the target object, and in the process of detecting the target object, the target object's head, neck, shoulders, elbows, hands, waist, knees and feet. can detect several keypoints of the target object, such as , are discrete from each other, e.g., the distance of the arm between the target subject's elbow and wrist, so it can be seen that the depth information of the keypoints is typically discrete depth information. Continuous depth information is continuous information obtained by performing depth estimation for all points between adjacent discrete keypoints based on the rigid structure of the human torso. can be represented as a continuous depth feature map.

そのため、ステップＳ１２において、ターゲット画像内のターゲット対象の連続深度情報を取得することは、ターゲット画像内のターゲット対象のカバレッジ内に含まれる複数の連続のサンプリングポイントの深度情報を取得することであり得、当該連続深度情報を取得する方式は、実際の場合によって柔軟に選択することができる。 Therefore, obtaining continuous depth information of the target object in the target image in step S12 may be obtaining depth information of a plurality of consecutive sampling points contained within the coverage of the target object in the target image. , the method of acquiring the continuous depth information can be flexibly selected according to actual cases.

ターゲット対象に対して姿勢検出を実行する過程において、最終的に取得する必要があることは、ターゲット対象の三次元座標である可能性があり、深度情報は、三次元座標のうちの１つの次元であるため、一可能な実現方式において、ステップＳ１２は、ターゲット対象の姿勢を決定することを補助するために、ターゲット対象の二次元位置情報を必要とする可能性がある。位置情報の実現形態は限定されなく、実際の場合によって柔軟に選択することができ、一可能な実現方式において、ターゲット対象の位置情報は、ターゲット対象の二次元ヒートマップを含み得、ここで、含まれる二次元ヒートマップの数とタイプは、実際の場合によって柔軟に决定することができ、一例において、ターゲット対象の位置情報は、ターゲット対象のキーポイントの二次元ヒートマップ、および／またはターゲット対象胴体の二次元ヒートマップなどを含み得る。 In the process of performing pose detection on a target object, what we finally need to obtain may be the 3D coordinates of the target object, and the depth information is one dimension of the 3D coordinates. So, in one possible implementation, step S12 may require two-dimensional position information of the target object to help determine the pose of the target object. The implementation form of the location information is not limited, and can be flexibly selected according to actual cases. In one possible implementation, the location information of the target object can include a two-dimensional heat map of the target object, wherein: The number and types of 2D heatmaps included can be flexibly determined according to the actual case. It may include a 2D heatmap of the torso, and so on.

本発明のいくつかの実施例において、ターゲット対象の位置情報を取得する方式は、同じく実際の場合によって柔軟に决定することができ、詳細は、各開示された実施例を参照されたい。 In some embodiments of the present invention, the manner of obtaining the location information of the target object can also be flexibly determined according to actual cases, please refer to each disclosed embodiment for details.

一可能な実現方式において、ステップＳ１２は、ターゲット画像を第１ニューラルネットワークモデルを介して、ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することを含み得る。 In one possible implementation, step S12 may include passing the target image through a first neural network model to acquire continuous depth and position information of the target object within the target image.

第１ニューラルネットワークモデルは、ターゲット対象の連続深度情報および位置情報の抽出機能を有するニューラルネットワークモデルであり得、その実現形態は、本発明の実施例では限定されなく、実際の場合によって柔軟に設定することができる。その入力データが、ターゲット画像であり、出力データが、ターゲット画像内のターゲット対象の連続深度情報であるため、一可能な実現方式において、第１ニューラルネットワークモデルは、第１トレーニングデータおよび第２トレーニングデータを介して、第１初期ニューラルネットワークをトレーニングして得られたニューラルネットワークモデルであり得る。ここで、第１トレーニングデータは、トレーニング対象を含むトレーニング画像であり得、第２トレーニングデータは、トレーニング対象の連続深度情報およびトレーニング対象の位置情報を含み得る。ここで、トレーニング画像に含まれるトレーニング対象は、ターゲット対象と同じであってもよいし、ターゲット対象と同じタイプに属するが、ターゲット対象と同じ対象ではない関連する対象などであってもよいし、ここで、実現形態は、ターゲット対象の実現形態を参照でき、ここでは繰り返して説明しない。トレーニング画像の実現形態は、ターゲット画像の実現形態を参照でき、ここでは繰り返して説明しない。第２トレーニングデータをどうやって取得するか、および第２トレーニングデータ内のターゲット対象の連続深度情報および位置情報をどうやって取得するかは、後続の各開示された実施例を参照でき、ここでは詳細を説明しない。第１トレーニングデータおよび第２トレーニングデータの数は、本発明の実施例で制限されなく、実際の場合によって柔軟に决定することができる。 The first neural network model can be a neural network model with the function of extracting continuous depth information and position information of the target object, and its implementation is not limited in the embodiments of the present invention, and can be flexibly set according to actual cases. can do. Since its input data is the target image, and its output data is continuous depth information of the target object within the target image, in one possible implementation, the first neural network model uses the first training data and the second training data It can be a neural network model obtained by training a first initial neural network through data. Here, the first training data may be training images including the training target, and the second training data may include continuous depth information of the training target and location information of the training target. where the training object included in the training images may be the same as the target object, or may be a related object that belongs to the same type as the target object but is not the same object as the target object, etc.; Here, the implementation can refer to the implementation of the target object and will not be repeated here. The implementation of the training image can refer to the implementation of the target image and will not be repeated here. How to obtain the second training data and how to obtain the continuous depth information and position information of the target object in the second training data can be referred to each subsequent disclosed embodiment, which will be described in detail here. do not do. The number of first training data and second training data is not limited in the embodiments of the present invention and can be flexibly determined according to actual cases.

第１初期ニューラルネットワークは、ＶＧＧ、ＲｅｓＮｅｔまたはＧｏｏｇｌｅＮｅｔなどの一般的なニューラルネットワークであってもよいし、ＭｏｂｉｌｅＮｅｔＶ２またはＳｈｕｆｆｌｅＮｅｔＶ２などの軽量ニューラルネットワークであってもよいし、ここで、どのニューラルネットワークを第１初期ニューラルネットワークとして選択する、実際のニーズによって柔軟に選択するとよい。 The first initial neural network may be a general neural network such as VGG, ResNet or GoogleNet, or a lightweight neural network such as MobileNet V2 or ShuffleNet V2, where any neural network The selection as the first initial neural network can be flexibly selected according to actual needs.

第１トレーニングデータおよび第２トレーニングデータを第１初期ニューラルネットワークに入力してトレーニングすることを介して、連続深度情報抽出機能を有する第１ニューラルネットワークモデルを取得し、当該第１ニューラルネットワークモデルを使用して、ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得し、上記の過程を介して、比較的に便利な方式を使用して、ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することができ、且つ、取得された連続深度情報の精度は高く、姿勢検出の難易度を低減する同時に、姿勢検出の精度を向上させる。同時に、連続深度情報を含む第２トレーニングデータを介してトレーニングして第１ニューラルネットワークモデルを取得して、第１ニューラルネットワークモデルが、より構造化された深度情報を学習することを容易にすることができ、それにより、第１ニューラルネットワークモデルにより堅牢な一般化機能を与え、第１ニューラルネットワークモデルが、より多くの適用シナリオに適することを容易にする。 Obtaining a first neural network model with continuous depth information extraction function through training by inputting the first training data and the second training data into the first initial neural network, and using the first neural network model to obtain the continuous depth information and position information of the target object in the target image; Information can be acquired, and the accuracy of the acquired continuous depth information is high, which reduces the difficulty of attitude detection and improves the accuracy of attitude detection. At the same time, training over a second training data containing continuous depth information to obtain a first neural network model to facilitate the first neural network model to learn more structured depth information. , thereby giving the first neural network model a more robust generalization capability and facilitating it to be suitable for more application scenarios.

一可能な実現方式において、ステップＳ１２は、
ターゲット画像を第４ニューラルネットワークモデルを介して、ターゲット画像内のターゲット対象の連続深度情報を取得することと、
ターゲット画像を第３ニューラルネットワークモデルを介して、ターゲット画像内のターゲット対象の位置情報を取得することと、を含み得る。 In one possible implementation, step S12 includes:
passing the target image through a fourth neural network model to obtain continuous depth information of the target object in the target image;
passing the target image through a third neural network model to obtain position information of the target object within the target image.

上記の開示された実施例によると、一可能な実現方式において、ターゲット画像を２つの異なるニューラルネットワークにそれぞれ入力することを介して、ターゲット対象の連続深度情報および位置情報をそれぞれ取得することもできる。この場合、第４ニューラルネットワークモデルは、ターゲット対象の連続深度情報抽出機能のみを有するニューラルネットワークモデルであり得、第３ニューラルネットワークモデルは、ターゲット対象の位置情報抽出機能のみを有するニューラルネットワークモデルであり得る。そのため、一可能な実現方式において、第１ニューラルネットワークモデルも、第１トレーニングデータおよび第４トレーニングデータを介して、第４初期ニューラルネットワークをトレーニングすることにより得られたニューラルネットワークモデルであり得、第１トレーニングデータの実現形態は、上記の各開示された実施例を参照でき、ここでは繰り返して説明しない。この場合、第４トレーニングデータの実現形態は、柔軟に変化することができ、例えば、第４トレーニングデータは、トレーニング対象の連続深度情報のみを含んでもよい。ターゲット対象の位置情報は、第１トレーニングデータと共同に第３ニューラルネットワークモデルのトレーニングデータとして使用されて、第３初期ニューラルネットワークトレーニングをトレーニングし、それにより、第３ニューラルネットワークモデルを取得することができる。第３初期ニューラルネットワークの実現形態は、柔軟に决定することができ、ここで、第１初期ニューラルネットワークを参照でき、ここでは繰り返して説明しない。 According to the above disclosed embodiments, in one possible implementation, continuous depth information and position information of the target object can also be obtained respectively through inputting the target image into two different neural networks respectively. . In this case, the fourth neural network model may be a neural network model that only has a target object continuous depth information extraction function, and the third neural network model is a target object position information extraction function only. obtain. So, in one possible implementation, the first neural network model can also be a neural network model obtained by training a fourth initial neural network via the first training data and the fourth training data, 1 The implementation of training data can refer to the above disclosed embodiments and will not be repeated here. In this case, the implementation of the fourth training data can be flexibly changed, for example the fourth training data may contain only continuous depth information of the training object. The location information of the target object can be used as training data for a third neural network model jointly with the first training data to train a third initial neural network training, thereby obtaining a third neural network model. can. The implementation of the third initial neural network can be flexibly determined, and here can refer to the first initial neural network, which will not be repeated here.

ターゲット画像を第４ニューラルネットワークモデルおよび第３ニューラルネットワークモデルにそれぞれ入力することにより、ターゲット対象の連続深度情報および位置情報をそれぞれ取得し、上記の過程を介して、第４ニューラルネットワークモデルおよび第３ニューラルネットワークモデルの機能を簡略化することにより、第４ニューラルネットワークモデルおよび第３ニューラルネットワークモデルの精度を向上し、それにより、取得された連続深度情報および位置情報の深度を効果的に向上し、姿勢検出の精度を向上させることができる。 By inputting the target image into the fourth neural network model and the third neural network model respectively, the continuous depth information and the position information of the target object are obtained respectively, and through the above process, the fourth neural network model and the third neural network model. improving the accuracy of the fourth neural network model and the third neural network model by simplifying the function of the neural network model, thereby effectively improving the depth of continuous depth information and location information obtained; It is possible to improve the accuracy of orientation detection.

ステップＳ１３において、連続深度情報に従って、ターゲット対象の姿勢を決定する実現方式も、本発明の実施例では限定されない。一可能な実現方式において、ステップＳ１３は、連続深度情報および位置情報を第２ニューラルネットワークモデルを介して、ターゲット対象の姿勢を取得することを含み得る。 The implementation method of determining the pose of the target object according to the continuous depth information in step S13 is also not limited in the embodiments of the present invention. In one possible implementation, step S13 may involve obtaining the pose of the target object through continuous depth information and position information through a second neural network model.

第２ニューラルネットワークモデルは、姿勢検出機能を有するニューラルネットワークモデルであり得、その実現形態は、本発明の実施例で限定されなく、実際の場合によって柔軟に設定することができる。その入力データが、連続深度情報および位置情報であり、出力データが、ターゲット対象の姿勢であるため、一可能な実現方式において、第２ニューラルネットワークモデルは、第２トレーニングデータおよび第３トレーニングデータを介して、第２初期ニューラルネットワークをトレーニングすることにより得られたニューラルネットワークモデルであり得る。ここで、第２トレーニングデータは、トレーニング対象の連続深度情報およびトレーニング対象の位置情報を含み得、第３トレーニングデータは、トレーニング対象の姿勢を含み得る。ここで、第２トレーニングデータの実現形態は、上記の開示された実施例で言及された第２トレーニングデータと同じであり、ここでは繰り返して説明しない。第３トレーニングデータは、トレーニング対象の姿勢予測結果であり得、その実現形態は限定されなく、一可能な実現方式において、トレーニング対象のキーポイントの三次元位置情報（例えば、三次元座標）をトレーニング対象の姿勢予測結果として使用することができる。 The second neural network model can be a neural network model with posture detection function, and its implementation is not limited by the embodiments of the present invention and can be flexibly set according to actual cases. Since its input data is continuous depth and position information, and its output data is the pose of the target subject, in one possible implementation, the second neural network model uses the second training data and the third training data. may be a neural network model obtained by training a second initial neural network via Here, the second training data may include the continuous depth information of the training object and the position information of the training object, and the third training data may include the posture of the training object. Here, the implementation of the second training data is the same as the second training data mentioned in the above disclosed embodiments, and will not be repeated here. The third training data can be the pose prediction result of the training target, and its implementation is not limited. It can be used as a target posture prediction result.

第２初期ニューラルネットワークの実現形態は、上記の第１初期ニューラルネットワークの実現形態を参照でき、即ち、ある一般的なニューラルネットワークによって構成されることができ、一可能な実現方式において、第２初期ニューラルネットワークは、順次に接続される畳み込み層とプーリング層によって構成されるニューラルネットワークであり得、ここで、畳み込み層およびプーリング層の数と、接続順序などは、本発明の実施例ですべて制限しない。 The implementation of the second initial neural network can refer to the implementation of the first initial neural network described above, that is, it can be configured by a general neural network, and in one possible implementation, the second initial The neural network can be a neural network composed of convolution layers and pooling layers connected in sequence, where the number of convolution layers and pooling layers, the connection order, etc. are not limited in the embodiments of the present invention. .

第２トレーニングデータおよび第３トレーニングデータを第２初期ニューラルネットワークに入力してトレーニングすることにより、姿勢検出機能を有する第２ニューラルネットワークモデルを取得し、当該第２ニューラルネットワークモデルを使用して、連続深度情報および位置情報に従って、ターゲット対象の姿勢を取得する。上記の過程を介して、連続深度情報を含む第２トレーニングデータを使用してトレーニングして第２ニューラルネットワークモデルを取得することができ、第２ニューラルネットワークモデルにより有効な情報を提供して、より正しい予測結果を取得し、より正確な姿勢予測を実現することができ、且つ、複数のターゲット画像または複数のターゲット対象を同時に処理して、複数の姿勢検出結果を取得することを容易にし、姿勢検出の精度と便利性を向上させる。 By inputting the second training data and the third training data into the second initial neural network and training it, obtaining a second neural network model having a posture detection function, and using the second neural network model, continuous A pose of the target object is obtained according to the depth information and the position information. Through the above process, the second training data containing continuous depth information can be used to train to obtain a second neural network model, providing more useful information to the second neural network model to provide more A correct prediction result can be obtained to achieve more accurate pose prediction, and multiple target images or multiple target objects can be processed simultaneously to facilitate obtaining multiple pose detection results. Improve detection accuracy and convenience.

本発明の実行例において、ターゲット画像およびターゲット画像内のターゲット対象の連続深度情報および位置情報を取得することを介して、連続深度情報および位置情報に従って、ターゲット対象の姿勢を決定する。上記の過程を介して、連続深度情報をターゲット対象姿勢予測過程の中間監督情報として使用して姿勢検出を実現することができ、それにより、ターゲット対象の姿勢をより正確に検出し、姿勢検出の精度と効果を向上させることができる。 In an implementation of the present invention, through obtaining the target image and the continuous depth information and position information of the target object within the target image, the pose of the target object is determined according to the continuous depth information and the position information. Through the above process, continuous depth information can be used as intermediate supervisory information in the target object pose prediction process to realize pose detection, thereby more accurately detecting the pose of the target object and improving the accuracy of pose detection. It can improve accuracy and effectiveness.

各上記の開示された実施例から分かるように、一可能な実現方式において、ステップＳ１１ないしステップ１３を介して、姿勢検出を実現する鍵は、ターゲット対象の連続深度情報を含むトレーニングデータを介して、第１ニューラルネットワークモデルおよび第２ニューラルネットワークモデルをトレーニングすることである。１つのトレーニングデータとする画像に対して、実在の人物が特定のセンサ機器を装着することにより、キーポイントの深度情報を収集することができるが、各連続サンプリングポイントの深度情報の取得を実現することは難しく、手動ラベリングの方式でラベリングする場合には、膨大な作業負荷が発生し、時間と労力を消費する。 As can be seen from each of the above disclosed embodiments, in one possible implementation, via steps S11 to 13, the key to realizing pose detection is via training data containing continuous depth information of the target object. , training a first neural network model and a second neural network model. For the image used as one training data, the depth information of key points can be collected by wearing a specific sensor device on a real person, and the acquisition of depth information of each continuous sampling point is realized. When labeling by manual labeling method, a huge workload occurs, consuming time and labor.

そのため、一可能な実現方式において、第２トレーニングデータは、第３トレーニングデータに従って生成することができ、第３トレーニングデータは、トレーニング対象の姿勢を含む。 Thus, in one possible implementation, the second training data can be generated according to third training data, the third training data including the pose of the training target.

ここで、第３トレーニングデータに従って、第２トレーニングデータを生成することは、以下のステップを含み得る。 Here, generating the second training data according to the third training data may include the following steps.

Ｓ２１において、第３トレーニングデータ内のトレーニング対象の姿勢に従って、第３トレーニングデータ内のトレーニング対象の離散深度情報、およびトレーニング対象の位置情報を取得する。 At S21, the discrete depth information of the training object in the third training data and the position information of the training object are obtained according to the posture of the training object in the third training data.

Ｓ２２において、少なくとも離散深度情報の一部を処理して、トレーニング対象の連続深度情報を取得する。 At S22, at least a portion of the discrete depth information is processed to obtain continuous depth information for the training target.

Ｓ２３において、トレーニング対象の連続深度情報およびトレーニング対象の位置情報に従って、第２トレーニングデータを生成する。 At S23, the second training data is generated according to the continuous depth information of the training object and the position information of the training object.

ここで、第３トレーニングデータの実現方式は、上記の開示された実施例と同じであり、ここでは繰り返して説明しない。上記の開示された実施例によれば、一可能な実現方式において、ターゲット対象のキーポイントの三次元位置情報をターゲット対象の姿勢予測結果として使用することができるため、対応的に、トレーニング対象のキーポイントの三次元位置情報をトレーニング対象の姿勢予測結果とすることができ、即ち、第３トレーニングデータには、直接にトレーニング対象の離散深度情報および位置情報が含まれることができ、即ち、ステップＳ２１を介して、第３トレーニングデータに従って、トレーニング対象の離散深度情報および位置情報を直接に取得することができる。 Here, the implementation manner of the third training data is the same as the above disclosed embodiments, and will not be repeated here. According to the above-disclosed embodiment, in one possible implementation, the 3D position information of the target's keypoints can be used as the target's pose prediction result, so correspondingly, the training target's The three-dimensional position information of the key points can be the pose prediction result of the training object, that is, the third training data can directly include the discrete depth information and position information of the training object, that is, step Via S21, the discrete depth information and location information of the training object can be directly obtained according to the third training data.

上記の開示された実施例の記載のように、離散深度情報は、あるキーポイントまたはサンプリングポイントの深度情報であり得、そのため、ステップＳ２１において、取得される離散深度情報の数は、第３トレーニングデータ内のキーポイントまたは夫君なれる深度値を有するサンプリングポイントの数に従って決定することができ、本発明の実施例では制限しない。 As described in the disclosed embodiments above, the discrete depth information can be the depth information of certain keypoints or sampling points, so in step S21 the number of discrete depth information acquired is the number of the third training It can be determined according to the number of key points in the data or the number of sampling points with different depth values, and is not limited in the embodiments of the present invention.

ステップＳ２２を介してトレーニング対象の連続深度情報を取得する過程では、取得される各離散深度情報に従って処理して得られることであってもよいし、取得される離散深度情報に対して、情報の一部を選択して処理して得られることであってもよいし、ここで、取得される離散深度情報のすべてを使用するか、離散深度情報の一部を使用するか、およびどの離散深度情報を使用するかをどうやって選択するかは、すべて実際の場合によって柔軟に選択することができ、本発明の実施例では制限しない。 In the process of acquiring the continuous depth information of the training target through step S22, it may be obtained by processing according to each piece of discrete depth information acquired, or the acquired discrete depth information may be processed according to the information It may be obtained by selectively processing a part, where whether to use all of the acquired discrete depth information or to use a part of the discrete depth information, and which discrete depth information How to choose to use the information can be flexibly chosen according to actual cases, and is not limited in the embodiments of the present invention.

トレーニング対象の連続深度情報を取得した後、ステップＳ２３を介して、トレーニング対象の連続深度情報および位置情報に従って、第２トレーニングデータを生成することができる。ステップＳ２３の実現方式は、本発明の実施例では限定されなく、一可能な実現方式において、連続深度情報は、連続深度特徴マップの形で体現することができ、位置情報は、二次元ヒートマップの形で体現することができるため、直接に連続深度情報および位置情報を共同に第２トレーニングデータとすることができる。 After obtaining the continuous depth information of the training object, the second training data can be generated according to the continuous depth information and the location information of the training object via step S23. The implementation method of step S23 is not limited in the embodiments of the present invention. In one possible implementation method, the continuous depth information can be embodied in the form of a continuous depth feature map, and the position information can be a two-dimensional heat map. so that the continuous depth information and the location information can be directly jointly used as the second training data.

トレーニング対象姿勢を含む第３トレーニングデータに従って、トレーニング対象の連続深度情報および位置情報を含む第２トレーニングデータを生成することにより、第２トレーニングデータを取得する難易度および作業負荷を大幅に減らして、姿勢検出実現過程の便利性を向上させることができる。 significantly reducing the difficulty and workload of obtaining the second training data by generating the second training data including continuous depth information and position information of the training target according to the third training data including the training target posture; The convenience of the pose detection implementation process can be improved.

本発明のいくつかの実施例において、ステップＳ２２において、少なくとも離散深度情報の一部をどうやって処理して、トレーニング対象の連続深度情報を取得するかの、処理方式は、実際の場合によって柔軟に决定することができる。一可能な実現方式において、ステップＳ２２は、以下のステップを含み得る。 In some embodiments of the present invention, how to process at least a portion of the discrete depth information in step S22 to obtain the continuous depth information of the training target is flexibly determined according to actual cases. can do. In one possible implementation, step S22 may include the following steps.

ステップＳ２２１において、少なくとも離散深度情報の一部に対応する少なくとも１つの接続を取得する。 At step S221, at least one connection corresponding to at least a portion of the discrete depth information is obtained.

ステップＳ２２２において、接続に対応する離散深度情報に従って、少なくとも１つの前記接続の連続深度サブ情報を決定する。 In step S222, determine continuous depth sub-information of at least one connection according to the discrete depth information corresponding to the connection.

ステップＳ２２３において、少なくとも１つの連続深度サブ情報を統計して、トレーニング対象の連続深度情報を取得する。 At step S223, the at least one continuous depth sub-information is statistically obtained to obtain continuous depth information for training.

ここで、離散深度情報に対応する接続の実現形態は、実際の場合によって柔軟に决定されることができる。一可能な実現方式において、当該接続は、実質的な接続であり得、例えば、離散深度情報に対応するキーポイントを接続して接続線を取得し、それにより、接続線を離散深度情報に対応する接続とする。一可能な実現方式において、当該接続は、実質的に接続されてない対応関係であり得、即ち、どの離散深度情報に対応するキーポイント間に接続関係を有することができるかを決定するが、これらの離散深度情報に対応するキーポイントを接続線を介して接続しない。後続の各開示された実施例は、すべて、取得される接続を実質的な接続線として使用することを例として説明し、実質的に接続しないことにより接続線を取得する方式は、後続の各開示された実施例を参照して柔軟に拡張することができ、詳細には説明しない。 Here, the implementation form of connection corresponding to discrete depth information can be flexibly determined according to actual cases. In one possible implementation, the connection can be a virtual connection, for example connecting keypoints corresponding to discrete depth information to obtain a connecting line, thereby making the connecting line correspond to the discrete depth information. connection. In one possible implementation, the connections can be substantially unconnected correspondences, i.e. determining which discrete depth information corresponding keypoints can have a connection relationship, Do not connect the keypoints corresponding to these discrete depth information via connecting lines. Each subsequent disclosed embodiment will all be described as an example of using the obtained connection as a substantial connection line, and the method of obtaining a connection line by not substantially connecting is described in each subsequent It can be flexibly expanded with reference to the disclosed embodiments and will not be described in detail.

上記の開示された実施例では、ターゲット対象の連続深度情報を取得する過程では、取得される各離散深度情報をすべて処理して取得することであってもよいし、離散深度情報の一部を処理して取得することであってもよいことを提案した。そのため、ステップＳ２２１において、少なくとも離散深度情報の一部に対応する少なくとも１つの接続を取得する過程では、取得されるすべての離散深度情報のうち、任意の２つの離散深度情報に対する接続であってもよいし、離散深度情報の一部を選択した接続であってもよいし、トレーニング対象における離散深度情報の位置に従って、選択的な接続であってもよいし、実際のニーズによって柔軟に决定するとよい。 In the above disclosed embodiments, the process of acquiring continuous depth information of the target object may include processing and acquiring all of the acquired discrete depth information, or processing a portion of the discrete depth information. I proposed that it may be to process and acquire. Therefore, in the process of acquiring at least one connection corresponding to at least a portion of the discrete depth information in step S221, even if the connection to any two discrete depth information among all the acquired discrete depth information It may be a connection that selects a part of the discrete depth information, or it may be a selective connection according to the position of the discrete depth information in the training object, or it may be flexibly decided according to the actual needs. .

一可能な実現方式において、離散深度情報が、取得されるトレーニング対象のキーポイントの深度情報である場合、トレーニング対象におけるキーポイントの位置に従って、キーポイントの離散深度情報を選択的に接続することができ、例を挙げると、取得される離散深度情報が、手首、肘、肩および頭を含む場合、人体の胴体構造に基づいて、手首と肘の離散深度情報、肘と肩の離散深度情報、および肩と頭の離散深度情報をそれぞれ接続して、３つの接続線を取得することができ、他の接続方式は、例えば、手首と頭を接続し、肘と頭を接続するなどは、人体胴体の分布方式と一致しないため、これらの離散深度情報を接続する必要がない。 In one possible implementation, if the discrete depth information is the depth information of the keypoints of the training object to be acquired, it is possible to selectively connect the discrete depth information of the keypoints according to the position of the keypoints in the training object. For example, if the discrete depth information obtained includes wrists, elbows, shoulders and head, based on the torso structure of the human body, discrete depth information for wrists and elbows, discrete depth information for elbows and shoulders, , and the discrete depth information of the shoulder and the head can be connected respectively to obtain three connection lines, and other connection methods, such as connecting the wrist and the head, connecting the elbow and the head, etc., are similar to the human body There is no need to connect these discrete depth information as they are inconsistent with the body distribution scheme.

少なくとも２つの離散深度情報を接続して、少なくとも１つの接続を取得した後、ステップＳ２２２を介して、接続に対応する離散深度情報に従って、少なくとも１つの接続の連続深度サブ情報を決定することができる。 After connecting at least two pieces of discrete depth information to obtain at least one connection, continuous depth sub-information of at least one connection can be determined according to the discrete depth information corresponding to the connection via step S222. .

一可能な実現方式において、接続が、接続線である場合、接続線終点に対応する離散深度情報に従って、少なくとも１つの接続線に対応する連続深度サブ情報を取得することができる。ここで、接続線終点に対応する離散深度情報は、即ち、ステップＳ２２１の接続線を取得する過程で、接続される離散深度情報である。例を挙げると、上記の開示された実施例では、手首と肘の離散深度情報、肘と肩の離散深度情報および肩と頭の離散深度情報を接続して３つの接続線を取得することができると言及しており、一例において、手首と肘の接続線に対応する連続深度サブ情報は、手首の離散深度情報と肘の離散深度情報を介して取得することができ、肘と肩の接続線に対応する連続深度サブ情報は、肘の離散深度情報および肩の離散深度情報によって取得することができ、同様に、肩と頭の接続線に対応する連続深度サブ情報は、肩の離散深度情報と頭の離散深度情報によって取得することができる。ここで、接続線終点の離散深度情報に従って接続線に対応する連続深度サブ情報を取得する過程は、後続の各開示された実施例を参照でき、ここでは詳細に説明しない。 In one possible implementation, if the connection is a connection line, continuous depth sub-information corresponding to at least one connection line can be obtained according to the discrete depth information corresponding to the connection line end point. Here, the discrete depth information corresponding to the connecting line end point is the discrete depth information connected in the process of acquiring the connecting line in step S221. By way of example, in the above-disclosed embodiment, the discrete depth information of the wrist and elbow, the discrete depth information of the elbow and shoulder, and the discrete depth information of the shoulder and head can be connected to obtain three connecting lines. In one example, the continuous depth sub-information corresponding to the wrist-elbow connection line can be obtained through the wrist-discrete depth information and the elbow-discrete depth information, and the elbow-shoulder connection The continuous depth sub-information corresponding to the line can be obtained by the discrete depth information of the elbow and the discrete depth information of the shoulder. information and head discrete depth information. Here, the process of obtaining the continuous depth sub-information corresponding to the connecting line according to the discrete depth information of the connecting line end point can refer to the following disclosed embodiments, and will not be described in detail herein.

図２は、本発明の一実施例によるトレーニング対象連続深度情報を取得する概略図を示し、図に示されたように、一可能な実現方式において、第３トレーニングデータから取得されたトレーニング対象の離散深度情報は、トレーニング対象の手首離散深度情報Ｐｗおよびトレーニング対象の手肘離散深度情報Ｐｅを含み得、この場合、手首離散深度情報ＰｗとＰｅを接続して、トレーニング対象の腕接続線を取得し、手首離散深度情報Ｐｗおよび手肘離散深度情報Ｐｅに従って、腕接続線に対応する連続深度情報を取得することができる。 FIG. 2 shows a schematic diagram of obtaining training object continuous depth information according to one embodiment of the present invention, as shown in the figure, in one possible implementation scheme, the training object obtained from the third training data. The discrete depth information may include the training target wrist discrete depth information Pw and the training target hand elbow discrete depth information Pe, in which case the wrist discrete depth information Pw and Pe are connected to obtain the training target arm connection line. Then, continuous depth information corresponding to the arm connection line can be obtained according to the wrist discrete depth information Pw and the wrist discrete depth information Pe.

少なくとも１つの接続に対応する連続深度サブ情報を取得した後、ステップＳ２２３を介して、これらの連続深度サブ情報を統計して、トレーニング対象の連続深度情報を取得することができる。統計の方式は、実際の場合によって柔軟に决定することができ、本発明の実施例では限定しない。一可能な実現方式において、取得された各接続に対応する連続深度サブ情報を組み合わせて、トレーニング対象に対応する連続深度情報として共同に使用することができ、一可能な実現方式において、いくつかの明らかに誤差がありまたはトレーニング対象のカバレッジを超える連続深度サブ情報を削除し、残りの連続深度サブ情報を保留して、トレーニング対象の連続深度情報として共同に使用されることもでき、ここで、スクリーニング方式は、実際の場合によって柔軟に選択するとよい。 After obtaining the continuous depth sub-information corresponding to at least one connection, via step S223, these continuous depth sub-information can be statistically obtained to obtain the continuous depth information for the training object. The statistical method can be flexibly determined according to actual cases, and is not limited in the embodiments of the present invention. In one possible implementation, the continuous depth sub-information corresponding to each connection obtained can be combined and jointly used as the continuous depth information corresponding to the training object, and in one possible implementation, several It is also possible to remove the continuous depth sub-information that is obviously in error or that exceeds the coverage of the training target, and retain the remaining continuous depth sub-information to be jointly used as the continuous depth information of the training target, where: The screening method should be selected flexibly according to actual cases.

与少なくとも離散深度情報の一部に対応する少なくとも１つの接続を取得することを介して、接続に対応する離散深度情報に従って、少なくとも１つの接続の連続深度サブ情報を取得し、それにより、少なくとも１つの連続深度サブ情報を統計して、トレーニング対象の連続深度情報を取得することができ、上記の過程を介して、既存の離散深度情報に従って、トレーニング対象本体の構造に基づいて、離散深度情報と連続深度情報との関係を確立して、離散値に従って連続値を導出することを容易にし、トレーニング対象の連続深度情報をより便利に決定し、さらに、第２トレーニングデータを取得する難易度および作業負荷を減らし、姿勢検出実現過程の便利性を向上させることができる。 Obtaining continuous depth sub-information for at least one connection according to the discrete depth information corresponding to the connection through obtaining at least one connection corresponding to at least a portion of the given discrete depth information, thereby obtaining at least one The continuous depth information of the training target can be obtained by statistically analyzing the two continuous depth sub-information. Establishing a relationship with the continuous depth information to facilitate deriving the continuous value according to the discrete values, more conveniently determine the continuous depth information of the training object, and further improve the difficulty and task of obtaining the second training data. It can reduce the load and improve the convenience of the attitude detection implementation process.

上記の開示された実施例は、ステップＳ２２２の実現方式は限定されないことを提案し、一可能な実現方式において、ステップＳ２２２は、以下のステップを含み得る。 The above disclosed embodiments suggest that the implementation of step S222 is not limited, and in one possible implementation, step S222 may include the following steps.

ステップＳ２２２１において、接続に対応する離散深度情報に従って、線形補間を介して、接続上の少なくとも１つの点の第１連続深度サブ情報を取得する。 In step S2221, obtain the first continuous depth sub-information of at least one point on the connection via linear interpolation according to the discrete depth information corresponding to the connection.

ステップＳ２２２２において、少なくとも１つの接続に対応する接続範囲を決定する。 In step S2222, a connection range corresponding to at least one connection is determined.

ステップＳ２２２３において、第１連続深度サブ情報に従って、接続に対応する接続範囲内の少なくとも１つの点の第２連続深度サブ情報を決定する。 In step S2223, a second continuous depth sub-information of at least one point within the connection range corresponding to the connection is determined according to the first continuous depth sub-information.

ステップＳ２２２４において、第１連続深度サブ情報および／または第２連続深度サブ情報に従って、接続に対応する連続深度サブ情報を取得する。 In step S2224, obtain continuous depth sub-information corresponding to the connection according to the first continuous depth sub-information and/or the second continuous depth sub-information.

ここで、第１連続深度サブ情報は、接続または接続線上の点に位置する深度情報であり得る。上記の開示された実施例の記載のように、一可能な実現方式において、離散深度情報を接続することにより、少なくとも１つの接続線を取得することができ、且つ、当該接続線終点の離散深度情報は、接続された離散深度情報であるためため、取得された各接続線に対して、当該接続線には、すべて、２つの点があり、接続線におけるこの２つの点の位置は知られている同時に、対応する深度情報も知られており、そのため、この接続線に対して、接続線上の残りの点に対応する深度情報は、すべて、この２つの知られている点の深度情報を介して導出されることができ、そのため、一可能な実現方式において、ステップＳ２２２１を介して、接続線終点に対応する離散深度情報を使用して、線形補間を介して、接続線上の残りの点の第１連続深度サブ情報を取得することができる。ここで、接続線上の各点の第１連続深度サブ情報を取得するか、接続線上の点の一部の第１連続深度サブ情報を取得するかは、ニーズによって柔軟に決定することができる。一可能な実現方式において、接続線終点に対応する離散深度情報を介して、線形補間により、接続線上の第１連続深度サブ情報と点の位置との関係関数を取得することができ、このようにして、第１連続深度サブ情報を取得する必要があるのは接続線上のすべての点であるか点の一部であるかに関係なく、第１連続深度サブ情報の点の位置を取得して、関係関数に代入することで取得することができる。 Here, the first continuous depth sub-information may be depth information located at a point on a connection or a connection line. As described in the disclosed embodiments above, in one possible implementation, at least one connecting line can be obtained by connecting the discrete depth information, and the discrete depth Since the information is connected discrete depth information, for each connecting line obtained, there are two points on the connecting line, and the positions of these two points on the connecting line are known. At the same time, the corresponding depth information is also known, so for this connecting line, the depth information corresponding to the remaining points on the connecting line are all the depth information of the two known points. Therefore, in one possible implementation, via step S2221, using the discrete depth information corresponding to the connecting line end points, the remaining points on the connecting line are calculated via linear interpolation. can be obtained from the first continuous depth sub-information of Here, whether to obtain the first continuous depth sub-information for each point on the connecting line or to obtain the first continuous depth sub-information for a part of the points on the connecting line can be flexibly determined according to needs. In one possible implementation, through the discrete depth information corresponding to the end point of the connecting line, the relational function between the first continuous depth sub-information on the connecting line and the position of the point can be obtained by linear interpolation, such as to obtain the position of the point of the first continuous depth sub-information regardless of whether it is all points on the connecting line or some of the points that need to obtain the first continuous depth sub-information. can be obtained by substituting it into the relational function.

第２連続深度サブ情報は、接続に対応する接続範囲内の点の深度情報であり得る。上記の各開示された実施例の記載のように、取得される離散深度情報は、トレーニング対象のキーポイントの一部の離散深度情報であり得、依然として図２を例として、図に示されたように、取得される離散深度情報が、手首キーポイントと手肘キーポイントの深度情報ＰｗとＰｅである場合、ＰｗとＰｅを接続して得られる接続線は、トレーニング対象の腕範囲を完全にカバできなく、当該接続線上の第１連続深度サブ情報のみを取得する場合に、腕の連続深度情報を完全に反映できない可能性があるため、一可能な実現方式において、接続線で適する拡張を実行して、接続線に対応する接続範囲を決定して、接続線上の点の第１連続深度サブ情報を取得する基で、さらに、接続範囲内の残りの点の第２連続深度サブ情報を取得することができる。 The second continuous depth sub-information may be depth information of points within the connection range corresponding to the connection. As described in each disclosed embodiment above, the discrete depth information obtained may be the discrete depth information of a portion of the keypoints to be trained, still taking FIG. , if the discrete depth information to be acquired is the depth information Pw and Pe of the wrist keypoint and the hand-elbow keypoint, the connecting line obtained by connecting Pw and Pe completely covers the arm range of the training target. Since it may not be possible to fully reflect the continuous depth information of the arm when only the first continuous depth sub-information on the connection line is acquired, one possible implementation method is to extend the connection line in a suitable manner. Execute to determine a connecting range corresponding to the connecting line and obtain first continuous depth sub-information for points on the connecting line, and further obtain second continuous depth sub-information for remaining points within the connecting range. can be obtained.

接続範囲の決定方式は、本発明の実施例では限定されなく、一可能な実現方式において、接続線を中心として、トレーニング対象の境界まで延長して、接続線に対応する接続範囲を取得することができる。一可能な実現方式において、作業負荷をけらし、接続範囲の決定効率を向上させるために、１つのプリセットの半径を設定して、接続線を中心として、プリセットの半径まで延長して、接続線の接続範囲を取得することができる。このプリセットの半径の設定方式は、実際の場合によって柔軟に選択することができ、本発明の実施例に限定されなく、一例において、１つの辺の長さが、Ｒである拡張半径を設定し、その後、接続線を中心線として、１つの幅が２Ｒであり、長さが接続線と同じである長方形を構成することができ、同時に、接続線の２つの終点で、長方形と離れた一側に、２つの半径がＲの半円をそれぞれ構造し、Ｒの値は、実際の場合によって柔軟に設定することができ、この２つの半径がＲである半円は、幅が２Ｒである長方形と共同に接続範囲を構成する。図３は、本発明の一実施例による接続範囲を決定する概略図を示し、図に示されたように、一例において、ＰｗとＰｅの接続線を中心線として、上記の方式に従って、当該接続に対応する接続範囲を構造することができる。 The method of determining the connection range is not limited in the embodiments of the present invention. One possible implementation method is to take the connection line as the center and extend it to the boundary of the training object to obtain the connection range corresponding to the connection line. can be done. In one possible implementation, in order to reduce the workload and improve the efficiency of determining the connection range, a preset radius is set, the connection line is the center, and the connection line is extended to the preset radius. connection range can be obtained. This preset radius setting method can be flexibly selected according to actual cases, and is not limited to the embodiments of the present invention. , then, with the connecting line as the center line, one can construct a rectangle whose width is 2R and whose length is the same as that of the connecting line; On the side, two semicircles with radius R are constructed respectively, the value of R can be flexibly set according to the actual case, the semicircle with two radii R has a width of 2R Construct a connection range jointly with a rectangle. FIG. 3 shows a schematic diagram of determining the connection range according to an embodiment of the present invention, as shown in the figure, in one example, taking the connection line of Pw and Pe as the center line, according to the above scheme, the connection You can construct a connection range corresponding to .

接続範囲を決定した後、第１連続深度サブ情報に従って、接続範囲に含まれる点の第２連続深度サブ情報を決定することができ、ここで、どうやって決定するかは、以下の各開示された実施例を参照でき、ここでは詳細に説明しない。 After determining the connected range, the second continuous depth sub-information of the points included in the connected range can be determined according to the first continuous depth sub-information, where how to determine is described in each of the following disclosures. Reference can be made to the embodiments, which are not described in detail herein.

第１連続深度サブ情報および第２連続深度サブ情報を取得した後、両方に従って連続深度サブ情報を取得することができ、ここで、第１連続深度サブ情報を連続深度サブ情報とするか、または第２連続深度サブ情報を連続深度サブ情報とするか、または第１連続深度サブ情報および第２連続深度サブ情報を共同に連続深度サブ情報とするか、情報の一部を選択して連続深度サブ情報とするかは、実際の場合によって柔軟に選択することができ、本発明の実施例では制限しない。 After obtaining the first continuous depth sub-information and the second continuous depth sub-information, the continuous depth sub-information can be obtained according to both, wherein the first continuous depth sub-information is the continuous depth sub-information, or Either the second continuous depth sub-information is the continuous depth sub-information, or the first continuous depth sub-information and the second continuous depth sub-information are jointly the continuous depth sub-information; Whether to use sub-information can be flexibly selected according to actual cases, and is not limited in the embodiments of the present invention.

上記の各開示された実施例の記載のように、一可能な実現方式において、離散深度情報の接続は、実質的な接続ではなく対応関係であってもよいし、この場合、接続の離散深度情報間には接続線がないけど、接続の離散深度情報に基づく位置関係は、上記の開示された実施例の実現方式を参照でき、線形補間を介して、両方の接続のカバレッジにおける異なる点の深度情報を決定し、ここで、決定過程は、上記の各開示された実施例を参照でき、ここでは繰り返して説明しない。 As described in each of the disclosed embodiments above, in one possible implementation, the connections of the discrete depth information may be correspondences rather than substantial connections, in which case the discrete depths of the connections Although there is no connecting line between the information, the positional relationship based on the discrete depth information of the connection can refer to the implementation method of the above-disclosed embodiment, through linear interpolation, the different points in the coverage of both connections. Determining depth information, where the determination process can refer to the above disclosed embodiments and will not be repeated here.

接続に対応する離散深度情報に従って線形補間することにより、接続上の少なくとも１つの点の第１連続深度サブ情報を取得し、同時に、接続に対応する接続範囲を決定して、第１連続深度サブ情報に従って、接続範囲内の少なくとも１つの点の第２連続深度サブ情報を取得し、さらに、第１連続深度サブ情報および／または第２連続深度サブ情報に従って連続深度情報を取得して、少なくとも１つの接続の連続深度サブ情報を取得する。上記の過程を介して、一方では、接続上の連続深度サブ情報をより便利的に取得することができ、もう一方では、接続のカバレッジを拡張して、接続に対応する接続範囲内の連続深度サブ情報を取得することができ、それにより、ニューラルネットワークモデルのトレーニングのための、より包括的で正確なトレーニング対象の連続深度情報を取得し、さらに、姿勢検出の精度を向上させることができる。 obtaining first continuous depth sub-information of at least one point on the connection by linearly interpolating according to the discrete depth information corresponding to the connection; obtaining second continuous depth sub-information of at least one point within the connected range according to the information, further obtaining continuous depth information according to the first continuous depth sub-information and/or the second continuous depth sub-information; Get continuous depth sub-info for one connection. Through the above process, on the one hand, we can obtain the continuous depth sub-information on the connection more conveniently, and on the other hand, we can extend the coverage of the connection to obtain the continuous depth sub-information within the connection range corresponding to the connection. Sub-information can be obtained, thereby obtaining more comprehensive and accurate continuous depth information of the training object for training the neural network model, and further improving the accuracy of pose detection.

一可能な実現方式において、ステップＳ２２２３は、以下のステップを含み得る。 In one possible implementation, step S2223 may include the following steps.

ステップＳ２２２３１において、接続範囲が、接続に対応する離散深度情報のプリセットの範囲内にある場合、接続に対応する離散深度情報を、接続範囲内の少なくとも１つの点の第２連続深度サブ情報として使用する。 In step S22231, if the connection range is within a preset range of discrete depth information corresponding to the connection, use the discrete depth information corresponding to the connection as second continuous depth sub-information for at least one point within the connection range. do.

ステップＳ２２２３２において、接続範囲が、接続に対応する離散深度情報のプリセットの範囲以外にある場合、接続内の接続範囲内の点と最も近い第１連続深度サブ情報に従って、接続範囲内の少なくとも１つの点の第２連続深度サブ情報を取得する。 In step S22232, if the connection range is outside the preset range of the discrete depth information corresponding to the connection, at least one Obtain the second continuous depth sub-information of the point.

ここで、接続に対応する離散深度情報のプリセットの範囲は、接続範囲に含まれる範囲の一部であり得、ここで、プリセットの範囲のサイズと定義方式は、接続範囲の決定場合によって柔軟に决定することができる。上記の開示された実施例から分かるように、一可能な実現方式において、プリセットの範囲は、接続線終点のカバ領域と接続線上の他の点のカバ領域を区別するために使用されることができ、即ち、プリセットの範囲内の点に対応する第２連続深度サブ情報は、終点の離散深度情報によって決定されることができ、プリセットの範囲以外の点に対応する第２連続深度サブ情報は、接続線上の第１連続深度サブ情報によって決定されることができる。一可能な実現方式において、接続範囲の構造方式は、上記の開示された実施例に記載の通りであり得、即ち、接続線と中心線として、１つの幅が２Ｒであり、長さと接続線が同じである長方形を構造し、接続線の２つの終点で、長方形と離れる一側に向かって、２つの半径がＲである半円を構造し、この場合では、この２つの半径がＲである半円を、接続終点のプリセットの範囲として見なし、残りの幅が２Ｒである長方形は、接続終点のプリセットの範囲以外として見なすことができる。 Here, the preset range of discrete depth information corresponding to a connection can be a part of the range included in the connection range, where the size and definition method of the preset range are flexible depending on the connection range determination. can decide. As can be seen from the above-disclosed embodiments, in one possible implementation, a preset range can be used to distinguish between the cover area of the connecting line end point and the covering area of other points on the connecting line. That is, the second continuous depth sub-information corresponding to points within the preset range can be determined by the discrete depth information of the end point, and the second continuous depth sub-information corresponding to points outside the preset range can be determined by , can be determined by the first successive depth sub-information on the connecting line. In one possible realization method, the structure method of the connecting area can be as described in the above disclosed embodiment, i.e., as the connecting line and center line, one width is 2R, and the length and connecting line are the same, and at the two endpoints of the connecting lines, toward one side away from the rectangle, construct two semicircles with radii R, in this case the two radii are R A semicircle can be considered as a preset range of connection endpoints, and a remaining rectangle with a width of 2R can be considered as a non-preset range of connection endpoints.

ステップＳ２２２３１から分かるように、一可能な実現方式において、プリセットの範囲内の点に対して、プリセットの範囲に対応する接続線終点の離散深度情報を、第２連続深度サブ情報として使用することができ、図３を例として説明すると、図３から分かるように、Ｐｗを円心として構造される半径がＲである半円について、この半円内の点の第２連続深度サブ情報が、すべてＰｗの離散深度情報と同じであり、同様に、Ｐｅを円心として構造される半径がＲである半円について、この半円内の点の第２連続深度サブ情報は、すべてＰｅの離散深度情報と同じである。 As can be seen from step S22231, in one possible implementation, for points within the preset range, the discrete depth information of the connecting line end points corresponding to the preset range can be used as the second continuous depth sub-information. Taking FIG. 3 as an example, as can be seen from FIG. 3, for a semicircle with radius R constructed with Pw as the center, the second continuous depth sub-information of the points within this semicircle are all For a semicircle with radius R that is the same as the discrete depth information of Pw and similarly constructed with Pe as the center of the circle, the second continuous depth sub-information of the points within this semicircle are all the discrete depth information of Pe Same as information.

ステップＳ２２２３２から分かるように、一可能な実現方式において、プリセットの範囲以外の点（ここで、当該点をＰとする）に対して、接続線上の点Ｐと最も近い点の第１連続深度サブ情報を、点Ｐの第２連続深度サブ情報として使用することができる。ここで、接続線上の点Ｐと最も近い点をどうやって決定するか、その決定方式は、限定されなく、同様に図３を例として説明すると、図に示されたように、一可能な実現方式において、点Ｐから接続線まで垂直線を引いて垂直足Ｐ’を取得することができ、Ｐ’の第１連続深度サブ情報は、点Ｐの第２連続深度サブ情報として使用されることができる。 As can be seen from step S22232, in one possible implementation, for a point outside the preset range (where the point is P), the first continuous depth sub information can be used as the second continuous depth sub-information for the point P. Here, how to determine the point closest to the point P on the connection line, the determination method is not limited. , a vertical line can be drawn from the point P to the connecting line to obtain the vertical foot P′, and the first continuous depth sub-information of P′ can be used as the second continuous depth sub-information of the point P. can.

上記の過程を介して、接続に対応する離散深度情報がカバするプリセットの範囲の第２連続深度サブ情報、および接続カバの範囲の第２連続深度サブ情報をそれぞれ取得して、すべての接続範囲内の点に対応する第２連続深度サブ情報を取得することができ、このような第２連続深度サブ情報の決定方式は比較的に簡単で、計算量が少なく、さらに、第２トレーニングデータを取得する難易度と作業負荷を減らし、姿勢検出実現過程の便利性および精度を向上させる。 Through the above process, the second continuous depth sub-information of the preset range covered by the discrete depth information corresponding to the connection and the second continuous depth sub-information of the range of the connection cover are obtained respectively, and all connection ranges are obtained. can obtain the second continuous depth sub-information corresponding to the points in the second continuous depth sub-information, such a method for determining the second continuous depth sub-information is relatively simple, the amount of computation is small, and the second training data is It reduces the acquisition difficulty and workload, and improves the convenience and accuracy of the pose detection implementation process.

一可能な実現方式において、さらに、上記の各開示された実施例によって決定されるターゲット対象の姿勢を、動き識別、人間とコンピュータとの対話および自律運転などの異なるシナリオに適用すうことができる。一可能な実現方式において、上記の各開示された実施例によって決定されるターゲット対象の姿勢を、ビデオデータの処理過程に適用することができる。そのため、本発明の実施例は、さらに、ビデオ処理方法を提案する。 In one possible implementation, the target subject pose determined by each of the above disclosed embodiments can also be applied to different scenarios such as motion identification, human-computer interaction and autonomous driving. In one possible implementation, the pose of the target object determined by each disclosed embodiment above can be applied to the processing of the video data. Therefore, an embodiment of the present invention further proposes a video processing method.

図４は、本発明の一実施例によるビデオ処理方法のフローチャートを示し、当該方法は、ビデオ処理装置に適用されることができ、ビデオ処理装置の実現形態は、上記の開示された実施例における姿勢検出装置と同じであってもよいし、異なってもよいし、その実現形態は、上記の各開示された実施例を参照でき、ここでは繰り返して説明しない。
いくつかの可能な実施形態において、当該ビデオ処理方法は、プロセッサがメモリに記憶されるコンピュータ可読命令を呼び出す方式を介して実現されることもできる。 FIG. 4 shows a flow chart of a video processing method according to an embodiment of the present invention, the method can be applied to a video processing device, and the implementation of the video processing device is the same as in the above disclosed embodiments. It may be the same as or different from the posture detection device, and its implementation can refer to the above disclosed embodiments and will not be repeated here.
In some possible embodiments, the video processing method may be implemented via a scheme in which a processor invokes computer readable instructions stored in memory.

図４の記載のように、前記ビデオ処理方法は、以下のステップを含み得る。 As described in FIG. 4, the video processing method may include the following steps.

ステップＳ２１において、現在のシナリオに対して画像収集を実行して、収集ビデオを取得する。 In step S21, image acquisition is performed for the current scenario to obtain an acquired video.

ステップＳ２２において、収集ビデオから、少なくとも２フレームの、ターゲット対象を含むターゲット画像を選択する。 At step S22, at least two frames of target images containing the target object are selected from the acquired video.

ステップＳ２３において、上記の各開示された実施例における姿勢検出方法を介して、少なくとも２フレームのターゲット画像内のターゲット対象に対して姿勢検出を実行して、収集ビデオ内のターゲット対象の少なくとも２つの姿勢を決定する。 In step S23, pose detection is performed on the target object in the target images of the at least two frames via the pose detection method in each of the disclosed embodiments above to obtain at least two images of the target object in the acquired video. determine posture.

ここで、ターゲット対象の実現方式は、上記の姿勢検出における各開示された実施例と同じであり、ここでは繰り返して説明しない。現在のシナリオは、ターゲット対象を含む任意のシナリオであり得、以下の各開示された実施例に限定されない。一可能な実現方式において、現在のシナリオは、歩行者検出シナリオ、自律運転シナリオ、教室内の対象捕獲シナリオおよび会社の環境検出シナリオなどであり得る。 Here, the implementation scheme of the target object is the same as each disclosed embodiment in pose detection above, and will not be repeated here. The current scenario can be any scenario, including the target subject, and is not limited to each disclosed example below. In one possible implementation, the current scenario can be a pedestrian detection scenario, an autonomous driving scenario, an object capture scenario in a classroom, and a company environment detection scenario.

現在のシナリオに対して画像収集を実行する方式は、現在のシナリオの実際に場合によって柔軟に决定することができ、例えば、現在のシナリオが、歩行者検出シナリオである場合、歩道に配備される撮影機器を介して画像収集を実行することができ、現在のシナリオが、自律運転シナリオである場合、車輛に配備された撮影機器を介して画像収集を実行することができる。 The method of performing image acquisition for the current scenario can be flexibly determined depending on the actual scenario of the current scenario, for example, if the current scenario is a pedestrian detection scenario, the Image acquisition may be performed via imaging equipment, and if the current scenario is an autonomous driving scenario, image acquisition may be performed via imaging equipment deployed in the vehicle.

ビデオを収集する実現形態は、画像収集の実際の場合によって柔軟に决定することができ、本発明の実施例では制限しない。 The implementation of video acquisition can be flexibly determined according to the actual case of image acquisition, and is not limited in the embodiments of the present invention.

ステップＳ２１を介してビデオを収集した後、ステップＳ２２を介して、収集ビデオから少なくとも２フレームのターゲット対象を含むターゲット画像を選択することができ、ここで、ターゲット画像の実現形態も、上記の姿勢検出における各開示された実施例と同じであり、ここでは繰り返して説明しない。収集ビデオから少なくとも２フレームのターゲット対象を含むターゲット画像を選択する方式は、実際の場合によって柔軟に决定することができ、以下の各開示された実施例に限定されない。一可能な実現方式において、収集ビデオのうちの少なくともフレームの一部に対してターゲット対象検出を実行し、その後、ターゲット対象を検出したフレームから、ランダムに選択するか、または画像品質などの基準にしたがって、そのうちの少なくともフレームの一部を選択して、ターゲット画像とすることができる。 After acquiring the video via step S21, a target image comprising at least two frames of the target object can be selected from the acquired video via step S22, where the realization of the target image also includes the above postures. It is the same as each disclosed embodiment in detection and will not be repeated here. The method of selecting a target image containing at least two frames of the target object from the acquired video can be flexibly determined according to actual cases, and is not limited to the following disclosed embodiments. In one possible implementation, target object detection is performed on at least a portion of the frames of the acquired video, and then target objects are selected randomly or based on criteria such as image quality from the detected frames. Therefore, it is possible to select at least a part of the frame from them as the target image.

ステップＳ２２を介して、少なくとも２フレームのターゲット対象を含むターゲット画像を取得した後、上記の任意の開示された実施例による姿勢検出方法を介して、選択されるターゲット画像に対して姿勢検出を実行して、収集ビデオ内のターゲット対象の少なくとも２つの姿勢を決定することができ、ここで、姿勢をどうやって決定するかは、ターゲット画像の実際の場合によって柔軟に决定することができ、本発明の実施例では限定しない。 After obtaining a target image containing at least two frames of the target object, via step S22, perform pose detection on the selected target image via the pose detection method according to any of the disclosed embodiments above. to determine at least two poses of the target object in the acquired video, where how the poses are determined can be flexibly determined according to the actual case of the target image, and the present invention Examples are not limiting.

現在のシナリオに対して画像収集を実行することにより、収集ビデオを取得し、収集ビデオから少なくとも２フレームのターゲット対象を含むターゲット画像を選択し、次に、上記の任意の開示された実施例による姿勢検出方法を介して、収集ビデオ内のターゲット対象の少なくとも２つの姿勢を決定することを実現する。上記の過程を介して、連続深度情報に基づいて実現される姿勢検出過程をビデオ処理過程に適用することができ、それにより、動的ビデオから、ターゲット対象の複数の姿勢をより正確に決定し、ビデオ中姿勢検出の精度を効果的に向上させることを容易にする。 Acquire an acquired video by performing image acquisition for the current scenario, select a target image containing at least two frames of the target object from the acquired video, and then perform image acquisition according to any of the above disclosed embodiments. Through the pose detection method, determining at least two poses of the target object in the acquired video is realized. Through the above process, the pose detection process realized based on continuous depth information can be applied to the video processing process, thereby more accurately determining multiple poses of the target object from the dynamic video. , facilitates effectively improving the accuracy of pose detection during video.

一可能な実現方式において、収集ビデオ内のターゲット対象の少なくとも２つの姿勢を取得した後、さらに、取得される複数の姿勢を処理することができ、そのため、一可能な実現方式において、本発明の実施例によるビデオ処理方法は、以下のステップをさらに含み得る。 In one possible implementation, after acquiring at least two poses of the target object in the acquired video, the acquired multiple poses can be further processed, so that in one possible implementation, the present invention: A video processing method according to an embodiment may further include the following steps.

ステップＳ２４において、ターゲット対象の少なくとも２つの姿勢、および収集ビデオ内のフレームの時間に従って、ターゲット対象の連続姿勢を取得する。 In step S24, successive poses of the target object are obtained according to the at least two poses of the target object and the time of the frames in the acquired video.

ステップＳ２５において、ターゲット対象の連続姿勢に従って、ターゲット対象を追跡する。 In step S25, the target object is tracked according to the continuous pose of the target object.

上記の開示された実施例の記載のように、収集ビデオ内のターゲット対象の姿勢は、収集ビデオ内の、ターゲット対象を含むフレームに従って決定することができ、収集ビデオ内のフレームは、時間の優先順位に従って配列することであるため、対応的に、これらのフレームが収集ビデオにおける時間に従って、取得される複数の姿勢の発生の優先順位を順次に決定して、ターゲット対象の連続姿勢を決定することができる。 As described in the disclosed embodiments above, the pose of the target object within the acquired video can be determined according to the frame within the acquired video containing the target object, wherein the frame within the acquired video is time-preferred. Correspondingly, these frames sequentially prioritize the occurrences of the acquired poses according to their time in the acquired video, so as to determine the successive poses of the target object. can be done.

本発明のいくつかの実施例において、ターゲット対象の連続姿勢を決定した後、ターゲット対象に対する追跡を実現することができる。上記の過程を介して、上記の連続深度情報を使用する姿勢検出過程を、ターゲット対象の追跡過程に使用して、追跡の精度を向上し、より高効率の追跡を実現することができる。 In some embodiments of the present invention, after determining the continuous pose of the target object, tracking can be implemented for the target object. Through the above process, the above pose detection process using continuous depth information can be used in the target object tracking process to improve tracking accuracy and achieve more efficient tracking.

例示的に、都市セキュリティモニタリングシナリオでは、都市モニタリング装置を介して都市に対して画像収集を実行して、収集ビデオを取得することができ、姿勢検出装置が、収集ビデオを取得した後、収集ビデオから少なくとも２フレームのターゲット対象を含む画像を取得し、第１ニューラルネットワークモデルを介して、少なくとも２フレームのターゲット対象を含む画像内の、ターゲット対象の対応する少なくとも二グループの連続深度情報および位置情報を取得し、少なくとも二グループの連続深度情報および位置情報に従って、第１ニューラルネットワークモデルを介して、対応する少なくとも２つの姿勢を取得し、少なくとも２つの姿勢、および少なくとも２つの姿勢のうち、各姿勢に対応するフレームの時間に従って、ターゲット対象の連続姿勢を取得し、さらに、ターゲット対象の行動が、戦闘、盗難、強盗などの、危険な行動であるか否かを判断し、ターゲット対象の行動が、危険な行動であると決定した場合、自動的に警報し、収集されたターゲット対象のビデオおよび危険な行動の判断結果を警察署に送信する。 Illustratively, in a city security monitoring scenario, image collection can be performed on a city via a city monitoring device to obtain collected video, and after the attitude detection device obtains the collected video, the collected video obtaining an image containing at least two frames of the target object from and through a first neural network model of at least two corresponding groups of continuous depth and position information of the target object in the image containing at least two frames of the target object and obtain at least two corresponding poses through a first neural network model according to at least two groups of continuous depth information and position information, and among the at least two poses, and the at least two poses, each pose According to the time of the frame corresponding to the continuous posture of the target object, further determine whether the behavior of the target object is dangerous behavior such as fighting, theft, robbery, etc., and determine whether the behavior of the target object is , automatically alerts and sends the collected video of the target and the determination of the dangerous behavior to the police station if it determines that the behavior is dangerous.

以下、本願実施例の実際の適用シナリオにおける例示的な適用を説明する。 An exemplary application in a practical application scenario of the embodiments of the present application is described below.

三次元人体の姿勢推定は、コンピュータビジョン分野における１つの基本的なタスクであり、動き識別、人間とコンピュータとの対話および自律運転などのシナリオに幅広く適用され、三次元人体姿勢推定の精度を向上することは、緊急の問題になった。 3D human pose estimation is one of the fundamental tasks in the field of computer vision, and is widely applied in scenarios such as motion identification, human-computer interaction and autonomous driving to improve the accuracy of 3D human pose estimation. has become a matter of urgency.

図５は、本発明の一応用例による概略図を示し、図に示されたように、本発明の実施例は、姿勢検出方法を提案し、この方法は、ターゲット画像内の人体の三次元姿勢を決定することができ、図に示されたように、本発明の応用例の姿勢検出の過程は、以下の通りである。 FIG. 5 shows a schematic diagram according to one application of the present invention, as shown in the figure, an embodiment of the present invention proposes a pose detection method, which is to detect the three-dimensional pose of the human body in the target image. can be determined, and as shown in the figure, the process of pose detection in the application of the present invention is as follows.

図に示されたように、本発明の応用例において、まず、入力されるターゲット画像を第１ニューラルネットワークモデル（ＢａｃｋＢｏｎｅバックボーンニューラルネットワーク）を開始、第１ニューラルネットワークモデルが、入力されたターゲット画像を処理して、３つの出力結果を取得することができ、それぞれ、人体キーポイントの二次元ヒートマップ、人体胴体の二次元ヒートマップおよび人体胴体の連続深度情報を含む深度補間図であり、その後、取得された３つの出力結果を共同に入力データとして、第２ニューラルネットワークモデル（Ｒｅｇｒｅｓｓｉｏｎ回帰ネットワーク）に入力し、第２ニューラルネットワークモデルの処理を介して、人体胴体のうち各連続点の三次元位置座標を取得することができる。 As shown in the figure, in the application example of the present invention, the input target image is first initiated into the first neural network model (BackBone backbone neural network), and the first neural network model converts the input target image into can be processed to obtain three output results, which are respectively a two-dimensional heat map of the human body key points, a two-dimensional heat map of the human body and a depth interpolation map containing continuous depth information of the human body, and then: The three output results obtained are jointly input to the second neural network model (regression regression network) as input data, and through the processing of the second neural network model, the three-dimensional position of each continuous point in the human torso is obtained. Coordinates can be obtained.

ここで、第１ニューラルネットワークモデルの構造は、浅いネットワークに３つの予測ブランチを加える形で構成され、浅いネットワークは、ターゲット画像内の人体に対して特徴抽出を実行することができ、取得される特徴抽出結果を３つの予測ブランチにそれぞれ入力して、１つの予測ブランチは、人体キーポイントを出力する二次元ヒートマップを予測するために使用され、１つの予測ブランチは、人体胴体の二次元ヒートマップを出力するために使用され、１つの予測ブランチは、人体胴体の連続深度情報を含む深度補間図を出力するために使用される。そのため、第１ニューラルネットワークモデルは、人体を含むトレーニング画像、人体キーポイントの二次元ヒートマップ、人体胴体の二次元ヒートマップおよび人体胴体の連続深度情報を含む深度補間図この４つの写真を介して、共同にトレーニングして得られることができ、ここで、人体胴体の連続深度情報を含む深度補間図は、人体胴体に対する離散深度情報図を介して、上記の開示された実施例による任意の１つの方式を処理することにより得られることができる。 Here, the structure of the first neural network model is constructed in the form of adding three prediction branches to a shallow network, the shallow network can perform feature extraction on the human body in the target image, and the obtained The feature extraction results are input into three prediction branches respectively, one prediction branch is used to predict a two-dimensional heat map outputting the human body keypoints, and one prediction branch is used to predict the two-dimensional heat map of the human torso. One prediction branch is used to output a depth interpolated map containing continuous depth information of the human torso. Therefore, the first neural network model is to create a training image containing the human body, a two-dimensional heat map of the human body key points, a two-dimensional heat map of the human torso, and a depth interpolation map containing continuous depth information of the human torso. , can be obtained by joint training, where a depth interpolated map containing continuous depth information of the human torso is obtained by any one of can be obtained by processing two schemes.

第２ニューラルネットワークモデルの構造は、畳み込み層とプーリング層を接続して構成され、ここで、第２ニューラルネットワークモデルが、入力データを処理する過程では、まず、入力される人体キーポイントの二次元ヒートマップ、人体胴体の二次元ヒートマップおよび人体胴体の連続深度情報を含む深度補間図を、画像チャネルの次元に沿ってスプライスし、その後、スプライスされたデータに対して、畳み込み層を介して特徴融合を実行し、プーリング層を介して予測を実現し、最終的に、人体胴体のうちの各連続点の三次元位置座標を出力することができる。そのため、第２ニューラルネットワークモデルは、人体キーポイントの二次元ヒートマップ、人体胴体の二次元ヒートマップ、人体胴体の連続深度情報を含む深度補間図および人体胴体のうちの各連続点の三次元位置座標この４つのデータを介して、共同にトレーニングして得られることができる。 The structure of the second neural network model is configured by connecting the convolution layer and the pooling layer, where the second neural network model, in the process of processing the input data, The heatmap, the 2D heatmap of the human torso and the depth interpolated map containing the continuous depth information of the human torso are spliced along the dimension of the image channel, and then the spliced data is subjected to feature analysis via a convolutional layer. We can perform fusion and realize prediction through pooling layers, and finally output the 3D position coordinates of each continuous point in the human torso. Therefore, the second neural network model includes a two-dimensional heat map of the human body keypoints, a two-dimensional heat map of the human torso, a depth interpolated map containing continuous depth information of the human torso, and a three-dimensional position of each continuous point of the human torso. Coordinates can be obtained through joint training through these four data.

本発明の実施例の画像処理方法は、上記の人体姿勢検出の過程に適用されることに限定されなく、任意のターゲット対象の姿勢検出に適用されることができ、本発明はこれに対して限定しないことに留意されたい。 The image processing method of the embodiment of the present invention is not limited to being applied to the process of human body pose detection described above, but can be applied to pose detection of any target object. Note that it is not limiting.

本発明で述べた上述の各方法の実施例は、原理及び論理に違反することなく、互いに組み合わせて、組み合わせされた実施例を生成することができ、スペースの制限により、本発明には繰り返さないことを理解されたい。当業者は、具体的な実施形態の上記方法において、各ステップの実行順序は、その機能と可能性に基づくべきであることを理解することができる。 The above-described method embodiments described in the present invention can be combined with each other to produce combined embodiments without violating the principle and logic, and due to space limitations, the present invention does not repeat Please understand. Persons skilled in the art can understand that in the above method of specific embodiments, the execution order of each step should be based on its functions and possibilities.

加えて、本発明は、さらに、姿勢検出装置、電子機器、コンピュータ可読記憶媒体、プログラムを提供し、上記は、すべて本発明で提供された任意の姿勢検出方法を実現することができ、対応する技術的解決策と説明および方法部分を参照した対応する記載は、繰り返しない。 In addition, the present invention further provides a posture detection device, an electronic device, a computer-readable storage medium, and a program, all of which can implement any posture detection method provided in the present invention and correspond to The corresponding descriptions with reference to the technical solution and the description and the method part will not be repeated.

図６は、本発明の一実施例による姿勢検出装置のブロック図を示す。当該姿勢検出装置は、端末機器、サーバまたは他の処理機器などであり得る。ここで、端末機器はユーザ機器（ＵＥ：ＵｓｅｒＥｑｕｉｐｍｅｎｔ）、モバイル機器、ユーザ端末、端末、携帯電話、コードレス電話、パーソナルデジタル処理（ＰＤＡ：ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ハンドヘルド機器、コンピューティング機器、車載機器、ウェアラブル機器などであり得る。
いくつかの可能な実現方式において、当該姿勢検出装置は、プロセッサによって、メモリに記憶されるコンピュータ可読命令を呼び出す方式を介して実現さすることができる。 FIG. 6 shows a block diagram of an attitude detection device according to one embodiment of the present invention. The attitude detection device can be a terminal device, a server or other processing device, or the like. Here, the terminal equipment includes user equipment (UE), mobile equipment, user terminal, terminal, mobile phone, cordless phone, personal digital assistant (PDA), handheld equipment, computing equipment, in-vehicle equipment, It can be a wearable device or the like.
In some possible implementations, the pose detection apparatus can be implemented by a processor through calling computer readable instructions stored in memory.

図６に示されたように、前記姿勢検出装置３０は、
ターゲット画像を取得するように構成される、ターゲット画像取得部３１と、
ターゲット画像に従って、ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得するように構成される、情報取得部３２と、
連続深度情報および位置情報に従って、ターゲット対象の姿勢を決定するように構成される、姿勢決定部３３と、を備える。 As shown in FIG. 6, the attitude detection device 30
a target image acquisition unit 31 configured to acquire a target image;
an information acquisition unit 32 configured to acquire continuous depth information and position information of a target object in the target image according to the target image;
a pose determiner 33 configured to determine the pose of the target object according to the continuous depth information and the position information.

一可能な実現方式において、情報取得部は、ターゲット画像を第１ニューラルネットワークモデルを介して、ターゲット画像内のターゲット対象の連続深度情報および位置情報を取得し、第１ニューラルネットワークモデルは、第１トレーニングデータおよび第２トレーニングデータを介してトレーニングすることにより得られるように構成され、ここで、第１トレーニングデータは、トレーニング対象を含むトレーニング画像であり、第２トレーニングデータは、トレーニング対象の連続深度情報、およびトレーニング対象の位置情報を含む。 In one possible implementation, the information acquisition unit passes the target image through a first neural network model to acquire continuous depth information and position information of the target object in the target image, wherein the first neural network model receives the first obtained by training via training data and second training data, wherein the first training data is a training image including a training object and the second training data is a continuous depth of the training object information, and the location of the training target.

一可能な実現方式において、姿勢決定部は、連続深度情報および位置情報を、第２ニューラルネットワークモデルを介して、ターゲット対象の姿勢を取得し、第２ニューラルネットワークモデルは、第２トレーニングデータおよび第３トレーニングデータを介してトレーニングするように構成され、ここで、第２トレーニングデータは、トレーニング対象の連続深度情報、およびトレーニング対象の位置情報を含み、第３トレーニングデータは、トレーニング対象の姿勢を含む。 In one possible implementation, the pose determiner obtains the pose of the target subject through continuous depth information and position information through a second neural network model, which uses the second training data and the second configured to train via three training data, wherein the second training data includes continuous depth information of the training object and position information of the training object, and the third training data includes pose of the training object .

一可能な実現方式において、第２トレーニングデータは、第３トレーニングデータに従って生成され、第３トレーニングデータは、トレーニング対象の姿勢を含み、第３トレーニングデータに従って、第２トレーニングデータを生成することは、第３トレーニングデータ内のトレーニング対象の姿勢に従って、トレーニング対象の離散深度情報、およびトレーニング対象の位置情報を取得することと、少なくとも離散深度情報の一部を処理して、トレーニング対象の連続深度情報を取得することと、トレーニング対象の連続深度情報およびトレーニング対象の位置情報に従って、第２トレーニングデータを生成することと、を含む。 In one possible implementation, the second training data is generated according to third training data, the third training data includes a pose of a training object, and generating the second training data according to the third training data includes: obtaining discrete depth information for the training object and position information for the training object according to the pose of the training object in the third training data; and processing at least a portion of the discrete depth information to obtain continuous depth information for the training object. obtaining and generating second training data according to the continuous depth information of the training object and the position information of the training object.

一可能な実現方式において、少なくとも離散深度情報の一部を処理して、トレーニング対象の連続深度情報を取得することは、少なくとも離散深度情報の一部に対応する少なくとも１つの接続を取得することと、接続に対応する離散深度情報に従って、少なくとも１つの接続の連続深度サブ情報を決定することと、少なくとも１つの連続深度サブ情報を統計して、トレーニング対象の連続深度情報を取得することと、を含む。 In one possible implementation, processing at least a portion of the discrete depth information to obtain continuous depth information to be trained includes obtaining at least one connection corresponding to at least a portion of the discrete depth information. , determining continuous depth sub-information of at least one connection according to the discrete depth information corresponding to the connection; and statisticating the at least one continuous depth sub-information to obtain continuous depth information to be trained. include.

一可能な実現方式において、接続に対応する離散深度情報に従って、少なくとも１つの前記接続の連続深度サブ情報を決定することは、接続に対応する離散深度情報に従って、線形補間を介して、接続上の少なくとも１つの点の第１連続深度サブ情報を取得することと、少なくとも１つの接続に対応する接続範囲を決定することと、第１連続深度サブ情報に従って、接続に対応する接続範囲内の少なくとも１つの点の第２連続深度サブ情報を決定することと、第１連続深度サブ情報および／または第２連続深度サブ情報に従って、接続に対応する連続深度サブ情報を取得して、少なくとも１つの前記接続の連続深度サブ情報を取得することと、を含む。 In one possible implementation, determining continuous depth sub-information of at least one connection according to the discrete depth information corresponding to the connection includes: obtaining first continuous depth sub-information of at least one point; determining a connection range corresponding to at least one connection; determining at least one within the connection range corresponding to the connection according to the first continuous depth sub-information; determining second continuous depth sub-information for one point; and obtaining continuous depth sub-information corresponding to a connection according to the first continuous depth sub-information and/or the second continuous depth sub-information, and at least one said connection. and obtaining continuous depth sub-information of .

一可能な実現方式において、第１連続深度サブ情報に従って、接続に対応する接続範囲内の少なくとも１つの点の第２連続深度サブ情報を決定することは、接続範囲が、接続に対応する離散深度情報のプリセットの範囲内にある場合、接続に対応する離散深度情報を、接続範囲内の少なくとも１つの点の第２連続深度サブ情報として使用することと、接続範囲が、接続に対応する離散深度情報のプリセットの範囲以外にある場合、接続内の接続範囲内の点と最も近い第１連続深度サブ情報に従って、接続範囲内の少なくとも１つの点の第２連続深度サブ情報を取得することと、を含む。 In one possible implementation, determining a second continuous depth sub-information of at least one point in the connection range corresponding to the connection according to the first continuous depth sub-information is performed when the connection range is a discrete depth corresponding to the connection. using the discrete depth information corresponding to the connection as second continuous depth sub-information for at least one point within the connection range, if within the preset range of information; obtaining second continuous depth sub-information of at least one point within the connected range according to the first continuous depth sub-information that is closest to the point within the connected range within the connection if outside the preset range of information; including.

図７は、本発明の一実施例によるビデオ処理装置のブロック図を示す。当該ビデオ処理装置は、端末機器、サーバまたは他の処理機器などであり得る。ここで、端末機器はユーザ機器（ＵＥ：ＵｓｅｒＥｑｕｉｐｍｅｎｔ）、モバイル機器、ユーザ端末、端末、携帯電話、コードレス電話、パーソナルデジタル処理（ＰＤＡ：ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、ハンドヘルド機器、コンピューティング機器、車載機器、ウェアラブル機器などであり得る。
いくつかの可能な実施形態において、当該ビデオ処理装置は、プロセッサがメモリに記憶されるコンピュータ可読命令を呼び出す方式を介して実現されることができる。 FIG. 7 shows a block diagram of a video processing device according to one embodiment of the present invention. The video processing device may be a terminal device, a server or other processing device, or the like. Here, the terminal equipment includes user equipment (UE), mobile equipment, user terminal, terminal, mobile phone, cordless phone, personal digital assistant (PDA), handheld equipment, computing equipment, in-vehicle equipment, It can be a wearable device or the like.
In some possible embodiments, the video processing apparatus can be implemented via a scheme in which a processor invokes computer readable instructions stored in memory.

図７に示されたように、前記ビデオ処理装置４０は、
現在のシナリオに対して画像収集を実行して、収集ビデオを取得するように構成される、画像収集部４１と、
前記収集ビデオから、少なくとも２フレームの、ターゲット対象を含むターゲット画像を選択するように構成される、選択部４２と、
上記の開示された実施例のいずれか一項に記載の姿勢検出方法を介して、少なくとも２フレームのターゲット画像内のターゲット対象に対して姿勢検出を実行し、収集ビデオ内のターゲット対象の少なくとも２つの姿勢を決定するように構成される、姿勢取得部４３と、を備える。 As shown in FIG. 7, the video processing device 40:
an image acquisition unit 41 configured to perform image acquisition for the current scenario to obtain an acquired video;
a selection unit 42 configured to select at least two frames of a target image containing the target object from the acquired video;
Performing pose detection for a target object in at least two frames of target images, via a pose detection method according to any one of the above disclosed embodiments; a posture acquisition unit 43 configured to determine one posture.

一可能な実現方式において、ビデオ処理装置４０は、さらに、ターゲット対象の少なくとも２つの姿勢、および収集ビデオ内のフレームの時間に従って、ターゲット対象の連続姿勢を取得し、ターゲット対象の連続姿勢に従って、ターゲット対象を追跡するように構成される。 In one possible implementation, the video processing unit 40 further obtains successive poses of the target object according to the at least two poses of the target object and the time of the frames in the acquired video; configured to track a subject;

本発明の実施例は、さらに、コンピュータプログラム命令が記憶される、コンピュータ可読記憶媒体を提案し、前記コンピュータプログラム命令は、プロセッサによって実行されるとき、上記の方法を実現する。コンピュータ可読記憶媒体は、不揮発性コンピュータ可読記憶媒体であり得る。 An embodiment of the present invention further proposes a computer-readable storage medium on which computer program instructions are stored, said computer program instructions, when executed by a processor, realizing the above method. A computer-readable storage medium may be a non-volatile computer-readable storage medium.

本発明の実施例は、さらに、プロセッサと、プロセッサ実行可能命令を記憶するように構成されるメモリと、を備える、電子機器を提案し、ここで、前記プロセッサは、前記メモリによって記憶される命令を呼び出して、上記の方法を実行するように構成される。 An embodiment of the present invention further proposes an electronic device comprising a processor and a memory configured to store processor-executable instructions, wherein said processor comprises instructions stored by said memory to perform the above method.

本発明の実施例は、さらに、コンピュータ可読コードを含むコンピュータプログラム製品を提供し、コンピュータ可読コードが機器で実行されるとき、機器内のプロセッサは上記の任意の実施例による画像処理方法を実現するための命令を実行する。 An embodiment of the invention further provides a computer program product comprising computer readable code, and when the computer readable code is executed in a device, a processor in the device implements an image processing method according to any of the above embodiments. execute instructions for

本発明の実施例は、さらに、コンピュータ可読命令を記憶するために使用される、別のコンピュータプログラム製品を提供し、命令が実行されるときに、コンピュータに、上記の任意の実施例による画像処理方法の操作を実行させる。 Embodiments of the invention further provide another computer program product for use in storing computer readable instructions which, when executed, cause a computer to perform image processing according to any of the embodiments above. Execute the operation of the method.

電子機器は、端末、サーバまたはその他の形態の機器として提供できる。 An electronic device may be provided as a terminal, server, or other form of device.

本発明の実施例および他の実施例において、「部分」は、部分回路、部分プロセッサ、部分プログラムまたはソフトウェア等であってもよく、もちろん、ユニットであってもよく、モジュールまたは非モジュール化であってもよい。 In embodiments and other embodiments of the present invention, a "portion" may be a partial circuit, partial processor, partial program or software, etc., and of course may be a unit, modular or non-modular. may

図８は、本発明の実施例による電子機器８００のブロック図を示す。例えば、電子機器８００は、携帯電話、コンピュータ、デジタル放送端末、メッセージングデバイス、ゲームコンソール、タブレットデバイス、医療機器、フィットネス機器、携帯情報端末などの端末であり得る。 FIG. 8 shows a block diagram of an electronic device 800 according to an embodiment of the invention. For example, electronic device 800 can be a terminal such as a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical equipment, fitness equipment, personal digital assistant, or the like.

図８を参照すると、電子機器８００は、処理コンポーネント８０２、メモリ８０４、電力コンポーネント８０６、マルチメディアコンポーネント８０８、オーディオコンポーネント８１０、入力／出力（Ｉ／Ｏ）インターフェース８１２、センサコンポーネント８１４、及び通信コンポーネント８１６のうちの１つまたは複数のコンポーネントを含み得る。 Referring to FIG. 8, electronic device 800 includes processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, and communication component 816. may include one or more components of

処理コンポーネント８０２は、一般的に、ディスプレイ、電話の呼び出し、データ通信、カメラ操作及び記録操作に関する操作などの、電子機器８００の全般的な操作を制御する。処理コンポーネント８０２は、前記方法のステップのすべてまたは一部を完成するために、１つまたは複数のプロセッサ８２０を備えて命令を実行することができる。加えて、処理コンポーネント８０２は、１つまたは複数のモジュールを備えて、処理コンポーネント８０２と他のコンポーネントとの相互作用を容易にすることができる。例えば、処理コンポーネント８０２は、マルチメディアモジュールを備えて、マルチメディアコンポーネント８０８と、処理コンポーネント８０２との相互作用を容易にすることができる。 The processing component 802 generally controls the general operation of the electronic device 800, such as operations related to display, phone calls, data communications, camera operation and recording operations. The processing component 802 can comprise one or more processors 820 to execute instructions to complete all or part of the method steps. Additionally, processing component 802 can comprise one or more modules to facilitate interaction between processing component 802 and other components. For example, processing component 802 can comprise a multimedia module to facilitate interaction between multimedia component 808 and processing component 802 .

メモリ８０４は、機器８００における操作をサポートするために、様々なタイプのデータを記憶するように構成される。これらのデータの例には、電子機器８００で動作する、任意のアプリケーションまたは方法の命令、連絡先データ、電話帳データ、メッセージ、写真、ビデオ等が含まれる。メモリ８０４は、スタティックランダムアクセスメモリ（ＳＲＡＭ）、電気的に消去可能なプログラム可能な読み取り専用メモリ（ＥＥＰＲＯＭ）、消去可能なプログラム可能な読み取り専用メモリ（ＥＰＲＯＭ）、プログラム可能な読み取り専用メモリ（ＰＲＯＭ）、読み取り専用メモリ（ＲＯＭ）、磁気メモリ、フラッシュメモリ、磁気ディスクまたは光ディスクなど、あらゆるタイプの揮発性または不揮発性ストレージデバイス、またはそれらの組み合わせで実装することができる。 Memory 804 is configured to store various types of data to support operations on device 800 . Examples of these data include instructions for any application or method, contact data, phonebook data, messages, photos, videos, etc., running on electronic device 800 . Memory 804 can be static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable programmable read only memory (EPROM), programmable read only memory (PROM). , read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk, or any combination thereof.

電力コンポーネント８０６は、電子機器８００の様々なコンポーネントに電力を提供する。電力コンポーネント８０６は、電力管理システム、１つまたは複数の電源、及び電子機器８００のために、電力を生成、管理及び割り当てに関連付けられる、他のコンポーネントを含み得る。 Power component 806 provides power to various components of electronic device 800 . Power components 806 may include a power management system, one or more power sources, and other components associated with generating, managing, and allocating power for electronic device 800 .

マルチメディアコンポーネント８０８は、前記電子機器８００とユーザとの間に、出力インターフェースを提供するスクリーンを含む。いくつかの実施例において、スクリーンは、液晶ディスプレイ（ＬＣＤ）およびタッチパネル（ＴＰ）を含み得る。スクリーンにタッチパネルが含まれる場合、スクリーンは、ユーザからの入力信号を受信するためのタッチスクリーンとして実現されることができる。タッチパネルは、タッチ、スワイプ及びタッチパネルにおけるジェスチャを検知するための１つまたは複数のタッチセンサを含む。前記タッチセンサは、タッチまたはスワイプの操作の境界を感知するだけでなく、前記タッチまたはスワイプ動作に関する、持続時間及び圧力も検知することができる。いくつかの実施例において、マルチメディアコンポーネント８０８は、１つのフロントカメラ及び／またはリアカメラを備える。電子機器８００が、撮影モードまたはビデオモードなどの動作モードにいるとき、フロントカメラ及び／またはリアカメラは、外部のマルチメディアデータを受信し得る。各フロントカメラ及びリアカメラは、固定光学レンズシステムであり得、または焦点距離と光学ズーム機能を有することがあり得る。 Multimedia component 808 includes a screen that provides an output interface between electronic device 800 and a user. In some examples, the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen for receiving input signals from the user. A touch panel includes one or more touch sensors for detecting touches, swipes, and gestures on the touch panel. The touch sensor can not only sense the boundaries of a touch or swipe operation, but also the duration and pressure associated with the touch or swipe action. In some examples, multimedia component 808 includes one front camera and/or one rear camera. When electronic device 800 is in an operational mode, such as photography mode or video mode, the front and/or rear cameras may receive external multimedia data. Each front and rear camera may be a fixed optical lens system or may have a focal length and optical zoom capability.

オーディオコンポーネント８１０は、オーディオ信号を出力及び／または入力するように構成される。例えば、オーディオコンポーネント８１０は、１つのマイク（ＭＩＣ）を備え、電子機器８００が、通話モード、録音モード及び音声認識モードなどの動作モードにいる場合、マイクは、外部オーディオ信号を受信するように構成される。受信されたオーディオ信号は、メモリ８０４に記憶され、または通信コンポーネント８１６を介して送信されることができる。いくつかの実施例において、オーディオコンポーネント８１０は、オーディオ信号を出力するように構成される、スピーカも備える。 Audio component 810 is configured to output and/or input audio signals. For example, the audio component 810 comprises one microphone (MIC), which is configured to receive external audio signals when the electronic device 800 is in operational modes such as call mode, recording mode and voice recognition mode. be done. The received audio signal can be stored in memory 804 or transmitted via communication component 816 . In some examples, audio component 810 also includes a speaker configured to output an audio signal.

Ｉ／Ｏインターフェース８１２は、処理コンポーネント８０２と周辺インターフェースモジュールとの間にインターフェースを提供し、前記周辺インターフェースモジュールは、キーボード、クリックホイール、ボタンなどであってもよい。これらのボタンは、ホームボタン、ボリュームボタン、スタートボタン、ロックボタンを含み得るが、これらに限定されない。 I/O interface 812 provides an interface between processing component 802 and peripheral interface modules, which may be keyboards, click wheels, buttons, and the like. These buttons may include, but are not limited to, home button, volume button, start button, lock button.

センサコンポーネント８１４は、電子機器８００に各態様の状態評価を提供するための１つまたは複数のセンサを備える。例えば、センサコンポーネント８１４は、電子機器８００のオン／オフ状態、電子機器８００のディスプレイとキーパッドなどのコンポーネントの、相対的な位置を検知することができ、センサコンポーネント８１４は、電子機器８００または電子機器８００の１つのコンポーネントの位置の変化、ユーザと電子機器８００との接触の有無、電子機器８００の方位または加速／減速、及び電子機器８００の温度の変化も検知することができる。センサコンポーネント８１４は、近接センサを備えることができ、物理的接触なしに近くの物体の存在を検知するように構成される。センサコンポーネント８１４は、さらに、ＣＭＯＳまたはＣＣＤ画像センサなどの光センサを備えることもでき、イメージングアプリケーションのために使用される。いくつかの実施例において、当該センサコンポーネント８１４は、さらに、加速度センサ、ジャイロスコープセンサ、磁気センサ、圧力センサまたは温度センサを含み得る。 Sensor component 814 comprises one or more sensors for providing status assessments of aspects to electronic device 800 . For example, the sensor component 814 can detect the on/off state of the electronic device 800, the relative positions of components such as the display and keypad of the electronic device 800, and the sensor component 814 can sense the electronic device 800 or electronic Changes in the position of one component of device 800, presence or absence of contact between the user and electronic device 800, orientation or acceleration/deceleration of electronic device 800, and changes in temperature of electronic device 800 can also be detected. Sensor component 814 can comprise a proximity sensor and is configured to detect the presence of nearby objects without physical contact. Sensor component 814 may also comprise an optical sensor, such as a CMOS or CCD image sensor, used for imaging applications. In some examples, the sensor component 814 may further include an acceleration sensor, gyroscope sensor, magnetic sensor, pressure sensor, or temperature sensor.

通信コンポーネント８１６は、電子機器８００と他の機器の間の有線、または無線方式の通信を容易にするように構成される。電子機器８００は、ＷｉＦｉ、２Ｇまたは３Ｇ、またはそれらの組み合わせなどの通信規格に基づく無線ネットワークにアクセスすることができる。一例示的な実施例において、通信コンポーネント８１６は、放送チャンネルを介して、外部放送管理システムからの放送信号または放送関連情報を受信する。一例示的な実施例において、前記通信コンポーネント８１６は、さらに、短距離通信を促進するために、近距離通信（ＮＦＣ）モジュールを備える。例えば、ＮＦＣモジュールは、無線周波数認識（ＲＦＩＤ）技術、赤外線データ協会（ＩｒＤＡ）技術、超広帯域（ＵＷＢ）技術、ブルートゥース（ＢＴ）技術及び他の技術に基づいて実現されることができる。 Communications component 816 is configured to facilitate wired or wireless communications between electronic device 800 and other devices. Electronic device 800 may access wireless networks based on communication standards such as WiFi, 2G or 3G, or combinations thereof. In one exemplary embodiment, communication component 816 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 816 further comprises a Near Field Communication (NFC) module to facilitate short-range communication. For example, the NFC module can be implemented based on Radio Frequency Identification (RFID) technology, Infrared Data Association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology and other technologies.

例示的な実施例において、電子機器８００は、上記の方法を実行するために、１つまたは複数の特定用途向け集積回路（ＡＳＩＣ）、デジタル信号プロセッサ（ＤＳＰ）、デジタル信号処理装置（ＤＳＰＤ）、プログラマブルロジックデバイス（ＰＬＤ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、コントローラ、マイクロコントローラ、マイクロプロセッサ、または他の電子素子によって実現されることができる。 In an exemplary embodiment, electronic device 800 includes one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processors (DSPDs), It can be implemented by a programmable logic device (PLD), field programmable gate array (FPGA), controller, microcontroller, microprocessor, or other electronic device.

例示的な実施例において、さらに、コンピュータプログラム命令を含むメモリ８０４などの、不揮発性コンピュータ可読記憶媒体を提供し、前記コンピュータプログラム命令は、電子機器８００のプロセッサ８２０によって実行されて、上記の方法を完成することができる。 An exemplary embodiment further provides a non-volatile computer-readable storage medium, such as memory 804, containing computer program instructions, which are executed by processor 820 of electronic device 800 to perform the above method. can be completed.

図９は、本発明の実施例による電子機器１９００のブロック図を示す。例えば、電子機器１９００は、サーバとして提供されることができる。図９を参照すると、電子機器１９００は、処理コンポーネント１９２２を含み、１つまたは複数のプロセッサ、及びメモリ１９３２によって表されるメモリリソースを含み、アプリケーションプログラムなど、処理コンポーネント１９２２によって、実行される命令を記憶するために使用される。メモリ１９３２に記憶されるアプリケーションプログラムは、１つまたは１つ以上の１セットの命令に対応する各モジュールを備えることができる。加えて、処理コンポーネント１９２２は、命令を実行するように構成されて、上記の方法を実行する。 FIG. 9 shows a block diagram of an electronic device 1900 according to an embodiment of the invention. For example, electronic device 1900 can be provided as a server. Referring to FIG. 9, electronic device 1900 includes a processing component 1922 and includes one or more processors and memory resources represented by memory 1932 to provide instructions, such as application programs, to be executed by processing component 1922. used to remember. An application program stored in memory 1932 may comprise one or more modules, each module corresponding to a set of instructions. Additionally, the processing component 1922 is configured to execute instructions to perform the methods described above.

電子機器１９００は、さらに、電子機器１９００の電源管理を実行するように構成される、１つの電力コンポーネント１９２６と、電子機器１９００をネットワークに接続させるように構成される、１つの有線または無線ネットワークインターフェース１９５０と、１つの入力／出力（Ｉ／Ｏ）インターフェース１９５８とを含み得る。電子機器１９００は、メモリ１９３２に記憶されたＷｉｎｄｏｗｓＳｅｒｖｅｒＴＭ、ＭａｃＯＳＸＴＭ、ＵｎｉｘＴＭ、ＬｉｎｕｘＴＭ、ＦｒｅｅＢＳＤＴＭまたは類似したものなどの操作システムに基づいて操作されることができる。 The electronic device 1900 further includes one power component 1926 configured to perform power management of the electronic device 1900 and one wired or wireless network interface configured to connect the electronic device 1900 to a network. 1950 and one input/output (I/O) interface 1958 . Electronic device 1900 can be operated based on an operating system such as Windows Server™, Mac OS X™, Unix™, Linux™, FreeBSD™, or the like, stored in memory 1932 .

例示的な実施例において、さらに、コンピュータプログラム命令を含むメモリ１９３２などの、揮発性コンピュータ可読記憶媒体を提供し、前記コンピュータプログラム命令は、電子機器１９００の処理コンポーネント１９２２によって実行されて、上記の方法を完成することができる。 An exemplary embodiment also provides a volatile computer-readable storage medium, such as memory 1932, containing computer program instructions, which are executed by processing component 1922 of electronic device 1900 to perform the method described above. can be completed.

本発明は、システム、方法及び／またはコンピュータプログラム製品であり得る。コンピュータプログラム製品は、プロセッサに本発明の様々な態様を実現させるために使用される、コンピュータ可読プログラム命令がロードされる、コンピュータ可読記憶媒体を含み得る。 The invention can be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions used to implement various aspects of the present invention on a processor.

コンピュータ可読記憶媒体は、命令実行機器によって使用される命令を保持および記憶することができる有形の機器であり得る。コンピュータ可読記憶媒体は、例えば、電気記憶機器、磁気記憶機器、光学記憶機器、電磁記憶機器、半導体貯蔵機器、または前記任意の適切な組み合わせであり得るが、これらに限定されない。コンピュータ可読記憶媒体のより具体的な例（非網羅的リスト）には、ポータブルコンピュータディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、読み取り専用メモリ（ＲＯＭ）、消去可能プログラマブル読み取り専用メモリ（ＥＰＲＯＭまたはフラッシュメモリ）、スタティックランダムアクセスメモリ（ＳＲＡＭ）、ポータブルコンパクトディスク読み取り専用メモリ（ＣＤ－ＲＯＭ）、デジタル多用途ディスク（ＤＶＤ）、メモリスティック、フロッピーディスク、機械的エンコーディング機器、例えば命令が記憶されるパンチカードまたは溝の突出構造、および、前記の任意の適切な組み合わせを含む。ここで使用されるコンピュータ可読記憶媒体は、無線電波または他の自由に伝播する電磁波、導波管または他の伝送媒体を介して伝播する電磁波（例えば、光ファイバケーブルを介する光パルス）、またはワイヤーを介して伝送される電気信号などの、過渡信号自体として解釈されない。 A computer-readable storage medium may be a tangible device capable of holding and storing instructions for use by an instruction-executing device. A computer-readable storage medium can be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer readable storage media include portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory) ), static random access memory (SRAM), portable compact disc read-only memory (CD-ROM), digital versatile disc (DVD), memory sticks, floppy discs, mechanical encoding devices such as punch cards on which instructions are stored or Including groove protruding structures, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, includes radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., light pulses through fiber optic cables), or wires. It is not interpreted as a transient signal per se, such as an electrical signal transmitted through a

本明細書に記載のコンピュータ可読プログラム命令は、コンピュータ可読記憶媒体から様々なコンピューティング／処理機器にダウンロードするか、またはインターネット、ローカルエリアネットワーク、ワイドエリアネットワーク、及び／またはワイヤレスネットワークなどのネットワークを介して、外部コンピュータまたは外部記憶機器にダウンロードすることができる。ネットワークは、銅線伝送ケーブル、光ファイバ伝送、無線伝送、ルータ、ファイアウォール、スイッチ、ゲートウェイコンピュータ、及び／またはエッジサーバを含み得る。各コンピューティング／処理機器における、ネットワークアダプタカードまたはネットワークインターフェースは、ネットワークからコンピュータ可読プログラム命令を受信し、前記コンピュータ可読プログラム命令を転送して、各コンピューティング／処理機器におけるコンピュータ可読記憶媒体に記憶される。 The computer readable program instructions described herein can be downloaded from a computer readable storage medium to various computing/processing devices or distributed over networks such as the Internet, local area networks, wide area networks and/or wireless networks. can be downloaded to an external computer or external storage device. A network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards said computer-readable program instructions for storage on a computer-readable storage medium in each computing/processing device. be.

本開示の操作を実行するために使用されるコンピュータプログラム命令は、コンポーネント命令、命令セットアーキテクチャ（ＩＳＡ）命令、機械命令、機械関連命令、マイクロコード、ファームウェア命令、ステータス設定データ、または１つまたは複数のプログラミング言語の任意の組み合わせで記述される、ソースコードまたはオブジェクトコードであり得、前記プログラミング言語は、Ｓｍａｌｌｔａｌｋ、Ｃ＋＋など、対象指向のプログラミング言語、及び「Ｃ」言語または同様のプログラミング言語など、従来の手続き型プログラミング言語を含む。コンピュータ可読プログラム命令は、完全にユーザのコンピュータで実行でき、部分的にユーザのコンピュータで実行でき、スタンドアロンパッケージとして実行でき、ユーザのコンピュータで一部、リモートコンピュータで一部実行でき、または、完全にリモートコンピュータまたはサーバで実行できる。リモートコンピュータに関するシナリオにおいて、リモートコンピュータは、ローカルエリアネットワーク（ＬＡＮ）またはワイドエリアネットワーク（ＷＡＮ）を含む、任意の種類のネットワークを介して、ユーザのコンピュータにアクセスでき、または、リモートコンピュータにアクセスできる（例えば、インターネットサービスプロバイダーを使用してインターネットを介してアクセスする）。いくつかの実施例において、コンピュータ可読プログラム命令のステータス情報を使用することを介して、プログラマブルロジック回路、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、またはプログラマブルロジックアレイ（ＰＬＡ）などの電子回路を、パーソナライズにスタマイズし、前記電子回路は、コンピュータ可読プログラム命令を実行して、本開示の様々な態様を実現することができる。 Computer program instructions used to perform the operations of the present disclosure may be component instructions, Instruction Set Architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or one or more of can be source code or object code written in any combination of programming languages, including object-oriented programming languages such as Smalltalk, C++, and conventional programming languages such as the "C" language or similar programming languages. including procedural programming languages. Computer-readable program instructions can be executed entirely on a user's computer, partially on a user's computer, as a stand-alone package, partially on a user's computer, partially on a remote computer, or entirely Can run on a remote computer or server. In the remote computer scenario, the remote computer can access the user's computer or can access the remote computer over any type of network, including a local area network (LAN) or a wide area network (WAN). over the Internet using an Internet Service Provider). In some embodiments, electronic circuits, such as programmable logic circuits, field programmable gate arrays (FPGAs), or programmable logic arrays (PLAs), are personalized through the use of status information in computer readable program instructions. and the electronic circuitry can execute computer readable program instructions to implement various aspects of the present disclosure.

本明細書では、本発明の実施例による方法、装置（システム）、及びコンピュータプログラム製品のフローチャート及び／またはブロック図を参照して本発明の様々な態様を説明する。フローチャート及び／またはブロック図の各ブロック、及びフローチャート及び／またはブロック図内の各ブロックの組み合わせは、コンピュータ可読プログラム命令によって実現されることを理解されたい。 Various aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

これらのコンピュータ可読プログラム命令は、汎用コンピュータ、固有コンピュータ、または他のプログラマブルデータ処理装置のプロセッサに提供することができ、それにより、デバイスが作成され、これらの命令が、コンピュータ、または他のプログラマブルデータ処理装置のプロセッサによって実行されるとき、フローチャート及び／またはブロック図内の１つまたは複数のブロックの指定される機能／アクションを実現させる。これらのコンピュータ可読プログラム命令を、コンピュータ可読記憶媒体に記憶することもあり得、これらの命令は、コンピュータ、プログラマブルデータ処理装置及び／または他の機器を特定の方式で作業するようにし、従って、命令が記憶されるコンピュータ可読媒体は、フローチャート及び／またはブロック図内の１つまたは複数のブロックの指定される機能／アクションを実現する様々な態様の命令を含む製造品を含む。 These computer readable program instructions may be provided to a processor of a general purpose computer, a proprietary computer, or other programmable data processing apparatus, thereby creating a device, by which these instructions may be read by the computer or other programmable data processing apparatus. When executed by the processor of the processing unit, it implements the specified functions/actions of one or more blocks in the flowchart illustrations and/or block diagrams. These computer readable program instructions may be stored on a computer readable storage medium, and these instructions cause the computer, programmable data processing device, and/or other apparatus to operate in a particular manner; A computer-readable medium on which is stored includes various aspects of instructions that implement the specified functions/actions of one or more blocks in the flowcharts and/or block diagrams.

コンピュータ可読プログラム命令を、コンピュータ、他のプログラマブルデータ処理装置、または他の機器にロードすることもでき、コンピュータ、他のプログラマブルデータ処理装置、または他の機器で一連の操作ステップを実行して、コンピュータ実現のプロセスを生成させ、これにより、コンピュータ、他のプログラマブルデータ処理装置、または他の機器で実行する命令を、フローチャート及び／またはブロック図内の１つまたは複数のブロックの指定される機能／アクションを実現させる。 The computer readable program instructions can also be loaded into a computer, other programmable data processing device, or other equipment, and cause the computer, other programmable data processing device, or other equipment to perform a series of operational steps to cause the computer to To generate a process of implementation that causes instructions to be executed by a computer, other programmable data processing device, or other machine to perform the specified functions/actions of one or more blocks in the flowchart illustrations and/or block diagrams. Realize

図面におけるプロセス図及びブロック図は、本発明の複数の実施例によるシステム、方法及びコンピュータプログラム製品の実現可能なアーキテクチャ、機能、及び操作を示す。この点について、フローチャートまたはブロック図内の各ブロックは、１つのモジュール、プログラムセグメント、または命令の一部を表すことができ、前記モジュール、プログラムセグメント、または命令の一部は、１つまたは複数の指定される論理機能を実現するために使用される実行可能な命令を含む。いくつかの代替実現において、ブロックのマークされる機能は、図面でマークされる順序とは異なる順序で発生することもできる。例えば、関する機能によって、２つの連続するブロックは、実際に基本的に並行して実行でき、時には逆の順序で実行できる。ブロック図及び／またはフローチャート中の各ブロック、及びブロック図及び／またはフローチャートのブロックの組み合わせは、指定される機能またはアクションを実行する、専用のハードウェアベースのシステムによって実現されるか、または、ハードウェアとコンピュータ命令の組み合わせを使用して、実現されることもできることを留意する必要がある。 The process diagrams and block diagrams in the figures illustrate possible architectures, functionality, and operation of systems, methods and computer program products according to several embodiments of the present invention. In this regard, each block in a flowchart or block diagram can represent a portion of a module, program segment, or instruction, wherein said module, program segment, or portion of instruction is one or more Contains executable instructions used to implement specified logic functions. In some alternative implementations, the marked functions of the blocks may occur out of the order marked in the figures. For example, depending on the functionality involved, two consecutive blocks can actually be executed essentially in parallel, sometimes in reverse order. Each block in the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented by, or may be, a dedicated hardware-based system that performs the specified functions or actions. Note that it can also be implemented using a combination of software and computer instructions.

当該コンピュータプログラム製品は、具体的に、ハードウェア、ソフトウェアまたはそれらを組み合わせる方式を介して実現されることができる。１つの例示的な実施例において、前記コンピュータプログラム製品は、具体的には、コンピュータ記憶媒体として具現され、別の例示的な実施例において、コンピュータプログラム製品は、具体的には、ソフトウェア開発キット（ＳＤＫ：ＳｏｆｔｗａｒｅＤｅｖｅｌｏｐｍｅｎｔＫｉｔ）などのソフトウェア製品として具現される。 The computer program product can be specifically realized through hardware, software, or a combination thereof. In one exemplary embodiment, said computer program product is specifically embodied as a computer storage medium, and in another exemplary embodiment, said computer program product is specifically embodied as a software development kit ( It is embodied as a software product such as SDK (Software Development Kit).

以上、本発明の各実施例を説明したが、以上の説明は、例示的なものに過ぎず、網羅的ではなく、開示された各実施例に限定されない。説明される各実施例の範囲及び思想から逸脱してない場合は、当業者にとって、多くの修正及び変更は明らかである。本明細書で使用される用語の選択は、各実施例の原理、実際の適用、または市場における技術の改善を最もよく説明するか、または、当業者が、本明細書で開示される各実施例を理解することができるようにすることを意図する。 Although embodiments of the present invention have been described above, the above description is illustrative only and is not exhaustive and is not limited to the disclosed embodiments. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of each described embodiment. The choice of terminology used herein is such that it best describes the principle, practical application, or technical improvement in the market of each embodiment, or allows those of ordinary skill in the art to understand each implementation disclosed herein. The intention is to make the example understandable.

Claims

A posture detection method comprising:
obtaining a target image;
obtaining continuous depth and position information of a target object within the target image according to the target image;
determining a pose of the target object according to the continuous depth information and the position information.

Obtaining continuous depth information and location information of a target object within the target image according to the target image;
passing the target image through a first neural network model to obtain continuous depth and position information of a target object in the target image;
said first neural network model is obtained by training via first training data and second training data;
the first training data is a training image including a training target;
the second training data includes continuous depth information of the training object and location information of the training object;
The posture detection method according to claim 1.

Determining a pose of the target object according to the continuous depth information and the position information includes:
obtaining a pose of the target object from the continuous depth information and the position information through a second neural network model;
training the second neural network model via second training data and third training data;
the second training data includes continuous depth information for a training target and location information for the training target;
wherein the third training data includes the posture of the training target;
The attitude detection method according to claim 1 or 2.

the second training data is generated according to third training data, the third training data including a posture of the training object;
generating the second training data according to the third training data;
obtaining discrete depth information of the training object and position information of the training object according to the pose of the training object in the third training data;
processing at least a portion of the discrete depth information to obtain continuous depth information for the training target;
generating the second training data according to the continuous depth information of the training object and the location information of the training object;
The attitude detection method according to claim 2 or 3.

processing the at least a portion of the discrete depth information to obtain continuous depth information for the training object;
obtaining at least one connection corresponding to at least a portion of the discrete depth information;
determining continuous depth sub-information for at least one of said connections according to discrete depth information corresponding to said connections;
statistics of the at least one continuous depth sub-information to obtain continuous depth information for the training object;
The posture detection method according to claim 4.

Determining continuous depth sub-information for at least one connection according to discrete depth information corresponding to the connection;
obtaining first continuous depth sub-information of at least one point on said connection via linear interpolation according to discrete depth information corresponding to said connection;
determining a connection range corresponding to at least one of said connections;
determining second continuous depth sub-information for at least one point within a connection range corresponding to said connection according to said first continuous depth sub-information;
obtaining continuous depth sub-information corresponding to the connections according to the first continuous depth sub-information and/or the second continuous depth sub-information to obtain continuous depth sub-information for the at least one connection; including,
The posture detection method according to claim 5.

Determining second continuous depth sub-information for at least one point within a connection range corresponding to said connection according to said first continuous depth sub-information;
If the connected range is within a preset range of discrete depth information corresponding to the connected, then using the discrete depth information corresponding to the connected as second continuous depth sub-information for at least one point within the connected range. and
If the connection range is outside the preset range of discrete depth information corresponding to the connection, at least one within the connection range according to a first continuous depth sub-information that is closest to a point within the connection range within the connection. obtaining second consecutive depth sub-information for one point;
The attitude detection method according to claim 6.

A video processing method comprising:
performing image acquisition for the current scenario to obtain an acquired video;
selecting at least two frames of a target image containing a target object from the acquired video;
Performing pose detection on the target object in the target images of at least two frames via a pose detection method according to any one of claims 1 to 7, wherein the target object in the acquired video determining at least two poses of .

The video processing method comprises:
obtaining successive poses of the target object according to at least two poses of the target object and time of frames in the acquired video;
tracking the target object according to a continuous pose of the target object;
A video processing method according to claim 8.

A posture detection device,
a target image acquisition unit configured to acquire a target image;
an information acquisition unit configured to acquire continuous depth information and location information of a target object within the target image according to the target image;
a pose determination unit configured to determine a pose of the target object according to the continuous depth information and the position information.

A video processing device,
an image acquisition unit configured to perform image acquisition for the current scenario and obtain an acquired video;
a selection unit configured to select at least two frames of a target image containing a target object from the acquired video;
Performing pose detection on the target object in the target images of at least two frames via a pose detection method according to any one of claims 1 to 7, wherein the target object in the acquired video a pose acquisition unit configured to determine at least two poses of the video processing apparatus.

an electronic device,
a processor;
a memory configured to store processor-executable instructions;
10. The electronic device, wherein the processor is configured to invoke instructions stored by the memory to perform a posture detection method according to any one of claims 1-9.

A computer readable storage medium storing computer program instructions, comprising:
10. The computer readable storage medium implementing the pose detection method of any one of claims 1 to 9 when the computer program instructions are executed by a processor.

A computer program product, comprising a computer program or instructions, said computer program or instructions, when executed on a computer, causing said computer to perform an attitude detection method according to any one of claims 1 to 9. , said computer program product.

10. A posture according to any one of claims 1 to 9, comprising a computer program comprising computer readable code, said computer readable code being executed in an electronic device and being executed by a processor within said electronic device. Said computer program implementing the detection method.