JP2022520498A

JP2022520498A - Image processing methods, devices, storage media and electronic devices

Info

Publication number: JP2022520498A
Application number: JP2021557461A
Authority: JP
Inventors: ユエリアオ; フェイワン; イエンジエチェン; チェンチエン; スーリウ
Original assignee: シャンハイセンスタイムリンガンインテリジェントテクノロジーカンパニーリミテッド
Priority date: 2019-12-30
Filing date: 2020-09-22
Publication date: 2022-03-30
Anticipated expiration: 2040-09-22
Also published as: WO2021135424A1; JP7105383B2; CN111104925B; KR102432204B1; CN111104925A; KR20210136138A

Abstract

本発明の実施例は、画像処理方法、装置、記憶媒体及び電子機器を開示する。前記方法は、第１画像の特徴データを抽出することと、前記特徴データに基づいて、前記第１画像の各インタラクションキーポイント及び各ターゲットの中心点を決定することであって、１つのインタラクションキーポイントは、連結線の中点からプリセットされた範囲内の前記連結線上の一点であり、前記連結線は、１つのインタラクション動作における２つのターゲットの中心点間の連結線であることと、前記特徴データに基づいて、少なくとも２つのオフセットを決定することであって、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表す、ことと、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することと、を含む。【選択図】図１Examples of the present invention disclose image processing methods, devices, storage media and electronic devices. The method is to extract the feature data of the first image and to determine each interaction key point of the first image and the center point of each target based on the feature data, and one interaction key. The point is a point on the connecting line within a preset range from the midpoint of the connecting line, and the connecting line is a connecting line between the center points of two targets in one interaction operation. Determining at least two offsets based on the data, one offset representing the offset between the interaction key point in one interaction operation and the center point of one target in said interaction operation. Includes determining the interaction relationship between targets in the first image based on the center point of each target, the interaction key point and at least the two offsets. [Selection diagram] Fig. 1

Description

［関連出願への相互参照］
本願は、２０１９年１２月３０日に中国特許局に提出された、出願番号が２０１９１１４０４４５０．６である中国特許出願に基づいて提出されるものであり、当該中国特許出願の優先権を主張し、当該中国特許出願の全ての内容が引用によって本願に組み込まれる。
［技術分野］
本発明は、画像処理技術に関し、具体的に、画像処理方法、装置、記憶媒体及び電子機器に関する。 [Cross-reference to related applications]
This application is submitted based on a Chinese patent application with an application number of 201911404450.6, which was filed with the Chinese Patent Office on December 30, 2019, claiming the priority of the Chinese patent application. The entire contents of the Chinese patent application are incorporated herein by reference.
[Technical field]
The present invention relates to an image processing technique, and specifically to an image processing method, an apparatus, a storage medium, and an electronic device.

画像内の人と物体との間のインタラクション動作関係を検出するために、通常、最初に検出器を介して画像内の人及び物体を検出し、信頼度が特定の閾値より高い人及び物体を選択し、選択された人と物体とをペアリングして、人－物体ペアを形成し、関係分類ネットワークを介して各人－物体ペアを分類し、動作関係カテゴリを出力する。 In order to detect the interaction motion relationship between a person and an object in an image, usually, the person and the object in the image are first detected through a detector, and the person and the object whose reliability is higher than a specific threshold are detected. Select, pair the selected person and object to form a person-object pair, classify each person-object pair via the relationship classification network, and output the action relationship category.

上記の処理プロセスでは、検出の信頼度のみを考慮し、人と物体間のインタラクション動作の可能性を考慮しないため、実際のインタラクション動作関係を有する人又は物体を欠落する可能性があり、即ち、実際のインタラクション動作関係を有する人－物体ペアを欠落する可能性があり、実際のインタラクション動作関係を有しない人－物体ペアを大量生成する。また、通常の状況では、１枚の画像にはインタラクション動作関係を有する人と物体が非常に少ないため、画像からＭ人とＮ個の物体を検出した場合、上記の処理方式を採用すれば、Ｍ×Ｎ個の人－物体ペアが生成され、この場合、関係分類ネットワークは、各人－物体ペアに対応する動作関係カテゴリを決定する必要があるため、不要な処理及び消費が増加する。 In the above processing process, only the reliability of detection is considered, and the possibility of interaction motion between a person and an object is not considered, so that there is a possibility that a person or an object having an actual interaction motion relationship is missing, that is, A large number of person-object pairs that may lack an actual interaction behavioral relationship and do not have an actual interaction behavioral relationship are generated. Further, in a normal situation, there are very few people and objects having an interaction operation relationship in one image. Therefore, when M people and N objects are detected from the image, if the above processing method is adopted, the above processing method can be adopted. M × N person-object pairs are generated, in which case the relationship classification network needs to determine the behavioral relationship category corresponding to each person-object pair, thus increasing unnecessary processing and consumption.

本発明の実施例は、画像処理方法、装置、記憶媒体及び電子機器を提供する。 Examples of the present invention provide image processing methods, devices, storage media and electronic devices.

本発明の実施例は、画像処理方法を提供し、前記方法は、第１画像の特徴データを抽出することと、前記特徴データに基づいて、前記第１画像の各インタラクションキーポイント及び各ターゲットの中心点を決定することであって、１つのインタラクションキーポイントは、連結線の中点からプリセットされた範囲内の前記連結線上の一点であり、前記連結線は、１つのインタラクション動作における２つのターゲットの中心点間の連結線であることと、前記特徴データに基づいて、少なくとも２つのオフセットを決定することであって、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表すことと、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することと、を含む。 An embodiment of the present invention provides an image processing method, wherein the method extracts feature data of a first image and, based on the feature data, of each interaction key point of the first image and each target. By determining the center point, one interaction key point is one point on the connecting line within a preset range from the midpoint of the connecting line, and the connecting line is two targets in one interaction operation. It is a connecting line between the center points of the above, and at least two offsets are determined based on the feature data, and one offset is an interaction key point in one interaction operation and one in the interaction operation. Representing the offset between the center points of one target and determining the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point and at least the two offsets. ,including.

本発明のいくつかの例示的な実施例において、前記特徴データに基づいて、前記第１画像の各インタラクションキーポイント及び各ターゲットの中心点を決定することは、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点、及び各ターゲットの信頼度を決定することと、前記特徴データに基づいて、前記第１画像内のインタラクションキーポイント、及び各インタラクションキーポイントに対応する各インタラクション動作カテゴリの信頼度を決定することとを含み、前記各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することは、各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することを含む。 In some exemplary embodiments of the invention, determining each interaction key point of the first image and the center point of each target based on the feature data is the first, based on the feature data. Determining the center point of each target in one image and the reliability of each target, and the interaction key points in the first image and each interaction operation corresponding to each interaction key point based on the feature data. Determining the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and at least the two offsets, including determining the confidence of the category. Within the first image, based on the center point of each target, the interaction key points, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. Includes determining the interaction relationships between the targets of.

本発明のいくつかの例示的な実施例において、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点及び各ターゲットの信頼度を決定することは、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点及びそのカテゴリ、及び各ターゲットが各カテゴリに属する信頼度を決定することを含み、前記各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することは、各ターゲットの中心点及びそのカテゴリ、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットが各カテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することを含む。 In some exemplary embodiments of the invention, determining the center point of each target in the first image and the reliability of each target based on the feature data is based on the feature data. The center point of each target and its category in the first image, and the confidence that each target belongs to each category, including the center point of each target, the interaction key point, the at least two offsets, Determining the interaction relationship between the targets in the first image based on the reliability of each target and the reliability of each preset interaction action category corresponding to each interaction key point is the center point of each target. And its category, said interaction key points, said at least two offsets, confidence that each target belongs to each category, and confidence of each preset interaction action category corresponding to each interaction key point. Includes determining the interaction relationships between targets in the image.

本発明のいくつかの例示的な実施例において、前記各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することは、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する２つのオフセットを決定することと、前記インタラクションキーポイント及び前記インタラクションキーポイントに対応する２つのオフセットに従って、前記インタラクションキーポイントに対応する２つの予測中心点を決定することと、各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に従って、各インタラクションキーポイントに対応する２つのターゲットを決定することと、各インタラクションキーポイントに対応する２つのターゲット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定することと、を含む。 In some exemplary embodiments of the invention, the center point of each target, the interaction keypoint, at least two offsets, the reliability of each target, and each preset interaction corresponding to each interaction keypoint. Determining the interaction relationship between the targets in the first image based on the reliability of the motion category is to determine, for one interaction key point, two offsets corresponding to the interaction key point, and to determine the above. Determining two predictive centers corresponding to the interaction keypoints according to the interaction keypoints and the two offsets corresponding to the interaction keypoints, and two predictions corresponding to the centerpoints of each target and each interaction keypoint. Determining two targets for each interaction keypoint according to the center point, two targets for each interaction keypoint, reliability for each target, and each preset interaction for each interaction keypoint. It includes determining the interaction relationship between the targets in the first image according to the reliability of the motion category.

本発明のいくつかの例示的な実施例において、前記各インタラクションキーポイントに対応する２つのターゲット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定することは、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する１つのプリセットされたインタラクション動作カテゴリの信頼度と前記インタラクションキーポイントに対応する２つのターゲットの信頼度とを乗算して、第１信頼度を取得することであって、前記第１信頼度は、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属する信頼度であることと、前記第１信頼度が信頼度閾値を超えることに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属すると決定することと、前記第１信頼度が信頼度閾値を超えないことに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属しないと決定することと、を含む。 In some exemplary embodiments of the invention, according to the reliability of the two targets corresponding to each of the interaction key points, the reliability of each target, and the reliability of each preset interaction operation category corresponding to each interaction key point. Determining the interaction relationship between the targets in the first image corresponds to the reliability of one preset interaction action category corresponding to the interaction keypoint and the interaction keypoint for one interaction keypoint. The first reliability is obtained by multiplying the reliability of the two targets to obtain the first reliability, in which the interaction relationship between the two targets corresponding to the interaction key points is preset. In response to the reliability belonging to the interaction operation category and the first reliability exceeding the reliability threshold, the interaction relationship between the two targets corresponding to the interaction key points is the preset interaction. In response to determining that it belongs to an action category and that the first confidence does not exceed the confidence threshold, the interaction relationship between the two targets corresponding to the interaction key points is the preset interaction action category. Includes determining that it does not belong to.

本発明のいくつかの例示的な実施例において、前記方法は、１つのインタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が各プリセットされたインタラクション動作カテゴリに属しないと決定した後、前記インタラクションキーポイントに対応する２つのターゲット間にインタラクション関係がないと決定することを更に含む。 In some exemplary embodiments of the invention, the method determines that the interaction relationship between two targets corresponding to one interaction key point does not belong to each preset interaction action category, and then the interaction. It further includes determining that there is no interaction relationship between the two targets corresponding to the key points.

本発明のいくつかの例示的な実施例において、前記各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に従って、各インタラクションキーポイントに対応する２つのターゲットを決定することは、１つの予測中心点について、各ターゲットの中心点と前記予測中心点との間の距離を決定することと、中心点と前記予測中心点との間の距離がプリセットされた距離閾値より小さいターゲットを、前記予測中心点に対応するインタラクションキーポイントに対応するターゲットとして使用することと、を含む。 In some exemplary embodiments of the invention, determining the two targets corresponding to each interaction key point according to the center point of each target and the two predicted center points corresponding to each interaction key point can be determined. For one predicted center point, a target whose distance between the center point of each target and the predicted center point is determined and the distance between the center point and the predicted center point is smaller than the preset distance threshold value. , The use as a target corresponding to the interaction key point corresponding to the prediction center point, and the like.

本発明のいくつかの例示的な実施例において、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点を決定することは、前記特徴データをダウンサンプリングして前記第１画像のヒットマップを取得することと、前記ヒットマップに従って、前記第１画像内の各点の位置オフセット、前記第１画像内の各ターゲットの中心点及び各ターゲットの検出ボックスの高さと幅を決定することと、を含み、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点を決定した後、前記画像処理方法は、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置オフセットに従って、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置を補正して、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置を取得することと、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置及びその検出ボックスの高さと幅に従って、前記第１画像においてインタラクション関係を有するターゲットの検出ボックスを決定することと、を更に含む。 In some exemplary embodiments of the invention, determining the center point of each target in the first image based on the feature data is to downsample the feature data to the first image. Obtaining a hit map and determining the position offset of each point in the first image, the center point of each target in the first image, and the height and width of the detection box for each target according to the hit map. After determining the center point of each target in the first image based on the feature data, the image processing method determines the position offset of the center point of the target having an interaction relationship in the first image. According to this, the position of the center point of the target having an interaction relationship in the first image is corrected to acquire the corrected position of the center point of the target having an interaction relationship in the first image, and the first image. Further includes determining the detection box of the target having an interaction relationship in the first image according to the corrected position of the center point of the target having an interaction relationship and the height and width of the detection box thereof.

本発明のいくつかの例示的な実施例において、前記画像処理方法は、ニューラルネットワークによって実行され、前記ニューラルネットワークは、サンプル画像を用いてトレーニングすることによって得られたものであり、前記サンプル画像には、インタラクション関係を有するターゲットの検出ボックスがマークされ、前記サンプル画像においてインタラクション関係を有するターゲットのマークされた中心点及びマークされたインタラクションキーポイントは、マークされた検出ボックスに従って決定され、マークされたオフセットは、インタラクション関係を有するターゲットのマークされた中心点及びマークされたインタラクションキーポイントに従って決定される。 In some exemplary embodiments of the invention, the image processing method is performed by a neural network, which is obtained by training with a sample image, in the sample image. The detection box of the target having an interaction relationship is marked, and the marked center point and the marked interaction key point of the target having an interaction relationship in the sample image are determined and marked according to the marked detection box. The offset is determined according to the marked center point and the marked interaction key point of the target having the interaction relationship.

本発明のいくつかの例示的な実施例において、サンプル画像を用いて前記ニューラルネットワークをトレーニングすることは、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データを抽出することと、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データをダウンサンプリングして前記サンプル画像のヒットマップを取得することと、前記ニューラルネットワークを用いて、前記サンプル画像のヒットマップに基づいて、前記サンプル画像内の各点の位置オフセット、前記サンプル画像内の各インタラクションキーポイント、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の各ターゲットの検出ボックスの高さと幅を予測することと、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データに基づいて少なくとも２つのオフセットを予測することと、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の前記インタラクションキーポイント及び前記サンプル画像内の少なくとも２つのオフセットに基づいて、前記サンプル画像内のターゲット間のインタラクション関係を予測することと、予測された位置オフセット、前記サンプル画像においてインタラクション関係を有するターゲットの予測された中心点及び予測された検出ボックスの高さと幅、前記サンプル画像においてインタラクション関係を有するターゲットに対応する予測されたインタラクションキーポイント及びそれに対応する予測されたオフセット、及びマークされた位置オフセット及び前記サンプル画像にマークされたインタラクション関係を有するターゲットの検出ボックスに従って、前記ニューラルネットワークのネットワークパラメータ値を調整することと、を含む。 In some exemplary embodiments of the invention, training the neural network with a sample image is using the neural network to extract feature data of the sample image and using the neural network. Using, downsampling the feature data of the sample image to obtain a hit map of the sample image, and using the neural network, each point in the sample image based on the hit map of the sample image. Position offset, each interaction key point in the sample image, the center point of each target in the sample image, the height and width of the detection box for each target in the sample image, and using the neural network At least two offsets are predicted based on the feature data of the sample image, and the center point of each target in the sample image, the interaction key point in the sample image, and at least two in the sample image. Predicting the interaction relationship between targets in the sample image based on the offset and the predicted position offset, the predicted center point of the target having the interaction relationship in the sample image, and the predicted height of the detection box. The width and width, the predicted interaction key points corresponding to the target having an interaction relationship in the sample image and the corresponding predicted offset, and the marked position offset and the target having the interaction relationship marked in the sample image. Includes adjusting the network parameter values of the neural network according to the detection box.

本発明の実施例は、画像処理装置を更に提供し、前記装置は、抽出ユニット、第１決定ユニット、第２決定ユニット及び第３決定ユニットを備え、ここで、
前記抽出ユニットは、第１画像の特徴データを抽出するように構成され、
前記第１決定ユニットは、前記抽出ユニットによって抽出された前記特徴データに基づいて、前記第１画像内の各インタラクションキーポイント及び各ターゲットの中心点を決定するように構成され、１つのインタラクションキーポイントは、連結線の中点からプリセットされた範囲内の前記連結線上の一点であり、前記連結線は、１つのインタラクション動作における２つのターゲットの中心点間の連結線であり、
前記第２決定ユニットは、前記抽出ユニットによって抽出された前記特徴データに基づいて、少なくとも２つのオフセットを決定するように構成され、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表し、
前記第３決定ユニットは、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 Embodiments of the present invention further provide an image processing apparatus, wherein the apparatus comprises an extraction unit, a first determination unit, a second determination unit and a third determination unit, wherein the apparatus comprises an extraction unit, a first determination unit, and a third determination unit.
The extraction unit is configured to extract the feature data of the first image.
The first determination unit is configured to determine each interaction key point and the center point of each target in the first image based on the feature data extracted by the extraction unit, and one interaction key point. Is a point on the connecting line within a preset range from the midpoint of the connecting line, and the connecting line is a connecting line between the center points of two targets in one interaction operation.
The second determination unit is configured to determine at least two offsets based on the feature data extracted by the extraction unit, one offset being an interaction key point in one interaction operation and the interaction operation. Represents the offset between the center points of one target in
The third determination unit is configured to determine the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point and the at least two offsets.

本発明のいくつかの例示的な実施例において、前記第１決定ユニットは、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点、及び各ターゲットの信頼度を決定し、前記特徴データに基づいて、前記第１画像内のインタラクションキーポイント、及び各インタラクションキーポイントに対応する各インタラクション動作カテゴリの信頼度を決定するように構成され、
前記第３決定ユニットは、各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 In some exemplary embodiments of the invention, the first determination unit determines the center point of each target in the first image and the reliability of each target based on the feature data. Based on the feature data, it is configured to determine the reliability of the interaction keypoints in the first image and each interaction action category corresponding to each interaction keypoint.
The third decision unit is based on the center point of each target, the interaction key points, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. It is configured to determine the interaction relationship between the targets in the first image.

本発明のいくつかの例示的な実施例において、前記第１決定ユニットは、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点及びそのカテゴリ、及び各ターゲットが各プリセットされたカテゴリに属する信頼度を決定するように構成され、
前記第３決定ユニットは、各ターゲットの中心点及びそのカテゴリ、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットが各プリセットされたカテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 In some exemplary embodiments of the invention, the first determination unit is preset with the center point of each target in the first image, its category, and each target based on the feature data. Configured to determine the confidence that belongs to a category,
The third determination unit is the center point of each target and its category, the interaction key point, the at least two offsets, the confidence that each target belongs to each preset category, and each preset corresponding to each interaction key point. It is configured to determine the interaction relationship between the targets in the first image based on the reliability of the interaction action category.

本発明のいくつかの例示的な実施例において、前記第３決定ユニットは、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する２つのオフセットを決定し、前記インタラクションキーポイント及び前記インタラクションキーポイントに対応する２つのオフセットに従って、前記インタラクションキーポイントに対応する２つの予測中心点を決定し、各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に従って、各インタラクションキーポイントに対応する２つのターゲットを決定し、各インタラクションキーポイントに対応する２つのターゲット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 In some exemplary embodiments of the invention, the third determination unit determines, for one interaction keypoint, two offsets corresponding to the interaction keypoint, the interaction keypoint and the interaction keypoint. According to the two offsets corresponding to, the two predicted center points corresponding to the interaction key points are determined, and the center point of each target and the two predicted center points corresponding to each interaction key point correspond to each interaction key point. In the first image, according to the two targets corresponding to each interaction key point, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. It is configured to determine the interaction relationships between the targets of.

本発明のいくつかの例示的な実施例において、前記第３決定ユニットは、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する１つのプリセットされたインタラクション動作カテゴリの信頼度と前記インタラクションキーポイントに対応する２つのターゲットの信頼度とを乗算して、第１信頼度を取得し、ここで、前記第１信頼度は、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記インタラクション動作カテゴリに属する信頼度であり、前記第１信頼度が信頼度閾値を超えることに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属すると決定し、前記第１信頼度が信頼度閾値を超えないことに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属しないと決定するように構成される。 In some exemplary embodiments of the invention, the third decision unit has, for one interaction keypoint, the reliability of one preset interaction action category corresponding to the interaction keypoint and said interaction keypoint. The reliability of the two targets corresponding to the above is multiplied to obtain the first reliability, where the first reliability is the interaction relationship between the two targets corresponding to the interaction key points. A reliability that belongs to the operation category, and the interaction relationship between the two targets corresponding to the interaction key points belongs to the preset interaction operation category in response to the first reliability exceeding the reliability threshold. In response to the first confidence not exceeding the confidence threshold, it is determined that the interaction relationship between the two targets corresponding to the interaction key points does not belong to the preset interaction action category. It is configured as follows.

本発明のいくつかの例示的な実施例において、前記第３決定ユニットは更に、１つのインタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が各プリセットされたインタラクション動作カテゴリに属しないと決定した後、前記インタラクションキーポイントに対応する２つのターゲット間にインタラクション関係がないと決定するように構成される。 In some exemplary embodiments of the invention, the third decision unit further determines that the interaction relationships between the two targets corresponding to one interaction key point do not belong to each preset interaction action category. Later, it is configured to determine that there is no interaction relationship between the two targets corresponding to the interaction key points.

本発明のいくつかの例示的な実施例において、前記第３決定ユニットは、１つの予測中心点について、各ターゲットの中心点と前記予測中心点との間の距離を決定し、中心点と前記予測中心点との間の距離がプリセットされた距離閾値より小さいターゲットを、前記予測中心点に対応するインタラクションキーポイントに対応するターゲットとして使用するように構成される。 In some exemplary embodiments of the invention, the third determination unit determines the distance between the center point of each target and the predicted center point for one predicted center point, and the center point and said. A target whose distance to the predicted center point is smaller than the preset distance threshold is configured to be used as the target corresponding to the interaction key point corresponding to the predicted center point.

本発明のいくつかの例示的な実施例において、前記第１決定ユニットは、前記特徴データをダウンサンプリングして前記第１画像のヒットマップを取得し、前記ヒットマップに従って、前記第１画像内の各点の位置オフセット、前記第１画像内の各ターゲットの中心点及び各ターゲットの検出ボックスの高さと幅を決定するように構成され、前記第１決定ユニットは更に、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点を決定した後、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置オフセットに従って、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置を補正して、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置を取得し、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置及びその検出ボックスの高さと幅に従って、前記第１画像においてインタラクション関係を有するターゲットの検出ボックスを決定するように構成される。 In some exemplary embodiments of the invention, the first determination unit downsamples the feature data to obtain a hit map of the first image, and according to the hit map, in the first image. It is configured to determine the position offset of each point, the center point of each target in the first image and the height and width of the detection box for each target, and the first determination unit is further based on the feature data. After determining the center point of each target in the first image, the position of the center point of the target having an interaction relationship in the first image is determined according to the position offset of the center point of the target having an interaction relationship in the first image. Corrected to obtain the corrected position of the center point of the target having an interaction relationship in the first image, and the corrected position of the center point of the target having an interaction relationship in the first image and the height of the detection box thereof. According to the width and width, it is configured to determine the detection box of the target having an interaction relationship in the first image.

本発明のいくつかの例示的な実施例において、前記画像処理装置の各機能ユニットは、ニューラルネットワークで実現され、前記ニューラルネットワークは、サンプル画像を用いてトレーニングすることによって得られたものであり、前記サンプル画像には、インタラクション関係を有するターゲットの検出ボックスがマークされ、前記サンプル画像においてインタラクション関係を有するターゲットのマークされた中心点及びマークされたインタラクションキーポイントは、マークされた検出ボックスに従って決定され、マークされたオフセットは、インタラクション関係を有するターゲットのマークされた中心点及びマークされたインタラクションキーポイントに従って決定される。 In some exemplary embodiments of the invention, each functional unit of the image processing apparatus is implemented in a neural network, which is obtained by training with sample images. The sample image is marked with a detection box for the target having an interaction relationship, and the marked center point and the marked interaction key point of the target having an interaction relationship in the sample image are determined according to the marked detection box. , The marked offset is determined according to the marked center point and the marked interaction key point of the target having the interaction relationship.

本発明のいくつかの例示的な実施例において、前記装置は更に、サンプル画像を用いて前記ニューラルネットワークをトレーニングするように構成されるトレーニングユニットを備え、前記トレーニングユニットは、具体的には、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データを抽出し、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データをダウンサンプリングして前記サンプル画像のヒットマップを取得し、前記ニューラルネットワークを用いて、前記サンプル画像のヒットマップに基づいて、前記サンプル画像内の各点の位置オフセット、前記サンプル画像内の各インタラクションキーポイント、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の各ターゲットの検出ボックスの高さと幅を予測し、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データに基づいて少なくとも２つのオフセットを予測し、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の前記インタラクションキーポイント及び前記サンプル画像内の少なくとも２つのオフセットに基づいて、前記サンプル画像内のターゲット間のインタラクション関係を予測し、予測された位置オフセット、前記サンプル画像においてインタラクション関係を有するターゲットの予測された中心点及び予測された検出ボックスの高さと幅、前記サンプル画像においてインタラクション関係を有するターゲットに対応する予測されたインタラクションキーポイント及びそれに対応する予測されたオフセット、及びマークされた位置オフセット及び前記サンプル画像にマークされたインタラクション関係を有するターゲットの検出ボックスに従って、前記ニューラルネットワークのネットワークパラメータ値を調整するように構成される。 In some exemplary embodiments of the invention, the device further comprises a training unit configured to train the neural network with sample images, wherein the training unit is specifically the said. The characteristic data of the sample image is extracted using the neural network, the characteristic data of the sample image is downsampled using the neural network to obtain a hit map of the sample image, and the neural network is used. , Position offset of each point in the sample image, each interaction key point in the sample image, center point of each target in the sample image, each target in the sample image, based on the hit map of the sample image. Predict the height and width of the detection box of, and use the neural network to predict at least two offsets based on the feature data of the sample image, the center point of each target in the sample image, in the sample image. Predict the interaction relationship between the targets in the sample image based on the interaction key point of the sample image and at least two offsets in the sample image, and predict the predicted position offset, the target having the interaction relationship in the sample image. The height and width of the center point and the predicted detection box, the predicted interaction keypoints and corresponding predicted offsets corresponding to the targets having an interaction relationship in the sample image, and the marked position offsets and said. It is configured to adjust the network parameter values of the neural network according to the detection box of the target having the interaction relationship marked in the sample image.

本発明の実施例は、コンピュータプログラムが記憶されているコンピュータ可読記憶媒体を更に提供し、当該プログラムがプロセッサによって実行されるときに、本発明の実施例に記載の方法のステップを実現する。 The embodiments of the present invention further provide a computer-readable storage medium in which a computer program is stored, and realize the steps of the method described in the embodiments of the present invention when the program is executed by a processor.

本発明の実施例は、電子機器を更に提供し、前記電子機器は、メモリと、プロセッサと、メモリに記憶された、コンピュータによって実行可能なコンピュータプログラムと、を備え、前記プロセッサが前記プログラムを実行するときに、本発明の実施例に記載の方法のステップを実現する。 An embodiment of the present invention further provides an electronic device, wherein the electronic device comprises a memory, a processor, and a computer-executable computer program stored in the memory, wherein the processor executes the program. When doing so, the steps of the method described in the embodiments of the present invention are realized.

本発明の実施例は、コンピュータ可読コードを含むコンピュータプログラムを更に提供し、前記コンピュータ可読コードが電子機器によって実行されるときに、前記電子機器のプロセッサに、本発明の実施例に記載の方法のステップを実行させる。 The embodiments of the present invention further provide a computer program comprising a computer-readable code, and when the computer-readable code is executed by the electronic device, the processor of the electronic device is subjected to the method according to the embodiment of the present invention. Have the steps performed.

本発明の実施例は、画像処理方法、装置、記憶媒体及び電子機器を提供し、前記方法は、第１画像の特徴データを抽出することと、前記特徴データに基づいて、前記第１画像の各インタラクションキーポイント及び各ターゲットの中心点を決定することであって、１つのインタラクションキーポイントは、連結線の中点からプリセットされた範囲内の前記連結線上の一点であり、前記連結線は、１つのインタラクション動作における２つのターゲットの中心点間の連結線である、ことと、前記特徴データに基づいて、少なくとも２つのオフセットを決定することであって、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表す、ことと、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することと、を含む。本発明の実施例の技術的解決策を採用すると、インタラクション動作に関するインタラクションキーポイントを定義し、インタラクションキーポイントに関連する少なくとも２つのオフセットを決定し、更に、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づき、前記第１画像内のターゲット間のインタラクション関係を決定することにより、人－物体ペアを生成する必要もなく、人－物体ペアを用いてインタラクション動作検出を実行することにより引き起こされる、実際のインタラクション関係を有する人－物体ペアを欠落するという問題も回避することができる。更に、最初に人及び物体を検出し、次に人と物体とをのペアリングした後、関係分類ネットワークを介して各人－物体ペアに対して分類検出を実行する従来の方式と比較すると、本実施例は、検出速度を大幅に向上させ、検出効率を向上させる。 An embodiment of the present invention provides an image processing method, an apparatus, a storage medium, and an electronic device, in which the method extracts the feature data of the first image and based on the feature data, the first image. By determining the center point of each interaction key point and each target, one interaction key point is one point on the connection line within a preset range from the midpoint of the connection line, and the connection line is a point. It is a connecting line between the center points of two targets in one interaction operation, and at least two offsets are determined based on the feature data, one offset being in one interaction operation. Representing an offset between an interaction key point and a center point of one target in the interaction operation, and in the first image based on the center point of each target, the interaction key point and the at least two offsets. Includes determining the interaction relationships between targets. Employing the technical solutions of the embodiments of the present invention, the interaction keypoints for interaction behavior are defined, at least two offsets associated with the interaction keypoints are determined, and the center point of each target, said interaction keypoint. And by determining the interaction relationship between the targets in the first image based on the at least two offsets, it is not necessary to generate a person-object pair and the interaction motion detection is performed using the person-object pair. The problem of missing a person-object pair with an actual interaction relationship caused by this can also be avoided. Furthermore, compared to the conventional method of first detecting a person and an object, then pairing the person and the object, and then performing classification detection for each person-object pair via a relationship classification network, In this embodiment, the detection speed is greatly improved and the detection efficiency is improved.

本発明の実施例に係る画像処理方法の例示的なフローチャートである。It is an exemplary flowchart of the image processing method which concerns on embodiment of this invention. 本発明の実施例に係る画像処理方法の応用の概略図である。It is a schematic diagram of the application of the image processing method which concerns on embodiment of this invention. 本発明の実施例に係る画像処理方法の別の応用の概略図である。It is a schematic diagram of another application of the image processing method which concerns on embodiment of this invention. 本発明の実施例に係る画像処理方法におけるニューラルネットワークのトレーニング方法の例示的なフローチャートである。It is an exemplary flowchart of the training method of the neural network in the image processing method which concerns on embodiment of this invention. 本発明の実施例に係る画像処理装置の構成の第１概略構造図である。It is a 1st schematic structural diagram of the structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施例に係る画像処理装置の構成の第２概略構造図である。It is a 2nd schematic structural diagram of the structure of the image processing apparatus which concerns on embodiment of this invention. 本発明の実施例に係る電子機器のハードウェア構成の概略構造図である。It is a schematic structural diagram of the hardware configuration of the electronic device which concerns on embodiment of this invention.

以下、図面および具体的な実施例を参照して、本発明をさらに詳細に説明する。 Hereinafter, the present invention will be described in more detail with reference to the drawings and specific examples.

本発明の実施例は、画像処理方法を提供する。図１は、本発明の実施例に係る画像処理方法の例示的なフローチャートであり、図１に示されたように、前記画像処理方法は、次のステップを含む。 Examples of the present invention provide an image processing method. FIG. 1 is an exemplary flowchart of an image processing method according to an embodiment of the present invention, and as shown in FIG. 1, the image processing method includes the following steps.

ステップ１０１において、第１画像の特徴データを抽出する。 In step 101, the feature data of the first image is extracted.

ステップ１０２において、前記特徴データに基づいて、前記第１画像の各インタラクションキーポイント及び各ターゲットの中心点を決定し、１つのインタラクションキーポイントは、連結線の中点からプリセットされた範囲内の前記連結線上の一点であり、前記連結線は、１つのインタラクション動作における２つのターゲットの中心点間の連結線である。 In step 102, each interaction key point of the first image and a center point of each target are determined based on the feature data, and one interaction key point is said to be within a preset range from the midpoint of the connecting line. It is a point on the connecting line, and the connecting line is a connecting line between the center points of two targets in one interaction operation.

ステップ１０３において、前記特徴データに基づいて、少なくとも２つのオフセットを決定し、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表す。 In step 103, at least two offsets are determined based on the feature data, and one offset represents an offset between the interaction key point in one interaction operation and the center point of one target in the interaction operation.

ステップ１０４において、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定する。 In step 104, the interaction relationship between the targets in the first image is determined based on the center point of each target, the interaction key point and the at least two offsets.

本実施例では、第１画像は複数のターゲットを含み得、ここで、前記複数のターゲットの各ターゲット間にはインタラクション関係を有しない可能性があるか、又は、前記複数のターゲットは、インタラクション関係を有する少なくとも１つのグループのターゲットを含み得る。ここで、前記インタラクション関係を有するターゲットは、具体的には、少なくとも２つのターゲットであり、例示的に、前記少なくとも２つのターゲットは、少なくとも、１つのターゲット人物を有する。例えば、インタラクション関係を有する２つのターゲットは、インタラクション関係を有する２つのターゲット人物であるか、又は、インタラクション関係を有する２つのターゲットは、インタラクション関係を有する１つのターゲット人物と１つのターゲット物体である。理解できることとして、前記インタラクション関係を有する少なくとも２つのターゲットは、具体的には、インタラクション動作に関する２つのターゲットであってもよく、ここで、前記インタラクション動作に関する２つのターゲットは、直接インタラクション動作又は暗黙的なインタラクション動作に関する２つのターゲットであってもよい。一例として、第１画像に含まれたターゲット人物がタバコを手に持っている場合、当該ターゲット人物とターゲット物体であるタバコとの間には、直接動作関係があると見なすことができ、この例では、ターゲット人物とターゲットオブジェクトとの間には、直接動作関係がある。別の例として、第１画像に含まれるターゲット人物がボールを打ち、ターゲット人物が打ち動作を行い、ボールがターゲット人物の手の下の空中にある場合、当該ターゲット人物とターゲット物体であるボールとの間には、暗黙の動作関係があると見なすことができる。 In this embodiment, the first image may include a plurality of targets, where there may be no interaction relationship between the targets of the plurality of targets, or the plurality of targets may have an interaction relationship. May include at least one group of targets having. Here, the target having the interaction relationship is specifically at least two targets, and exemplifiedly, the at least two targets have at least one target person. For example, two targets having an interaction relationship are two target persons having an interaction relationship, or two targets having an interaction relationship are one target person having an interaction relationship and one target object. It is understandable that at least two targets having the interaction relationship may be specifically two targets relating to the interaction behavior, where the two targets relating to the interaction behavior are either direct interaction behavior or implicit. It may be two targets for interaction behavior. As an example, when the target person included in the first image holds a cigarette, it can be considered that there is a direct motion relationship between the target person and the cigarette which is the target object, and this example. Then, there is a direct behavioral relationship between the target person and the target object. As another example, when the target person included in the first image hits the ball, the target person makes a striking motion, and the ball is in the air under the hands of the target person, the target person and the ball which is the target object are used. It can be considered that there is an implicit behavioral relationship between them.

本発明の実施例に係る画像処理方法において、画像内のターゲットがインタラクション関係を有するかどうかを決定する場合、ターゲットの中心点及びインタラクションキーポイントを決定するステップ（点を検出するステップ）と、オフセットを決定するステップ（点をマッチングするステップ）とを同時に実行することができ、その後、決定されたオフセット、決定された中心点、及びインタラクションキーポイントに従って、インタラクション関係を有するターゲット及びそのインタラクション動作カテゴリを最終的に決定し、それにより、インタラクション関係検出効率を向上させる。 In the image processing method according to the embodiment of the present invention, when determining whether or not a target in an image has an interaction relationship, a step of determining a center point and an interaction key point of the target (a step of detecting a point) and an offset. Can be performed simultaneously with the step of determining the target (the step of matching points), and then the target having an interaction relationship and its interaction action category according to the determined offset, the determined center point, and the interaction key point. The final decision is made, thereby improving the efficiency of interaction relationship detection.

本発明の一例示的な実施例において、ステップ１０１について、前記第１画像の特徴データを抽出することは、深層ニューラルネットワークモデルを介して、前記第１画像の特徴データを抽出することを含む。例示的に、第１画像を入力データとして深層ニューラルネットワークモデルに入力して、前記第１画像の特徴データを取得する。ここで、理解できることとして、深層ニューラルネットワークモデルは、複数の畳み込み層を含み得、各畳み込み層を介して第１画像に対して畳み込み処理を順次実行することにより、第１画像の特徴データを取得する。 In an exemplary embodiment of the invention, for step 101, extracting the feature data of the first image comprises extracting the feature data of the first image via a deep neural network model. Illustratively, the first image is input to the deep neural network model as input data, and the feature data of the first image is acquired. Here, it can be understood that the deep neural network model may include a plurality of convolution layers, and the feature data of the first image is acquired by sequentially executing the convolution process for the first image via each convolution layer. do.

本実施例では、事前トレーニングによって得られた第１ブランチネットワークを介してステップ１０２を実行でき、即ち、第１ブランチネットワークを介して、前記特徴データに基づいて、各ターゲットの中心点及び各インタラクションキーポイントを決定する。理解できることとして、前記第１画像の特徴データを入力データとして前記第１ブランチネットワークに入力することにより、前記第１画像内の各ターゲットの中心点及び各インタラクションキーポイントを取得する。例えば、第１画像に含まれるターゲットがすべてターゲット人物である場合、前記第１ブランチネットワークを介して、前記特徴データを処理して、各ターゲット人物の中心点及び各インタラクションキーポイントを取得する。別の例では、第１画像に含まれるターゲットがターゲット人物及びターゲット物体を含む場合、前記第１ブランチネットワークを介して、前記特徴データを処理して、ターゲット人物の中心点、ターゲット物体の中心点及び各インタラクションキーポイントを取得する。 In this embodiment, step 102 can be performed via the first branch network obtained by pre-training, that is, via the first branch network, based on the feature data, the center point of each target and each interaction key. Determine the points. As can be understood, by inputting the feature data of the first image into the first branch network as input data, the center point of each target and each interaction key point in the first image are acquired. For example, when all the targets included in the first image are target people, the feature data is processed via the first branch network to acquire the center point of each target person and each interaction key point. In another example, when the target included in the first image includes the target person and the target object, the feature data is processed via the first branch network, and the center point of the target person and the center point of the target object are processed. And get each interaction key point.

ここで、いくつかの実施例において、ターゲットの中心点の後、第１ブランチネットワークは、ターゲットの検出ボックスの長さと幅を回帰し、ターゲットの検出ボックスは、ターゲットの中心点及びターゲットの検出ボックスの長さと幅に従って決定される。図２に示されたように、第１画像は、２つのターゲット人物及び２つのターゲット物体（２つのターゲット物体は２つのボールである）を含み、両者を区別するために、ターゲット人物の中心点を第１中心点として記録し、ターゲット物体の中心点を第２中心点として記録する。 Here, in some embodiments, after the target center point, the first branch network regresses the length and width of the target discovery box, and the target discovery box is the target center point and the target discovery box. Determined according to the length and width of. As shown in FIG. 2, the first image contains two target people and two target objects (two target objects are two balls), and a center point of the target person to distinguish between them. Is recorded as the first center point, and the center point of the target object is recorded as the second center point.

ここで、いくつかの実施例において、インタラクションキーポイントは、１つのインタラクション動作における２つのターゲットの中心点間の連結線において、当該連結線の中点からプリセットされた範囲内の点である。一例として、前記インタラクションキーポイントは、１つのインタラクション動作における２つのターゲットの中心点の連結線の中点であってもよい。図２に示されたように、１つのインタラクションキーポイントは、１つのインタラクション動作におけるターゲット人物の第１中心点とターゲット物体の第２中心点の連結線の中点であってもよい。 Here, in some embodiments, the interaction key point is a point within a preset range from the midpoint of the connection line at the connection line between the center points of the two targets in one interaction operation. As an example, the interaction key point may be the midpoint of the connecting line of the center points of two targets in one interaction operation. As shown in FIG. 2, one interaction key point may be the midpoint of the connecting line between the first center point of the target person and the second center point of the target object in one interaction operation.

本実施例では、事前トレーニングによって得られた第２ブランチネットワークを介してステップ１０３を実行することができ、即ち、第２ブランチネットワークを介して、前記特徴データに基づいて、少なくとも２つのオフセットを決定することができる。ここで、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表す。理解できることとして、第１画像の特徴データを入力データとして前記第２ブランチネットワークに入力することにより、第１画像内の各点の少なくとも２つのオフセットを取得する。 In this embodiment, step 103 can be performed via the second branch network obtained by pre-training, i.e., at least two offsets are determined based on the feature data via the second branch network. can do. Here, one offset represents an offset between the interaction key point in one interaction operation and the center point of one target in the interaction operation. As can be understood, by inputting the feature data of the first image as input data into the second branch network, at least two offsets of each point in the first image are acquired.

実際の応用では、各点に対応する少なくとも２つのオフセットはオフセット行列で表すことができる。ステップ１０２で決定された各インタラクションキーポイントに基づいて、各インタラクションキーポイントに対応する少なくとも２つのオフセットを決定することができる。いくつかの実施例において、各インタラクションキーポイントの座標、及び各点に対応するオフセット行列に従って、各インタラクションキーポイントに対応する少なくとも２つのオフセットを決定することができる。 In practical applications, at least two offsets corresponding to each point can be represented by an offset matrix. Based on each interaction keypoint determined in step 102, at least two offsets corresponding to each interaction keypoint can be determined. In some embodiments, at least two offsets corresponding to each interaction key point can be determined according to the coordinates of each interaction key point and the offset matrix corresponding to each point.

図２に示されたように、例示的に、１つのオフセットは、インタラクション動作におけるインタラクションキーポイントと第１中心点の間のオフセットを表し、もう１つのオフセットは、前記インタラクション動作におけるインタラクションキーポイントと第２中心点の間のオフセットを表し、両者を区別するために、インタラクション動作におけるインタラクションキーポイントと第１中心点の間のオフセットを第１オフセットとして記録し、前記インタラクション動作におけるインタラクションキーポイントと第２中心点の間のオフセットを第２オフセットとして記録し、この例では、第１オフセットは、インタラクション動作におけるインタラクションキーポイントと第１中心点の間のオフセットを表し、第２オフセットは、インタラクション動作におけるインタラクションキーポイントと第２中心点の間のオフセットを表す。もちろん、他の例において、２つのターゲットをそれぞれ第１ターゲット及び第２ターゲットに記録することもでき、この場合、第１オフセットは、インタラクション動作におけるインタラクションキーポイントと第１ターゲットの中心点の間のオフセットを表し、第２オフセットは、インタラクション動作におけるインタラクションキーポイントと第２ターゲットの中心点の間のオフセットを表す。 As shown in FIG. 2, schematically, one offset represents an offset between an interaction key point in an interaction operation and a first center point, and another offset is an interaction key point in the interaction operation. In order to represent the offset between the second center points and distinguish between them, the offset between the interaction key point in the interaction operation and the first center point is recorded as the first offset, and the interaction key point and the first in the interaction operation are recorded. The offset between the two center points is recorded as the second offset, in which in this example the first offset represents the offset between the interaction key point and the first center point in the interaction operation and the second offset is in the interaction operation. Represents the offset between the interaction key point and the second center point. Of course, in another example, the two targets could be recorded on the first target and the second target, respectively, where the first offset is between the interaction key point and the center point of the first target in the interaction operation. The second offset represents the offset between the interaction key point and the center point of the second target in the interaction operation.

本実施例では、ステップ１０４について、前記各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することは、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する２つのオフセットを決定することと、前記インタラクションキーポイント及び前記インタラクションキーポイントに対応する２つのオフセットに従って、前記インタラクションキーポイントに対応する２つの予測中心点を決定することと、各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に従って、各インタラクションキーポイントに対応する２つのターゲットを決定することと、各インタラクションキーポイントに対応する２つのターゲットに従って、前記第１画像内のターゲット間のインタラクション関係を決定することと、を含む。 In this embodiment, for step 104, determining the interaction relationship between the targets in the first image is one interaction based on the center point of each target, the interaction key point and at least the two offsets. For a key point, determine the two offsets corresponding to the interaction key point, and according to the two offsets corresponding to the interaction key point and the interaction key point, the two prediction center points corresponding to the interaction key point are determined. Determining and determining the two targets corresponding to each interaction keypoint according to the center point of each target and the two predicted centerpoints corresponding to each interaction keypoint, and the two corresponding interaction keypoints. Including determining the interaction relationship between the targets in the first image according to the target.

本実施例では、ステップ１０３で決定された少なくとも２つのオフセットは、インタラクション動作（即ち、インタラクション関係）に関する少なくとも２つのターゲットを決定するために使用される。ステップ１０２で決定された各ターゲットの中心点及び各インタラクションキーポイントを通して、インタラクション動作に関するターゲットを知ることはできない。これに基づいて、本実施例では、各インタラクションキーポイントに対応する２つのオフセットを決定し、前記インタラクションキーポイント及び前記インタラクションキーポイントに対応する２つのオフセットに従って、前記インタラクションキーポイントに対応する２つの予測中心点を決定する。 In this embodiment, the at least two offsets determined in step 103 are used to determine at least two targets for interaction behavior (ie, interaction relationships). It is not possible to know the target related to the interaction operation through the center point of each target and each interaction key point determined in step 102. Based on this, in this embodiment, two offsets corresponding to each interaction key point are determined, and two offsets corresponding to the interaction key points are followed according to the interaction key points and the two offsets corresponding to the interaction key points. Determine the prediction center point.

例示的に、任意のインタラクションキーポイント（ここでは第１インタラクションキーポイントとして記録する）を例にとると、第１インタラクションキーポイントの位置及び当該第１インタラクションキーポイントに対応する１つのオフセット（例えば、第１オフセット）に基づいて、第１位置を決定することができ、前記第１位置は、理論的には、第１インタラクションキーポイントと一致する１つのターゲットの中心点（例えば、第１中心点）の位置として使用でき、ここで、前記第１位置を第１予測中心点として記録する。同様に、第１インタラクションキーポイントの位置及び当該第１インタラクションキーポイントに対応する別のオフセット（例えば、第２オフセット）に基づいて、第２位置を決定することができ、ここで、前記第２位置を第２予測中心点として記録する。 Illustratively, taking any interaction keypoint (recorded here as the first interaction keypoint) as an example, the position of the first interaction keypoint and one offset corresponding to the first interaction keypoint (eg, for example). The first position can be determined based on the first offset), where the first position theoretically coincides with the first interaction key point and is the center point of one target (eg, the first center point). ), Where the first position is recorded as the first prediction center point. Similarly, the second position can be determined based on the position of the first interaction keypoint and another offset corresponding to the first interaction keypoint (eg, the second offset), where said second. Record the position as the second prediction center point.

さらに、中心点と取得された予測中心点との間の距離がプリセットされた距離閾値より小さいターゲットを、当該予測中心点に対応するインタラクションキーポイントに対応するターゲットとして使用する。例示的に、第１ターゲットの中心点と上記の第１予測中心点との間の距離がプリセットされた距離閾値より小さく、第２ターゲットの中心点と上記の第２プリセットされた中心点との間の距離が前記プリセットされた距離閾値より小さい場合、前記第１ターゲット及び前記第２ターゲットが、上記の第１インタラクションキーポイントに対応する２つのターゲットであることを示し得る。理解できることとして、特定の予測中心点との距離がプリセットされた距離閾値より小さいターゲットの中心点の数が複数である場合があり、つまり、１つのインタラクションキーポイントに対応するターゲットが２つ又は２つ以上存在する可能性がある。 Further, a target in which the distance between the center point and the acquired predicted center point is smaller than the preset distance threshold value is used as the target corresponding to the interaction key point corresponding to the predicted center point. Illustratively, the distance between the center point of the first target and the above-mentioned first predicted center point is smaller than the preset distance threshold value, and the center point of the second target and the above-mentioned second preset center point are If the distance between them is less than the preset distance threshold, it may indicate that the first target and the second target are two targets corresponding to the first interaction key point. It is understandable that there may be more than one target center point whose distance to a particular predicted center point is less than the preset distance threshold, that is, there are two or two targets corresponding to one interaction key point. There may be more than one.

本実施例では、各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、当該インタラクションキーポイントに対応する少なくとも２つのターゲット間のインタラクション関係を決定することができる。理解できることとして、第１ブランチネットワークを介して特徴データを処理して、第１画像内の各インタラクションキーポイントを取得する場合、各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度も取得することができ、前記プリセットされたインタラクション動作カテゴリの信頼度に基づいて、少なくとも２つのターゲット間のインタラクション関係を決定することができる。 In this embodiment, the interaction relationship between at least two targets corresponding to the interaction key points can be determined based on the reliability of each preset interaction action category corresponding to each interaction key point. It is understandable that when processing feature data through the first branch network to get each interaction keypoint in the first image, the reliability of each preset interaction action category corresponding to each interaction keypoint is also It can be acquired and the interaction relationship between at least two targets can be determined based on the reliability of the preset interaction action category.

本発明の実施例の技術的解決策を採用すると、インタラクション動作に関するインタラクションキーポイントを定義し、インタラクションキーポイントに関連する少なくとも２つのオフセットを決定し、更に、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づき、前記第１画像内のターゲット間のインタラクション関係を決定することにより、人－物体ペアを生成する必要もなく、人－物体ペアを用いてインタラクション動作検出を実行することにより引き起こされる、実際のインタラクション関係を有する人－物体ペアを欠落するという問題も回避することができる。本実施例は、インタラクション関係を有するターゲットを直接取得するため、関係分類ネットワークを介して各人－物体ペアに対して分類検出を実行する従来の方法と比較すると、本実施例は、検出速度を大幅に向上させ、検出効率を向上させる。 Employing the technical solutions of the embodiments of the present invention, the interaction keypoints for interaction behavior are defined, at least two offsets associated with the interaction keypoints are determined, and the center point of each target, said interaction keypoint. And by determining the interaction relationship between the targets in the first image based on the at least two offsets, it is not necessary to generate a person-object pair and the interaction motion detection is performed using the person-object pair. The problem of missing a person-object pair with an actual interaction relationship caused by this can also be avoided. In this embodiment, since the target having the interaction relationship is directly acquired, this embodiment shows the detection speed as compared with the conventional method of performing classification detection for each person-object pair via the relationship classification network. Significantly improve and improve detection efficiency.

以下、図１に示される画像処理方法の各ステップについて詳細に説明する。 Hereinafter, each step of the image processing method shown in FIG. 1 will be described in detail.

本発明の一例示的な実施例において、ステップ１０２について、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点を決定することは、前記特徴データをダウンサンプリングして前記第１画像のヒットマップを取得することと、前記ヒットマップに従って、前記第１画像内の各点の位置オフセット、前記第１画像内の各ターゲットの中心点及び各ターゲットの検出ボックスの高さと幅を決定することと、を含む。前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点を決定した後、前記画像処理方法は、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置オフセットに従って、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置を補正して、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置を取得することと、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置及びその検出ボックスの高さと幅に従って、前記第１画像においてインタラクション関係を有するターゲットの検出ボックスを決定することと、を更に含む。 In an exemplary embodiment of the invention, for step 102, determining the center point of each target in the first image based on the feature data is the first downsampling of the feature data. Obtaining a hit map of an image and determining the position offset of each point in the first image, the center point of each target in the first image, and the height and width of the detection box of each target according to the hit map. Including to do. After determining the center point of each target in the first image based on the feature data, the image processing method follows the position offset of the center point of the target having an interaction relationship in the first image. Correcting the position of the center point of the target having an interaction relationship in the image to acquire the corrected position of the center point of the target having an interaction relationship in the first image, and having an interaction relationship in the first image. Further including determining the detection box of the target having an interaction relationship in the first image according to the corrected position of the center point of the target and the height and width of the detection box thereof.

本実施例では、前記第１画像の特徴データをダウンサンプリング処理し、前記ダウンサンプリング処理は、例えば、特徴データを含む特徴マップに対して画像縮小処理を実行すること、即ち、特徴マップのサイズを縮小することであり得、これにより、ダウンサンプリング後に取得されたヒットマップ内の各点と第１画像内の各点は、１対１で対応しない。例えば、第１画像のサイズは１２８ｘ１２８であり、第１画像内のターゲット人物の中心点は（１０，１０）であるが、ヒットマップはダウンサンプリングによって取得されたものであるため、３２ｘ３２に４倍ダウンサンプリングすると、ターゲット人物の中心点は（１０／４、１０／４）＝（２．５、２．５）にマッピングされるが、ヒットマップにおける点の座標は整数であるため、ヒットマップにおいて予測されるターゲット人物の中心点は、座標の小数点以下を切り捨てた点であり、即ち、座標は（２、２）である。つまり、ダウンサンプリングすると、ターゲット人物の中心点の位置の位置オフセットが発生する。 In this embodiment, the feature data of the first image is downsampled, and the downsampling process is, for example, performing an image reduction process on the feature map including the feature data, that is, determining the size of the feature map. It can be reduced, so that each point in the hit map acquired after downsampling does not have a one-to-one correspondence with each point in the first image. For example, the size of the first image is 128x128, and the center point of the target person in the first image is (10,10), but since the hit map was acquired by downsampling, it is quadrupled to 32x32. When downsampling, the center point of the target person is mapped to (10/4, 10/4) = (2.5, 2.5), but since the coordinates of the points in the hit map are integers, in the hit map The predicted center point of the target person is the point where the decimal point of the coordinates is truncated, that is, the coordinates are (2, 2). That is, when downsampling occurs, a position offset of the position of the center point of the target person occurs.

したがって、第１ブランチネットワークを介して、前記特徴データを処理することができ、具体的には、まず、特徴データが含まれた特徴マップをダウンサンプリングしてヒットマップ（Ｈｅａｔｍａｐ）を取得し、その後、ヒットマップに従って、前記第１画像内の各点の位置オフセット、前記第１画像内の各ターゲットの中心点及び各ターゲットの検出ボックスの高さと幅を決定することができる。理解できることとして、特徴データを第１ブランチネットワークの入力データとして使用し、特徴データをダウンサンプリングすることによりヒットマップを取得した後、第１ブランチネットワークは、ヒットマップに基づいて、第１画像内の各点の位置オフセット（ｏｆｆｓｅｔ）、第１画像内の各ターゲットの中心点、各ターゲットの検出ボックスの高さと幅［ｈｅｉｇｈｔ、ｗｉｄｔｈ］及び各ターゲットが各カテゴリに属する信頼度、及び第１画像内の各インタラクションキーポイント及び各インタラクションキーポイントが各プリセットされたインタラクション動作カテゴリに属する信頼度を決定する。 Therefore, the feature data can be processed via the first branch network. Specifically, first, the feature map containing the feature data is downsampled to obtain a hit map (Heatmap), and then the feature map is obtained. , The position offset of each point in the first image, the center point of each target in the first image, and the height and width of the detection box of each target can be determined according to the hit map. It is understandable that after the feature data is used as the input data of the first branch network and the hit map is obtained by downsampling the feature data, the first branch network is based on the hit map in the first image. The position offset of each point, the center point of each target in the first image, the height and width [height, width] of the detection box of each target, the reliability that each target belongs to each category, and in the first image. Each interaction keypoint of the and each interaction keypoint determines the reliability that belongs to each preset interaction action category.

本実施例では、いくつかの実施例において、前記特徴データに基づいて、前記第１画像内の各点の位置オフセットを決定した後、インタラクション関係を有するターゲットの中心点の位置オフセットに基づいて、当該中心点の位置を補正することができる。例示的に、取得されたターゲットの中心点と対応する位置オフセットを加算して、補正後のターゲットの中心点の位置を取得することができる。これに対応して、ターゲットの中心点の補正後の位置及び検出ボックスの高さと幅に従って、当該ターゲットの検出ボックスを取得することにより、インタラクション関係を有するターゲットの検出ボックスを出力する。 In this embodiment, in some embodiments, the position offset of each point in the first image is determined based on the feature data, and then the position offset of the center point of the target having an interaction relationship is used. The position of the center point can be corrected. Illustratively, the position of the corrected center point of the target can be acquired by adding the acquired center point of the target and the corresponding position offset. Correspondingly, the detection box of the target having an interaction relationship is output by acquiring the detection box of the target according to the corrected position of the center point of the target and the height and width of the detection box.

例示的に、図２を参照すると、図２の第１中心点は補正後の位置であり、当該第１中心点を通る垂直の点線は、検出ボックスの高さ（ｈｅｉｇｈｔ）を示し、当該第１中心点を通る水平の点線は、検出ボックスの幅（ｗｉｄｔｈ）を示す。 Illustratively, referring to FIG. 2, the first center point of FIG. 2 is the corrected position, and the vertical dotted line passing through the first center point indicates the height of the detection box, and the first center point is the height of the detection box. 1 A horizontal dotted line passing through the center point indicates the width of the detection box.

本発明の一例示的な実施例において、ステップ１０２について、前記特徴データに基づいて、前記第１画像の各インタラクションキーポイント及び各ターゲットの中心点を決定することは、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点、及び各ターゲットの信頼度を決定することと、前記特徴データに基づいて、前記第１画像内のインタラクションキーポイント、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度を決定することと、を含み、
前記各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することは、各ターゲットの中心点に基づいて、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度、前記第１画像内のターゲット間のインタラクション関係を決定することを含む。 In an exemplary embodiment of the invention, for step 102, determining each interaction key point of the first image and the center point of each target based on the feature data is based on the feature data. Determining the center point of each target in the first image and the reliability of each target, and the interaction key points in the first image and the interaction key points corresponding to each interaction key point based on the feature data. Including determining the reliability of preset interaction behavior categories, including
Determining the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and at least two offsets is the interaction key based on the center point of each target. Includes determining points, the at least two offsets, the reliability of each target and the reliability of each preset interaction action category corresponding to each interaction key point, and the interaction relationships between the targets in the first image.

本実施例では、第１ブランチネットワークを介して特徴データを処理することができ、例示的に、第１ブランチネットワークの複数の畳み込み層を介して特徴データに対して畳み込み処理を実行して、第１画像内の各ターゲットの中心点及び各ターゲットの信頼度を取得することができ、ここで、前記ターゲットの信頼度は、前記第１画像に前記ターゲットがある信頼度であってもよい。これに対応して、第１ブランチネットワークの複数の畳み込み層を介して特徴データに対して畳み込み処理を実行して、第１画像内の各インタラクションキーポイント及び各インタラクションキーポイントに対応するプリセットされたインタラクション動作カテゴリの信頼度を取得することもでき、ここで、前記プリセットされたインタラクション動作カテゴリは、事前に設定された任意のインタラクション動作カテゴリ、例えば、喫煙インタラクション動作、ボールを打つインタラクション動作などであってもよい。さらに、各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定する。 In this embodiment, the feature data can be processed via the first branch network. Illustratively, the feature data is subjected to the convolution process via the plurality of convolution layers of the first branch network, and the convolution process is executed. The center point of each target in one image and the reliability of each target can be acquired, and the reliability of the target may be the reliability of the target in the first image. Correspondingly, a convolution process was performed on the feature data via a plurality of convolution layers of the first branch network, and presets corresponding to each interaction key point and each interaction key point in the first image. It is also possible to obtain the reliability of the interaction action category, where the preset interaction action category is any preset interaction action category, such as a smoking interaction action, an interaction action of hitting a ball, and the like. You may. Further, based on the center point of each target, the interaction key points, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point, the first. Determine the interaction relationships between the targets in the image.

これに基づいて、本発明の一例示的な実施例において、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点及び各ターゲットの信頼度を決定することは、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点及びそのカテゴリ、及び各ターゲットが各カテゴリに属する信頼度を決定することを含む。前記各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することは、各ターゲットの中心点及びそのカテゴリ、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットが各カテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することを含む。 Based on this, in an exemplary embodiment of the present invention, determining the center point of each target in the first image and the reliability of each target based on the feature data is the feature data. Based on this, the center point of each target in the first image and its category, and the reliability to which each target belongs to each category are determined. The first image is based on the center point of each target, the interaction key point, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. Determining the interaction relationships between targets within corresponds to the center point of each target and its category, said interaction key points, said at least two offsets, confidence that each target belongs to each category, and each interaction key point. It involves determining the interaction relationships between the targets in the first image based on the reliability of each preset interaction action category.

本実施例では、第１ブランチネットワークを介して特徴データを処理することができ、例示的に、第１ブランチネットワークの複数の畳み込み層を介して、特徴データに対して畳み込み処理を実行して、第１画像内の各ターゲットの中心点及びそのカテゴリ、及び各ターゲットが各カテゴリに属する信頼度を取得することができる。ここで、第１画像内のターゲットが属するカテゴリは、人、車両、ボールなどの任意のカテゴリを含み得、前記ターゲットが各カテゴリに属する信頼度は、前記第１画像の前記ターゲットが当該カテゴリに属する信頼度であり、すなわち、第１画像内の特定の位置に特定のカテゴリに属するターゲットがある信頼度である。本実施例では、各ターゲットの中心点及びそのカテゴリ、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットが各カテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定する。 In this embodiment, the feature data can be processed via the first branch network. Illustratively, the feature data is subjected to the convolution process via a plurality of convolution layers of the first branch network. The center point of each target in the first image and its category, and the reliability to which each target belongs to each category can be acquired. Here, the category to which the target in the first image belongs may include any category such as a person, a vehicle, and a ball, and the reliability that the target belongs to each category is such that the target in the first image belongs to the category. It is the reliability to which it belongs, that is, the reliability to which a target belonging to a specific category exists at a specific position in the first image. In this embodiment, the center point of each target and its category, the interaction key point, the at least two offsets, the reliability that each target belongs to each category, and each preset interaction operation category corresponding to each interaction key point. The interaction relationship between the targets in the first image is determined based on the reliability of.

本発明の一例示的な実施例において、前記各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定することは、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する２つのオフセットを決定することと、前記インタラクションキーポイント及び前記インタラクションキーポイントに対応する２つのオフセットに従って、前記インタラクションキーポイントに対応する２つの予測中心点を決定することと、各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に従って、各インタラクションキーポイントに対応する２つのターゲットを決定することと、各インタラクションキーポイントに対応する２つのターゲット、各ターゲットが各カテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定することと、を含む。 In an exemplary embodiment of the invention, the center point of each target, the interaction keypoint, at least two offsets, the reliability of each target, and each preset interaction action category corresponding to each interaction keypoint. To determine the interaction relationship between the targets in the first image based on the reliability of the first image is to determine the two offsets corresponding to the interaction key points for one interaction key point and to determine the interaction key. Determining the two prediction center points corresponding to the interaction key points according to the points and the two offsets corresponding to the interaction key points, and the center point of each target and the two prediction center points corresponding to each interaction key point. According to, two targets corresponding to each interaction key point are determined, two targets corresponding to each interaction key point, reliability of each target belonging to each category, and each preset corresponding to each interaction key point. It includes determining the interaction relationship between the targets in the first image according to the reliability of the interaction operation category.

本実施例では、任意のインタラクションキーポイント（ここでは第１インタラクションキーポイントとして記録する）を例にとると、第１インタラクションキーポイントの位置及び当該第１インタラクションキーポイントに対応する１つのオフセット（例えば、第１オフセット）に基づいて第１位置を決定することができ、ここで、前記第１位置を第１予測中心点として記録する。同様に、第１インタラクションキーポイントの位置及び当該第１インタラクションキーポイントに対応する別のオフセット（例えば、第２オフセット）に基づいて第２位置を決定することができ、ここで、前記第２位置を第２予測中心点として記録する。 In this embodiment, taking an arbitrary interaction key point (recorded here as a first interaction key point) as an example, the position of the first interaction key point and one offset corresponding to the first interaction key point (for example,). , 1st offset), the first position can be determined, where the first position is recorded as the first prediction center point. Similarly, the second position can be determined based on the position of the first interaction keypoint and another offset corresponding to the first interaction keypoint (eg, the second offset), where said second position. Is recorded as the second prediction center point.

さらに、各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に基づいて、各インタラクションキーポイントに対応する２つのターゲットを決定し、各インタラクションキーポイントに対応する２つのターゲット、各ターゲットが各カテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定する。 Furthermore, based on the center point of each target and the two predicted center points corresponding to each interaction key point, the two targets corresponding to each interaction key point are determined, and the two targets corresponding to each interaction key point, each. The interaction relationship between the targets in the first image is determined according to the reliability of the target belonging to each category and the reliability of each preset interaction operation category corresponding to each interaction key point.

本発明の一例示的な実施例において、前記各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に従って、各インタラクションキーポイントに対応する２つのターゲットを決定することは、１つの予測中心点について、各ターゲットの中心点と前記予測中心点との間の距離を決定することと、中心点と前記予測中心点との間の距離がプリセットされた距離閾値より小さいターゲットを、前記予測中心点に対応するインタラクションキーポイントに対応するターゲットとして使用することと、を含む。 In one exemplary embodiment of the invention, determining two targets corresponding to each interaction key point according to the center point of each target and the two predicted center points corresponding to each interaction key point is one. For the predicted center point, the distance between the center point of each target and the predicted center point is determined, and the target whose distance between the center point and the predicted center point is smaller than the preset distance threshold is described. Includes using as a target corresponding to the interaction key point corresponding to the prediction center point.

本実施例では、ターゲットの中心点と取得された予測中心点との間の距離がプリセットされた距離閾値より小さいターゲットを、当該予測中心点に対応するインタラクションキーポイントに対応するターゲットとして使用する。例示的に、第１ターゲットの中心点と上記の第１予測中心点との間の距離がプリセットされた距離閾値より小さく、第２ターゲットの中心点と上記の第２プリセットされた中心点との間の距離が前記プリセットされた距離閾値より小さい場合、前記第１ターゲット及び前記第２ターゲットが、上記の第１インタラクションキーポイントに対応する２つのターゲットであることを示し得る。理解できることとして、特定の予測中心点との距離がプリセットされた距離閾値より小さいターゲットの中心点の数が複数である場合があり、つまり、１つのインタラクションキーポイントに対応するターゲットが２つ又は２つ以上存在する可能性がある。さらに、各インタラクションキーポイントに対応する少なくとも２つのターゲット、各ターゲットが各カテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定する。 In this embodiment, a target in which the distance between the center point of the target and the acquired predicted center point is smaller than the preset distance threshold value is used as the target corresponding to the interaction key point corresponding to the predicted center point. Illustratively, the distance between the center point of the first target and the above-mentioned first predicted center point is smaller than the preset distance threshold value, and the center point of the second target and the above-mentioned second preset center point are If the distance between them is less than the preset distance threshold, it may indicate that the first target and the second target are two targets corresponding to the first interaction key point. It is understandable that there may be more than one target center point whose distance to a particular predicted center point is less than the preset distance threshold, that is, there are two or two targets corresponding to one interaction key point. There may be more than one. Further, according to the reliability of at least two targets corresponding to each interaction key point, the reliability of each target belonging to each category, and the reliability of each preset interaction operation category corresponding to each interaction key point, in the first image. Determine the interaction relationships between the targets.

本発明の一例示的な実施例において、前記各インタラクションキーポイントに対応する２つのターゲット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定することは、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する１つのプリセットされたインタラクション動作カテゴリの信頼度と前記インタラクションキーポイントに対応する２つのターゲットが対応するカテゴリに属する信頼度とを乗算して、第１信頼度を取得することであって、前記第１信頼度は、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記インタラクション動作カテゴリに属する信頼度であり、対応するカテゴリとは、２つのターゲット間のインタラクションがプリセットされたインタラクション動作カテゴリに属する場合、２つのターゲットが属するカテゴリを指す（例えば、プリセットされた動作カテゴリがバレーボールである場合、１つのターゲットが属する対応するカテゴリは人であり、もう１つのターゲットが属する対応するカテゴリはボールであり、プリセットされた動作カテゴリが電話を掛けることである場合、１つのターゲットが属する対応するカテゴリは人であり、もう１つのターゲットが属する対応するカテゴリは電話である）ことと、前記第１信頼度が信頼度閾値を超えることに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属すると決定することと、前記第１信頼度が信頼度閾値を超えないことに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属しないと決定することと、を含む。 In an exemplary embodiment of the invention, according to the reliability of the two targets corresponding to each of the interaction key points, the reliability of each target, and the reliability of each preset interaction operation category corresponding to each interaction key point. Determining the interaction relationship between the targets in the first image is the reliability of one preset interaction action category corresponding to the interaction keypoint and the reliability corresponding to the interaction keypoint 2 for one interaction keypoint. The first reliability is obtained by multiplying the reliability belonging to the corresponding category by one target, and the first reliability is the interaction relationship between the two targets corresponding to the interaction key points. It is the reliability belonging to the interaction operation category, and the corresponding category refers to the category to which the two targets belong (for example, the preset operation category) when the interaction between the two targets belongs to the preset interaction operation category. If is a volleyball, the corresponding category to which one target belongs is a person, the corresponding category to which the other target belongs is a ball, and the preset action category is to make a call, then one target. Corresponds to the interaction keypoint in response to the corresponding category to which the person belongs and the corresponding category to which the other target belongs is the telephone) and the first confidence exceeds the confidence threshold. Corresponds to the interaction key point in response to determining that the interaction relationship between the two targets is in the preset interaction action category and that the first confidence does not exceed the confidence threshold. Includes determining that the interaction relationship between two targets does not belong to the preset interaction action category.

本発明の一例示的な実施例において、前記画像処理方法は、１つのインタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が各プリセットされたインタラクション動作カテゴリに属しないと決定した後、前記インタラクションキーポイントに対応する２つのターゲット間にインタラクション関係がないと決定することを更に含む。 In an exemplary embodiment of the invention, the image processing method determines that the interaction relationship between two targets corresponding to one interaction key point does not belong to each preset interaction action category, and then the interaction. It further includes determining that there is no interaction relationship between the two targets corresponding to the key points.

本実施例では、１つのインタラクションキーポイントが少なくとも２つのターゲットに対応し、複数のターゲット間のインタラクション関係を決定するプロセスでは、まず、上記の技術案を採用して、複数のターゲットのうちの２つのターゲット間のインタラクション関係を決定し、当該２つのターゲット間のインタラクション関係が、対応するインタラクションキーポイントに対応するプリセットされたインタラクション動作カテゴリに属するかどうかを決定することができる。例えば、１つのインタラクションキーポイントに対応する３つのターゲットがあり、それぞれ、ターゲット１、ターゲット２及びターゲット３と記録し、この場合、上記の技術案を採用して、ターゲット１とターゲット２の間のインタラクション関係、ターゲット２とターゲット３の間のインタラクション関係、及びターゲット３とターゲット１の間のインタラクション関係をそれぞれ決定することができる。 In this embodiment, one interaction key point corresponds to at least two targets, and in the process of determining the interaction relationship between the plurality of targets, first, the above technical proposal is adopted and two of the plurality of targets are adopted. It is possible to determine the interaction relationship between two targets and determine whether the interaction relationship between the two targets belongs to the preset interaction action category corresponding to the corresponding interaction key point. For example, there are three targets corresponding to one interaction key point, recorded as target 1, target 2 and target 3, respectively, in which case the above technology proposal is adopted and between target 1 and target 2. The interaction relationship, the interaction relationship between the target 2 and the target 3, and the interaction relationship between the target 3 and the target 1 can be determined respectively.

図３は、本発明の実施例に係る画像処理方法の別の応用の概略図であり、図３に示されたように、ニューラルネットワークは、特徴抽出ネットワーク、第１ブランチネットワーク及び第２ブランチネットワークを含み得、ここで、特徴抽出ネットワークは、入力画像に対して特徴抽出を実行して、特徴データを取得するために使用される。第１ブランチネットワークは、特徴データをダウンサンプリングしてヒットマップを取得し、その後、ヒットマップに従って入力画像内の各ターゲットの中心点及び各インタラクションキーポイントを決定し、各点の位置オフセット（ｏｆｆｓｅｔ）及び各ターゲットの検出ボックスの高さと幅［高さ，幅］、各ターゲットがカテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度を取得するために使用される。第２ブランチネットワークは、特徴データを処理して入力画像内の各点の少なくとも２つのオフセットを取得するために使用され、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表す。 FIG. 3 is a schematic diagram of another application of the image processing method according to the embodiment of the present invention, and as shown in FIG. 3, the neural network includes a feature extraction network, a first branch network, and a second branch network. Here, the feature extraction network is used to perform feature extraction on the input image and acquire feature data. The first branch network downsamples the feature data to obtain a hit map, then determines the center point of each target and each interaction key point in the input image according to the hit map, and the position offset (offset) of each point. And used to get the height and width [height, width] of each target's detection box, the confidence that each target belongs to a category, and the confidence of each preset interaction action category that corresponds to each interaction key point. Will be done. The second branch network is used to process feature data to obtain at least two offsets for each point in the input image, where one offset is the interaction keypoint in one interaction operation and one in said interaction operation. Represents an offset between the center points of one target.

一実施形態において、第１ブランチネットワークを介して、特徴データを含む特徴マップをダウンサンプリングして、ヒットマップを取得する。この例の入力画像内のターゲットがターゲット人物及びターゲット物体を含むことを例にとると、両者を区別するために、ターゲット人物の中心点を第１中心点として記録し、ターゲット物体の中心点を第２中心点として記録すると、第１中心点を含む第１ヒットマップ、第２中心点を含む第２ヒットマップ及び各インタラクションキーポイントを含む第３ヒットマップをそれぞれ取得することができる。つまり、第１ブランチネットワークの出力データは、上記の第１ヒットマップ、第２ヒットマップ、第３ヒットマップ、入力画像内の各点の位置オフセット、及びターゲット人物及びターゲット物体の検出ボックスの高さと幅を含み得る。 In one embodiment, the feature map containing the feature data is downsampled via the first branch network to obtain a hit map. Taking the case where the target in the input image of this example includes the target person and the target object, in order to distinguish between the two, the center point of the target person is recorded as the first center point, and the center point of the target object is recorded. When recorded as the second center point, the first hit map including the first center point, the second hit map including the second center point, and the third hit map including each interaction key point can be acquired. That is, the output data of the first branch network includes the above-mentioned first hit map, second hit map, third hit map, position offset of each point in the input image, and the height of the target person and the target object detection box. May include width.

具体的には、第１ブランチネットワークを介して、各ターゲットの中心点及びそのカテゴリ、各ターゲットが各カテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度を取得することもできる。 Specifically, through the first branch network, the center point of each target and its category, the reliability of each target belonging to each category, and the reliability of each preset interaction operation category corresponding to each interaction key point. Can also be obtained.

一実施形態において、第２ブランチネットワークを介して、特徴データを含む特徴マップを処理して、各インタラクションキーポイントに対応する２つのオフセットを取得し、両者を区別するために、インタラクションキーポイントとインタラクション動作におけるターゲット人物の第１中心点との間のオフセットを第１オフセットとして記録し、インタラクションキーポイントとインタラクション動作におけるターゲット物体の第２中心点との間のオフセットを第２オフセットとして記録する。 In one embodiment, an interaction keypoint and an interaction are used to process a feature map containing feature data via a second branch network to obtain two offsets corresponding to each interaction keypoint and to distinguish between them. The offset between the first center point of the target person in the motion is recorded as the first offset, and the offset between the interaction key point and the second center point of the target object in the interaction motion is recorded as the second offset.

１つのインタラクションキーポイント及び当該インタラクションキーポイントに対応する第１オフセット及び第２オフセットに従って、前記インタラクションキーポイントに対応する２つの予測中心点（それぞれ第１予測中心点及び第２予測中心点として記録する）を決定し、第１予測中心点について、各第１中心点と第１予測中心点との距離をそれぞれ決定し、当該第１予測中心点との距離がプリセットされた距離閾値より小さい第１中心点を決定する。これに対応して、第２予測中心点について、各第２中心点と第２予測中心点との距離をそれぞれ決定し、当該第２予測中心点との距離がプリセットされた距離閾値より小さい第２中心点を決定する。 According to one interaction key point and the first and second offsets corresponding to the interaction key point, the two prediction center points corresponding to the interaction key point (recorded as the first prediction center point and the second prediction center point, respectively). ) Is determined, the distance between each first predicted center point and the first predicted center point is determined for the first predicted center point, and the distance to the first predicted center point is smaller than the preset distance threshold value. Determine the center point. Correspondingly, for the second predicted center point, the distance between each second center point and the second predicted center point is determined, and the distance to the second predicted center point is smaller than the preset distance threshold value. 2 Determine the center point.

図３における２つのインタラクションキーポイントについて、各インタラクションキーポイントに対応するプリセットされたインタラクション動作カテゴリの信頼度と、前記インタラクションキーポイントに対応するターゲット人物の信頼度及びターゲット物体の信頼度とをそれぞれ乗算して、第１信頼度を取得し、第１信頼度が信頼度閾値お超える場合、当該インタラクションキーポイントに対応するターゲット人物とターゲット物体との間のインタラクション関係が前記インタラクションキーポイントに対応するプリセットされたインタラクション動作カテゴリに属すると決定し、第１信頼度が信頼度閾値を超えない場合、当該インタラクションキーポイントに対応するターゲット人物とターゲット物体との間のインタラクション関係が前記インタラクションキーポイントに対応するプリセットされたインタラクション動作カテゴリに属しないと決定する。 For the two interaction key points in FIG. 3, the reliability of the preset interaction action category corresponding to each interaction key point is multiplied by the reliability of the target person and the reliability of the target object corresponding to the interaction key points, respectively. Then, when the first reliability is acquired and the first reliability exceeds the reliability threshold, the interaction relationship between the target person corresponding to the interaction key point and the target object is the preset corresponding to the interaction key point. If it is determined that the interaction operation category belongs and the first reliability does not exceed the reliability threshold, the interaction relationship between the target person and the target object corresponding to the interaction key point corresponds to the interaction key point. Determines that it does not belong to the preset interaction action category.

この例では、第１ブランチネットワークによって出力された入力画像内の各点の位置オフセットに基づいて、ターゲット人物の第１中心点及びターゲット物体の第２中心点の位置を補正して、インタラクション関係を有するターゲット人物の第１中心点の補正後の位置、及びターゲット物体の第２中心点の補正後の位置を取得し、入力画像においてインタラクション関係を有するターゲット人物の第１中心点の補正後の位置及びその検出ボックスの高さと幅［高さ、幅］、ターゲット物体の第２中心点の補正後の位置及びその検出ボックスの高さと幅［高さ、幅］に従って、前記第１画像においてインタラクション関係を有するターゲットの検出ボックスを決定する。ニューラルネットワークの出力結果は、ターゲット人物の第１中心点の補正後の位置及びその対応する検出ボックス、ターゲット物体の第２中心点の補正後の位置及びその対応する検出ボックス、及びターゲット人物とターゲット物体のインタラクション関係（即ち、インタラクション動作カテゴリ）を含む。入力画像においてインタラクション関係を有しないターゲットの場合、検出ボックスを出力しない。 In this example, the positions of the first center point of the target person and the second center point of the target object are corrected based on the position offset of each point in the input image output by the first branch network, and the interaction relationship is established. The corrected position of the first center point of the target person and the corrected position of the second center point of the target object are acquired, and the corrected position of the first center point of the target person having an interaction relationship in the input image is acquired. And the height and width [height, width] of the detection box, the corrected position of the second center point of the target object, and the height and width [height, width] of the detection box. Determine the detection box for the target with. The output result of the neural network is the corrected position of the first center point of the target person and its corresponding detection box, the corrected position of the second center point of the target object and its corresponding detection box, and the target person and the target. Includes object interaction relationships (ie, interaction behavior categories). If the target has no interaction relationship in the input image, the detection box is not output.

本発明の一例示的な実施例において、本実施例の前記画像処理方法はニューラルネットワークによって実行され、前記ニューラルネットワークは、サンプル画像を用いてトレーニングすることによって得られたものであり、前記サンプル画像には、インタラクション関係を有するターゲットの検出ボックスがマークされ、前記サンプル画像においてインタラクション関係を有するターゲットのマークされた中心点（即ち、ターゲットの検出ボックスの中心）及びマークされたインタラクションキーポイント（インタラクション関係を有するターゲットの検出ボックスの中心の連結線の中点）は、マークされた検出ボックスに従って決定され、マークされたオフセットは、サンプル画像のサイズ及びサンプル画像に従って決定されたヒットマップのサイズに従って決定される。これに基づいて、本発明の実施例は、ニューラルネットワークのトレーニング方法を更に提供する。図４は、本発明の実施例に係る画像処理方法におけるニューラルネットワークのトレーニング方法の例示的なフローチャートであり、図４に示されたように、前記方法は、次のステップを含む。 In an exemplary embodiment of the invention, the image processing method of the present embodiment is performed by a neural network, the neural network is obtained by training with a sample image, and the sample image. Is marked with the detection box of the target having an interaction relationship, the marked center point of the target having an interaction relationship (that is, the center of the detection box of the target) and the marked interaction key point (interaction relationship) in the sample image. The midpoint of the connecting line in the center of the detection box of the target with is determined according to the marked detection box, and the marked offset is determined according to the size of the sample image and the size of the hit map determined according to the sample image. To. Based on this, the embodiments of the present invention further provide training methods for neural networks. FIG. 4 is an exemplary flow chart of a neural network training method in the image processing method according to an embodiment of the present invention, and as shown in FIG. 4, the method includes the following steps.

ステップ２０１において、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データを抽出する。 In step 201, the feature data of the sample image is extracted using the neural network.

ステップ２０２において、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データをダウンサンプリングして前記サンプル画像のヒットマップを取得する。 In step 202, the neural network is used to downsample the feature data of the sample image to obtain a hit map of the sample image.

ステップ２０３において、前記ニューラルネットワークを用いて、前記サンプル画像のヒットマップに基づいて、前記サンプル画像内の各点の位置オフセット、前記サンプル画像内の各インタラクションキーポイント、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の各ターゲットの検出ボックスの高さと幅を予測する。 In step 203, using the neural network, based on the hit map of the sample image, the position offset of each point in the sample image, each interaction key point in the sample image, and each target in the sample image. Predict the center point, the height and width of the detection box for each target in the sample image.

ステップ２０４において、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データに基づいて、少なくとも２つのオフセットを予測する。 In step 204, the neural network is used to predict at least two offsets based on the feature data of the sample image.

ステップ２０５において、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の前記インタラクションキーポイント及び前記サンプル画像内の少なくとも２つのオフセットに基づいて、前記サンプル画像内のターゲット間のインタラクション関係を予測する。 In step 205, the interaction relationship between the targets in the sample image is predicted based on the center point of each target in the sample image, the interaction key point in the sample image, and at least two offsets in the sample image. do.

ステップ２０６において、予測された位置オフセット、前記サンプル画像においてインタラクション関係を有するターゲットの予測された中心点及び予測された検出ボックスの高さと幅、前記サンプル画像においてインタラクション関係を有するターゲットに対応する予測されたインタラクションキーポイント及びそれに対応する予測されたオフセット、及びマークされた位置オフセット及び前記サンプル画像にマークされたインタラクション関係を有するターゲットの検出ボックスに従って、前記ニューラルネットワークのネットワークパラメータ値を調整する。 In step 206, the predicted position offset, the predicted center point of the target with the interaction relationship in the sample image and the predicted height and width of the detection box, the prediction corresponding to the target with the interaction relationship in the sample image. The network parameter values of the neural network are adjusted according to the detection box of the interaction key point and the corresponding predicted offset, and the marked position offset and the target having the interaction relationship marked in the sample image.

本実施例のステップ２０１～ステップ２０５の詳細については、上記の実施例を参照でき、ここでは繰り返して説明しない。 The details of steps 201 to 205 of this embodiment can be referred to in the above embodiment and will not be repeated here.

本実施例ステップ２０６では、いくつかの実施例において、ニューラルネットワークの第１ブランチネットワークについて、予測されたサンプル画像においてインタラクション関係を有するターゲットの予測された中心点、予測された検出ボックスの高さと幅、及び予測されたインタラクションキーポイント、及びインタラクション関係を有するターゲットのマークされた検出ボックスとマークされた位置オフセットに従って、１つの損失関数を決定でき、当該損失関数に基づいて第１ブランチネットワークのネットワークパラメータを調整することができる。 In step 206 of this example, in some examples, for the first branch network of the neural network, the predicted center point of the target having an interaction relationship in the predicted sample image, the predicted height and width of the detection box. One loss function can be determined according to the predicted interaction key points, and the marked detection box and marked position offset of the target with the interaction relationship, and the network parameters of the first branch network based on the loss function. Can be adjusted.

いくつかの実施例において、ニューラルネットワークの第２ブランチネットワークについて、インタラクションキーポイントに対応する予測されたオフセット及びマークされたオフセットに従って、１つの損失関数を決定でき、当該損失関数に基づいて第２ブランチネットワークのネットワークパラメータを調整することができる。 In some embodiments, for the second branch network of the neural network, one loss function can be determined according to the predicted and marked offsets corresponding to the interaction key points, and the second branch is based on the loss function. You can adjust the network parameters of the network.

いくつかの実施例において、予測された位置オフセット及びマークされた位置オフセットに基づいて１つの損失関数を決定し、当該損失関数を介して、特徴データを含む特徴マップをダウンサンプリングすることによる位置オフセットを補正することで、ダウンサンプリングのよる損失を最小限に抑えることにより、取得された各点の位置オフセット（ｏｆｆｓｅｔ）をより正確にすることができる。これに基づいて、当該損失関数を介して第１ブランチネットワークのネットワークパラメータを調整する。 In some embodiments, one loss function is determined based on the predicted position offset and the marked position offset, and the position offset by downsampling the feature map containing the feature data via the loss function. By correcting the above, the position offset (offset) of each acquired point can be made more accurate by minimizing the loss due to downsampling. Based on this, the network parameters of the first branch network are adjusted via the loss function.

本実施例では、前述した各実施例におけるパラメータ調整方式を用いて、ニューラルネットワークのネットワークパラメータ値を調整することができる。 In this embodiment, the network parameter value of the neural network can be adjusted by using the parameter adjustment method in each of the above-described embodiments.

本発明の実施例は、画像処理装置を更に提供する。図５は、本発明の実施例に係る画像処理装置の構成の第１概略構造図であり、図５に示されたように、前記装置は、抽出ユニット４１、第１決定ユニット４２、第２決定ユニット４３及び第３決定ユニット４４を備え、ここで、
前記抽出ユニット４１は、第１画像の特徴データを抽出するように構成され、
前記第１決定ユニット４２は、前記抽出ユニット４１によって抽出された前記特徴データに基づいて、前記第１画像内の各インタラクションキーポイント及び各ターゲットの中心点を決定するように構成され、１つのインタラクションキーポイントは、連結線の中点からプリセットされた範囲内の前記連結線上の一点であり、前記連結線は、１つのインタラクション動作における２つのターゲットの中心点間の連結線であり、
前記第２決定ユニット４３は、前記抽出ユニット４１によって抽出された前記特徴データに基づいて、少なくとも２つのオフセットを決定するように構成され、１つのオフセットは、１つのインタラクション動作におけるインタラクションキーポイントと前記インタラクション動作における１つのターゲットの中心点の間のオフセットを表し、
前記第３決定ユニット４４は、各ターゲットの中心点、前記インタラクションキーポイント及び前記少なくとも２つのオフセットに基づいて、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 The embodiments of the present invention further provide an image processing apparatus. FIG. 5 is a first schematic structural diagram of the configuration of the image processing apparatus according to the embodiment of the present invention, and as shown in FIG. 5, the apparatus includes an extraction unit 41, a first determination unit 42, and a second. It comprises a decision unit 43 and a third decision unit 44, where it
The extraction unit 41 is configured to extract the feature data of the first image.
The first determination unit 42 is configured to determine each interaction key point and the center point of each target in the first image based on the feature data extracted by the extraction unit 41, and one interaction. A key point is a point on the connection line within a preset range from the midpoint of the connection line, which connection line is a connection line between the center points of two targets in one interaction operation.
The second determination unit 43 is configured to determine at least two offsets based on the feature data extracted by the extraction unit 41, one offset being an interaction key point in one interaction operation and said. Represents the offset between the center points of one target in an interaction operation
The third determination unit 44 is configured to determine the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point, and the at least two offsets.

本発明の一例示的な実施例において、前記第１決定ユニット４２は、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点、及び各ターゲットの信頼度を決定し、前記特徴データに基づいて、前記第１画像内のインタラクションキーポイント、及び各インタラクションキーポイントに対応する各インタラクション動作カテゴリの信頼度を決定するように構成され、
前記第３決定ユニット４４は、各ターゲットの中心点、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 In an exemplary embodiment of the invention, the first determination unit 42 determines the center point of each target in the first image and the reliability of each target based on the feature data, and the feature. Based on the data, it is configured to determine the reliability of the interaction keypoints in the first image and each interaction action category corresponding to each interaction keypoint.
The third determination unit 44 determines the center point of each target, the interaction key point, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. Based on this, it is configured to determine the interaction relationship between the targets in the first image.

本発明の一例示的な実施例において、前記第１決定ユニット４２は、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点及びそのカテゴリ、及び各ターゲットが各プリセットされたカテゴリに属する信頼度を決定するように構成され、
前記第３決定ユニット４４は、各ターゲットの中心点及びそのカテゴリ、前記インタラクションキーポイント、前記少なくとも２つのオフセット、各ターゲットが各プリセットされたカテゴリに属する信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に基づいて、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 In an exemplary embodiment of the invention, the first determination unit 42 is based on the feature data, the center point of each target in the first image and its category, and the category in which each target is preset. It is configured to determine the confidence that belongs to
The third determination unit 44 corresponds to the center point of each target and its category, the interaction key point, the at least two offsets, the confidence that each target belongs to each preset category, and each interaction key point. It is configured to determine the interaction relationships between the targets in the first image based on the reliability of the preset interaction action categories.

本発明の一例示的な実施例において、前記第３決定ユニット４４は、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する２つのオフセットを決定し、前記インタラクションキーポイント及び前記インタラクションキーポイントに対応する２つのオフセットに従って、前記インタラクションキーポイントに対応する２つの予測中心点を決定し、各ターゲットの中心点及び各インタラクションキーポイントに対応する２つの予測中心点に従って、各インタラクションキーポイントに対応する２つのターゲットを決定し、各インタラクションキーポイントに対応する２つのターゲット、各ターゲットの信頼度、及び各インタラクションキーポイントに対応する各プリセットされたインタラクション動作カテゴリの信頼度に従って、前記第１画像内のターゲット間のインタラクション関係を決定するように構成される。 In an exemplary embodiment of the invention, the third determination unit 44 determines, for one interaction keypoint, two offsets corresponding to the interaction keypoint, to the interaction keypoint and the interaction keypoint. According to the two corresponding offsets, the two predicted center points corresponding to the interaction key points are determined, and the center point of each target and the two predicted center points corresponding to each interaction key point correspond to each interaction key point. Two targets are determined and in the first image according to the two targets corresponding to each interaction key point, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. It is configured to determine the interaction relationships between targets.

本発明の一例示的な実施例において、前記第３決定ユニット４４は、１つのインタラクションキーポイントについて、前記インタラクションキーポイントに対応する１つのプリセットされたインタラクション動作カテゴリの信頼度と前記インタラクションキーポイントに対応する２つのターゲットの信頼度とを乗算して、第１信頼度を取得し、ここで、前記第１信頼度は、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記インタラクション動作カテゴリに属する信頼度であり、前記第１信頼度が信頼度閾値を超えることに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属すると決定し、前記第１信頼度が信頼度閾値を超えないことに応答して、前記インタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が前記プリセットされたインタラクション動作カテゴリに属しないと決定するように構成される。 In an exemplary embodiment of the invention, the third determination unit 44 has, for one interaction keypoint, the reliability of one preset interaction action category corresponding to the interaction keypoint and the interaction keypoint. The reliability of the two corresponding targets is multiplied to obtain the first reliability, where the first reliability is the interaction operation in which the interaction relationship between the two targets corresponding to the interaction key points is the interaction operation. It is a reliability that belongs to the category, and in response to the first reliability exceeding the reliability threshold, the interaction relationship between the two targets corresponding to the interaction key points belongs to the preset interaction operation category. To determine and determine that the interaction relationship between the two targets corresponding to the interaction key points does not belong to the preset interaction behavior category in response to the first confidence not exceeding the confidence threshold. It is composed of.

本発明の一例示的な実施例において、前記第３決定ユニット４４は更に、１つのインタラクションキーポイントに対応する２つのターゲット間のインタラクション関係が各プリセットされたインタラクション動作カテゴリに属しないと決定した後、前記インタラクションキーポイントに対応する２つのターゲット間にインタラクション関係がないと決定するように構成される。 In an exemplary embodiment of the invention, the third decision unit 44 further determines that the interaction relationship between the two targets corresponding to one interaction key point does not belong to each preset interaction action category. , It is configured to determine that there is no interaction relationship between the two targets corresponding to the interaction key points.

本発明の一例示的な実施例において、前記第３決定ユニット４４は、１つの予測中心点について、各ターゲットの中心点と前記予測中心点との間の距離を決定し、中心点と前記予測中心点との間の距離がプリセットされた距離閾値より小さいターゲットを、前記予測中心点に対応するインタラクションキーポイントに対応するターゲットとして使用するように構成される。 In an exemplary embodiment of the invention, the third determination unit 44 determines the distance between the center point of each target and the predicted center point for one predicted center point, and the center point and the predicted center point. A target whose distance to the center point is smaller than the preset distance threshold is configured to be used as the target corresponding to the interaction key point corresponding to the predicted center point.

本発明の一例示的な実施例において、前記第１決定ユニット４２は、前記特徴データをダウンサンプリングして前記第１画像のヒットマップを取得し、前記ヒットマップに従って、前記第１画像内の各点の位置オフセット、前記第１画像内の各ターゲットの中心点及び各ターゲットの検出ボックスの高さと幅を決定ように構成され、前記第１決定ユニット４２は更に、前記特徴データに基づいて、前記第１画像内の各ターゲットの中心点を決定した後、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置オフセットに従って、前記第１画像においてインタラクション関係を有するターゲットの中心点の位置を補正して、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置を取得し、前記第１画像においてインタラクション関係を有するターゲットの中心点の補正後の位置及びその検出ボックスの高さと幅に従って、前記第１画像においてインタラクション関係を有するターゲットの検出ボックスを決定するように構成される。 In an exemplary embodiment of the invention, the first determination unit 42 downsamples the feature data to obtain a hit map of the first image, and according to the hit map, each in the first image. The first determination unit 42 is further configured to determine the position offset of the points, the center point of each target in the first image and the height and width of the detection box for each target, and the first determination unit 42 is further based on the feature data. After determining the center point of each target in the first image, the position of the center point of the target having an interaction relationship in the first image is corrected according to the position offset of the center point of the target having an interaction relationship in the first image. Then, the corrected position of the center point of the target having an interaction relationship in the first image is acquired, and the corrected position of the center point of the target having an interaction relationship in the first image and the height of the detection box thereof. According to the width, it is configured to determine the detection box of the target having an interaction relationship in the first image.

本発明の一例示的な実施例において、前記画像処理装置の各機能ユニットは、ニューラルネットワークで実現され、前記ニューラルネットワークは、サンプル画像を用いてトレーニングすることによって得られたものであり、前記サンプル画像には、インタラクション関係を有するターゲットの検出ボックスがマークされ、前記サンプル画像においてインタラクション関係を有するターゲットのマークされた中心点及びマークされたインタラクションキーポイントは、マークされた検出ボックスに従って決定され、マークされたオフセットは、インタラクション関係を有するターゲットのマークされた中心点及びマークされたインタラクションキーポイントに従って決定される。 In an exemplary embodiment of the present invention, each functional unit of the image processing apparatus is realized by a neural network, and the neural network is obtained by training using a sample image, and the sample is obtained. The image is marked with a detection box for the target having the interaction relationship, and the marked center point and the marked interaction key point of the target having the interaction relationship in the sample image are determined and marked according to the marked detection box. The offset is determined according to the marked center point and the marked interaction key point of the target having the interaction relationship.

本発明の一例示的な実施例において、図６に示されたように、前記装置は更に、サンプル画像を用いて前記ニューラルネットワークをトレーニングするように構成されるトレーニングユニット４５を備え、前記トレーニングユニット４５は、具体的に、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データを抽出し、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データをダウンサンプリングして前記サンプル画像のヒットマップを取得し、前記ニューラルネットワークを用いて、前記サンプル画像のヒットマップに基づいて、前記サンプル画像内の各点の位置オフセット、前記サンプル画像内の各インタラクションキーポイント、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の各ターゲットの検出ボックスの高さと幅を予測し、前記ニューラルネットワークを用いて、前記サンプル画像の特徴データに基づいて少なくとも２つのオフセットを予測し、前記サンプル画像内の各ターゲットの中心点、前記サンプル画像内の前記インタラクションキーポイント及び前記サンプル画像内の少なくとも２つのオフセットに基づいて、前記サンプル画像内のターゲット間のインタラクション関係を予測し、予測された位置オフセット、前記サンプル画像においてインタラクション関係を有するターゲットの予測された中心点及び予測された検出ボックスの高さと幅、前記サンプル画像においてインタラクション関係を有するターゲットに対応する予測されたインタラクションキーポイント及びそれに対応する予測されたオフセット、及びマークされた位置オフセット及び前記サンプル画像にマークされたインタラクション関係を有するターゲットの検出ボックスに従って、前記ニューラルネットワークのネットワークパラメータ値を調整するように構成される。 In an exemplary embodiment of the invention, as shown in FIG. 6, the apparatus further comprises a training unit 45 configured to train the neural network using sample images. Specifically, 45 uses the neural network to extract the feature data of the sample image, and uses the neural network to downsample the feature data of the sample image to acquire a hit map of the sample image. Then, using the neural network, based on the hit map of the sample image, the position offset of each point in the sample image, each interaction key point in the sample image, and the center point of each target in the sample image. , Predict the height and width of the detection box for each target in the sample image, predict at least two offsets based on the feature data of the sample image using the neural network, and predict each target in the sample image. Based on the center point of the sample image, the interaction key point in the sample image, and at least two offsets in the sample image, the interaction relationship between the targets in the sample image is predicted, and the predicted position offset, the sample image. The predicted center point and the predicted detection box height and width of the interacting target in the sample image, the predicted interaction keypoint corresponding to the interacting target in the sample image and the corresponding predicted offset, And the detection box of the target having the marked position offset and the interaction relationship marked in the sample image is configured to adjust the network parameter values of the neural network.

本発明の実施例では、前記装置の抽出ユニット４１、第１決定ユニット４２、第２決定ユニット４３、第３決定ユニット４４及びトレーニングユニット４５は、実際の応用ではすべて前記装置の中央処理装置（ＣＰＵ：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、デジタル信号プロセッサ（ＤＳＰ：ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、マイクロコントローラユニット（ＭＣＵ：ＭｉｃｒｏｃｏｎｔｒｏｌｌｅｒＵｎｉｔ）又はフィールド（ＦＰＧＡ、Ｆｉｅｌｄ－ＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）で実現できる。 In the embodiment of the present invention, the extraction unit 41, the first determination unit 42, the second determination unit 43, the third determination unit 44, and the training unit 45 of the apparatus are all central processing units (CPUs) of the apparatus in actual application. : Central Processing Unit), Digital Signal Processor (DSP: Digital Signal Processor), Microcontroller Unit (MCU) or Field (FPGA, Field-Programmable Gate Array).

上記の実施例に係る画像処理装置が画像処理を実行することについて、上述の各プログラムモジュールの分割のみを例に挙げて説明しているが、実際の応用では、必要に応じて、上述の処理を異なるプログラムモジュールに割り当てて完了することができ、即ち、装置の内部構造を異なるプログラムモジュールに分割して、上記の処理の全てまたは一部を完了することができることに留意されたい。なお、上述の実施例で提供される画像処理装置の実施例は、画像処理方法の実施例と同じ構想に属し、その具体的な実現プロセスについては、方法の実施例を参照でき、ここでは繰り返して説明しない。 The fact that the image processing apparatus according to the above embodiment executes image processing is described by taking only the division of each of the above-mentioned program modules as an example, but in actual application, the above-mentioned processing is performed as necessary. It should be noted that can be completed by assigning to different program modules, i.e., the internal structure of the device can be divided into different program modules to complete all or part of the above processing. The example of the image processing apparatus provided in the above-described embodiment belongs to the same concept as the example of the image processing method, and the embodiment of the method can be referred to for the specific realization process thereof, and is repeated here. I will not explain.

本発明の実施例は、電子機器を更に提供する。図７は、本発明の実施例に係る電子機器のハードウェアの構成の概略構造図であり、図７に示されたように、前記電子機器は、メモリ５２と、プロセッサ５１と、メモリ５２に記憶された、プロセッサ５１によって実行可能なコンピュータプログラムとを備え、前記プロセッサ５１は、前記プログラムを実行するときに、本発明の実施例に記載の画像処理方法のステップを実行する。 The embodiments of the present invention further provide electronic devices. FIG. 7 is a schematic structural diagram of the hardware configuration of the electronic device according to the embodiment of the present invention, and as shown in FIG. 7, the electronic device includes a memory 52, a processor 51, and a memory 52. It comprises a stored computer program that can be executed by the processor 51, which, when executing the program, performs the steps of the image processing method described in the embodiments of the present invention.

例示的に、電子機器の各コンポーネントは、バスシステム５３を介して結合される。バスシステム５３は、これらのコンポーネント間の接続通信を具現するために使用されることを理解されたい。データバスに加えて、バスシステム５３は、電力バス、制御バスおよび状態信号バスをさらに備える。しかしながら、説明を明確にするために、図７では様々なバスをすべてバスシステム５３として表記する。 Illustratively, each component of an electronic device is coupled via a bus system 53. It should be understood that the bus system 53 is used to implement the connection communication between these components. In addition to the data bus, the bus system 53 further comprises a power bus, a control bus and a status signal bus. However, for the sake of clarity, all the various buses are referred to as the bus system 53 in FIG.

メモリ５２は、揮発性メモリまたは不揮発性メモリであってもよいし、揮発性および不揮発性メモリの両方を含んでもよいことを理解されたい。ここで、不揮発性メモリは、読み取り専用メモリ（ＲＯＭ：Ｒｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、プログラム可能な読み取り専用メモリ（ＰＲＯＭ：ＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）、消去可能なプログラム可能な読み取り専用メモリ（ＥＰＲＯＭ：ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、電気的に消去可能なプログラム可能な読み取り専用メモリ（ＥＥＰＲＯＭ：ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）、強磁性ランダムアクセスメモリ（ＦＲＡＭ：ＦｅｒｒｏｍａｇｎｅｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）フラッシュメモリ（ＦｌａｓｈＭｅｍｏｒｙ）、磁気メモリ、コンパクトディスク、または読み取り専用コンパクトディスク（ＣＤ－ＲＯＭ：ＣｏｍｐａｃｔＤｉｓｃＲｅａｄ－ＯｎｌｙＭｅｍｏｒｙ）であり得、磁気メモリは、磁気ディスクメモリまたは磁気テープメモリであり得る。揮発性メモリは、外部キャッシュとして使用されるランダムアクセスメモリ（ＲＡＭ：ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）であってもよい。例示的であるが限定的な説明ではないが、例えば、スタティックランダムアクセスメモリ（ＳＲＡＭ：ＳｔａｔｉｃＲＡＭ）、同期スタティックランダムアクセスメモリ（ＳＳＲＡＭ：ＳｙｎｃｈｒｏｎｏｕｓＳｔａｔｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ダイナミックランダムアクセスメモリ（ＤＲＡＭ：ＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、同期ダイナミックランダムアクセスメモリ（ＳＤＲＡＭ：ＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ダブルデータレートの同期ダイナミックランダムアクセスメモリ（ＤＤＲＳＤＲＡＭ：ＤｏｕｂｌｅＤａｔａＲａｔｅＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、強化された同期ダイナミックランダムアクセスメモリ（ＥＳＤＲＡＭ：ＥｎｈａｎｃｅｄＳｙｎｃｈｒｏｎｏｕｓＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ダイナミックランダムアクセスメモリの同期接続（ＳＬＤＲＡＭ：ＳｙｎｃＬｉｎｋＤｙｎａｍｉｃＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）およびダイレクトメモリバスランダムアクセスメモリ（ＤＲＲＡＭ：ＤｉｒｅｃｔＲａｍｂｕｓＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）など様々な形のＲＡＭを使用することができる。本発明の実施例に記載のメモリ５２は、これらおよび任意の他の適切なタイプのメモリを含むが、これらに限定されないことを意図する。
上記の本発明の実施例で開示された方法は、プロセッサ５１に適用されてもよく、またはプロセッサ５１によって実現されてもよい。プロセッサ５１は、信号処理機能を備える集積回路チップであり得る。具現プロセスにおいて、上記した方法の各ステップは、プロセッサ５１におけるハードウェアの集積論理回路またはソフトウェアの形の命令を介して完了されることができる。上記のプロセッサ５１は、汎用プロセッサ、ＤＳＰ、または他のプログラマブルロジックデバイス、ディスクリートゲートまたはトランジスタロジックデバイス、ディスクリートハードウェアコンポーネントなどであってもよい。プロセッサ５１は、本発明の実施例で開示された各方法、ステップおよび論理ブロック図を実現または実行することができる。汎用プロセッサは、マイクロプロセッサであってもよいし、任意の従来のプロセッサなどであってもよい。本発明の実施例を組み合たせて開示された方法のステップは、直接に、ハードウェア復号化プロセッサによって実行されて完了すると具現されることができ、または復号化プロセッサにおけるハードウェアおよびソフトウェアモジュールの組み合わせによって実行して完了する。ソフトウェアモジュールは記憶媒体に配置されることができ、当該記憶媒体は、メモリ５２に配置され、プロセッサ５１は、メモリ５２内の情報を読み取り、そのハードウェアと組み合わせて前記方法のステップを完成する。 It should be understood that the memory 52 may be volatile or non-volatile memory and may include both volatile and non-volatile memory. Here, the non-volatile memory includes a read-only memory (ROM: Read-Only Memory), a programmable read-only memory (PROM: Programmable ROM), and an erasable programmable read-only memory (EPROM: Erasable Programmable Read-Only). Memory), electrically erasable programmable read-only memory (EEPROM: Electrically Erasable Read-Only Memory), ferromagnetic random access memory (FRAM: Ferromagnetic Random Access Memory) flash, flash It can be a compact disk or a read-only compact disk (CD-ROM: Compact Disc Read-Only Memory), and the magnetic memory can be a magnetic disk memory or a magnetic tape memory. The volatile memory may be a random access memory (RAM: Random Access Memory) used as an external cache. Although exemplary but not limited, for example, static random access memory (SRAM: Static RAM), synchronous static random access memory (SSRAM: Synchronous Static Access Memory), dynamic random access memory (DRAM: Dynamic Random), etc. Access Memory), Synchronous Dynamic Random Access Memory (SDRAM: Synchronous Dynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRDRAM: Double Data Rate Synchronous Memory Synchronized Dynamic Synchronous Memory) Synchronized Dynamic Random Access Memory, Synchronized Dynamic Random Access Memory ESDRAM: Enhanced Synchronous Dynamic Random Access Memory), Dynamic Random Access Memory Synchronous Connection (SL DRAM: SyncLink Dynamic Random Access Memory) and Direct Memory Bus Random Access Memory (DRRAM) can do. The memory 52 described in the embodiments of the present invention is intended to include, but is not limited to, these and any other suitable type of memory.
The method disclosed in the above-described embodiment of the present invention may be applied to the processor 51 or may be realized by the processor 51. The processor 51 may be an integrated circuit chip having a signal processing function. In the embodiment process, each step of the method described above can be completed via a hardware integrated logic circuit in the processor 51 or an instruction in the form of software. The processor 51 may be a general purpose processor, DSP, or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, and the like. The processor 51 can realize or execute each of the methods, steps and logic block diagrams disclosed in the embodiments of the present invention. The general-purpose processor may be a microprocessor, an arbitrary conventional processor, or the like. The steps of the method disclosed in combination with the embodiments of the present invention can be embodied when executed and completed directly by a hardware decryption processor, or of hardware and software modules in the decryption processor. Execute and complete by combination. The software module can be placed in a storage medium, which is placed in the memory 52, where the processor 51 reads the information in the memory 52 and combines it with its hardware to complete the steps of the method.

例示的な実施例において、電子機器は、上記の方法を実行するために、１つまたは複数の特定用途向け集積回路（ＡＳＩＣ：ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、デジタル信号プロセッサ（ＤＳＰ）、プログラマブルロジックデバイス（ＰＬＤ）、複合プログラマブルロジックデバイス（ＣＰＬＤ：ＣｏｍｐｌｅｘＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、ＦＰＧＡ、汎用プロセッサ、コントローラ、ＭＣＵ、マイクロプロセッサ（Ｍｉｃｒｏｐｒｏｃｅｓｓｏｒ）または他の電子素子によって実現されることができる。 In an exemplary embodiment, the electronic device is one or more application-specific integrated circuits (ASICs), digital signal processors (DSPs), programmable logic devices (ASICs) to perform the above methods. It can be realized by a PLD), a composite programmable logic device (CPLD), an FPGA, a general-purpose processor, a controller, an MCU, a microprocessor, or another electronic element.

例示的な実施例において、本発明の実施例は、コンピュータプログラム命令を含むメモリ５２などの不揮発性コンピュータ可読記憶媒体を更に提供し、上述のコンピュータプログラムは、電子機器のプロセッサ５１によって実行されて上記の方法を完了することができる。コンピュータ記憶媒体は、ＦＲＡＭ、ＲＯＭ、ＰＲＯＭ、ＥＰＲＯＭ、ＥＥＰＲＯＭ、ＦｌａｓｈＭｅｍｏｒｙ、磁気表面メモリ、光ディスク、またはＣＤ－ＲＯＭなどのメモリであってもよいし、上記のメモリのうちの１つ又は任意に組み合わせた様々な機器であってもよい。 In an exemplary embodiment, embodiments of the invention further provide a non-volatile computer readable storage medium, such as a memory 52 containing computer program instructions, wherein the computer program described above is executed by the processor 51 of the electronic device. You can complete the method of. The computer storage medium may be a memory such as FRAM, ROM, PROM, EPROM, EEPROM, Flash Memory, magnetic surface memory, optical disk, or CD-ROM, or one of the above memories or any combination thereof. It may be various devices.

本発明の実施例は、コンピュータプログラムが記憶されているコンピュータ可読記憶媒体を提供し、当該プログラムがプロセッサによって実行されるときに、本発明の実施例に記載の画像処理方法のステップを実現する。 The embodiments of the present invention provide a computer-readable storage medium in which a computer program is stored, and realize the steps of the image processing method described in the embodiments of the present invention when the program is executed by a processor.

本発明の実施例は、コンピュータ可読コードを含むコンピュータプログラムを提供し、前記コンピュータ可読コードが電子機器によって実行されるときに、前記電子機器のプロセッサに、本発明の実施例に記載の画像処理方法のステップを実行させる。 An embodiment of the present invention provides a computer program including a computer-readable code, and when the computer-readable code is executed by an electronic device, the image processing method according to the embodiment of the present invention is applied to the processor of the electronic device. To execute the steps of.

本願で提供されるいくつかの方法の実施例に開示された方法は、競合することなく任意に組み合わせて、新しい方法の実施例を取得することができる。 The methods disclosed in the examples of some of the methods provided herein can be arbitrarily combined without conflict to obtain examples of the new method.

本願で提供されるいくつかの製品の実施例に開示された技術的特徴は、競合することなく任意に組み合わせて、新しい製品の実施例を取得することができる。 The technical features disclosed in the examples of some of the products provided in the present application can be arbitrarily combined to obtain examples of new products without conflict.

本願で提供されるいくつかの方法又は機器の実施例に開示された特徴は、競合することなく任意に組み合わせて、新しい方法の実施例又は機器の実施例を取得することができる。 The features disclosed in the embodiments of some of the methods or devices provided in the present application can be arbitrarily combined without conflict to obtain examples of new methods or devices.

本願で提供されたいくつかの実施例において、開示された機器及び方法は、他の方式で実現できることを理解されたい。上記で説明された機器の実施例は例示的なものに過ぎず、例えば、前記ユニットの分割は、論理機能の分割に過ぎず、実際の実現では、他の分割方法があり、例えば、複数のユニット又はコンポーネントを別のシステムに統合又は集積したり、又は一部の特徴を無視したり、又は実行しないことができる。なお、表示または議論された各構成要素間の相互結合または直接結合または通信接続は、いくつかのインターフェース、機器またはユニットを介した間接な結合または通信接続であり得、電気的、機械的または他の形態であり得る。 It should be understood that in some of the embodiments provided herein, the disclosed devices and methods can be implemented in other ways. The embodiments of the equipment described above are merely exemplary, for example, the division of the unit is merely a division of the logical function, and in practice there are other division methods, eg, a plurality. Units or components may be integrated or integrated into another system, or some features may be ignored or not implemented. It should be noted that the interconnect or direct coupling or communication connection between each component displayed or discussed may be an indirect coupling or communication connection via some interface, device or unit, electrical, mechanical or other. Can be in the form of.

上記の分離部材として説明されたユニットは、物理的に分離されている場合とされていない場合があり、ユニットとして表示された部材は、物理ユニットである場合もそうでない場合もあり、１箇所に配置される場合もあれば、複数のネットワークユニットに分散される場合もあり、実際の必要に応じて、その一部またはすべてのユニットを選択して、本実施例の技術案の目的を具現することができる。 The unit described above as a separating member may or may not be physically separated, and the member labeled as a unit may or may not be a physical unit in one place. It may be deployed or distributed across multiple network units, and some or all of them may be selected according to actual needs to embody the objectives of the technical proposal of this embodiment. be able to.

なお、本発明の各実施例における各機能ユニットは、全部１つの処理ユニットに統合してもよいし、各ユニットを別々に１つのユニットとして使用してもよいし、２つ以上のユニットを１つのユニットに統合してもよい。上記の統合されたユニットは、ハードウェアの形態で、またはハードウェアおよびソフトウェア機能ユニットの形態で具現することができる。 Each functional unit in each embodiment of the present invention may be integrated into one processing unit, each unit may be used separately as one unit, or two or more units may be used as one unit. It may be integrated into one unit. The integrated units described above can be embodied in the form of hardware or in the form of hardware and software functional units.

当業者は、上記した方法の実施例の全てまたは一部のステップは、プログラム命令に関連するハードウェアによって完了することができ、前記プログラムは、コンピュータ読み取り可能な記憶媒体に記憶されることができ、前記プログラムが実行されるとき、上記の方法の実施例のステップを実行し、前記記憶媒体は、モバイル記憶機器、ＲＯＭ、ＲＡＭ、磁気メモリまたは光ディスクなどのプログラムコードを記憶することができる様々な媒体を含む。 Those skilled in the art may complete all or part of the steps of the embodiments of the above method by hardware associated with the program instructions and the program may be stored in a computer-readable storage medium. When the program is executed, the steps of the embodiment of the above method are performed, and the storage medium can store program code such as a mobile storage device, ROM, RAM, magnetic memory or an optical disk. Includes medium.

あるいは、本発明の上記の統合されたユニットがソフトウェア機能モジュールの形で実現され、スタンドアロン製品として販売または使用される場合、コンピュータ読み取り可能な記憶媒体に記憶されてもよい。このような理解に基づいて、本発明の実施例の技術的解決策の本質的な部分、すなわち、先行技術に貢献のある部分は、ソフトウェア製品の形で具現されることができ、当該コンピュータソフトウェア製品は、１つの記憶媒体に記憶され、コンピュータ機器（パーソナルコンピュータ、サーバ、又はネットワーク機器等であり得る）に、本開示の各実施例に記載の方法の全部又は一部を実行させるためのいくつかの命令を含む。前述した記憶媒体は、リムーバブルストレージ、ＲＯＭ、ＲＡＭ、磁気メモリまたは光ディスクなどのプログラムコードを記憶することができる様々な媒体を含む。 Alternatively, if the above-mentioned integrated unit of the present invention is realized in the form of a software functional module and sold or used as a stand-alone product, it may be stored in a computer-readable storage medium. Based on this understanding, the essential part of the technical solution of the embodiments of the present invention, i.e., the part that contributes to the prior art, can be embodied in the form of software products and said computer software. The product is stored in one storage medium and is how many to allow a computer device (which may be a personal computer, server, network device, etc.) to perform all or part of the methods described in each embodiment of the present disclosure. Including the instruction. The storage medium described above includes various media capable of storing program code such as removable storage, ROM, RAM, magnetic memory or optical disk.

上記の内容は、本発明の具体的な実施形態に過ぎず、本発明の保護範囲はこれに限定されない。当業者は、本発明に開示された技術的範囲内で容易に想到し得る変更又は置換は、すべて本開示の保護範囲内に含まれるべきである。したがって、本発明の保護範囲は、特許請求の範囲の保護範囲に従うものとする。
The above contents are merely specific embodiments of the present invention, and the scope of protection of the present invention is not limited thereto. All skill in the art should include all changes or substitutions readily conceivable within the technical scope disclosed in the present invention within the scope of the present disclosure. Therefore, the scope of protection of the present invention shall be in accordance with the scope of protection of the claims.

Claims

It ’s an image processing method.
Extracting the feature data of the first image and
By determining each interaction key point of the first image and the center point of each target based on the feature data, one interaction key point is said to be within a preset range from the midpoint of the connecting line. It is a point on the connecting line, and the connecting line is a connecting line between the center points of two targets in one interaction operation.
Determining at least two offsets based on the feature data, one offset represents an offset between the interaction key point in one interaction operation and the center point of one target in the interaction operation. That and
The image processing method comprising determining an interaction relationship between targets in the first image based on a center point of each target, the interaction key point and at least two offsets.

Determining each interaction key point of the first image and the center point of each target based on the feature data can be done.
To determine the center point of each target in the first image and the reliability of each target based on the feature data.
Including determining the reliability of the interaction keypoints in the first image and each preset interaction action category corresponding to each interaction keypoint based on the feature data.
Determining the interaction relationship between the targets in the first image is based on the center point of each target, the interaction key point, and at least the two offsets.
Within the first image, based on the center point of each target, the interaction key points, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. Including determining the interaction relationships between the targets of
The image processing method according to claim 1.

Determining the center point of each target in the first image and the reliability of each target based on the feature data is not possible.
Based on the feature data, the center point of each target in the first image and its category, and the reliability to which each target belongs to each category are determined.
The first image is based on the center point of each target, the interaction key point, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. Determining the interaction relationships between the targets within
Based on the center point of each target and its category, said interaction key points, said at least two offsets, the confidence that each target belongs to each category, and the confidence of each preset interaction action category corresponding to each interaction key point. Including determining the interaction relationship between the targets in the first image.
The image processing method according to claim 2.

The first image is based on the center point of each target, the interaction key point, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. Determining the interaction relationships between the targets within
Determining two offsets corresponding to the interaction keypoint for one interaction keypoint,
Determining the two prediction center points corresponding to the interaction keypoint according to the interaction keypoint and the two offsets corresponding to the interaction keypoint.
Determining the two targets corresponding to each interaction key point according to the center point of each target and the two predicted center points corresponding to each interaction key point.
The interaction relationship between the targets in the first image is determined according to the two targets corresponding to each interaction key point, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. To do, including,
The image processing method according to claim 2 or 3.

The interaction relationships between the targets in the first image are determined according to the two targets corresponding to each interaction key point, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. To decide
For one interaction key point, multiply the reliability of one preset interaction action category corresponding to the interaction key point by the reliability of the two targets corresponding to the interaction key point to obtain the first reliability. The first reliability is the reliability that the interaction relationship between the two targets corresponding to the interaction key points belongs to the preset interaction operation category.
In response to the first confidence exceeding the confidence threshold, determining that the interaction relationship between the two targets corresponding to the interaction key points belongs to the preset interaction action category.
In response to the first confidence not exceeding the confidence threshold, determining that the interaction relationship between the two targets corresponding to the interaction key points does not belong to the preset interaction action category. include,
The image processing method according to claim 4.

The image processing method is
After determining that the interaction relationship between the two targets corresponding to one interaction key point does not belong to each preset interaction action category, it is determined that there is no interaction relationship between the two targets corresponding to the interaction key point. Including that,
The image processing method according to claim 5.

Determining the two targets corresponding to each interaction key point according to the center point of each target and the two predicted center points corresponding to each interaction key point is
For one prediction center point, determining the distance between the center point of each target and the prediction center point,
A target whose distance between the center point and the predicted center point is smaller than a preset distance threshold is used as a target corresponding to the interaction key point corresponding to the predicted center point.
The image processing method according to any one of claims 4 to 6.

Determining the center point of each target in the first image based on the feature data
To obtain the hit map of the first image by downsampling the feature data,
According to the hit map, the position offset of each point in the first image, the center point of each target in the first image, and the height and width of the detection box of each target are determined.
After determining the center point of each target in the first image based on the feature data, the image processing method may be performed.
According to the position offset of the center point of the target having an interaction relationship in the first image, the position of the center point of the target having an interaction relationship in the first image is corrected, and the center of the target having an interaction relationship in the first image is corrected. Obtaining the corrected position of the point and
Further including determining the detection box of the target having an interaction relationship in the first image according to the corrected position of the center point of the target having an interaction relationship in the first image and the height and width of the detection box thereof. ,
The image processing method according to any one of claims 1 to 7.

The image processing method is performed by a neural network, the neural network is obtained by training with a sample image, and the sample image is marked with a detection box for a target having an interaction relationship. , The marked center point and the marked interaction key point of the target having an interaction relationship in the sample image are determined according to the marked detection box, and the marked offset is the marked center of the target having an interaction relationship. Determined according to points and marked interaction key points,
The image processing method according to claim 8.

Training the neural network with sample images
Using the neural network to extract the feature data of the sample image,
Using the neural network, downsampling the feature data of the sample image to obtain a hit map of the sample image, and
Using the neural network, based on the hit map of the sample image, the position offset of each point in the sample image, each interaction key point in the sample image, the center point of each target in the sample image, the said. Predicting the height and width of the detection box for each target in the sample image,
Using the neural network to predict at least two offsets based on the feature data of the sample image,
Predicting the interaction relationship between targets in the sample image based on the center point of each target in the sample image, the interaction key point in the sample image, and at least two offsets in the sample image.
Predicted position offset, predicted center point of target with interaction relationship in the sample image and predicted detection box height and width, predicted interaction key point corresponding to target with interaction relationship in the sample image And the corresponding predicted offset, and adjusting the network parameter values of the neural network according to the detection box of the target having the marked position offset and the interaction relationship marked in the sample image.
The image processing method according to claim 9.

It is an image processing device
It is equipped with an extraction unit, a first decision unit, a second decision unit, and a third decision unit.
The extraction unit is configured to extract the feature data of the first image.
The first determination unit is configured to determine each interaction key point and the center point of each target in the first image based on the feature data extracted by the extraction unit, and one interaction key point. Is a point on the connecting line within a preset range from the midpoint of the connecting line, and the connecting line is a connecting line between the center points of two targets in one interaction operation.
The second determination unit is configured to determine at least two offsets based on the feature data extracted by the extraction unit, one offset being an interaction key point in one interaction operation and the interaction operation. Represents the offset between the center points of one target in
The image processing is configured such that the third determination unit determines the interaction relationship between the targets in the first image based on the center point of each target, the interaction key point and at least two offsets. Device.

The first determination unit determines the center point of each target in the first image and the reliability of each target based on the feature data, and the interaction in the first image based on the feature data. It is configured to determine the reliability of keypoints and each interaction behavior category corresponding to each interaction keypoint.
The third decision unit is based on the center point of each target, the interaction key points, the at least two offsets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. It is configured to determine the interaction relationship between the targets in the first image.
The image processing apparatus according to claim 11.

The first determination unit is configured to determine the center point of each target in the first image and its category, and the reliability to which each target belongs to each preset category, based on the feature data.
The third determination unit is the center point of each target and its category, the interaction key point, the at least two offsets, the confidence that each target belongs to each preset category, and each preset corresponding to each interaction key point. It is configured to determine the interaction relationship between the targets in the first image based on the reliability of the interaction behavior category.
The image processing apparatus according to claim 12.

The third determination unit determines, for one interaction keypoint, two offsets corresponding to the interaction keypoint, and according to the interaction keypoint and the two offsets corresponding to the interaction keypoint, to the interaction keypoint. Determine the two corresponding prediction center points, determine the two targets corresponding to each interaction key point according to the center point of each target and the two prediction center points corresponding to each interaction key point, and use each interaction key point. It is configured to determine the interaction relationships between the targets in the first image according to the two corresponding targets, the reliability of each target, and the reliability of each preset interaction action category corresponding to each interaction key point. Ru,
The image processing apparatus according to claim 12 or 13.

The third determination unit multiplies the reliability of one preset interaction action category corresponding to the interaction keypoint by the reliability of the two targets corresponding to the interaction keypoint for one interaction keypoint. The first reliability is acquired, and the first reliability is the reliability in which the interaction relationship between the two targets corresponding to the interaction key points belongs to the interaction operation category. In response to the degree exceeding the confidence threshold, it is determined that the interaction relationship between the two targets corresponding to the interaction key points belongs to the preset interaction action category, and the first confidence is the confidence threshold. In response to not exceeding, it is configured to determine that the interaction relationship between the two targets corresponding to the interaction keypoint does not belong to the preset interaction action category.
The image processing apparatus according to claim 14.

The third determination unit further determines that the interaction relationship between the two targets corresponding to one interaction key point does not belong to each preset interaction action category, and then between the two targets corresponding to the interaction key point. Is configured to determine that there is no interaction relationship with
The image processing apparatus according to claim 15.

The third determination unit determines the distance between the center point of each target and the predicted center point for one predicted center point, and the distance between the center point and the predicted center point is preset. A target smaller than the threshold is configured to be used as a target corresponding to the interaction key point corresponding to the predicted center point.
The image processing apparatus according to any one of claims 14 to 16.

The first determination unit downsamples the feature data to obtain a hit map of the first image, and according to the hit map, the position offset of each point in the first image and each in the first image. The center point of the target and the height and width of the detection box of each target are configured to be determined, and the first determination unit further determines the center point of each target in the first image based on the feature data. Later, according to the position offset of the center point of the target having an interaction relationship in the first image, the position of the center point of the target having an interaction relationship in the first image is corrected, and the target having an interaction relationship in the first image is corrected. The corrected position of the center point of the target is acquired, and the target having an interaction relationship in the first image is according to the corrected position of the center point of the target having an interaction relationship and the height and width of the detection box thereof. Configured to determine the detection box for
The image processing apparatus according to any one of claims 11 to 17.

A computer-readable storage medium that stores computer programs.
The computer-readable storage medium that realizes the steps of the method according to any one of claims 1 to 10 when the program is executed by a processor.

It ’s an electronic device,
The step of the method according to any one of claims 1 to 10, comprising a memory, a processor, and a computer program stored in the memory that can be executed by a computer, when the processor executes the program. The electronic device to be realized.

A computer program that contains computer-readable code
The computer program that causes the processor of the electronic device to execute the method according to any one of claims 1 to 10 when the computer-readable code is executed in the electronic device.