JP2018088057A

JP2018088057A - Image recognition device and image recognition method

Info

Publication number: JP2018088057A
Application number: JP2016230130A
Authority: JP
Inventors: 文平田路; Bunpei Taji
Original assignee: Konica Minolta Inc
Current assignee: Konica Minolta Inc
Priority date: 2016-11-28
Filing date: 2016-11-28
Publication date: 2018-06-07

Abstract

PROBLEM TO BE SOLVED: To shorten time needed for image recognition processing upon performing image recognition on time-series images through combining a cascade-type processing part and deep learning.SOLUTION: A cascade-type processing part 15 performs image recognition processing using deep learning at respective stages in a cascade structure. A stage number determination part 19 determines one or more stages which perform image recognition processing among respective stages in the cascade structure. The one or more stages determined by the stage number determination part 19 perform image recognition processing on time-series images (video for example).SELECTED DRAWING: Figure 1

Description

本発明は、深層学習（ｄｅｅｐｌｅａｒｎｉｎｇ：ディープラーニング）を用いて、画像認識をする技術に関する。 The present invention relates to a technique for performing image recognition using deep learning (deep learning).

深層学習は、様々な分野で応用が研究されている。例えば、非特許文献１は、深層学習を用いて人間の画像を解析することにより、その人間の関節の位置を推定し、関節の位置からその人間の姿勢を検出する技術を開示している。非特許文献１の技術は、カスケード型検出器と深層学習とを組み合わせて関節の位置を推定する。検出器がカスケード構造であるのは、関節の位置の推定精度を向上させるためである。カスケード型検出器の最初の段が、関節の位置を大まかに推定し、次の段は、前の段が推定した関節の位置を基にして関節の位置を推定する。このように、カスケード型検出器の各段は、関節の位置を推定する画像認識処理をし、この処理に深層学習が用いられる。 Applications of deep learning have been studied in various fields. For example, Non-Patent Document 1 discloses a technique for estimating the position of a human joint by analyzing a human image using deep learning and detecting the posture of the human from the position of the joint. The technique of Non-Patent Document 1 estimates a joint position by combining a cascade detector and deep learning. The reason why the detector has a cascade structure is to improve the estimation accuracy of the joint position. The first stage of the cascade detector roughly estimates the joint position, and the next stage estimates the joint position based on the joint position estimated by the previous stage. In this way, each stage of the cascade detector performs image recognition processing for estimating the joint position, and deep learning is used for this processing.

カスケード型検出器以外に、カスケート型分類器等も深層学習と組み合わせることができる。カスケート型分類器は、例えば、特定の人間を検出する場合、特定の人間とこれ以外の人間とに分類する。本明細書では、これらの総称として、カスケード型処理部と記載する。 In addition to the cascade detector, a cascade categorizer can be combined with deep learning. For example, when a specific person is detected, the cascade categorizer classifies the specific person and the other person. In this specification, these are collectively referred to as a cascade processing unit.

″ＤｅｅｐＰｏｓｅ：ＨｕｍａｎＰｏｓｅＥｓｔｉｍａｔｉｏｎｖｉａＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋｓ″、［ｏｎｌｉｎｅ］、［平成２８年１１月１７日検索］、インターネット〈ＵＲＬ：http://www.cv-foundation.org/openaccess/content_cvpr_2014/papers/Toshev_DeepPose_Human_Pose_2014_CVPR_paper.pdf〉"DeepPose: Human Position Estimate via Deep Neural Networks", [online], [November 17, 2016 search], Internet <URL: http://www.cv-foundation.org/openaccess/content_cvpr_C_Purse_P_P_P___ .pdf>

深層学習は、処理量が多いので、深層学習には、時間を要する。カスケード型処理部は、複数の段で画像認識処理がされる。このため、カスケード型処理部と深層学習とを組み合わせて、画像認識処理をする画像認識装置では、画像認識処理の時間が長くなる。時系列画像（例えば、動画）の場合、画像認識処理の時間が長くなることは問題である。例えば、動いている人間の姿勢を検出する場合、検出された姿勢と現在の姿勢とに大きなズレが生じる可能性がある。 Since deep learning requires a large amount of processing, deep learning requires time. The cascade processing unit performs image recognition processing in a plurality of stages. For this reason, in the image recognition apparatus that performs the image recognition process by combining the cascade type processing unit and the deep learning, the time of the image recognition process becomes long. In the case of a time-series image (for example, a moving image), it takes a long time to perform image recognition processing. For example, when detecting the posture of a moving human, there is a possibility that a large shift occurs between the detected posture and the current posture.

本発明の目的は、カスケード型処理部と深層学習とを組み合わせて、時系列画像に対して画像認識をする際に、画像認識処理に要する時間を短くできる画像認識装置及び画像認識方法を提供することである。 An object of the present invention is to provide an image recognition apparatus and an image recognition method capable of shortening the time required for image recognition processing when image recognition is performed on a time-series image by combining a cascade processing unit and deep learning. That is.

本発明の第１の局面に係る画像認識装置は、カスケード構造を有し、前記カスケード構造の各段において、深層学習を用いた画像認識処理をする処理部と、前記カスケード構造の各段のうち、前記画像認識処理をする１以上の段を決定する決定部と、を備え、前記決定部が決定した前記１以上の段は、時系列画像に対して、前記画像認識処理をする。 An image recognition apparatus according to a first aspect of the present invention has a cascade structure, and in each stage of the cascade structure, a processing unit that performs image recognition processing using deep learning, and each stage of the cascade structure A determination unit that determines one or more stages for performing the image recognition process, and the one or more stages determined by the determination unit perform the image recognition process on a time-series image.

決定部は、深層学習を用いた画像認識処理をする１以上の段を決定する。このため、常に、カスケード構造の全ての段が、深層学習を用いた画像認識処理をするのではなく、これより少ない数の段が、深層学習を用いた画像認識処理をすることができる。従って、本発明の第１の局面に係る画像認識装置によれば、カスケード型処理部と深層学習とを組み合わせて、時系列画像に対して画像認識をする際に、画像認識処理に要する時間を短くできる。 The determination unit determines one or more stages for performing image recognition processing using deep learning. For this reason, not all stages of the cascade structure always perform image recognition processing using deep learning, but a smaller number of stages can perform image recognition processing using deep learning. Therefore, according to the image recognition apparatus according to the first aspect of the present invention, the time required for the image recognition process when the image recognition is performed on the time-series image by combining the cascade processing unit and the deep learning. Can be shortened.

時系列画像は、撮像された時間の順に並んだ画像であり、例えば、動画や、一定の時間間隔で撮影された画像である。 A time-series image is an image arranged in the order of captured time, for example, a moving image or an image captured at a certain time interval.

画像認識処理とは、例えば、物体の位置を推定する画像認識処理、物体を検出する画像認識処理である。物体の位置を推定する画像認識処理とは、例えば、人間の姿勢を検出するために、人間の関節の位置を推定する画像認識処理である。物体を検出する画像認識処理とは、例えば、多数の人間の中から特定の人間を検出する画像認識処理や、自動運転のために、道路の前方に存在する物体の中から人間を検出する画像認識処理である。 The image recognition process is, for example, an image recognition process for estimating the position of an object or an image recognition process for detecting an object. The image recognition process for estimating the position of an object is, for example, an image recognition process for estimating the position of a human joint in order to detect a human posture. The image recognition process for detecting an object is, for example, an image recognition process for detecting a specific person from a large number of persons or an image for detecting a person from objects existing in front of a road for automatic driving. Recognition process.

上記構成において、前記処理部は、前記１以上の段のうち、最後の段が前記時系列画像に対して前記画像認識処理をした結果を出力し、前記決定部は、前記結果を基にして、前記１以上の段を決定する。 In the above configuration, the processing unit outputs a result of the image recognition processing performed on the time-series image in the last stage among the one or more stages, and the determination unit is based on the result. And determining the one or more stages.

最後の段について、３段のカスケード構造を例にして説明する。１段目、２段目及び３段目が選択されたとき、最後の段は、３段目である。３段目のみが選択されたとき、最後の段は、３段目である。１段目及び２段目が選択されたとき、最後の段は、２段目である。１段目のみが選択されたとき、最後の段は、１段目である。 The last stage will be described by taking a three-stage cascade structure as an example. When the first stage, the second stage, and the third stage are selected, the last stage is the third stage. When only the third level is selected, the last level is the third level. When the first stage and the second stage are selected, the last stage is the second stage. When only the first stage is selected, the last stage is the first stage.

画像認識処理をした結果を基にしてとは、例えば、リアルタイム性を重視する場合、処理部が前記結果をリアルタイムで出力できたか否かである。処理部が前記結果をリアルタイムで出力できないとき、決定部は、処理部が前記結果をリアルタイムで出力できるようにするために、画像認識処理に用いる段の数を減らす決定をする。処理部が前記結果をリアルタイムで出力できたとき、決定部は、リアルタイム性に加えて、画像認識の精度を向上させるために、画像認識処理に用いる段の数を増やす決定をする。 Based on the result of the image recognition process, for example, when emphasizing real-time property, it is whether or not the processing unit can output the result in real time. When the processing unit cannot output the result in real time, the determination unit determines to reduce the number of stages used for the image recognition processing so that the processing unit can output the result in real time. When the processing unit can output the result in real time, the determination unit determines to increase the number of stages used for the image recognition processing in order to improve the accuracy of image recognition in addition to the real time property.

上記構成において、前記処理部は、前記１以上の段のうち、最後の段が前記時系列画像に対して前記画像認識処理をした結果を出力し、前記画像認識装置は、前記結果を基にして、前記結果の信頼度を判定する判定部をさらに備え、前記決定部は、前記信頼度を基にして、前記１以上の段を決定する。 In the above configuration, the processing unit outputs a result of the image recognition processing performed on the time-series image in the last stage among the one or more stages, and the image recognition apparatus is based on the result. The determination unit further determines a reliability of the result, and the determination unit determines the one or more stages based on the reliability.

決定部が決定した１以上の段について、段の数が多いとき、画像認識処理をした結果の信頼度が向上する。段の数が少ないとき、前記結果の算出速度が向上するので、処理部は前記結果をリアルタイムで出力することが可能となる。 When there are a large number of stages for one or more stages determined by the determination unit, the reliability of the result of image recognition processing is improved. When the number of stages is small, the calculation speed of the result is improved, so that the processing unit can output the result in real time.

信頼度を基にしてとは、例えば、決定部は、信頼度が高くなるに従って、段の数を減らす決定をし、信頼度が低くなるに従って、段の数を増やす決定をする。これにより、前記結果の算出速度と前記結果の信頼度とのバランスをとることができる。 Based on the reliability, for example, the determination unit determines to decrease the number of stages as the reliability increases, and determines to increase the number of stages as the reliability decreases. This makes it possible to balance the calculation speed of the result and the reliability of the result.

信頼度の程度は、しきい値で判断することができる。しきい値が１つのとき、信頼度の程度は、信頼度が高い場合と信頼度が高くない場合とに分けることができる。しきい値の数を増やすと、信頼度の程度を細かく分けることができる。例えば、しきい値が２つのとき、信頼度の程度は、信頼度が高い場合、信頼度が中位の場合、信頼度が低い場合に分けることができる。 The degree of reliability can be determined by a threshold value. When there is one threshold, the degree of reliability can be divided into a case where the reliability is high and a case where the reliability is not high. Increasing the number of thresholds can finely divide the degree of reliability. For example, when there are two thresholds, the degree of reliability can be divided into a case where the reliability is high, a case where the reliability is medium, and a case where the reliability is low.

上記構成において、前記処理部は、前記画像認識処理によって、物体の推定位置を算出し、前記判定部は、前記時系列画像を構成する複数の画像のうち、今回、前記処理部で前記画像認識処理がされた画像を第１の画像とし、前記第１の画像よりも前に前記処理部で前記画像認識処理がされた画像を第２の画像とし、前記第１の画像での前記物体の推定位置と前記第２の画像での前記物体の推定位置との距離を算出し、前記距離の大小に応じて前記信頼度を判定する。 In the above configuration, the processing unit calculates an estimated position of the object by the image recognition processing, and the determination unit is currently performing the image recognition by the processing unit among a plurality of images constituting the time-series image. The processed image is defined as a first image, the image subjected to the image recognition processing by the processing unit prior to the first image is defined as a second image, and the object in the first image is defined as the second image. A distance between the estimated position and the estimated position of the object in the second image is calculated, and the reliability is determined according to the magnitude of the distance.

第１の画像が撮像された時刻と第２の画像が撮像された時刻とが近いとする。第１の画像での物体の推定位置の精度、及び、第２の画像での物体の推定位置の精度が高ければ、これらの推定位置は、極めて近いはずである（又は同じはずである）。従って、判定部は、例えば、これらの推定位置の距離が所定のしきい値より小さければ、第１の画像での物体の推定位置の信頼度が高いと判定する。これに対して、判定部は、上記距離が所定のしきい値以上であれば、第１の画像での物体の推定位置の信頼度が高くないと判定する。 Assume that the time when the first image is captured is close to the time when the second image is captured. If the accuracy of the estimated position of the object in the first image and the accuracy of the estimated position of the object in the second image are high, these estimated positions should be very close (or the same). Therefore, the determination unit determines that the reliability of the estimated position of the object in the first image is high, for example, if the distance between these estimated positions is smaller than a predetermined threshold value. On the other hand, the determination unit determines that the reliability of the estimated position of the object in the first image is not high if the distance is equal to or greater than a predetermined threshold value.

上記構成において、前記１以上の段のうち、最初の段が１段目でない場合、前記最初の段の直前に位置する段が出力した過去の結果に基づいて、前記直前に位置する段が出力する結果を推定する推定部をさらに備え、前記最初の段は、前記推定部が推定した結果を用いて前記画像認識処理をする。 In the above configuration, when the first stage is not the first stage among the one or more stages, the stage located immediately before is output based on the past result output by the stage located immediately before the first stage. An estimation unit for estimating a result to be performed, and the first stage performs the image recognition process using a result estimated by the estimation unit.

カスケード型構造において、各段は、直前に位置する段が出力した結果を用いて、画像認識処理をする。例えば、２段目は、１段目が出力した結果を用いて、画像認識処理をし、３段目は、２段目が出力した結果を用いて、画像認識処理をする。決定部が決定した１以上の段のうち、最初の段が１段目でない場合、直前に位置する段が出力した結果が存在しない。例えば、３段のカスケード構造において、３段目のみが選択されたとする。３段目は、２段目が出力した結果を用いて画像認識処理をするので、２段目が出力した結果が必要となる。しかし、３段目のみが選択されているので、２段目が結果を出力することはない。そこで、推定部は、２段目が出力した過去の結果を基にして、２段目が出力する結果を推定する。例えば、結果が物体の推定位置の場合、推定部は、外挿、物体追跡、深層学習等を用いて、結果を推定する。 In the cascade structure, each stage performs image recognition processing using the result output by the stage positioned immediately before. For example, the second stage performs image recognition processing using the result output from the first stage, and the third stage performs image recognition processing using the result output from the second stage. Among the one or more stages determined by the determination unit, when the first stage is not the first stage, there is no result output by the stage positioned immediately before. For example, assume that only the third stage is selected in a three-stage cascade structure. Since the third stage performs image recognition processing using the result output from the second stage, the result output from the second stage is required. However, since only the third stage is selected, the second stage does not output the result. Therefore, the estimation unit estimates the result output from the second stage based on the past result output from the second stage. For example, when the result is the estimated position of the object, the estimation unit estimates the result using extrapolation, object tracking, deep learning, or the like.

本発明の第２の局面に係る画像認識方法は、カスケード構造を有し、前記カスケード構造の各段において、深層学習を用いた画像認識処理をする処理部を使用する画像認識方法であって、前記カスケード構造の各段のうち、前記画像認識処理をする１以上の段を決定する第１のステップと、前記第１のステップで決定した前記１以上の段が、時系列画像に対して、前記画像認識処理をする第２のステップと、を備える。 An image recognition method according to a second aspect of the present invention is an image recognition method having a cascade structure, and using a processing unit that performs image recognition processing using deep learning at each stage of the cascade structure, Among the stages of the cascade structure, a first step for determining one or more stages for performing the image recognition processing, and the one or more stages determined in the first step are performed on a time-series image, And a second step of performing the image recognition process.

本発明の第２の局面に係る画像認識方法は、本発明の第１の局面に係る画像認識装置を方法の観点から規定しており、本発明の第１の局面に係る画像認識装置と同様の作用効果を有する。 The image recognition method according to the second aspect of the present invention defines the image recognition device according to the first aspect of the present invention from the viewpoint of the method, and is the same as the image recognition device according to the first aspect of the present invention. It has the following effects.

本発明によれば、カスケード型処理部と深層学習とを組み合わせて、時系列画像に対して画像認識をする際に、画像認識処理に要する時間を短くできる。 According to the present invention, it is possible to shorten the time required for image recognition processing when image recognition is performed on a time-series image by combining a cascade processing unit and deep learning.

実施形態に係る画像認識装置の構成を示すブロック図である。It is a block diagram which shows the structure of the image recognition apparatus which concerns on embodiment. 全段選択モードを説明する模式図である。It is a schematic diagram explaining all the stage selection modes. 最終段選択モードを説明する模式図である。It is a schematic diagram explaining the last stage selection mode. 実施形態に係る画像認識装置の動作を説明するフローチャートその１である。It is the flowchart 1 explaining operation | movement of the image recognition apparatus which concerns on embodiment. 実施形態に係る画像認識装置の動作を説明するフローチャートその２である。It is the flowchart 2 explaining operation | movement of the image recognition apparatus which concerns on embodiment. ｋ番目フレームの一例を示す模式図である。It is a schematic diagram which shows an example of a kth frame. 入力画像Ｉｍ−１の一例を示す模式図である。It is a schematic diagram which shows an example of input image Im-1. 左肩関節の推定位置Ｐ１−１と入力画像Ｉｍ−１との関係の一例を示す模式図である。It is a schematic diagram which shows an example of the relationship between the estimated position P1-1 of a left shoulder joint, and input image Im-1. 矩形領域が設定された入力画像Ｉｍ−１の一例を示す模式図である。It is a schematic diagram which shows an example of input image Im-1 to which the rectangular area was set. ２段用切り出し画像Ｉｍ２−１の一例を示す模式図である。It is a schematic diagram which shows an example of the cut-out image Im2-1 for 2 steps | paragraphs. 左肩関節の推定位置Ｐ２−１と２段目用切り出し画像Ｉｍ２−１との関係の一例を示す模式図である。It is a schematic diagram which shows an example of the relationship between the estimated position P2-1 of the left shoulder joint and the cut-out image Im2-1 for the second stage. 矩形領域が設定された入力画像Ｉｍ−１の一例を示す模式図である。It is a schematic diagram which shows an example of input image Im-1 to which the rectangular area was set. ３段目用切り出し画像Ｉｍ３−１の一例を示す模式図である。It is a schematic diagram which shows an example of the 3rd-stage cut-out image Im3-1. 左肩関節の推定位置Ｐ３−１と３段目用切り出し画像Ｉｍ３−１との関係の一例を示す模式図である。It is a schematic diagram which shows an example of the relationship between the estimated position P3-1 of the left shoulder joint and the third-stage cutout image Im3-1. ｋ＋１０番目フレームの一例を示す模式図である。It is a schematic diagram which shows an example of the k + 10th frame. 入力画像Ｉｍ−２の一例を示す模式図である。It is a schematic diagram which shows an example of input image Im-2. ２段目用切り出し画像Ｉｍ２−２〜Ｉｍ２−５を用いて、２段目処理部が算出した左肩関節の推定位置Ｐ２−２〜Ｐ２−５を示す模式図である。It is a schematic diagram which shows the estimated positions P2-2 to P2-5 of the left shoulder joint calculated by the second-stage processing unit using the second-stage cut-out images Im2-2 to Im2-5. 矩形領域が設定された入力画像Ｉｍ−２の一例を示す模式図である。It is a schematic diagram which shows an example of input image Im-2 to which the rectangular area was set. ３段目用切り出し画像Ｉｍ３−２の一例を示す模式図である。It is a schematic diagram which shows an example of the 3rd step | paragraph cut-out image Im3-2. 左肩関節の推定位置Ｐ３−２と３段目用切り出し画像Ｉｍ３−２との関係の一例を示す模式図である。It is a schematic diagram which shows an example of the relationship between the estimated position P3-2 of a left shoulder joint, and the 3rd step | paragraph cut-out image Im3-2. ２段選択モードを説明する模式図である。It is a schematic diagram explaining 2 step | paragraph selection mode. ２段選択モードの動作を説明するフローチャートである。It is a flowchart explaining operation | movement in 2 step | paragraph selection mode. 矩形領域が設定された入力画像Ｉｍ−２の一例を示す模式図である。It is a schematic diagram which shows an example of input image Im-2 to which the rectangular area was set. ２段目用切り出し画像Ｉｍ２−６の一例を示す模式図である。It is a schematic diagram which shows an example of the cut-out image Im2-6 for the second stage. 左肩関節の推定位置Ｐ２−７と２段目用切り出し画像Ｉｍ２−６との関係の一例を示す模式図である。It is a schematic diagram which shows an example of the relationship between the estimated position P2-7 of the left shoulder joint and the second-stage cutout image Im2-6. 第２変形例で実行される第１モードを説明する説明図である。It is explanatory drawing explaining the 1st mode performed in the 2nd modification. 第２変形例で実行される第２モードを説明する説明図である。It is explanatory drawing explaining the 2nd mode performed in the 2nd modification. 第２変形例で実行される第３モードを説明する説明図である。It is explanatory drawing explaining the 3rd mode performed in the 2nd modification.

以下、図面に基づいて本発明の実施形態を詳細に説明する。各図において、同一符号を付した構成は、同一の構成であることを示し、その構成について、既に説明している内容については、その説明を省略する。本明細書において、総称する場合には添え字を省略した参照符号で示し（例えば、入力画像Ｉｍ）、個別の構成を指す場合には添え字を付した参照符号で示す（例えば、入力画像Ｉｍ−１）。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. In each figure, the structure which attached | subjected the same code | symbol shows that it is the same structure, The description is abbreviate | omitted about the content which has already demonstrated the structure. In the present specification, when referring generically, it is indicated by a reference symbol without a suffix (for example, input image Im), and when referring to an individual configuration, it is indicated by a reference symbol with a suffix (for example, input image Im). -1).

図１は、実施形態に係る画像認識装置１の構成を示すブロック図である。画像認識装置１は、機能ブロックとして、人物検出部１１、入力画像生成部１３、カスケード型処理部１５、信頼度判定部１７、段数決定部１９、及び、推定部１２を備える。画像認識装置１は、ハードウェア（ＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）等）、及び、ソフトウェア等によって実現される。 FIG. 1 is a block diagram illustrating a configuration of an image recognition apparatus 1 according to the embodiment. The image recognition device 1 includes a person detection unit 11, an input image generation unit 13, a cascade processing unit 15, a reliability determination unit 17, a stage number determination unit 19, and an estimation unit 12 as functional blocks. The image recognition apparatus 1 is realized by hardware (CPU (Central Processing Unit), RAM (Random Access Memory), ROM (Read Only Memory), HDD (Hard Disk Drive), and the like, and software.

人物検出部１１は、撮像部３が撮像した動画のフレームに対して画像処理をし、フレームに人物が写っていれば、人物を検出する。動画は、時系列画像の具体例である。時系列画像は、撮像された時間の順に並んだ画像である。例えば、一定の時間間隔で撮影された画像は、時系列画像である。人物検出部１１は、例えば、ＨＯＧ（ＨｉｓｔｏｇｒａｍｏｆＯｒｉｅｎｔｅｄＧｒａｄｉｅｎｔｓ）特徴を用いた人物検出等のような、公知の人物検出技術を用いて、人物を検出することができる。撮像部３は、例えば、デジタル式の可視光カメラ、デジタル式の赤外線カメラである。 The person detection unit 11 performs image processing on the frame of the moving image captured by the imaging unit 3, and detects a person if a person is captured in the frame. A moving image is a specific example of a time-series image. A time-series image is an image arranged in the order of captured time. For example, images taken at regular time intervals are time-series images. The person detection unit 11 can detect a person using a known person detection technique such as person detection using a HOG (Histogram of Oriented Gradients) feature. The imaging unit 3 is, for example, a digital visible light camera or a digital infrared camera.

入力画像生成部１３は、入力画像Ｉｍを生成する。入力画像Ｉｍは、フレームから人物を含む矩形領域を切り出した切り出し画像である。入力画像生成部１３は、入力画像Ｉｍをカスケード型処理部１５に入力する。 The input image generation unit 13 generates an input image Im. The input image Im is a cutout image obtained by cutting out a rectangular area including a person from a frame. The input image generation unit 13 inputs the input image Im to the cascade processing unit 15.

カスケード型処理部１５は、カスケード構造を有し、カスケード構造の各段において、深層学習を用いた画像認識処理をする。カスケード型処理部１５は、カスケード型検出器、カスケード型分類器の総称である。実施形態では、カスケード型処理部１５を用いて、人物の所定の関節（例えば、左肩関節、左肘関節、左手首関節、左股関節、左膝関節、左足首関節、右肩関節、右肘関節、右手首関節、右股関節、右膝関節、右足首関節）の推定位置を算出する。 The cascade processing unit 15 has a cascade structure, and performs image recognition processing using deep learning at each stage of the cascade structure. The cascade processing unit 15 is a general term for a cascade detector and a cascade classifier. In the embodiment, a predetermined human joint (for example, a left shoulder joint, a left elbow joint, a left wrist joint, a left hip joint, a left knee joint, a left ankle joint, a right shoulder joint, a right elbow joint) using the cascade processing unit 15. , Right wrist joint, right hip joint, right knee joint, right ankle joint).

深層学習を用いた関節の推定位置の算出について簡単に説明する。上記所定の関節のいずれの場合も同じなので、左肩関節の推定位置の算出を例にして説明する。画像認識装置１は、左肩関節が写されている人物の多数の画像（服で覆われた左肩関節の画像、服で覆われていない左肩関節の画像のいずれも含まれる）と、左肩関節が写されていない多数の画像とを用いて、左肩関節を認識する学習をする。カスケード型処理部１５は、この学習を基礎にして、動画のフレームに写された人物の左肩関節の位置を推定する。深層学習では、多層ニューラルネットワーク（ＤＮＮ：ＤｅｅｐＮｅｕｒａｌＮｅｔｗｏｒｋ）が用いられる。多層ニューラルネットワークとして、例えば、ＣＮＮ（ＣｏｎｖｏｌｕｔｉｏｎａｌＮｅｕｒａｌＮｅｔｗｏｒｋ）がある。 The calculation of the estimated joint position using deep learning will be briefly described. Since it is the same in any of the predetermined joints, the calculation of the estimated position of the left shoulder joint will be described as an example. The image recognition apparatus 1 includes a large number of images of a person whose left shoulder joint is photographed (including an image of a left shoulder joint covered with clothes and an image of a left shoulder joint not covered with clothes) and a left shoulder joint. Learning to recognize the left shoulder joint using a number of images that have not been copied. Based on this learning, the cascade processing unit 15 estimates the position of the left shoulder joint of the person captured in the frame of the moving image. In deep learning, a multilayer neural network (DNN) is used. As a multi-layer neural network, for example, there is CNN (Convolutional Neural Network).

カスケード型処理部１５は、１段目処理部１５１、２段目処理部１５３、及び、３段目処理部１５５を備える。実施形態では、段数が３を例に説明するが、段数は、複数であればよい。 The cascade processing unit 15 includes a first stage processing unit 151, a second stage processing unit 153, and a third stage processing unit 155. In the embodiment, the number of stages will be described as an example, but the number of stages may be plural.

１段目処理部１５１は、深層学習を用いた画像認識処理をして、関節の推定位置を大まかに算出する。２段目処理部１５３は、１段目処理部１５１が出力した結果（ここでは、１段目処理部１５１が算出した関節の推定位置）を基にして、深層学習を用いた画像認識処理をして、関節の推定位置を算出する。これにより、関節の推定位置は、１段目処理部１５１が算出した関節の推定位置よりも高い精度となる。３段目処理部１５５は、２段目処理部１５３が出力した結果（ここでは、２段目処理部１５３が算出した関節の推定位置）を基にして、深層学習を用いた画像認識処理をして、関節の推定位置を算出する。これにより、関節の推定位置は、２段目処理部１５３が算出した関節の推定位置よりも高い精度となる。以上説明したカスケード型処理部１５を用いる関節の推定位置の算出については、例えば、上記非特許文献１に開示されている。 The first-stage processing unit 151 performs an image recognition process using deep learning, and roughly calculates an estimated joint position. The second-stage processing unit 153 performs image recognition processing using deep learning based on the result output by the first-stage processing unit 151 (here, the estimated joint position calculated by the first-stage processing unit 151). Then, the estimated position of the joint is calculated. Thereby, the estimated position of the joint has higher accuracy than the estimated position of the joint calculated by the first stage processing unit 151. The third-stage processing unit 155 performs image recognition processing using deep learning based on the result output by the second-stage processing unit 153 (here, the estimated joint position calculated by the second-stage processing unit 153). Then, the estimated position of the joint is calculated. Thereby, the estimated position of the joint has higher accuracy than the estimated position of the joint calculated by the second stage processing unit 153. The calculation of the estimated joint position using the cascade processing unit 15 described above is disclosed in Non-Patent Document 1, for example.

画像認識装置１は、３段目処理部１５５が算出した、上記所定の関節の推定位置を、所定の関節の現在の推定位置とし、これらを基にして、人物の姿勢を判定する。この判定には、関節位置を用いて人物の姿勢を判定する公知のアルゴリズムを用いることができるので、人物の姿勢を判定する説明については省略する。画像認識装置１は、人物の姿勢の判定結果を表示部５に表示させる。 The image recognition apparatus 1 uses the estimated position of the predetermined joint calculated by the third-stage processing unit 155 as the current estimated position of the predetermined joint, and determines the posture of the person based on these. For this determination, a known algorithm for determining the posture of the person using the joint position can be used, and therefore description of determining the posture of the person is omitted. The image recognition device 1 causes the display unit 5 to display the determination result of the posture of the person.

信頼度判定部１７は、関節の推定位置の信頼度を判定する。上記所定の関節のいずれの場合も同じなので、左肩関節の推定位置の信頼度を例にして説明する。ｋ番目フレームは、第１の画像の一例であり、ｋ−１番目フレームは、第２の画像の一例である。ｋ番目フレームでの左肩関節の推定位置（左肩関節の現在の推定位置）の精度、及び、ｋ−１番目フレームでの左肩関節の推定位置（左肩関節の過去の推定位置）の精度が高ければ、これらの推定位置は、極めて近いはずである（又は同じはずである）。従って、信頼度判定部１７は、これらの推定位置の距離が所定のしきい値より小さければ、ｋ番目フレームでの左肩関節の推定位置の信頼度が高いと判断する。これに対して、信頼度判定部１７は、上記距離が所定のしきい値以上であれば、ｋ番目フレームでの左肩関節の推定位置の信頼度が高くないと判断する。 The reliability determination unit 17 determines the reliability of the estimated joint position. Since it is the same in any of the predetermined joints, the reliability of the estimated position of the left shoulder joint will be described as an example. The kth frame is an example of a first image, and the (k-1) th frame is an example of a second image. If the accuracy of the estimated position of the left shoulder joint in the k-th frame (current estimated position of the left shoulder joint) and the accuracy of the estimated position of the left shoulder joint in the k-1 frame (the past estimated position of the left shoulder joint) are high These estimated positions should be very close (or the same). Accordingly, the reliability determination unit 17 determines that the reliability of the estimated position of the left shoulder joint in the k-th frame is high if the distance between these estimated positions is smaller than a predetermined threshold value. On the other hand, the reliability determination unit 17 determines that the reliability of the estimated position of the left shoulder joint in the k-th frame is not high if the distance is equal to or greater than a predetermined threshold value.

信頼度が高くなければ、カスケード型処理部１５は、１段目処理部１５１〜３段目処理部１５５を用いて、左肩関節の推定位置を算出する必要がある。そこで、段数決定部１９は、信頼度が高くないとき、カスケード型処理部１５に全段選択モードを実行する命令をする。全段選択モードとは、１段目処理部１５１、２段目処理部１５３、３段目処理部１５５の全てを用いて関節の推定位置を算出するモードである。図２は、全段選択モードを説明する模式図である。１段目処理部１５１、２段目処理部１５３、３段目処理部１５５は、同じ入力画像Ｉｍに対して、関節の推定位置を算出する。 If the reliability is not high, the cascade processing unit 15 needs to calculate the estimated position of the left shoulder joint using the first-stage processing unit 151 to the third-stage processing unit 155. Therefore, when the reliability is not high, the stage number determination unit 19 instructs the cascade processing unit 15 to execute the all-stage selection mode. The all-stage selection mode is a mode in which the joint position is calculated using all of the first-stage processing unit 151, the second-stage processing unit 153, and the third-stage processing unit 155. FIG. 2 is a schematic diagram for explaining the all-stage selection mode. The first-stage processing unit 151, the second-stage processing unit 153, and the third-stage processing unit 155 calculate joint estimation positions for the same input image Im.

これに対して、信頼度が高ければ、カスケード型処理部１５は、１段目処理部１５１及び２段目処理部１５３を省略しても、左肩関節の推定位置を高精度で算出することができる。そこで、段数決定部１９は、信頼度が高いとき、カスケード型処理部１５に最終段選択モードを実行する命令をする。最終段選択モードとは、１段目処理部１５１及び２段目処理部１５３を省略し、３段目処理部１５５を用いて関節の推定位置を算出するモードである。図３は、最終段選択モードを説明する模式図である。最終段選択モードは、１段目処理部１５１及び２段目処理部１５３が省略されるので、入力画像Ｉｍは、３段目処理部１５５にのみ入力する。 On the other hand, if the reliability is high, the cascade processing unit 15 can calculate the estimated position of the left shoulder joint with high accuracy even if the first-stage processing unit 151 and the second-stage processing unit 153 are omitted. it can. Therefore, the stage number determination unit 19 instructs the cascade processing unit 15 to execute the final stage selection mode when the reliability is high. The final stage selection mode is a mode in which the first stage processing unit 151 and the second stage processing unit 153 are omitted, and the joint position is calculated using the third stage processing unit 155. FIG. 3 is a schematic diagram for explaining the final stage selection mode. Since the first stage processing unit 151 and the second stage processing unit 153 are omitted in the final stage selection mode, the input image Im is input only to the third stage processing unit 155.

以上説明したように、段数決定部１９は、図２に示す全段選択モード又は図３に示す最終段選択モードを選択する。すなわち、段数決定部１９は、カスケード型処理部１５の各段のうち、関節の推定位置を算出するための画像認識処理をする１以上の段を決定する。 As described above, the stage number determination unit 19 selects the entire stage selection mode shown in FIG. 2 or the final stage selection mode shown in FIG. That is, the stage number determination unit 19 determines one or more stages for performing the image recognition process for calculating the estimated position of the joint among the stages of the cascade processing unit 15.

なお、信頼度は、上記距離に限定されない。深層学習によれば、深層学習によって算出された推定位置が、実際の位置である確率を算出できることが知られている。これを用いて、信頼度判定部１７が信頼度を算出する。例えば、上述したように、１段目処理部１５１、２段目処理部１５３、３段目処理部１５５は、それぞれ、深層学習を用いて、ｋ番目フレームでの左肩関節の推定位置を算出する。このとき、１段目処理部１５１は、ｋ番目フレームについて、１段目処理部１５１が算出した左肩関節の推定位置が、左肩関節の実際位置である確率を、深層学習を用いて算出する（例えば、確率が７０％）。同様に、２段目処理部１５３は、ｋ番目フレームについて、２段目処理部１５３が算出した左肩関節の推定位置が、左肩関節の実際位置である確率を、深層学習を用いて算出する（例えば、確率が８０％）。３段目処理部１５５は、ｋ番目フレームについて、３段目処理部１５５が算出した左肩関節の推定位置が、左肩関節の実際位置である確率を、深層学習を用いて算出する（例えば、確率が９０％）。信頼度判定部１７は、それらの確率を掛け算して、左肩関節の推定位置の信頼度を算出する（信頼度＝７０％×８０％×９０％）。 Note that the reliability is not limited to the above distance. According to deep learning, it is known that a probability that an estimated position calculated by deep learning is an actual position can be calculated. Using this, the reliability determination unit 17 calculates the reliability. For example, as described above, the first-stage processing unit 151, the second-stage processing unit 153, and the third-stage processing unit 155 each calculate the estimated position of the left shoulder joint in the k-th frame using deep learning. . At this time, the first-stage processing unit 151 calculates, using deep learning, the probability that the estimated position of the left shoulder joint calculated by the first-stage processing unit 151 is the actual position of the left shoulder joint for the k-th frame ( For example, the probability is 70%). Similarly, the second-stage processing unit 153 calculates, using deep learning, the probability that the estimated position of the left shoulder joint calculated by the second-stage processing unit 153 is the actual position of the left shoulder joint for the kth frame ( For example, the probability is 80%). The third-stage processing unit 155 calculates the probability that the estimated position of the left shoulder joint calculated by the third-stage processing unit 155 is the actual position of the left shoulder joint for the k-th frame using deep learning (for example, the probability 90%). The reliability determination unit 17 multiplies those probabilities to calculate the reliability of the estimated position of the left shoulder joint (reliability = 70% × 80% × 90%).

実施形態に係る画像認識装置１の動作について説明する。図４Ａ及び図４Ｂは、これを説明するフローチャートである。実施形態において、関節の推定位置を算出するアルゴリズムは、上述した所定の関節のいずれの場合も同じであるので、左肩関節の推定位置の算出を例にして説明する。 An operation of the image recognition apparatus 1 according to the embodiment will be described. 4A and 4B are flowcharts for explaining this. In the embodiment, since the algorithm for calculating the estimated position of the joint is the same for any of the predetermined joints described above, the calculation of the estimated position of the left shoulder joint will be described as an example.

図１及び図４Ａを参照して、画像認識装置１の動作が開始した時点において、信頼度判定部１７は、信頼度を算出していない。よって、段数決定部１９は、図２に示す全段選択モードを選択する。 With reference to FIG. 1 and FIG. 4A, the reliability determination unit 17 has not calculated the reliability at the time when the operation of the image recognition apparatus 1 starts. Therefore, the stage number determination unit 19 selects the all stage selection mode shown in FIG.

人物検出部１１は、撮像部３から送られてきた動画（時系列画像の一例）のフレームに対して、リアルタイムで人物検出の画像処理をする（ステップＳ１）。人物検出部１１が、ステップＳ１において、人物が写されたフレームを検出できないとき（ステップＳ２でＮｏ）、人物検出部１１は、ステップＳ１の処理を繰り返す。 The person detection unit 11 performs image processing for person detection in real time on a frame of a moving image (an example of a time-series image) sent from the imaging unit 3 (step S1). When the person detection unit 11 cannot detect the frame in which the person is captured in step S1 (No in step S2), the person detection unit 11 repeats the process of step S1.

例えば、１番目フレームからｋ−１番目フレームまで、人物が写されたフレームがない場合、人物検出部１１は、１番目フレームからｋ−１番目フレームまで、ステップＳ１の処理と、ステップＳ２がＮｏの判断とを繰り返す。図５に示すｋ番目フレームに人物２１が写されているとする。図５は、ｋ番目フレームの一例を示す模式図である。人物検出部１１は、ステップＳ１において、ｋ番目フレームに写された人物２１を検出する（ステップＳ２でＹｅｓ）。これにより、画像認識装置１は、人物２１の左肩関節の推定位置を算出する処理を開始する。 For example, when there is no frame in which a person is captured from the first frame to the (k-1) th frame, the person detection unit 11 performs the process of step S1 and the step S2 in No from the first frame to the k-1th frame. Repeat the judgment. It is assumed that the person 21 is copied in the kth frame shown in FIG. FIG. 5 is a schematic diagram illustrating an example of the kth frame. In step S1, the person detection unit 11 detects the person 21 copied in the kth frame (Yes in step S2). Thereby, the image recognition apparatus 1 starts a process of calculating the estimated position of the left shoulder joint of the person 21.

入力画像生成部１３は、人物２１が写された矩形領域をｋ番目フレームから切り出し、図６に示す入力画像Ｉｍ−１を生成する（ステップＳ３）。図６は、入力画像Ｉｍ−１の一例を示す模式図である。入力画像の総称を入力画像Ｉｍと記載する。 The input image generation unit 13 cuts out a rectangular region in which the person 21 is copied from the kth frame, and generates an input image Im-1 shown in FIG. 6 (step S3). FIG. 6 is a schematic diagram illustrating an example of the input image Im-1. A generic name of the input image is referred to as an input image Im.

入力画像生成部１３は、入力画像Ｉｍ−１を、１段目処理部１５１、２段目処理部１５３、及び、３段目処理部１５５に入力する。 The input image generation unit 13 inputs the input image Im−1 to the first stage processing unit 151, the second stage processing unit 153, and the third stage processing unit 155.

１段目処理部１５１は、入力画像Ｉｍ−１に対して、深層学習を用いた画像認識処理をし、左肩関節の推定位置Ｐ１−１を算出する（ステップＳ４）。図７は、左肩関節の推定位置Ｐ１−１と入力画像Ｉｍ−１との関係の一例を示す模式図である。１段目処理部１５１は、左肩関節の推定位置Ｐ１−１を大まかに算出するので、左肩関節の推定位置Ｐ１−１と、左肩関節の実際位置Ｐ０とがずれが比較的大きい。１段目処理部１５１が算出した左肩関節の推定位置の総称を、左肩関節の推定位置Ｐ１と記載する。 The first stage processing unit 151 performs image recognition processing using deep learning on the input image Im-1, and calculates an estimated position P1-1 of the left shoulder joint (step S4). FIG. 7 is a schematic diagram illustrating an example of the relationship between the estimated position P1-1 of the left shoulder joint and the input image Im-1. Since the first-stage processing unit 151 roughly calculates the estimated position P1-1 of the left shoulder joint, the deviation between the estimated position P1-1 of the left shoulder joint and the actual position P0 of the left shoulder joint is relatively large. The generic name of the estimated position of the left shoulder joint calculated by the first stage processing unit 151 is referred to as an estimated position P1 of the left shoulder joint.

カスケード型処理部１５は、１段目処理部１５１が算出した左肩関節の推定位置Ｐ１−１を示す情報を記憶する。１段目処理部１５１は、その情報を出力する（ステップＳ５）。 The cascade processing unit 15 stores information indicating the estimated position P1-1 of the left shoulder joint calculated by the first stage processing unit 151. The first stage processing unit 151 outputs the information (step S5).

２段目処理部１５３は、矩形領域を入力画像Ｉｍ−１に設定する。図８は、その矩形領域２３が設定された入力画像Ｉｍ−１の一例を示す模式図である。矩形領域２３は、図７に示す左肩関節の推定位置Ｐ１−１を含む。２段目処理部１５３は、入力画像Ｉｍ−１のうち、矩形領域２３で囲まれた部分を切り出し、図９に示す切り出し画像を生成する（ステップＳ６）。これを２段目用切り出し画像Ｉｍ２−１と称する。図９は、２段用切り出し画像Ｉｍ２−１の一例を示す模式図である。２段目用切り出し画像Ｉｍ２−１の面積は、入力画像Ｉｍ−１の面積より小さい。２段目用切り出し画像Ｉｍ２−１には、左肩を含む人物２１の一部分が写されている。２段目用切り出し画像の総称を２段目用切り出し画像Ｉｍ２と記載する。 The second stage processing unit 153 sets the rectangular area to the input image Im-1. FIG. 8 is a schematic diagram showing an example of the input image Im-1 in which the rectangular area 23 is set. The rectangular area 23 includes an estimated position P1-1 of the left shoulder joint shown in FIG. The second-stage processing unit 153 cuts out a portion surrounded by the rectangular area 23 from the input image Im-1, and generates a cut-out image shown in FIG. 9 (step S6). This is referred to as a second-stage cutout image Im2-1. FIG. 9 is a schematic diagram illustrating an example of the two-stage cut-out image Im2-1. The area of the second-stage cutout image Im2-1 is smaller than the area of the input image Im-1. A part of the person 21 including the left shoulder is shown in the second-stage cutout image Im2-1. A generic name of the second-stage cutout image is referred to as a second-stage cutout image Im2.

２段目処理部１５３は、２段目用切り出し画像Ｉｍ２−１に対して、深層学習を用いた画像認識処理をし、左肩関節の推定位置Ｐ２−１を算出する（ステップＳ７）。図１０は、左肩関節の推定位置Ｐ２−１と２段目用切り出し画像Ｉｍ２−１との関係の一例を示す模式図である。２段目処理部１５３は、１段目処理部１５１が算出した左肩関節の推定位置Ｐ１−１（図７）を用いて、左肩関節の推定位置Ｐ２−１を算出するので、左肩関節の推定位置Ｐ２−１は、左肩関節の実際位置Ｐ０に近い。２段目処理部１５３が算出した左肩関節の推定位置の総称を、左肩関節の推定位置Ｐ２と記載する。 The second-stage processing unit 153 performs image recognition processing using deep learning on the second-stage cutout image Im2-1 to calculate the estimated position P2-1 of the left shoulder joint (step S7). FIG. 10 is a schematic diagram illustrating an example of the relationship between the estimated position P2-1 of the left shoulder joint and the second-stage cutout image Im2-1. Since the second stage processing unit 153 calculates the estimated position P2-1 of the left shoulder joint using the estimated position P1-1 of the left shoulder joint calculated by the first stage processing unit 151 (FIG. 7), the estimation of the left shoulder joint is performed. The position P2-1 is close to the actual position P0 of the left shoulder joint. A generic name of the estimated position of the left shoulder joint calculated by the second-stage processing unit 153 is referred to as an estimated position P2 of the left shoulder joint.

カスケード型処理部１５は、２段目処理部１５３が算出した左肩関節の推定位置Ｐ２−１を示す情報を記憶する。２段目処理部１５３は、その情報を出力する（ステップＳ８）。 The cascade processing unit 15 stores information indicating the estimated position P2-1 of the left shoulder joint calculated by the second stage processing unit 153. The second stage processing unit 153 outputs the information (step S8).

３段目処理部１５５は、矩形領域を入力画像Ｉｍ−１に設定する。図１１は、その矩形領域２５が設定された入力画像Ｉｍ−１の一例を示す模式図である。矩形領域２５は、図１０に示す左肩関節の推定位置Ｐ２−１を含む。矩形領域２５の面積は、矩形領域２３（図８）の面積より小さい。３段目処理部１５５は、入力画像Ｉｍ−１のうち、矩形領域２５で囲まれた部分を切り出し、図１２に示す切り出し画像を生成する（ステップＳ９）。これを３段目用切り出し画像Ｉｍ３−１と称する。図１２は、３段目用切り出し画像Ｉｍ３−１の一例を示す模式図である。３段目用切り出し画像Ｉｍ３−１の面積は、２段目用切り出し画像Ｉｍ２−１の面積より小さい。３段目用切り出し画像Ｉｍ３−１には、左肩を含む人物２１の一部分が写されている。３段目用切り出し画像の総称を３段目用切り出し画像Ｉｍ３と記載する。 The third stage processing unit 155 sets the rectangular area to the input image Im-1. FIG. 11 is a schematic diagram illustrating an example of the input image Im-1 in which the rectangular area 25 is set. The rectangular area 25 includes the estimated position P2-1 of the left shoulder joint shown in FIG. The area of the rectangular area 25 is smaller than the area of the rectangular area 23 (FIG. 8). The third-stage processing unit 155 cuts out a portion surrounded by the rectangular area 25 from the input image Im-1, and generates a cut-out image shown in FIG. 12 (step S9). This is referred to as a third-stage cutout image Im3-1. FIG. 12 is a schematic diagram illustrating an example of the third-stage cutout image Im3-1. The area of the third-stage cutout image Im3-1 is smaller than the area of the second-stage cutout image Im2-1. A part of the person 21 including the left shoulder is shown in the third-stage cutout image Im3-1. A generic name of the third-stage cutout image is referred to as a third-stage cutout image Im3.

３段目処理部１５５は、３段目用切り出し画像Ｉｍ３−１に対して、深層学習を用いた画像認識処理をし、左肩関節の推定位置Ｐ３−１を算出する（ステップＳ１０）。図１３は、左肩関節の推定位置Ｐ３−１と３段目用切り出し画像Ｉｍ３−１との関係の一例を示す模式図である。３段目処理部１５５は、２段目処理部１５３が算出した左肩関節の推定位置Ｐ２−１（図１０）を用いて、左肩関節の推定位置Ｐ３−１を算出するので、左肩関節の推定位置Ｐ３−１は、左肩関節の実際位置Ｐ０にさらに近くなる。図１３では、左肩関節の推定位置Ｐ３−１と左肩関節の実際位置Ｐ０とがほぼ同じとなる。３段目処理部１５５が算出した左肩関節の推定位置の総称を、左肩関節の推定位置Ｐ３と記載する。 The third-stage processing unit 155 performs image recognition processing using deep learning on the third-stage cut-out image Im3-1, and calculates the estimated position P3-1 of the left shoulder joint (step S10). FIG. 13 is a schematic diagram illustrating an example of the relationship between the estimated position P3-1 of the left shoulder joint and the third-stage cutout image Im3-1. Since the third-stage processing unit 155 calculates the estimated position P3-1 of the left shoulder joint using the estimated position P2-1 of the left shoulder joint calculated by the second-stage processing unit 153 (FIG. 10), the left shoulder joint is estimated. The position P3-1 is closer to the actual position P0 of the left shoulder joint. In FIG. 13, the estimated position P3-1 of the left shoulder joint and the actual position P0 of the left shoulder joint are substantially the same. A generic name of the estimated position of the left shoulder joint calculated by the third stage processing unit 155 is referred to as an estimated position P3 of the left shoulder joint.

カスケード型処理部１５は、３段目処理部１５５が算出した左肩関節の推定位置Ｐ３−１を示す情報を記憶する。３段目処理部１５５は、その情報を出力する（ステップＳ１１）。左肩関節の推定位置Ｐ３−１は、図５に示すｋ番目フレームでの左肩関節の推定位置を示す情報である。信頼度判定部１７は、この情報を記憶する。現時点において、ｋ番目フレームでの左肩関節の推定位置が、左肩関節の現在の推定位置である。 The cascade processing unit 15 stores information indicating the estimated position P3-1 of the left shoulder joint calculated by the third-stage processing unit 155. The third stage processing unit 155 outputs the information (step S11). The estimated position P3-1 of the left shoulder joint is information indicating the estimated position of the left shoulder joint in the kth frame shown in FIG. The reliability determination unit 17 stores this information. At the present time, the estimated position of the left shoulder joint at the k-th frame is the current estimated position of the left shoulder joint.

信頼度判定部１７は、信頼度の算出が可能か否かを判断する（ステップＳ１２）。信頼度の算出には、左肩関節の現在の推定位置と、左肩関節の過去の推定位置とが用いられる。ここでは、ｋ番目フレームでの左肩関節の推定位置が、左肩関節の現在の推定位置であり、ｋ−１番目フレームでの左肩関節の推定位置が、左肩関節の過去の推定位置である。上述したように、ｋ番目フレームより前のフレームには、人物２１が写されていないので、ｋ−１番目フレームでの左肩関節の推定位置は、算出されていない。従って、信頼度判定部１７は、信頼度の算出が可能でないと判断する（ステップＳ１２でＮｏ）。これにより、人物検出部１１は、ステップＳ１の処理をする。ここでは、ｋ＋１番目のフレームについて、ステップＳ１の処理がされる。 The reliability determination unit 17 determines whether or not the reliability can be calculated (step S12). For the calculation of the reliability, the current estimated position of the left shoulder joint and the past estimated position of the left shoulder joint are used. Here, the estimated position of the left shoulder joint in the kth frame is the current estimated position of the left shoulder joint, and the estimated position of the left shoulder joint in the k−1th frame is the past estimated position of the left shoulder joint. As described above, since the person 21 is not copied in the frame before the kth frame, the estimated position of the left shoulder joint in the (k−1) th frame is not calculated. Therefore, the reliability determination unit 17 determines that the reliability cannot be calculated (No in step S12). Thereby, the person detection part 11 processes step S1. Here, the process of step S1 is performed for the (k + 1) th frame.

ｋ＋１番目フレームに人物２１が写されているとする。人物検出部１１は、ｋ＋１番目フレームに写された人物２１を検出する（ステップＳ２でＹｅｓ）。入力画像生成部１３は、ステップＳ３の処理をする。カスケード型処理部１５は、ステップＳ３〜ステップＳ１１の処理をする。ステップＳ１１で出力された情報は、ｋ＋１番目フレームでの左肩関節の推定位置を示す情報である。信頼度判定部１７は、この情報を記憶する。現時点において、ｋ＋１番目フレームでの左肩関節の推定位置が、左肩関節の現在の推定位置であり、ｋ番目フレームでの左肩関節の推定位置が、左肩関節の過去の推定位置である。 It is assumed that the person 21 is copied in the (k + 1) th frame. The person detection unit 11 detects the person 21 captured in the (k + 1) th frame (Yes in step S2). The input image generation unit 13 performs the process of step S3. The cascade processing unit 15 performs the processes of Step S3 to Step S11. The information output in step S11 is information indicating the estimated position of the left shoulder joint in the (k + 1) th frame. The reliability determination unit 17 stores this information. At the present time, the estimated position of the left shoulder joint in the (k + 1) th frame is the current estimated position of the left shoulder joint, and the estimated position of the left shoulder joint in the kth frame is the past estimated position of the left shoulder joint.

信頼度判定部１７は、信頼度の算出が可能か否かを判断する（ステップＳ１２）。信頼度判定部１７は、左肩関節の現在の推定位置を示す情報、及び、左肩関節の過去の推定位置を示す情報を記憶しているので、信頼度の算出が可能と判断する（ステップＳ１２でＹｅｓ）。 The reliability determination unit 17 determines whether or not the reliability can be calculated (step S12). Since the reliability determination unit 17 stores information indicating the current estimated position of the left shoulder joint and information indicating the past estimated position of the left shoulder joint, it is determined that the reliability can be calculated (in step S12). Yes).

信頼度判定部１７は、左肩関節の現在の推定位置（ここでは、ｋ＋１番目フレームでの左肩関節の推定位置）と、左肩関節の過去の推定位置（ここでは、ｋ番目フレームでの左肩関節の推定位置）と、を用いて、左肩関節の現在の推定位置の信頼度を算出する。信頼度判定部１７は、その信頼度が高いか否かを判定する（ステップＳ１３）。 The reliability determination unit 17 determines the current estimated position of the left shoulder joint (here, the estimated position of the left shoulder joint in the (k + 1) th frame) and the past estimated position of the left shoulder joint (here, the left shoulder joint in the kth frame). And the reliability of the current estimated position of the left shoulder joint is calculated. The reliability determination unit 17 determines whether or not the reliability is high (step S13).

段数決定部１９は、左肩関節の現在の推定位置の信頼度が高くないとき（ステップＳ１３でＮｏ）、図２に示す全段選択モードを選択する。従って、画像認識装置１は、次のフレーム（ここでは、ｋ＋２番目フレーム）に対して、ステップＳ１〜ステップＳ１３の処理をする。 When the reliability of the current estimated position of the left shoulder joint is not high (No in step S13), the stage number determination unit 19 selects the all stage selection mode shown in FIG. Therefore, the image recognition apparatus 1 performs the processing from step S1 to step S13 for the next frame (here, k + 2nd frame).

段数決定部１９は、左肩関節の現在の推定位置の信頼度が高いとき（ステップＳ１３でＹｅｓ）、図３に示す最終段選択モードを選択する。従って、カスケード型処理部１５は、次のフレームに対して、最終段選択モードを実行する。例えば、信頼度判定部１７が、ｋ＋１番目〜ｋ＋８番目フレームのそれぞれについて、左肩関節の現在の推定位置の信頼度が高くないと判定し（ステップＳ１３でＮｏ）、カスケード型処理部１５は、これらのフレームのそれぞれに対して、全段選択モードを実行した後、ｋ＋９番目フレームについて、左肩関節の現在の推定位置の信頼度が高いと判定する（ステップＳ１３でＹｅｓ）。段数決定部１９は、ｋ＋１０番目フレームに対して、図３に示す最終段選択モードを選択する。 When the reliability of the current estimated position of the left shoulder joint is high (Yes in step S13), the stage number determination unit 19 selects the final stage selection mode shown in FIG. Accordingly, the cascade processing unit 15 executes the final stage selection mode for the next frame. For example, the reliability determination unit 17 determines that the reliability of the current estimated position of the left shoulder joint is not high for each of the k + 1th to k + 8th frames (No in step S13), and the cascade processing unit 15 After executing the all-stage selection mode for each of the frames, it is determined that the reliability of the current estimated position of the left shoulder joint is high for the k + 9th frame (Yes in step S13). The stage number determination unit 19 selects the last stage selection mode shown in FIG. 3 for the k + 10th frame.

図１及び図４Ｂを参照して、人物検出部１１は、図１４に示すｋ＋１０番目フレームに対して、リアルタイムで人物検出の画像処理をする（ステップＳ１４）。図１４は、ｋ＋１０番目フレームの一例を示す模式図である。このフレームには、人物２１が写されている。人物検出部１１は、ステップＳ１４において、ｋ＋１０番目フレームに写された人物２１を検出する（ステップＳ１５でＹｅｓ）。なお、人物検出部１１が、ステップＳ１４において、ｋ＋１０番目フレームから人物２１を検出できないとき（ステップＳ１５でＮｏ）、人物２１は、撮像部３の撮像範囲外におり、段数決定部１９は、最終段選択モードから全段選択モードに切り替える。これにより、画像認識装置１は、次のフレーム（ここでは、ｋ＋１１番目フレーム）に対して、ステップＳ１〜ステップＳ１３の処理をする。 Referring to FIGS. 1 and 4B, person detection unit 11 performs image processing for person detection in real time on the k + 10th frame shown in FIG. 14 (step S14). FIG. 14 is a schematic diagram illustrating an example of the (k + 10) th frame. A person 21 is shown in this frame. In step S14, the person detection unit 11 detects the person 21 captured in the k + 10th frame (Yes in step S15). When the person detection unit 11 cannot detect the person 21 from the k + 10th frame in step S14 (No in step S15), the person 21 is outside the imaging range of the imaging unit 3, and the stage number determination unit 19 Switch from stage selection mode to full stage selection mode. As a result, the image recognition apparatus 1 performs the processing from step S1 to step S13 on the next frame (here, the k + 11th frame).

人物検出部１１は、ｋ＋１０番目フレームから人物２１を検出したとき（ステップＳ１５でＹｅｓ）、入力画像生成部１３は、人物２１が写された矩形領域をｋ＋１０番目フレームから切り出し、図１５に示す入力画像Ｉｍ−２を生成する（ステップＳ１６）。図１５は、入力画像Ｉｍ−２の一例を示す模式図である。 When the person detection unit 11 detects the person 21 from the k + 10th frame (Yes in step S15), the input image generation unit 13 cuts out a rectangular area in which the person 21 is copied from the k + 10th frame, and performs the input shown in FIG. An image Im-2 is generated (step S16). FIG. 15 is a schematic diagram illustrating an example of the input image Im-2.

入力画像生成部１３は、入力画像Ｉｍ−２を３段目処理部１５５に入力する。３段目処理部１５５が左肩関節の推定位置Ｐ３を算出するには、前提として、３段目用切り出し画像Ｉｍ３を作成する必要がある。３段目用切り出し画像Ｉｍ３とは、例えば、図１２及び図１３に示す３段目用切り出し画像Ｉｍ３−１のように、３段目処理部１５５が左肩関節の推定位置Ｐ３を算出するために用いる切り出し画像である。 The input image generation unit 13 inputs the input image Im-2 to the third stage processing unit 155. In order for the third-stage processing unit 155 to calculate the estimated position P3 of the left shoulder joint, it is necessary to create a third-stage cutout image Im3 as a premise. The third-stage cutout image Im3 is, for example, like the third-stage cutout image Im3-1 shown in FIGS. 12 and 13, in order for the third-stage processing unit 155 to calculate the estimated position P3 of the left shoulder joint. This is a cut-out image to be used.

３段目用切り出し画像Ｉｍ３を作成するためには、２段目処理部１５３が算出した左肩関節の推定位置Ｐ２が必要となる（図２）。しかし、図３に示す最終段選択モードでは、２段目処理部１５３が省略されるので、ｋ＋１０番目フレームでの左肩関節の推定位置Ｐ２が算出されない。 In order to create the third-stage cutout image Im3, the estimated position P2 of the left shoulder joint calculated by the second-stage processing unit 153 is required (FIG. 2). However, in the final stage selection mode shown in FIG. 3, since the second stage processing unit 153 is omitted, the estimated position P2 of the left shoulder joint at the k + 10th frame is not calculated.

そこで、ｋ＋１０番目フレームでの左肩関節の推定位置Ｐ２の算出に、直近の数フレームにおいて、２段目処理部１５３が、２段目用切り出し画像Ｉｍ２を用いて算出した左肩関節の推定位置Ｐ２を用いる。ここでは、直近の数フレームとして、ｋ＋９番目フレーム〜ｋ＋６番目フレームを例にして説明する。図１６は、ｋ＋９番目フレーム〜ｋ＋６番目フレームからそれぞれ切り出された、２段目用切り出し画像Ｉｍ２−２〜Ｉｍ２−５を用いて、２段目処理部１５３が算出した左肩関節の推定位置Ｐ２−２〜Ｐ２−５を示す模式図である。 Therefore, for the calculation of the estimated position P2 of the left shoulder joint in the k + 10th frame, the estimated position P2 of the left shoulder joint calculated by the second-stage processing unit 153 using the second-stage cut-out image Im2 in the latest several frames is used. Use. Here, the k + 9th frame to the k + 6th frame will be described as examples of the most recent frames. FIG. 16 shows the estimated position P2- of the left shoulder joint calculated by the second-stage processing unit 153 using the second-stage cut-out images Im2-2 to Im2-5 cut out from the k + 9th frame to the k + 6th frame, respectively. It is a schematic diagram which shows 2-P2-5.

推定部１２は、図１６に示す左肩関節の推定位置Ｐ２−２〜Ｐ２−５を用いて、ｋ＋１０番目フレームでの左肩関節の推定位置Ｐ２−６を算出する（ステップＳ１７）。この算出方法としては、外挿、物体追跡、深層学習等があり、これらのいずれでもよい。 The estimation unit 12 calculates the estimated position P2-6 of the left shoulder joint at the k + 10th frame using the estimated positions P2-2 to P2-5 of the left shoulder joint shown in FIG. 16 (step S17). This calculation method includes extrapolation, object tracking, deep learning, and the like, and any of these may be used.

ステップＳ１７後、３段目処理部１５５は、矩形領域を入力画像Ｉｍ−２に設定する。図１７は、その矩形領域２７が設定された入力画像Ｉｍ−２の一例を示す模式図である。矩形領域２７は、ステップＳ１７で算出された左肩関節の推定位置Ｐ２ー６を含む。３段目処理部１５５は、図１７に示す入力画像Ｉｍ−２のうち、矩形領域２７で囲まれた部分を切り出し、図１８に示す３段目切り出し画像Ｉｍ３−２を生成する（ステップＳ１８）。図１８は、３段目用切り出し画像Ｉｍ３−２の一例を示す模式図である。３段目用切り出し画像Ｉｍ３−２には、左肩を含む人物２１の一部分が写されている。 After step S17, the third-stage processing unit 155 sets the rectangular area to the input image Im-2. FIG. 17 is a schematic diagram illustrating an example of the input image Im-2 in which the rectangular area 27 is set. The rectangular area 27 includes the estimated position P2-6 of the left shoulder joint calculated in step S17. The third-stage processing unit 155 cuts out a portion surrounded by the rectangular area 27 from the input image Im-2 shown in FIG. 17, and generates a third-stage cut-out image Im3-2 shown in FIG. 18 (step S18). . FIG. 18 is a schematic diagram illustrating an example of the third-stage cutout image Im3-2. A part of the person 21 including the left shoulder is shown in the third-stage cutout image Im3-2.

３段目処理部１５５は、３段目用切り出し画像Ｉｍ３−２に対して、深層学習を用いた画像認識処理をし、左肩関節の推定位置Ｐ３−２を算出する（ステップＳ１９）。図１９は、ステップＳ１９で算出された左肩関節の推定位置Ｐ３−２と３段目用切り出し画像Ｉｍ３−２との関係の一例を示す模式図である。 The third-stage processing unit 155 performs image recognition processing using deep learning on the third-stage cut-out image Im3-2, and calculates the estimated position P3-2 of the left shoulder joint (step S19). FIG. 19 is a schematic diagram illustrating an example of the relationship between the estimated position P3-2 of the left shoulder joint calculated in step S19 and the third-stage cutout image Im3-2.

カスケード型処理部１５は、３段目処理部１５５が算出した左肩関節の推定位置Ｐ３−２を示す情報を記憶する。３段目処理部１５５は、その情報を出力する（ステップＳ２０）。左肩関節の推定位置Ｐ３−２は、図１４に示すｋ＋１０番目フレームでの左肩関節の推定位置を示す情報である。信頼度判定部１７は、この情報を記憶する。現時点において、ｋ＋１０番目フレームでの左肩関節の推定位置が、左肩関節の現在の推定位置である。 The cascade processing unit 15 stores information indicating the estimated position P3-2 of the left shoulder joint calculated by the third-stage processing unit 155. The third stage processing unit 155 outputs the information (step S20). The estimated position P3-2 of the left shoulder joint is information indicating the estimated position of the left shoulder joint in the k + 10th frame shown in FIG. The reliability determination unit 17 stores this information. At this time, the estimated position of the left shoulder joint in the (k + 10) th frame is the current estimated position of the left shoulder joint.

信頼度判定部１７は、左肩関節の現在の推定位置（ここでは、ｋ＋１０番目フレームでの左肩関節の推定位置）と、左肩関節の過去の推定位置（ここでは、ｋ＋９番目フレームでの左肩関節の推定位置）と、を用いて、左肩関節の現在の推定位置の信頼度を算出する。信頼度判定部１７は、その信頼度が高いか否かを判定する（ステップＳ２１）。 The reliability determination unit 17 determines the current estimated position of the left shoulder joint (here, the estimated position of the left shoulder joint at the k + 10th frame) and the past estimated position of the left shoulder joint (here, the left shoulder joint at the k + 9th frame). And the reliability of the current estimated position of the left shoulder joint is calculated. The reliability determination unit 17 determines whether or not the reliability is high (step S21).

段数決定部１９は、左肩関節の現在の推定位置の信頼度が高いとき（ステップＳ２１でＹｅｓ）、図３に示す最終段選択モードを選択する。これにより、画像認識装置１は、次のフレーム（ここでは、ｋ＋１１番目のフレーム）に対して、ステップＳ１４〜ステップＳ２１の処理をする。 When the reliability of the current estimated position of the left shoulder joint is high (Yes in step S21), the stage number determination unit 19 selects the final stage selection mode shown in FIG. As a result, the image recognition apparatus 1 performs the processing of step S14 to step S21 for the next frame (here, k + 11th frame).

段数決定部１９は、左肩関節の現在の推定位置の信頼度が高くないとき（ステップＳ２１でＮｏ）、図２に示す全段選択モードを選択する。これにより、画像認識装置１は、次のフレーム（ここでは、ｋ＋１１番目フレーム）に対して、ステップＳ１〜ステップＳ１３の処理をする。 When the reliability of the current estimated position of the left shoulder joint is not high (No in step S21), the stage number determination unit 19 selects the all stage selection mode shown in FIG. As a result, the image recognition apparatus 1 performs the processing from step S1 to step S13 on the next frame (here, the k + 11th frame).

実施形態の主な効果を説明する。段数決定部１９は、図２に示す全段選択モード又は図３に示す最終段選択モードを選択する。最終段選択モードが選択されたとき、３段目処理部１５５のみが深層学習を用いた画像認識処理をする。このため、実施形態に係る画像認識装置１によれば、常に、カスケード型処理部１５の全ての段が、深層学習を用いた画像認識処理をするのではなく、これより少ない数の段が、深層学習を用いた画像認識処理をすることができる。従って、実施形態に係る画像認識装置１によれば、カスケード型処理部１５と深層学習とを組み合わせて、動画（時系列画像の一例）に対して画像認識をする際に、画像認識処理に要する時間を短くできる。 The main effects of the embodiment will be described. The stage number determination unit 19 selects the entire stage selection mode shown in FIG. 2 or the final stage selection mode shown in FIG. When the final stage selection mode is selected, only the third stage processing unit 155 performs image recognition processing using deep learning. For this reason, according to the image recognition apparatus 1 according to the embodiment, not all stages of the cascade processing unit 15 always perform image recognition processing using deep learning, but a smaller number of stages, Image recognition processing using deep learning can be performed. Therefore, according to the image recognition apparatus 1 according to the embodiment, the image recognition process is required when performing image recognition on a moving image (an example of a time-series image) by combining the cascade processing unit 15 and deep learning. Time can be shortened.

実施形態に係る画像認識装置１の変形例を説明する。実施形態は、信頼度の判定に１つのしきい値を用いたが、第１変形例は、信頼度の判定に２つのしきい値を用いる。２つのしきい値は、第１のしきい値と第２のしきい値であり、第１のしきい値が第２のしきい値より小さいとする。 A modification of the image recognition device 1 according to the embodiment will be described. In the embodiment, one threshold value is used for the determination of the reliability, but the first modification example uses two threshold values for the determination of the reliability. The two threshold values are a first threshold value and a second threshold value, and the first threshold value is smaller than the second threshold value.

第１変形例において、図１に示す信頼度判定部１７は、左肩関節の現在の推定位置と左肩関節の過去の推定位置との距離が、第１のしきい値より小さいとき、左肩関節の現在の推定位置の信頼度が高いと判定する。信頼度判定部１７は、上記距離が、第１のしきい値以上、かつ、第２のしきい値より小さいとき、左肩関節の現在の推定位置の信頼度が中位と判定する。信頼度判定部１７は、上記距離が第２のしきい値より大きいとき、左肩関節の現在の推定位置の信頼度が低いと判定する。 In the first modification, the reliability determination unit 17 illustrated in FIG. 1 determines that the distance between the current estimated position of the left shoulder joint and the past estimated position of the left shoulder joint is smaller than the first threshold value. It is determined that the reliability of the current estimated position is high. The reliability determination unit 17 determines that the reliability of the current estimated position of the left shoulder joint is medium when the distance is greater than or equal to the first threshold and less than the second threshold. The reliability determination unit 17 determines that the reliability of the current estimated position of the left shoulder joint is low when the distance is greater than the second threshold value.

段数決定部１９は、信頼度が低いと判定されたとき、カスケード型処理部１５に全段選択モード（図２）を実行する命令をする。第１変形例の全段選択モードが、実施形態の全段選択モードと異なる点は、図４Ａに示すステップＳ１３において、信頼度判定部１７が、信頼度が高いと判定する場合、信頼度が中位と判定する場合、信頼度が低いと判定する場合とがある。 When it is determined that the reliability is low, the stage number determination unit 19 instructs the cascade processing unit 15 to execute the all-stage selection mode (FIG. 2). The point that the all-stage selection mode of the first modification is different from the all-stage selection mode of the embodiment is that, when the reliability determination unit 17 determines that the reliability is high in step S13 shown in FIG. 4A, the reliability is high. When determining as medium, it may be determined that the reliability is low.

段数決定部１９は、信頼度が高いと判定されたとき、カスケード型処理部１５に最終段選択モード（図３）を実行する命令をする。第１変形例の最終段選択モードが、実施形態の最終段選択モードと異なる点は、図４Ｂに示すステップＳ２１において、信頼度判定部１７が、信頼度が高いと判定する場合、信頼度が中位と判定する場合、信頼度が低いと判定する場合とがある。 When it is determined that the reliability is high, the stage number determination unit 19 instructs the cascade processing unit 15 to execute the final stage selection mode (FIG. 3). The last stage selection mode of the first modified example is different from the last stage selection mode of the embodiment in that when the reliability determination unit 17 determines that the reliability is high in step S21 shown in FIG. 4B, the reliability is high. When determining as medium, it may be determined that the reliability is low.

段数決定部１９は、信頼度が中位と判定されたとき、カスケード型処理部１５に２段選択モードを実行する命令をする。２段選択モードとは、１段目処理部１５１を省略し、２段処理部及び３段目処理部１５５を用いて関節の推定位置を算出するモードである。図２０は、２段選択モードを説明する模式図である。２段選択モードは、１段目処理部１５１が省略されるので、入力画像Ｉｍは、２段目処理部１５３及び３段目処理部１５５に入力する。 When the reliability is determined to be medium, the stage number determination unit 19 instructs the cascade processing unit 15 to execute the two-stage selection mode. The second-stage selection mode is a mode in which the first-stage processing unit 151 is omitted and the joint position is calculated using the second-stage processing unit and the third-stage processing unit 155. FIG. 20 is a schematic diagram for explaining the two-stage selection mode. Since the first stage processing unit 151 is omitted in the second stage selection mode, the input image Im is input to the second stage processing unit 153 and the third stage processing unit 155.

図２１は、２段選択モードの動作を説明するフローチャートである。ステップＳ１４〜ステップＳ１６は、図４Ｂに示すステップＳ１４〜ステップＳ１６と同じである。 FIG. 21 is a flowchart for explaining the operation in the two-stage selection mode. Steps S14 to S16 are the same as steps S14 to S16 shown in FIG. 4B.

入力画像生成部１３は、図１５に示す入力画像Ｉｍ−２を２段目処理部１５３及び３段目処理部１５５に入力する。２段目処理部１５３が左肩関節の推定位置Ｐ２を算出するには、前提として、２段目用切り出し画像Ｉｍ２を作成する必要がある。２段目用切り出し画像Ｉｍ２とは、例えば、図９及び図１０に示す２段目用切り出し画像Ｉｍ２−１のように、２段目処理部１５３が左肩関節の推定位置Ｐ２を算出するために用いる切り出し画像である。２段目用切り出し画像Ｉｍ２を作成するためには、１段目処理部１５１が算出した左肩関節の推定位置Ｐ１が必要となる。しかし、２段選択モードでは、１段目処理部１５１が省略されるので、ｋ＋１０番目フレームでの左肩関節の推定位置Ｐ１が算出されない。 The input image generation unit 13 inputs the input image Im-2 illustrated in FIG. 15 to the second-stage processing unit 153 and the third-stage processing unit 155. In order for the second stage processing unit 153 to calculate the estimated position P2 of the left shoulder joint, it is necessary to create the second stage cutout image Im2. The second-stage cutout image Im2 is for the second-stage processing unit 153 to calculate the estimated position P2 of the left shoulder joint, for example, as in the second-stage cutout image Im2-1 shown in FIGS. This is a cut-out image to be used. In order to create the second-stage cutout image Im2, the left shoulder joint estimated position P1 calculated by the first-stage processing unit 151 is required. However, since the first-stage processing unit 151 is omitted in the two-stage selection mode, the estimated position P1 of the left shoulder joint at the k + 10th frame is not calculated.

そこで、推定部１２は、直近の数フレームにおいて（例えば、ｋ＋９番目フレーム〜ｋ＋６番目フレーム）、１段目処理部１５１が、入力画像Ｉｍを用いて算出した左肩関節の推定位置Ｐ１を用いて、左肩関節の推定位置Ｐ１を算出する（ステップＳ２２）。これは、図１６に示す左肩関節の推定位置Ｐ２−６の算出と同様の方法（外挿、物体追跡、深層学習等）で算出する。 Therefore, the estimation unit 12 uses the estimated position P1 of the left shoulder joint calculated by the first-stage processing unit 151 using the input image Im in the latest few frames (for example, k + 9th frame to k + 6th frame) An estimated position P1 of the left shoulder joint is calculated (step S22). This is calculated by the same method (extrapolation, object tracking, deep learning, etc.) as the calculation of the estimated position P2-6 of the left shoulder joint shown in FIG.

ステップＳ２２後、２段目処理部１５３は、矩形領域を入力画像Ｉｍ−２に設定する。図２２は、その矩形領域２９が設定された入力画像Ｉｍ−２の一例を示す模式図である。矩形領域２９は、ステップＳ２２で算出された左肩関節の推定位置Ｐ１を含む。 After step S22, the second-stage processing unit 153 sets the rectangular area to the input image Im-2. FIG. 22 is a schematic diagram illustrating an example of the input image Im-2 in which the rectangular area 29 is set. The rectangular area 29 includes the estimated position P1 of the left shoulder joint calculated in step S22.

２段目処理部１５３は、図２２に示す入力画像Ｉｍ−２のうち、矩形領域２９で囲まれた部分を切り出し、２段目用切り出し画像Ｉｍ２−６を生成する（ステップＳ２３）。図２３は、２段目用切り出し画像Ｉｍ２−６の一例を示す模式図である。２段目用切り出し画像Ｉｍ２−６には、左肩を含む人物２１の一部分が写されている。 The second-stage processing unit 153 cuts out a portion surrounded by the rectangular area 29 from the input image Im-2 shown in FIG. 22 and generates a second-stage cut-out image Im2-6 (step S23). FIG. 23 is a schematic diagram illustrating an example of the second-stage cutout image Im2-6. A part of the person 21 including the left shoulder is shown in the second-stage cutout image Im2-6.

２段目処理部１５３は、図２３に示す２段目用切り出し画像Ｉｍ２−６に対して、深層学習を用いた画像認識処理をし、左肩関節の推定位置Ｐ２−７を算出する（ステップＳ２４）。図２４は、ステップＳ２４で算出された左肩関節の推定位置Ｐ２−７と２段目用切り出し画像Ｉｍ２−６との関係の一例を示す模式図である。 The second-stage processing unit 153 performs image recognition processing using deep learning on the second-stage cutout image Im2-6 illustrated in FIG. 23, and calculates an estimated position P2-7 of the left shoulder joint (step S24). ). FIG. 24 is a schematic diagram illustrating an example of a relationship between the estimated position P2-7 of the left shoulder joint calculated in step S24 and the second-stage cutout image Im2-6.

カスケード型処理部１５は、ステップＳ２４で算出された左肩関節の推定位置Ｐ２−７を示す情報を記憶する。２段目処理部１５３は、その情報を出力する（ステップＳ２５）。 The cascade processing unit 15 stores information indicating the estimated position P2-7 of the left shoulder joint calculated in step S24. The second stage processing unit 153 outputs the information (step S25).

ステップＳ９〜ステップＳ１１は、図４Ａに示すステップＳ９〜ステップＳ１１と同じである。図１に示す信頼度判定部１７は、左肩関節の現在の推定位置（ステップＳ１１で出力された情報）と左肩関節の過去の推定位置とを用いて信頼度を判定する（ステップＳ２６）。信頼度判定部１７が、信頼度が高いと判定したとき、段数決定部１９は、次のフレームに対して、最終段選択モードを選択する。信頼度判定部１７が、信頼度が中位と判定したとき、段数決定部１９は、次のフレームに対して、２段選択モードを選択する。信頼度判定部１７が、信頼度が低いと判定したとき、段数決定部１９は、次のフレームに対して、全段選択モードを選択する。 Steps S9 to S11 are the same as steps S9 to S11 shown in FIG. 4A. The reliability determination unit 17 illustrated in FIG. 1 determines the reliability using the current estimated position of the left shoulder joint (information output in step S11) and the past estimated position of the left shoulder joint (step S26). When the reliability determination unit 17 determines that the reliability is high, the stage number determination unit 19 selects the final stage selection mode for the next frame. When the reliability determination unit 17 determines that the reliability is medium, the stage number determination unit 19 selects the 2-stage selection mode for the next frame. When the reliability determination unit 17 determines that the reliability is low, the stage number determination unit 19 selects the all-stage selection mode for the next frame.

第２変形例を説明する。図２及び図３に示すように、実施形態に係る画像認識装置１は、３段目処理部１５５が算出した関節の推定位置を示す情報をカスケード型処理部１５の出力とし、これを関節の現在の推定位置とする。第２変形例は、第１モード、第２モード、第３モードのいずれかを選択できる。第１モードは、１段目処理部１５１が算出した関節の推定位置を示す情報をカスケード型処理部１５の出力とし、これを関節の現在の推定位置とするモードである。第２モードは、２段目処理部１５３が算出した関節の推定位置を示す情報をカスケード型処理部１５の出力とし、これを関節の現在の推定位置とするモードである。第３モードは、３段目処理部１５５が算出した関節の推定位置を示す情報をカスケード型処理部１５の出力とし、これを関節の現在の推定位置とするモードである。 A second modification will be described. As shown in FIGS. 2 and 3, the image recognition apparatus 1 according to the embodiment uses information indicating the estimated joint position calculated by the third-stage processing unit 155 as an output of the cascade processing unit 15, and uses this as an output of the joint. The current estimated position. In the second modification, any one of the first mode, the second mode, and the third mode can be selected. The first mode is a mode in which information indicating the estimated joint position calculated by the first-stage processing unit 151 is output from the cascade processing unit 15 and is used as the current estimated position of the joint. The second mode is a mode in which information indicating the estimated joint position calculated by the second-stage processing unit 153 is output from the cascade processing unit 15 and is used as the current estimated position of the joint. The third mode is a mode in which information indicating the estimated position of the joint calculated by the third-stage processing unit 155 is used as the output of the cascade processing unit 15 and this is the current estimated position of the joint.

図２５は、第２変形例で実行される第１モードを説明する説明図である。図２６は、第２変形例で実行される第２モードを説明する説明図である。図２７は、第２変形例で実行される第３モードを説明する説明図である。第２変形例において、図１に示すカスケード型処理部１５は、スイッチＳ１及びスイッチＳ２を備える。これらのスイッチは、ソフトウェアスイッチである。スイッチＳ１は、１段目処理部１５１の出力部と２段目処理部１５３の入力部との接続と、１段目処理部１５１の出力部とカスケード型処理部１５の出力部との接続とを切り替える。スイッチＳ２は、２段目処理部１５３の出力部と３段目処理部１５５の入力部との接続と、２段目処理部１５３の出力部とカスケード型処理部１５の出力部との接続とを切り替える。 FIG. 25 is an explanatory diagram illustrating the first mode executed in the second modification. FIG. 26 is an explanatory diagram illustrating a second mode executed in the second modification. FIG. 27 is an explanatory diagram illustrating a third mode executed in the second modification. In the second modification, the cascade processing unit 15 illustrated in FIG. 1 includes a switch S1 and a switch S2. These switches are software switches. The switch S1 includes a connection between the output unit of the first stage processing unit 151 and the input unit of the second stage processing unit 153, and a connection between the output unit of the first stage processing unit 151 and the output unit of the cascade processing unit 15. Switch. The switch S2 includes a connection between the output unit of the second stage processing unit 153 and the input unit of the third stage processing unit 155, and a connection between the output unit of the second stage processing unit 153 and the output unit of the cascade processing unit 15. Switch.

図２５を参照して、第１モードの場合、段数決定部１９は、スイッチＳ１を制御して、１段目処理部１５１の出力部とカスケード型処理部１５の出力部とを接続させる。スイッチＳ２によって、２段目処理部１５３の出力部と３段目処理部１５５の入力部とが接続されているが、２段目処理部１５３の出力部とカスケード型処理部１５の出力部とが接続されていてもよい。第１モードは、図２に示す１段目処理部１５１が算出した関節の推定位置を示す情報を、カスケード型処理部１５の出力とするモードと言うことができる。 Referring to FIG. 25, in the first mode, stage number determination unit 19 controls switch S1 to connect the output unit of first stage processing unit 151 and the output unit of cascade type processing unit 15. The switch S2 connects the output unit of the second stage processing unit 153 and the input unit of the third stage processing unit 155, but the output unit of the second stage processing unit 153 and the output unit of the cascade processing unit 15 May be connected. The first mode can be said to be a mode in which information indicating the joint position calculated by the first stage processing unit 151 illustrated in FIG. 2 is output from the cascade processing unit 15.

図２６を参照して、第２モードの場合、段数決定部１９は、スイッチＳ１を制御して、１段目処理部１５１の出力部と２段目処理部１５３の入力部とを接続させ、かつ、スイッチＳ２を制御して、２段目処理部１５３の出力部とカスケード型処理部１５の出力部とを接続させる。第２モードは、図２に示す２段目処理部１５３が算出した関節の推定位置を示す情報を、カスケード型処理部１５の出力とするモードと言うことができる。 Referring to FIG. 26, in the second mode, stage number determination unit 19 controls switch S1 to connect the output unit of first stage processing unit 151 and the input unit of second stage processing unit 153, In addition, the switch S2 is controlled to connect the output unit of the second stage processing unit 153 and the output unit of the cascade processing unit 15. The second mode can be said to be a mode in which information indicating the joint position calculated by the second-stage processing unit 153 illustrated in FIG. 2 is output from the cascade processing unit 15.

図２７を参照して、第３モードの場合、段数決定部１９は、スイッチＳ１を制御して、１段目処理部１５１の出力部と２段目処理部１５３の入力部とを接続させ、かつ、スイッチＳ２を制御して、２段目処理部１５３の出力部と３段目処理部１５５の入力部とを接続させる。第３モードは、図２に示す３段目処理部１５５が算出した関節の推定位置を示す情報を、カスケード型処理部１５の出力とするモードと言うことができる（すなわち、全段選択モードである）。 Referring to FIG. 27, in the case of the third mode, stage number determination unit 19 controls switch S1 to connect the output unit of first stage processing unit 151 and the input unit of second stage processing unit 153, In addition, the switch S2 is controlled to connect the output unit of the second stage processing unit 153 and the input unit of the third stage processing unit 155. The third mode can be said to be a mode in which information indicating the estimated joint position calculated by the third-stage processing unit 155 shown in FIG. 2 is output from the cascade-type processing unit 15 (that is, in the all-stage selection mode). is there).

図２５に示す第１モードは、１段目処理部１５１が関節の推定位置を算出し、２段目処理部１５３及び３段目処理部１５５が省略される。このため、関節の現在の推定位置の信頼度は低いが、関節の現在の推定位置の算出速度は速い。 In the first mode shown in FIG. 25, the first-stage processing unit 151 calculates the estimated joint position, and the second-stage processing unit 153 and the third-stage processing unit 155 are omitted. For this reason, the reliability of the current estimated position of the joint is low, but the calculation speed of the current estimated position of the joint is fast.

図２７に示す第３モードは、１段目処理部１５１、２段目処理部１５３及び３段目処理部１５５が関節の推定位置を算出する。このため、関節の現在の推定位置の信頼度は高いが、関節の現在の推定位置の算出速度は遅い。 In the third mode shown in FIG. 27, the first-stage processing unit 151, the second-stage processing unit 153, and the third-stage processing unit 155 calculate joint estimated positions. For this reason, the reliability of the current estimated position of the joint is high, but the calculation speed of the current estimated position of the joint is slow.

図２６に示す第２モードは、１段目処理部１５１及び２段目処理部１５３が関節の推定位置を算出し、３段目処理部１５５が省略される。このため、関節の現在の推定位置の信頼度は、第１モードと比べて高いが、第３モードと比べて低い。関節の現在の推定位置の算出速度は、第１モードより遅いが、第３モードより速い。 In the second mode shown in FIG. 26, the first stage processing unit 151 and the second stage processing unit 153 calculate the joint position, and the third stage processing unit 155 is omitted. For this reason, the reliability of the current estimated position of the joint is higher than that in the first mode, but is lower than that in the third mode. The calculation speed of the current estimated position of the joint is slower than the first mode, but faster than the third mode.

画像認識装置１に入力する動画（図１）の解像度が高いとき、図２７に示す第３モードでは、関節の現在の推定位置をリアルタイムに算出することが困難となる。そこで、関節の現在の推定位置をリアルタイムに算出できるようにするために、段数決定部１９は、動画の解像度に応じて、第１モード〜第３モードのいずれかを選択する。解像度のしきい値として、第１のしきい値と第２のしきい値とがある。第１のしきい値が第２のしきい値より小さいとする。 When the resolution of the moving image (FIG. 1) input to the image recognition device 1 is high, it is difficult to calculate the current estimated position of the joint in real time in the third mode shown in FIG. Therefore, in order to be able to calculate the current estimated position of the joint in real time, the stage number determination unit 19 selects any one of the first mode to the third mode according to the resolution of the moving image. There are a first threshold value and a second threshold value as resolution threshold values. Assume that the first threshold value is smaller than the second threshold value.

段数決定部１９は、画像認識装置１に入力する動画の解像度が、第１のしきい値より小さいとき、図２７に示す第３モードを選択する。段数決定部１９は、画像認識装置１に入力する動画の解像度が、第１のしきい値以上、かつ、第２のしきい値より小さいとき、図２６に示す第２モードを選択する。段数決定部１９は、画像認識装置１に入力する動画の解像度が、第２のしきい値より大きいとき、図２５に示す第１モードを選択する。 The stage number determination unit 19 selects the third mode shown in FIG. 27 when the resolution of the moving image input to the image recognition apparatus 1 is smaller than the first threshold value. The stage number determination unit 19 selects the second mode shown in FIG. 26 when the resolution of the moving image input to the image recognition apparatus 1 is equal to or higher than the first threshold value and smaller than the second threshold value. The stage number determination unit 19 selects the first mode shown in FIG. 25 when the resolution of the moving image input to the image recognition apparatus 1 is larger than the second threshold value.

なお、ユーザが第１モード、第２モード、第３モードのいずれかを選択できるようにしてもよい。ユーザが第１モードを選択する入力を画像認識装置１に入力したとき、段数決定部１９は、第１モードを選択する。ユーザが第２モードを選択する入力を画像認識装置１に入力したとき、段数決定部１９は、第２モードを選択する。ユーザが第３モードを選択する入力を画像認識装置１に入力したとき、段数決定部１９は、第３モードを選択する。 Note that the user may be able to select any one of the first mode, the second mode, and the third mode. When the user inputs an input for selecting the first mode to the image recognition apparatus 1, the stage number determination unit 19 selects the first mode. When the user inputs an input for selecting the second mode to the image recognition apparatus 1, the stage number determination unit 19 selects the second mode. When the user inputs an input for selecting the third mode to the image recognition apparatus 1, the stage number determination unit 19 selects the third mode.

１画像認識装置
１２推定部
１５カスケード型処理部（処理部の一例）
１７信頼度判定部（判定部の一例）
１９段数決定部（決定部の一例）
２１人物
２３，２５，２７矩形領域
Ｉｍ入力画像
Ｉｍ２２段目用切り出し画像
Ｉｍ３３段目用切り出し画像
Ｐ０左肩関節の実際位置
Ｐ１１段目処理部が算出した左肩関節の推定位置
Ｐ２２段目処理部が算出した左肩関節の推定位置
Ｐ３３段目処理部が算出した左肩関節の推定位置 DESCRIPTION OF SYMBOLS 1 Image recognition apparatus 12 Estimation part 15 Cascade type | mold processing part (an example of a processing part)
17 Reliability determination unit (an example of a determination unit)
19 stage number determination unit (an example of a determination unit)
21 Person 23, 25, 27 Rectangular area Im Input image Im2 Second stage cut-out image Im3 Third stage cut-out image P0 Left shoulder joint actual position P1 Left shoulder joint estimated position P2 calculated by the first stage processing unit Second stage Estimated position of the left shoulder joint calculated by the processing unit P3 Estimated position of the left shoulder joint calculated by the third stage processing unit

Claims

A processing unit having a cascade structure, and performing image recognition processing using deep learning at each stage of the cascade structure;
A determination unit that determines one or more stages for performing the image recognition processing among the stages of the cascade structure;
The image recognition apparatus in which the one or more stages determined by the determination unit perform the image recognition processing on a time-series image.

The processing unit outputs a result of the image recognition processing performed on the time-series image in the last stage among the one or more stages,
The image recognition device according to claim 1, wherein the determination unit determines the one or more stages based on the result.

The processing unit outputs a result of the image recognition processing performed on the time-series image in the last stage among the one or more stages,
The image recognition device further includes a determination unit that determines the reliability of the result based on the result,
The image recognition apparatus according to claim 1, wherein the determination unit determines the one or more stages based on the reliability.

The processing unit calculates an estimated position of the object by the image recognition processing,
The determination unit sets, as a first image, an image that has been subjected to the image recognition processing by the processing unit among the plurality of images that constitute the time-series image, and performs the processing before the first image. The image subjected to the image recognition processing in the unit is set as a second image, a distance between the estimated position of the object in the first image and the estimated position of the object in the second image is calculated, The image recognition apparatus according to claim 3, wherein the reliability is determined according to a distance.

If the first stage is not the first stage among the one or more stages, the result output by the stage positioned immediately before is estimated based on the past results output by the stage positioned immediately before the first stage. And an estimator for
The image recognition apparatus according to claim 1, wherein the first stage performs the image recognition process using a result estimated by the estimation unit.

An image recognition method using a processing unit that has a cascade structure and performs image recognition processing using deep learning at each stage of the cascade structure,
A first step of determining one or more stages for performing the image recognition processing among the stages of the cascade structure;
The image recognition method, wherein the one or more stages determined in the first step include a second step of performing the image recognition process on a time-series image.