JP2021039740A

JP2021039740A - Pedestrian re-identification device and method

Info

Publication number: JP2021039740A
Application number: JP2020121896A
Authority: JP
Inventors: メンユェンジュ; Menyuan Ju; シンユグオ; xin yu Guo; アンシンリ; An-Shin Lee; ランチェン; Lan Chen; 佳祐山谷; Keisuke Yamatani; 誠也小島; Seiya Kojima; 俊樹酒井; Toshiki Sakai
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2019-09-02
Filing date: 2020-07-16
Publication date: 2021-03-11
Also published as: CN112446258A

Abstract

To provide a pedestrian re-identification device, method, and computer storage medium, which improve accuracy of pedestrian re-identification in scenes where a pedestrian changed his/her posture significantly or is identified by different cameras.SOLUTION: A pedestrian re-identification device 100 is provided, comprising: a pedestrian detection unit configured to detect pedestrians from each video frame of a video sequence; a feature extraction unit configured to extract appearance features of each pedestrian detected in the video frames; and a pedestrian identification unit configured to match extracted appearance features of each pedestrian with target appearance features of a target pedestrian, and identify the target pedestrian in the video frames based on a matching result. The target appearance features of the target pedestrian are generated based on initial appearance features and most recent appearance features of the target pedestrian.SELECTED DRAWING: Figure 3

Description

本開示は、画像処理に関し、具体的に、歩行者再識別デバイス、方法及びコンピュータ記憶媒体に関する。 The present disclosure relates to image processing, specifically to pedestrian re-identification devices, methods and computer storage media.

歩行者の再識別とは、複数の視野の重なっている、又は重なっていないカメラからの画像シーケンス又はビデオシーケンスを解析することによって、そのうち特定の目標歩行者が存在するか否かを判断する技術である。通常の歩行者の追跡とは異なり、歩行者の再識別では、特定の目標歩行者に対する長期的な追跡及び監視を可能にするために、異なるカメラにより撮影された画像シーケンス又はビデオシーケンスにおいて特定の目標歩行者を識別することができるため、監視分野や犯罪検知作業等において非常に良い適用の見通しがある。 Pedestrian re-identification is a technique for determining whether or not a specific target pedestrian exists by analyzing an image sequence or a video sequence from a camera in which multiple fields of view overlap or do not overlap. Is. Unlike normal pedestrian tracking, pedestrian re-identification involves specific in image or video sequences taken by different cameras to allow long-term tracking and surveillance of a particular target pedestrian. Since it can identify the target pedestrian, it has a very good prospect of application in the field of surveillance and crime detection work.

現在、歩行者の再識別では、画像シーケンスやビデオシーケンスから検出された各歩行者の特徴と、予め構築された目標歩行者ライブラリにおける既知の目標歩行者の特徴と、をマッチングして、検出された歩行者が既知の目標歩行者に該当するか否かを判断することが一般的であるが、その効果が良くないことが多く、主な原因として、異なるカメラ間の撮影角度、視野範囲、照明条件などの違いによって、異なるカメラで撮影された画像における同一歩行者の姿勢や外観が大きく異なることがあり、歩行者の移動性が高いことが多いので、姿勢や外観も時間と共に多少変化し、予め構築された目標歩行者の特徴は単一であり、同一歩行者の姿勢や外観が変化した場合に誤識別が発生し易くなる。 Currently, pedestrian re-identification is detected by matching the characteristics of each pedestrian detected from image sequences and video sequences with the characteristics of known target pedestrians in a pre-built target pedestrian library. It is common to determine whether a pedestrian is a known target pedestrian, but the effect is often poor, and the main causes are the shooting angle between different cameras, the viewing range, and so on. Due to differences in lighting conditions, the posture and appearance of the same pedestrian in images taken by different cameras may differ significantly, and pedestrians are often highly mobile, so the posture and appearance also change slightly over time. The pre-constructed target pedestrian has a single characteristic, and misidentification is likely to occur when the posture or appearance of the same pedestrian changes.

本開示の一態様によれば、ビデオシーケンスの各ビデオフレームに歩行者を検出するように構成される歩行者検出ユニットと、ビデオフレームから検出された各歩行者について、外観特徴を抽出するように構成される特徴抽出ユニットと、抽出された各歩行者の外観特徴を目標歩行者の目標外観特徴とマッチングし、マッチング結果に基づいてビデオフレームから目標歩行者を識別するように構成される歩行者識別ユニットと、を含み、前記目標歩行者の目標外観特徴は、前記目標歩行者の初期外観特徴と最近外観特徴に基づいて生成される歩行者再識別デバイスが提供される。 According to one aspect of the present disclosure, a pedestrian detection unit configured to detect a pedestrian in each video frame of a video sequence and an appearance feature extracted from each pedestrian detected in the video frame. The configured feature extraction unit matches the extracted appearance features of each pedestrian with the target appearance features of the target pedestrian, and the pedestrian is configured to identify the target pedestrian from the video frame based on the matching result. A pedestrian re-identification device is provided that includes an identification unit and the target pedestrian's target appearance features are generated based on the target pedestrian's initial appearance features and recent appearance features.

本開示の他の態様によれば、プロセッサと、コンピュータプログラム命令が記憶されたメモリとを備え、前記コンピュータプログラム命令が前記プロセッサによって実行されるとき、前記プロセッサに、ビデオシーケンスの各ビデオフレームに歩行者を検出するステップと、ビデオフレームから検出された各歩行者について、外観特徴を抽出するステップと、抽出された各歩行者の外観特徴を目標歩行者の目標外観特徴とマッチングし、マッチング結果に基づいてビデオフレームから目標歩行者を識別するステップと、を実行させ、前記目標歩行者の目標外観特徴は、前記目標歩行者の初期外観特徴と最近外観特徴に基づいて生成される歩行者再識別デバイスが提供される。 According to another aspect of the present disclosure, it comprises a processor and a memory in which computer program instructions are stored, and when the computer program instructions are executed by the processor, the processor walks to each video frame of the video sequence. A step of detecting a person, a step of extracting appearance features for each pedestrian detected from a video frame, and a step of matching the extracted appearance features of each pedestrian with the target appearance features of the target pedestrian, and the matching result is obtained. A step of identifying a target pedestrian from a video frame based on is performed, and the target pedestrian target appearance feature is a pedestrian reidentification generated based on the target pedestrian's initial appearance feature and recent appearance feature. The device is provided.

本開示の他の態様によれば、ビデオシーケンスの各ビデオフレームに歩行者を検出するステップと、ビデオフレームから検出された各歩行者について、外観特徴を抽出するステップと、抽出された各歩行者の外観特徴を目標歩行者の目標外観特徴とマッチングし、マッチング結果に基づいてビデオフレームから目標歩行者を識別するステップと、を含み、前記目標歩行者の目標外観特徴は、前記目標歩行者の初期外観特徴と最近外観特徴に基づいて生成される歩行者再識別方法が提供される。 According to another aspect of the present disclosure, a step of detecting a pedestrian in each video frame of the video sequence, a step of extracting appearance features for each pedestrian detected from the video frame, and each extracted pedestrian. The target pedestrian's target appearance feature is the target pedestrian's target appearance feature, including a step of matching the appearance feature of the target pedestrian with the target pedestrian's target appearance feature and identifying the target pedestrian from the video frame based on the matching result. Provided are pedestrian re-identification methods generated based on initial appearance features and recent appearance features.

本開示の他の態様によれば、コンピュータプログラム命令が記憶され、前記コンピュータプログラム命令がプロセッサによって実行されるとき、上記した歩行者再識別方法を実行するコンピュータで読み取り可能な記憶媒体が提供される。 According to another aspect of the present disclosure, a computer-readable storage medium is provided in which a computer program instruction is stored and when the computer program instruction is executed by a processor, the computer performing the pedestrian reidentification method described above is performed. ..

本開示の上記のデバイス、方法及びコンピュータで読み取り可能な記憶媒体は、歩行者の姿勢が大きく変化し又は異なるカメラにより識別する場合、歩行者に対する再識別の正確度を向上させる。 The above-mentioned devices, methods and computer-readable storage media of the present disclosure improve the accuracy of re-identification for a pedestrian when the pedestrian's posture changes significantly or is identified by a different camera.

本開示のこれら及び/又は他の態様及び利点は、添付の図面と併せて本開示の実施例の以下の詳細な説明からより明らかとなり、より容易に理解されるであろう。
目標歩行者の初期外観特徴に基づいて歩行者を再識別する場合の誤識別の一例を示す。目標歩行者の最近外観特徴に基づいて歩行者を再識別する場合の誤識別の一例を示す。本開示の実施例に係る歩行者再識別デバイスの構成例のブロック図を示す。本開示の実施例に係る歩行者再識別デバイスにおいて目標歩行者の目標外観特徴に基づいて歩行者を再識別する模式図を示す。本開示の実施例に係る歩行者再識別デバイスを実現するためのコンピューティング装置の一例のハードウェアブロック図を示す。本開示の実施例に係る歩行者再識別方法の模式的なフローチャートを示す。従来及び本開示の実施例に係る歩行者再識別方法の検証結果の模式図を示す。 These and / or other aspects and advantages of the present disclosure, together with the accompanying drawings, will become clearer and easier to understand from the following detailed description of the embodiments of the present disclosure.
An example of misidentification when reidentifying a pedestrian based on the initial appearance characteristics of the target pedestrian is shown. An example of misidentification when reidentifying a pedestrian based on the recent appearance characteristics of the target pedestrian is shown. A block diagram of a configuration example of a pedestrian re-identification device according to an embodiment of the present disclosure is shown. The schematic diagram which re-identifies a pedestrian based on the target appearance feature of the target pedestrian in the pedestrian re-identification device which concerns on embodiment of this disclosure is shown. A hardware block diagram of an example of a computing device for realizing the pedestrian re-identification device according to the embodiment of the present disclosure is shown. A schematic flowchart of the pedestrian re-identification method according to the embodiment of the present disclosure is shown. The schematic diagram of the verification result of the pedestrian re-identification method which concerns on the prior art and the Example of this disclosure is shown.

本開示を当業者にさらによく理解させるために、本開示について、添付の図面および詳細な実施形態を参照して、以下にさらに詳細に説明する。 In order for those skilled in the art to better understand the disclosure, the disclosure will be described in more detail below with reference to the accompanying drawings and detailed embodiments.

本開示の実施例における技術案について、本開示の実施例における添付図面を参照して、以下に明確に、完全に説明し、説明する実施例は、本開示の一部の実施例に過ぎず、全ての実施例ではないことが明らかである。本開示の実施例に基づいて、当業者が創造的な労働をすることなく得られる全ての他の実施例は、本開示の保護範囲に属する。 The technical proposals in the examples of the present disclosure are clearly and completely described below with reference to the accompanying drawings in the examples of the present disclosure, and the examples described are only a part of the examples of the present disclosure. , It is clear that not all examples. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative labor belong to the scope of protection of the present disclosure.

以上のように、既存の歩行者の再識別は、画像シーケンス又はビデオシーケンスから検出された各歩行者の特徴を、予め構築された目標歩行者ライブラリ内の既知の目標歩行者の特徴とマッチングして、検出された歩行者が既知の目標歩行者に該当するか否かを判断することが一般的であるが、歩行者の姿勢変化などの様々な理由から、従来の歩行者再識別方法では、歩行者を誤識別しやすくなる。特に、特徴マッチングを行うための目標歩行者ライブラリの特徴が単一である場合、同一歩行者の姿勢や外観が大きく変化すれば、他の歩行者と誤識別してしまう可能性があり、さらに、誤識別が継続される可能性がある。以下には、歩行者を再識別する際に生じ得る歩行者の誤識別例について、図１及び図２を参照して説明する。 As described above, the re-identification of existing pedestrians matches the characteristics of each pedestrian detected from the image sequence or video sequence with the characteristics of known target pedestrians in the pre-constructed target pedestrian library. Therefore, it is common to determine whether or not the detected pedestrian corresponds to a known target pedestrian, but for various reasons such as changes in the pedestrian's posture, the conventional pedestrian re-identification method is used. , It becomes easy to misidentify pedestrians. In particular, when the target pedestrian library for feature matching has a single feature, if the posture or appearance of the same pedestrian changes significantly, it may be misidentified as another pedestrian, and further. , Misidentification may continue. An example of pedestrian misidentification that may occur when reidentifying a pedestrian will be described below with reference to FIGS. 1 and 2.

図１は、目標歩行者の初期外観特徴に基づいて特徴マッチングを行った場合に生じる歩行者の誤識別の一例を示す。この場合、目標歩行者ライブラリの目標歩行者について、初回出現時の特徴が、目標歩行者ＩＤの初期外観特徴として目標歩行者ライブラリに格納され、以降、この目標歩行者の初期外観特徴とのマッチングを行う。図１に示すように、視野の重なっている、又は重なっていないカメラで同一歩行者を撮影した３枚(連続又は非連続)の画像のうち、１番目の画像においてこの目標歩行者をＩＤ１と識別し、１番目の画像での特徴を、この目標歩行者ＩＤ１の初期外観特徴として以降のマッチングに用いるが、その後、歩行者の姿勢や撮影角度等が大きく変化したために、２番目の画像においてこの目標歩行者をＩＤ２と誤識別し、３番目の画像においてまたこの目標歩行者をＩＤ３と誤識別してしまう。この場合、マッチングに用いる目標歩行者の初期外観特徴が一定であるため、歩行者姿勢や撮像角度が大きく変化した状況によく応対できないことが分かる。 FIG. 1 shows an example of pedestrian misidentification that occurs when feature matching is performed based on the initial appearance features of a target pedestrian. In this case, for the target pedestrian in the target pedestrian library, the feature at the time of the first appearance is stored in the target pedestrian library as the initial appearance feature of the target pedestrian ID, and thereafter, matching with the initial appearance feature of this target pedestrian. I do. As shown in FIG. 1, among three (continuous or discontinuous) images of the same pedestrian taken by cameras with overlapping or non-overlapping visual fields, the target pedestrian is designated as ID1 in the first image. The features in the first image are identified and used for subsequent matching as the initial appearance features of this target pedestrian ID1, but since the pedestrian's posture, shooting angle, etc. have changed significantly since then, in the second image. This target pedestrian is erroneously identified as ID2, and this target pedestrian is erroneously identified as ID3 in the third image. In this case, since the initial appearance characteristics of the target pedestrian used for matching are constant, it can be seen that it is not possible to respond well to a situation in which the pedestrian posture and the imaging angle change significantly.

図２は、目標歩行者の最近外観特徴に基づいて特徴マッチングを行った場合に生じる歩行者の誤識別の一例を示す図である。この場合、目標歩行者について、直近に出現したときの特徴を、目標歩行者ＩＤの最近外観特徴として目標歩行者ライブラリに格納し、目標歩行者の前回出現時の特徴を削除し、以降、この目標歩行者の最近外観特徴とのマッチングを行う。図２に示すように、視野の重なっている、又は重なっていないカメラで同一歩行者を撮影した３枚の画像のうち、１番目の画像においてこの目標歩行者をＩＤ１と識別し、本回出現時の特徴を、目標歩行者ＩＤ１の最近外観特徴としてもよく、その後、歩行者たちが重なっている等の様々な理由で、２番目の画像においてこの目標歩行者をＩＤ２と誤識別し、本回出現時の特徴を目標歩行者ＩＤ２の最近外観特徴として、以前に記憶された外観特徴を置き換え、３番目の画像において目標歩行者ＩＤ１をＩＤ２と識別したという誤った結果を維持することになる。この場合、最近外観特徴によるマッチングでは、歩行者の姿勢や撮像角度などの変化にある程度応対できるが、歩行者の重なりなどの原因で誤識別が発生すると、その誤識別の結果は、訂正せず維持されてしまう傾向がある。 FIG. 2 is a diagram showing an example of pedestrian misidentification that occurs when feature matching is performed based on the recent appearance features of a target pedestrian. In this case, the feature of the target pedestrian when it appeared most recently is stored in the target pedestrian library as the recent appearance feature of the target pedestrian ID, and the feature of the target pedestrian at the time of the previous appearance is deleted. Match the recent appearance features of the target pedestrian. As shown in FIG. 2, of the three images of the same pedestrian photographed by cameras with overlapping or non-overlapping visual fields, the target pedestrian is identified as ID1 in the first image and appears this time. The feature of time may be the recent appearance feature of the target pedestrian ID1, and then, for various reasons such as overlapping pedestrians, this target pedestrian is misidentified as ID2 in the second image, and the book The feature at the time of appearance will be the recent appearance feature of the target pedestrian ID2, replacing the previously stored appearance feature and maintaining the erroneous result of identifying the target pedestrian ID1 as ID2 in the third image. .. In this case, recent matching based on appearance features can respond to changes in the posture and imaging angle of pedestrians to some extent, but if misidentification occurs due to overlapping of pedestrians, the result of the misidentification is not corrected. It tends to be maintained.

かかる事情について、本開示では、目標歩行者の初期外観特徴と最近外観特徴とを総合的に考慮して、目標歩行者の特徴を構築し、これを検出された歩行者の特徴とのマッチングに用いることで、歩行者の姿勢が大きく変化したシーンや異なるカメラにより識別されたシーンにおいても、十分な正確度の識別結果を提供することができる。 Regarding such circumstances, in the present disclosure, the characteristics of the target pedestrian are constructed by comprehensively considering the initial appearance characteristics of the target pedestrian and the recent appearance characteristics, and the characteristics are matched with the detected characteristics of the pedestrian. By using it, it is possible to provide an identification result with sufficient accuracy even in a scene in which the posture of a pedestrian changes significantly or a scene in which the pedestrian is identified by a different camera.

以下、図３を参照し、本開示の実施例に係る歩行者再識別デバイスを説明する。図３は本開示の実施例に係る歩行者再識別デバイス１００の構成例のブロック図を示す。図３に示すように、歩行者再識別デバイス１００は、歩行者検出ユニット１１０と、特徴抽出ユニット１２０と、歩行者識別ユニット１３０とを含んでもよい。以下、当該歩行者再識別デバイス１００の各ユニットの主な機能を説明する。 Hereinafter, the pedestrian re-identification device according to the embodiment of the present disclosure will be described with reference to FIG. FIG. 3 shows a block diagram of a configuration example of the pedestrian re-identification device 100 according to the embodiment of the present disclosure. As shown in FIG. 3, the pedestrian re-identification device 100 may include a pedestrian detection unit 110, a feature extraction unit 120, and a pedestrian identification unit 130. Hereinafter, the main functions of each unit of the pedestrian re-identification device 100 will be described.

歩行者検出ユニット１１０は、ビデオシーケンスの各ビデオフレームから歩行者の検出を行うことができる。以上のように、本開示に係る歩行者再識別技術では、通常の歩行者追跡とは異なり、異なるカメラにより撮影された画像シーケンス又はビデオシーケンスから特定の目標歩行者を識別し、特定の目標歩行者に対する長期的な追跡及び監視を可能にする。より具体的には、当該ビデオシーケンスは、目標歩行者の存在の有無が識別されることを必要とする解析すべきビデオシーケンスであり、当該ビデオシーケンスと、以前に目標歩行者が撮影されそこから初期外観特徴又は最近外観特徴が抽出されたビデオフレームとは、異なるカメラによって撮影されてもよく、又は異なる時刻に同一カメラによって撮影されてもよい。 The pedestrian detection unit 110 can detect a pedestrian from each video frame of the video sequence. As described above, in the pedestrian re-identification technique according to the present disclosure, unlike ordinary pedestrian tracking, a specific target pedestrian is identified from an image sequence or a video sequence taken by a different camera, and a specific target pedestrian walks. Allows long-term tracking and monitoring of individuals. More specifically, the video sequence is a video sequence to be analyzed that requires the presence or absence of a target pedestrian to be identified, from which the video sequence and the previously captured target pedestrian were captured. The video frame from which the initial appearance feature or the latest appearance feature has been extracted may be shot by a different camera, or may be shot by the same camera at different times.

具体的に、歩行者検出ユニット１１０によって解析されるビデオシーケンスは、１つ又は複数のビデオフレームを含んでもよく、各ビデオフレームは、単一のカメラや視野の重なっている又は重なっていない複数のカメラによって撮影された連続的又は非連続的なビデオフレームであってもよい。歩行者検出ユニット１１０は、当該技術分野で任意の適切な画像検出技術を用いて、ビデオシーケンスの各ビデオフレームから歩行者を検出することができ、本開示では、これが限定されない。例えば、歩行者検出ユニット１１０は、ビデオフレーム毎に、前景分割（Foreground Segmentation）、エッジ抽出、動き検出等の処理を行い、このビデオフレームに出現する各歩行者に対応する各サブ画像領域を特定してもよく、例えば、このビデオフレームのサブ画像領域を、検出された歩行者の身体の輪郭に外接する矩形枠で表してもよい。また、例えば、歩行者検出ユニット１１０は、ニューラルネットワークやサポートベクターマシン等の機械学習手法を用いて、予め訓練された歩行者検出分類器により各ビデオフレームに対して歩行者の検出を行うことにより、当該ビデオフレームに出現する各歩行者の位置を特定してもよい。 Specifically, the video sequence analyzed by the pedestrian detection unit 110 may include one or more video frames, each video frame being a single camera or multiple overlapping or non-overlapping visual fields. It may be a continuous or discontinuous video frame taken by the camera. The pedestrian detection unit 110 can detect a pedestrian from each video frame of a video sequence using any suitable image detection technique in the art, which is not limited herein. For example, the pedestrian detection unit 110 performs processing such as foreground segmentation, edge extraction, and motion detection for each video frame, and identifies each sub-image area corresponding to each pedestrian appearing in this video frame. For example, the sub-image area of this video frame may be represented by a rectangular frame circumscribing the detected contour of the pedestrian's body. Further, for example, the pedestrian detection unit 110 detects a pedestrian for each video frame by a pre-trained pedestrian detection classifier using a machine learning method such as a neural network or a support vector machine. , The position of each pedestrian appearing in the video frame may be specified.

特徴抽出ユニット１２０は、ビデオフレームから検出された各歩行者について、外観特徴を抽出してもよい。本開示において、外観特徴は、色特徴、テクスチャ特徴、形状特徴、及び顔特徴など、歩行者の外観形態を反映する特徴を含んでもよく、本開示では、これが限定されない。上記の各種特徴の抽出方法は、当該技術分野における周知のものであり、ここでは説明を省略する。 The feature extraction unit 120 may extract appearance features for each pedestrian detected from the video frame. In the present disclosure, the appearance features may include features that reflect the appearance morphology of the pedestrian, such as color features, texture features, shape features, and face features, and the present disclosure does not limit this. The above methods for extracting various features are well known in the art, and description thereof will be omitted here.

歩行者識別ユニット１３０は、検出された各歩行者の外観特徴と目標歩行者の目標外観特徴とをマッチングし、マッチング結果に基づいてビデオフレームから目標歩行者を識別してもよい。具体的には、歩行者識別ユニット１３０は、検出された各歩行者の外観特徴と目標歩行者の外観特徴とをマッチングし、両者の類似度に基づいてビデオフレームから検出された歩行者が当該目標歩行者に該当するか否かを決定して歩行者の再識別を行ってもよい。選択的に、目標歩行者の数は複数であってもよく、これに対応して、歩行者識別ユニット１３０は、検出された各歩行者の外観特徴を、構築された各目標歩行者の外観特徴とそれぞれマッチングし、両者の類似度に基づいて、ビデオフレームから検出された歩行者が各目標歩行者のいずれに該当するか否かを決定して、歩行者の再識別を行うようにしてもよい。目標歩行者の目標外観特徴は、予め構築された歩行者外観特徴ライブラリに含まれ、この歩行者外観特徴ライブラリは、この歩行者再識別デバイスのローカルに、又は、歩行者再識別デバイスがアクセス可能なサーバに格納され得ることが理解できる。これに代わって、目標歩行者の目標外観特徴は、歩行者外観特徴ライブラリに予め記憶せず、必要に応じて写真から生成されて外部から入力されるものであってもよい。本開示では、目標歩行者の目標外観特徴の記憶又は取得の方式が限定されていない。 The pedestrian identification unit 130 may match the detected appearance features of each pedestrian with the target appearance features of the target pedestrian, and identify the target pedestrian from the video frame based on the matching result. Specifically, the pedestrian identification unit 130 matches the detected appearance characteristics of each pedestrian with the appearance characteristics of the target pedestrian, and the pedestrian detected from the video frame based on the similarity between the two is the pedestrian. The pedestrian may be re-identified by determining whether or not it corresponds to the target pedestrian. Optionally, the number of target pedestrians may be plural, and correspondingly, the pedestrian identification unit 130 sets the appearance characteristics of each detected pedestrian to the appearance of each target pedestrian constructed. Pedestrians are re-identified by matching each feature and determining which of the target pedestrians the pedestrian detected from the video frame corresponds to based on the similarity between the two. May be good. The target pedestrian's target appearance features are included in a pre-built pedestrian appearance feature library, which is accessible locally or by the pedestrian reidentification device. It can be understood that it can be stored in various servers. Alternatively, the target appearance feature of the target pedestrian may be generated from a photograph and input from the outside, if necessary, without being stored in the pedestrian appearance feature library in advance. In the present disclosure, the method of storing or acquiring the target appearance feature of the target pedestrian is not limited.

前述したように、目標歩行者の初期外観特徴のみに基づいて識別を行う場合には、歩行者の姿勢や撮像角度が変化した状況によく対応できず、目標歩行者の最近外観特徴のみに基づいて識別を行う場合には、誤識別が発生すると、修正が困難となる。本開示では、目標歩行者の初期外観特徴と最近外観特徴の両方に基づいて、この目標歩行者の目標外観特徴を生成することにより、生成された特徴は、目標歩行者の固有の外観特性と、その外観及び姿勢の変化を十分に反映することができ、歩行者の再識別の正確性を確保することができる。前記初期外観特徴は、この目標歩行者を含む初期ビデオフレームから予め抽出された外観特徴を含み、前記最近外観特徴は、この目標歩行者を含む最近ビデオフレームから予め抽出された外観特徴を含んでもよい。本開示では、目標歩行者の初期外観特徴と最近外観特徴とを融合して、当該目標歩行者の目標外観特徴を生成し、検出した歩行者特徴をそれとマッチングさせてもよい。以下、歩行者識別ユニット１３０が特徴マッチングを行う際に基にする目標歩行者の目標外観特徴について説明する。 As described above, when identification is performed based only on the initial appearance characteristics of the target pedestrian, it is not possible to respond well to the situation where the posture and imaging angle of the pedestrian have changed, and only the recent appearance characteristics of the target pedestrian are used. If misidentification occurs, it becomes difficult to correct the identification. In the present disclosure, by generating this target pedestrian target appearance feature based on both the target pedestrian's initial appearance feature and the recent appearance feature, the generated feature is the unique appearance characteristic of the target pedestrian. , The change in appearance and posture can be sufficiently reflected, and the accuracy of pedestrian re-identification can be ensured. The initial appearance features include appearance features pre-extracted from an initial video frame containing the target pedestrian, and the recent appearance features may include appearance features pre-extracted from a recent video frame containing the target pedestrian. Good. In the present disclosure, the initial appearance feature of the target pedestrian and the recent appearance feature may be fused to generate the target appearance feature of the target pedestrian, and the detected pedestrian feature may be matched with the target appearance feature. Hereinafter, the target appearance features of the target pedestrian, which the pedestrian identification unit 130 is based on when performing feature matching, will be described.

本開示の一実施例によれば、予め設定された重みに基づいて、目標歩行者の初期外観特徴と最近外観特徴とを重み付け融合して、この目標歩行者の目標外観特徴とすることができる。本開示の実施例では、様々な方法で、予め設定された重みに基づいて、目標歩行者の初期外観特徴と最近外観特徴とを重み付けして融合することができるが、説明の完全性のために、２つの例示的な方法を以下で説明する。 According to one embodiment of the present disclosure, the initial appearance feature of the target pedestrian and the recent appearance feature can be weighted and fused to obtain the target appearance feature of the target pedestrian based on a preset weight. .. In the embodiments of the present disclosure, the initial appearance features of the target pedestrian and the recent appearance features can be weighted and fused in various ways based on preset weights, but for completeness of description. Two exemplary methods will be described below.

一例では、まず、予め設定された重みに基づいて、重み付け融合に用いられる目標歩行者の初期ビデオフレームの数及び最近ビデオフレームの数を決定してもよい。具体的には、目標歩行者の初期外観特徴と最近外観特徴の両方を、重み付け融合の際に占める比重を考慮事項として、この予め設定された重みによって、当該目標歩行者の初期ビデオフレームの数に対する最近ビデオフレームの相対数を決定してもよい。この予め設定された重みは、様々な方法で得ることができ、例えば経験値を用いても機械学習によって得てもよく、本開示では、これが限定されない。そして、決定された数の初期ビデオフレーム及び決定された数の最近ビデオフレームにおけるこの目標歩行者の各外観特徴の加重平均値を計算することができる。具体的には、上記した色特徴、テクスチャ特徴、形状特徴、顔特徴などといった特徴の抽出手法を用いて、決定された数のビデオフレームのうち、目標歩行者が含まれるサブ画像領域からその外観特徴を抽出し、抽出した各外観特徴の平均値をこの目標歩行者の目標外観特徴として算出するようにしてもよい。一例として、初期外観特徴と最近外観特徴の重みが０.３:０.７である場合、重み付け融合に必要な初期ビデオフレームの数と最近ビデオフレームの数をそれぞれ３フレームと７フレームであると決定し、さらに、合計１０枚のビデオフレームにおけるこの目標歩行者の外観特徴の平均値を目標歩行者の目標外観特徴としてもよい。 In one example, first, the number of initial video frames and the number of recent video frames of the target pedestrian used for weighting fusion may be determined based on preset weights. Specifically, the number of initial video frames of the target pedestrian is determined by this preset weight, taking into account the specific gravity of both the initial appearance feature and the recent appearance feature of the target pedestrian at the time of weighting fusion. You may determine the relative number of recent video frames to. This preset weight can be obtained by various methods, for example, using empirical values or by machine learning, and this is not limited in the present disclosure. Then, the weighted average value of each appearance feature of this target pedestrian in the determined number of initial video frames and the determined number of recent video frames can be calculated. Specifically, using the above-mentioned feature extraction method such as color feature, texture feature, shape feature, face feature, etc., the appearance of the sub-image area including the target pedestrian out of the determined number of video frames. The features may be extracted and the average value of each extracted appearance feature may be calculated as the target appearance feature of the target pedestrian. As an example, if the weights of the initial appearance feature and the recent appearance feature are 0.3: 0.7, then the number of initial video frames and the number of recent video frames required for weighted fusion are 3 and 7 frames, respectively. Further, the average value of the appearance features of the target pedestrian in a total of 10 video frames may be used as the target appearance features of the target pedestrian.

別の例では、まず、所定数の当該目標歩行者の初期ビデオフレーム及び最近ビデオフレームを取得してもよく、例えば、当該目標歩行者の５フレームの初期ビデオフレーム及び５フレームの最近ビデオフレームを取得してもよい。そして、重み付け融合の際に、初期ビデオフレームと最近ビデオフレームから抽出された外観特徴に異なる重みを付与し、例えば、初期外観特徴と最近外観特徴の重みが０.８:０.２である場合、５フレームの初期ビデオフレームにおける目標歩行者の初期外観特徴の平均値の０.８倍と、５フレームの最近ビデオフレームにおける目標歩行者の最近外観特徴の平均値の０.２倍との和を、当該目標歩行者の目標外観特徴として算出してもよい。 In another example, first, a predetermined number of initial video frames and recent video frames of the target pedestrian may be acquired, for example, 5 initial video frames and 5 recent video frames of the target pedestrian. You may get it. Then, at the time of weighting fusion, different weights are given to the initial video frame and the appearance features extracted from the recently video frames. For example, when the weights of the initial appearance features and the recent appearance features are 0.8: 0.2. The sum of 0.8 times the average value of the initial appearance feature of the target pedestrian in the initial video frame of 5 frames and 0.2 times the average value of the recent appearance feature of the target pedestrian in the recent video frame of 5 frames. May be calculated as the target appearance feature of the target pedestrian.

なお、上記の実施例における重み及びビデオフレーム数の例は、いずれも例示的なものであり、本開示では、予め設定された重みに基づいて、実際の演算速度及び正確度要求に応じて、任意の数の初期ビデオフレームにおける当該目標歩行者の初期外観特徴と、任意の数の最近ビデオフレームにおける最近外観特徴とを、重み付け融合してもよい。 Note that the weights and the number of video frames in the above examples are both exemplary, and in the present disclosure, based on preset weights, depending on the actual calculation speed and accuracy requirements, The initial appearance features of the target pedestrian in any number of initial video frames may be weighted and fused with the recent appearance features in any number of recent video frames.

本開示の他の実施例によれば、この目標歩行者の初期外観特徴と最近外観特徴とをカスケード接続し、カスケード接続した外観特徴に対して畳み込み操作を行うことにより、当該初期外観特徴と最近外観特徴とをカスケード接続し融合してこの目標歩行者の目標外観特徴とすることができる。目標歩行者の初期外観特徴と最近外観特徴とをカスケード接続して融合するのは、様々な方法を用いることができるが、完全性を説明するために、以下に１つの例示的な方法を説明する。まず、初期外観特徴と最近外観特徴とをチャネル次元でカスケード接続する。そして、カスケード接続した特徴に対して畳み込み操作、例えば１×１畳み込み層を適用して、カスケード接続した特徴を融合するとともに、カスケード接続した特徴の次元を少なくともカスケード接続した前の次元と同一までに圧縮し、これにより、次元を縮小することによって計算量を低減しつつ、各チャネルの特徴を融合することができる。 According to another embodiment of the present disclosure, the initial appearance feature and the recent appearance feature of the target pedestrian are cascaded, and the initial appearance feature and the recent appearance feature are convoluted by performing a convolution operation on the cascaded appearance feature. The appearance features can be cascaded and fused to form the target appearance features of this target pedestrian. Various methods can be used to cascade and fuse the initial appearance features of the target pedestrian with the recent appearance features, but to illustrate the integrity, one exemplary method is described below. To do. First, the initial appearance feature and the recent appearance feature are cascaded in the channel dimension. Then, a convolution operation, for example, a 1 × 1 convolution layer, is applied to the cascaded features to fuse the cascaded features and to make the dimensions of the cascaded features at least the same as the dimensions before the cascaded features. It can be compressed, which allows the features of each channel to be fused while reducing the amount of computation by reducing the dimensions.

以上、本開示の実施例に係る歩行者識別ユニット１３０による歩行者の再識別の際に、基にする目標歩行者の目標外観特徴の一例について説明したが、以下、この歩行者識別ユニット１３０による歩行者識別動作について具体的に説明する。以上のように、歩行者識別ユニット１３０は、検出された各歩行者の外観特徴と目標歩行者の目標外観特徴とをマッチングし、両者の類似度に基づいてビデオフレームから目標歩行者を識別してもよい。具体的には、歩行者識別ユニット１３０は、検出された各歩行者の外観特徴と目標歩行者の目標外観特徴との間の特徴距離、例えば、マンハッタン距離（Manhattan distance）やユークリッド距離（Euclidean distance）、バッタチャリヤ距離（Bhattacharyya distance）などを算出し、予め設定された閾値と比較することにより、検出された歩行者が当該目標歩行者に該当するか否かを決定してもよい。選択的に、目標歩行者の数は複数であってもよく、この場合、歩行者識別ユニット１３０は、検出された歩行者について、各目標歩行者との夫々の特徴距離を算出し、夫々の特徴距離と予め設定された閾値とを比較し、当該予め設定された閾値よりも低い特徴距離の中から最も近い特徴距離を特定し、当該検出された歩行者を最も近い特徴距離に対応する目標歩行者として識別してもよい。歩行者識別ユニット１３０は、マッチング結果に基づいて歩行者の再識別結果を出力してもよく、例えば、検出された歩行者がある目標歩行者に該当する場合、検出された歩行者と当該目標歩行者のアイデンティティーとを関連付け、検出された歩行者と目標歩行者とがマッチングできない場合、検出された歩行者に新たなアイデンティティーを割り当てる。 The example of the target appearance feature of the target pedestrian based on the re-identification of the pedestrian by the pedestrian identification unit 130 according to the embodiment of the present disclosure has been described above. The pedestrian identification operation will be specifically described. As described above, the pedestrian identification unit 130 matches the detected appearance features of each pedestrian with the target appearance features of the target pedestrian, and identifies the target pedestrian from the video frame based on the similarity between the two. You may. Specifically, the pedestrian identification unit 130 is a feature distance between the detected appearance feature of each pedestrian and the target appearance feature of the target pedestrian, for example, Manhattan distance or Euclidean distance. ), Bhattacharyya distance, etc. may be calculated and compared with a preset threshold value to determine whether or not the detected pedestrian corresponds to the target pedestrian. Optionally, the number of target pedestrians may be plural, in which case the pedestrian identification unit 130 calculates the characteristic distances of each of the detected pedestrians with each target pedestrian, and each of the detected pedestrians. A target that compares the feature distance with a preset threshold, identifies the closest feature distance from the feature distances lower than the preset threshold, and assigns the detected pedestrian to the closest feature distance. It may be identified as a pedestrian. The pedestrian identification unit 130 may output a pedestrian re-identification result based on the matching result. For example, when the detected pedestrian corresponds to a target pedestrian, the detected pedestrian and the target. It associates with the pedestrian's identity and assigns a new identity to the detected pedestrian if the detected pedestrian cannot match the target pedestrian.

以下、図４を参照して、本開示の実施例に係る歩行者識別ユニット１３０が目標歩行者の目標外観特徴に基づいて歩行者を再識別する例について説明する。図４に示すように、左側にある１番目の画像が目標歩行者ＩＤ１の初期ビデオフレームであり、中央にある２番目の画像がこの目標歩行者ＩＤ１の最近ビデオフレームであり、初期ビデオフレーム及び最近ビデオフレームからこの目標歩行者ＩＤ１の初期外観特徴及び最近外観特徴をそれぞれ抽出し、その初期外観特徴と最近外観特徴とを融合してこの目標歩行者ＩＤ１の目標外観特徴とし、当該目標外観特徴は、この目標歩行者の最新の姿勢及び外観変化の追従に用いられるだけでなく、歩行者の初期外観特性を保持するようにしてもよい。その後、図４の右側にある３番目の画像に示すように、当該歩行者の姿勢や外観が大きく変化した場合であっても、当該目標外観特徴に基づいて、当該歩行者をＩＤ１として正確度良く識別することができる。 Hereinafter, an example in which the pedestrian identification unit 130 according to the embodiment of the present disclosure reidentifies a pedestrian based on the target appearance feature of the target pedestrian will be described with reference to FIG. As shown in FIG. 4, the first image on the left side is the initial video frame of the target pedestrian ID1, and the second image in the center is the recent video frame of the target pedestrian ID1, the initial video frame and The initial appearance feature and the recent appearance feature of the target pedestrian ID1 are extracted from the recent video frame, and the initial appearance feature and the recent appearance feature are fused to obtain the target appearance feature of the target pedestrian ID1. Is used not only to track the latest posture and appearance changes of this target pedestrian, but may also retain the pedestrian's initial appearance characteristics. After that, as shown in the third image on the right side of FIG. 4, even when the posture and appearance of the pedestrian change significantly, the accuracy of the pedestrian is set as ID1 based on the target appearance feature. Can be identified well.

選択的に、当該歩行者再識別デバイス１００は、特徴更新ユニット(図示せず)をさらに含んでもよい。当該特徴更新ユニットは、歩行者識別ユニット１３０による識別結果に基づいて、目標歩行者の目標外観特徴を更新してもよい。例えば、歩行者識別ユニット１３０がビデオフレームから検出された歩行者がある目標歩行者に該当すると決定し、検出された歩行者と当該目標歩行者のアイデンティティーとを関連付ける場合、当該特徴更新ユニットは、ビデオフレームにおける検出された歩行者の外観特徴を当該目標歩行者の最も新しい最近外観特徴とし、当該目標歩行者の初期外観特徴と当該最も新しい最近外観特徴とを融合して、当該目標歩行者の新たな目標外観特徴を生成し、又は、当該最も新しい最近外観特徴を目標歩行者の目標外観特徴を維持するサーバに送信することにより、サーバが受信した最も新しい最近外観特徴に基づいて目標歩行者の目標外観特徴を更新するようにすることで、更新した特徴が目標歩行者の最近の姿勢及び外観の変化にリアルタイムに追従できるようにしてもよい。また、歩行者識別ユニット１３０は、ビデオフレームから検出された歩行者が目標歩行者とマッチングできないと決定し、検出された歩行者に新たなアイデンティティーを割り当てる場合、特徴更新ユニットは、この歩行者を新たな目標歩行者とし、以降の特徴マッチングのためにその目標外観特徴を構築してもよい。 Optionally, the pedestrian re-identification device 100 may further include a feature update unit (not shown). The feature updating unit may update the target appearance feature of the target pedestrian based on the identification result by the pedestrian identification unit 130. For example, if the pedestrian identification unit 130 determines that a pedestrian detected from a video frame corresponds to a target pedestrian and associates the detected pedestrian with the identity of the target pedestrian, the feature update unit , The detected pedestrian appearance feature in the video frame is taken as the newest recent appearance feature of the target pedestrian, and the initial appearance feature of the target pedestrian and the newest recent appearance feature are fused to form the target pedestrian. By generating a new target appearance feature or sending the newest recent appearance feature to a server that maintains the target pedestrian's target appearance feature, the target walk is based on the latest recent appearance feature received by the server. By updating the target appearance feature of the person, the updated feature may be able to follow the recent changes in the posture and appearance of the target pedestrian in real time. Further, when the pedestrian identification unit 130 determines that the pedestrian detected from the video frame cannot be matched with the target pedestrian and assigns a new identity to the detected pedestrian, the feature update unit is the pedestrian. May be set as a new target pedestrian, and the target appearance feature may be constructed for subsequent feature matching.

以上説明したように、本開示の実施例に係る歩行者再識別デバイス１００は、目標歩行者の初期外観特徴及び最近外観特徴を用いて目標歩行者を識別することによって、目標歩行者の最近の姿勢及び外観の変化を効果的に追従し、目標歩行者の初期特徴に基づいてエラー訂正を効果的に行うことができ、歩行者の姿勢が大きく変化したシーン又は異なるカメラにより識別されたシーンにおいて歩行者の再識別の正確度を向上させることができる。 As described above, the pedestrian re-identification device 100 according to the embodiment of the present disclosure identifies the target pedestrian using the initial appearance feature and the recent appearance feature of the target pedestrian, thereby identifying the target pedestrian recently. It can effectively follow changes in posture and appearance, effectively correct errors based on the initial characteristics of the target pedestrian, and in scenes where the pedestrian's posture has changed significantly or identified by different cameras. The accuracy of pedestrian re-identification can be improved.

なお、上記実施の形態の説明に用いたブロック図は、機能単位のブロックを示している。これらの機能ブロック（構成部）は、ハードウェア及び／又はソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的及び／又は論理的に結合した１つの装置により実現されてもよいし、物理的及び／又は論理的に分離した２つ以上の装置を直接的及び／又は間接的に(例えば、有線及び／又は無線)で接続し、これら複数の装置により実現されてもよい。 The block diagram used in the description of the above embodiment shows a block of functional units. These functional blocks (components) are realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one physically and / or logically coupled device, or directly and / or indirectly by two or more physically and / or logically separated devices. (For example, wired and / or wireless) may be connected and realized by these a plurality of devices.

図５は、本開示の実施例に係る歩行者再識別デバイスのコンピュータ装置の一例を示すハードウェアのフロック図である。上記の歩行者再識別デバイス１００は、物理的には、プロセッサ５０１、メモリ５０２、ストレージ５０３、通信装置５０４、入力装置５０５、出力装置５０６、バス５０７などを含むコンピュータ装置として構成されてもよい。 FIG. 5 is a hardware flock diagram showing an example of a computer device of the pedestrian re-identification device according to the embodiment of the present disclosure. The pedestrian re-identification device 100 may be physically configured as a computer device including a processor 501, a memory 502, a storage 503, a communication device 504, an input device 505, an output device 506, a bus 507, and the like.

なお、以下の説明では、「装置」という文言は、回路、デバイス、ユニットなどに読み替えることができる。歩行者再識別デバイス１００のハードウェア構成は、図に示した各装置を１つ又は複数含むように構成されてもよいし、一部の装置を含まずに構成されてもよい。 In the following description, the word "device" can be read as a circuit, a device, a unit, or the like. The hardware configuration of the pedestrian re-identification device 100 may be configured to include one or more of the devices shown in the figure, or may be configured not to include some of the devices.

例えば、プロセッサ５０１は、１つのみ図示しているが、複数のプロセッサであってもよい。また、１つのプロセッサにより処理が実行されてもよいし、１つ以上のプロセッサにより同時、順次、又は他の方法で処理が実行されてもよい。また、プロセッサ５０１は、１以上のチップにより実装されてもよい。 For example, although only one processor 501 is shown, it may be a plurality of processors. Further, the processing may be executed by one processor, or the processing may be executed simultaneously, sequentially, or by another method by one or more processors. Further, the processor 501 may be mounted by one or more chips.

歩行者再識別デバイス１００における各機能は、プロセッサ５０１、メモリ５０２などのハードウェア上に所定のソフトウェア（プログラム）を読み込ませることで、プロセッサ５０１が演算を行い、通信装置５０４による通信や、メモリ５０２及びストレージ５０３におけるデータの読み出し及び／又は書き込みを制御することで実現される。 For each function of the pedestrian re-identification device 100, by loading predetermined software (program) on hardware such as the processor 501 and the memory 502, the processor 501 performs an calculation, and communication by the communication device 504 and the memory 502 And by controlling the read and / or write of data in the storage 503.

プロセッサ５０１は、例えば、オペレーティングシステムを動作させてコンピュータ全体を制御する。プロセッサ５０１は、周辺装置とのインターフェース、制御装置、演算装置、レジスタなどを含む中央処理装置（ＣＰＵ：ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）で構成されてもよい。例えば、上記歩行者検出ユニット１１０、特徴抽出ユニット１２０及び歩行者識別ユニットなどは、プロセッサ５０１で実現されてもよい。 Processor 501, for example, operates an operating system to control the entire computer. The processor 501 may be composed of a central processing unit (CPU: Central Processing Unit) including an interface with peripheral devices, a control device, an arithmetic unit, a register, and the like. For example, the pedestrian detection unit 110, the feature extraction unit 120, the pedestrian identification unit, and the like may be realized by the processor 501.

また、プロセッサ５０１は、プログラム（プログラムコード）、ソフトウェアモジュールやデータを、ストレージ５０３及び／又は通信装置５０４からメモリ５０２に読み出し、これらに従って各種の処理を実行する。プログラムとしては、上記の実施の形態で説明した動作の少なくとも一部をコンピュータに実行させるプログラムが用いられる。 Further, the processor 501 reads a program (program code), a software module, and data from the storage 503 and / or the communication device 504 into the memory 502, and executes various processes according to these. As the program, a program that causes a computer to execute at least a part of the operations described in the above-described embodiment is used.

メモリ５０２は、コンピュータ読み取り可能な記録媒体であり、例えば、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）、ＥＰＲＯＭ（ＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）、ＥＥＰＲＯＭ（ＥｌｅｃｔｒｉｃａｌｌｙＥｒａｓａｂｌｅＰｒｏｇｒａｍｍａｂｌｅＲＯＭ）、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）、他の適切な記憶媒体の少なくとも１つで構成されてもよい。メモリ５０２は、レジスタ、キャッシュ、メインメモリ（主記憶装置）などと呼ばれてもよい。メモリ５０２は、本開示の一実施の形態に係る歩行者再識別方法を実施するために実行可能なプログラム（プログラムコード）、ソフトウェアモジュールなどを保存することができる。 The memory 502 is a computer-readable recording medium, and is, for example, a ROM (Read Only Memory), an EPROM (Erasable Program ROM), an EPROM (Electrically Erasable Program ROM), a RAM (Random Access Memory), or a RAM (Random Access Memory). It may be composed of at least one of. The memory 502 may be called a register, a cache, a main memory (main storage device), or the like. The memory 502 can store a program (program code), a software module, or the like that can be executed to carry out the pedestrian re-identification method according to the embodiment of the present disclosure.

ストレージ５０３は、コンピュータ読み取り可能な記録媒体であり、例えば、フレキシブルディスク(ｆｌｅｘｉｂｌｅｄｉｓｋ)、フロッピー(登録商標)ディスク(ｆｌｏｐｐｙｄｉｓｋ)、光磁気ディスク(例えば、ＣＤ-ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲＯＭ)等)、デジタル多用途ディスク、Ｂｌｕ-ｒａｙ (登録商標)ディスク)、リムーバブルディスク、ハードディスクドライブ、スマートカード、フラッシュメモリデバイス(例えば、カード、スティック、キードライバー(ｋｅｙｄｒｉｖｅｒ))、磁気ストリップ、データベース、サーバ、その他の適切な記憶媒体の少なくとも１つで構成されてよい。ストレージ５０３は、補助記憶装置と呼ばれてもよい。 The storage 503 is a computer-readable recording medium, such as a flexible disk, a floppy disk, an optical magnetic disk (for example, a CD-ROM (Compact Disk ROM), etc.), and the like. Digital versatile disks, Blu-ray® disks, removable disks, optical disc drives, smart cards, flash memory devices (eg cards, sticks, key drivers), magnetic strips, databases, servers, etc. It may be composed of at least one of suitable storage media. The storage 503 may be referred to as an auxiliary storage device.

通信装置５０４は、有線及び/又は無線のネットワークを介してコンピュータ間の通信を行うためのハードウェア(送受信デバイス)であり、例えば、ネットワークデバイス、ネットワークコントローラ、ネットワークカード、通信モジュール等ともいう。通信装置５０４は、例えば、周波数分割複信(ＦＤＤ，ＦｒｅｑｕｅｎｃｙＤｉｖｉｓｉｏｎＤｕｐｌｅｘ)及び/又は時分割複信(ＴＤＤ，ＴｉｍｅＤｉｖｉｓｉｏｎＤｕｐｌｅｘ)を実現するために、高周波スイッチ、デュプレクサ、フィルタ、周波数シンセサイザ等を含み得る。例えば、解析すべきビデオシーケンスは、通信デバイス５０４を介して受信されてもよい。 The communication device 504 is hardware (transmission / reception device) for performing communication between computers via a wired and / or wireless network, and is also referred to as, for example, a network device, a network controller, a network card, a communication module, or the like. The communication device 504 includes, for example, a high frequency switch, a duplexer, a filter, a frequency synthesizer, etc. in order to realize frequency division duplex (FDD, Frequency Division Duplex) and / or time division duplex (TDD, Time Division Duplex). obtain. For example, the video sequence to be analyzed may be received via the communication device 504.

入力装置５０５は、外部からの入力を受け付ける入力デバイス（例えば、キーボード、マウス、マイクロフォン、スイッチ、ボタン、センサなど）である。出力装置５０６は、外部への出力を実施する出力デバイス（例えば、ディスプレイ、スピーカー、LEDランプなど）である。なお、入力装置５０５及び出力装置５０６は、一体となった構成（例えば、タッチパネル）であってもよい。 The input device 505 is an input device (for example, a keyboard, a mouse, a microphone, a switch, a button, a sensor, etc.) that receives an input from the outside. The output device 506 is an output device (for example, a display, a speaker, an LED lamp, etc.) that outputs to the outside. The input device 505 and the output device 506 may have an integrated configuration (for example, a touch panel).

また、プロセッサ５０１やメモリ５０２などの各装置は、情報を通信するためのバス５０７で接続される。バス５０７は、単一のバスで構成されてもよいし、装置間で異なるバスで構成されてもよい。 Further, each device such as the processor 501 and the memory 502 is connected by a bus 507 for communicating information. The bus 507 may be composed of a single bus or may be composed of different buses between the devices.

また、歩行者再識別デバイス１００は、マイクロプロセッサ、デジタル信号プロセッサ（ＤＳＰ：ＤｉｇｉｔａｌＳｉｇｎａｌＰｒｏｃｅｓｓｏｒ）、ＡＳＩＣ（ＡｐｐｌｉｃａｔｉｏｎＳｐｅｃｉｆｉｃＩｎｔｅｇｒａｔｅｄＣｉｒｃｕｉｔ）、ＰＬＤ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＤｅｖｉｃｅ）、ＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）などのハードウェアを含んで構成されてもよく、当該ハードウェアにより、各機能ブロックの一部又は全てが実現されてもよい。例えば、プロセッサ５０１は、これらのハードウェアの少なくとも１つで実装されてもよい。 Further, the pedestrian re-identification device 100 includes a microprocessor, a digital signal processor (DSP: Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), a PLD (Programmable Logic Device) hardware, an FPGA (FGA), and the like. It may be configured to include, and a part or all of each functional block may be realized by the hardware. For example, processor 501 may be implemented on at least one of these hardware.

以下、図６を参照して本開示の実施例に係る歩行者再識別方法のフローチャートを説明する。 Hereinafter, a flowchart of the pedestrian re-identification method according to the embodiment of the present disclosure will be described with reference to FIG.

図６に示すように、ステップＳ６０１において、ビデオシーケンスの各ビデオフレームから歩行者を検出する。以上のように、本開示に係る歩行者再識別技術では、通常の歩行者追跡とは異なり、異なるカメラにより撮影された画像シーケンス又はビデオシーケンスから特定の目標歩行者を識別し、特定の目標歩行者に対する長期的な追跡及び監視を可能にする。当該ビデオシーケンスは、目標歩行者の存在の有無が識別されることを必要とする解析すべき目標ビデオシーケンスであり、当該ビデオシーケンスと、以前に目標歩行者が撮影されそこからその初期又は最近外観特徴が抽出されたビデオフレームとは、異なるカメラによって撮影されてもよく、又は異なる時刻に同一カメラによって撮影されてもよい。このビデオシーケンスは、１つ以上のビデオフレームを含んでもよく、各ビデオフレームは、単一のカメラや視野の重なっている又は重なっていない複数のカメラによって撮影された連続的又は非連続的なビデオフレームであってもよい。 As shown in FIG. 6, in step S601, a pedestrian is detected from each video frame of the video sequence. As described above, in the pedestrian re-identification technique according to the present disclosure, unlike ordinary pedestrian tracking, a specific target pedestrian is identified from an image sequence or a video sequence taken by a different camera, and a specific target pedestrian walks. Allows long-term tracking and monitoring of individuals. The video sequence is a target video sequence to be analyzed that requires the presence or absence of a target pedestrian to be identified, and the video sequence and its initial or recent appearance from which the target pedestrian was previously photographed. The feature-extracted video frame may be shot by different cameras, or may be shot by the same camera at different times. The video sequence may include one or more video frames, where each video frame is a continuous or discontinuous video taken by a single camera or multiple cameras with overlapping or non-overlapping fields of view. It may be a frame.

当該ステップにおいて、当該技術分野で任意の適切な画像検出技術を用いて、ビデオシーケンスの各ビデオフレームから歩行者を検出することができ、本開示では、これが限定されない。例えば、ビデオフレーム毎に、前景分割（Foreground Segmentation）、エッジ抽出、動き検出等の処理を行い、このビデオフレームに出現する各歩行者に対応する各サブ画像領域を特定してもよく、例えば、このビデオフレームのサブ画像領域を、検出された歩行者の身体の輪郭に外接する矩形枠で表してもよい。また、例えば、ニューラルネットワークやサポートベクターマシン等の機械学習手法を用いて、予め訓練された歩行者検出分類器により各ビデオフレームに対して歩行者の検出を行うことにより、当該ビデオフレームに出現する各歩行者の位置を特定してもよい。 In that step, pedestrians can be detected from each video frame of the video sequence using any suitable image detection technique in the art, which is not limited herein. For example, for each video frame, processing such as foreground segmentation, edge extraction, and motion detection may be performed to specify each sub-image area corresponding to each pedestrian appearing in this video frame. The sub-image area of this video frame may be represented by a rectangular frame circumscribing the detected contour of the pedestrian's body. Further, for example, by using a machine learning method such as a neural network or a support vector machine to detect a pedestrian for each video frame by a pedestrian detection classifier trained in advance, the pedestrian appears in the video frame. The position of each pedestrian may be specified.

ステップＳ６０２において、ビデオフレームから検出された各歩行者について、外観特徴を抽出してもよい。外観特徴は、色特徴、テクスチャ特徴、形状特徴、及び顔特徴など、歩行者の外観形態を反映する特徴を含んでもよく、本開示ではこれが限定されない。当該ステップは、当該技術分野における周知の特徴抽出方法を採用して検出された各歩行者の外観特徴を抽出してもよく、ここでは説明を省略する。 In step S602, appearance features may be extracted for each pedestrian detected from the video frame. Appearance features may include features that reflect the pedestrian's appearance morphology, such as color features, texture features, shape features, and face features, and are not limited to this in the present disclosure. The step may extract the appearance features of each pedestrian detected by adopting a feature extraction method well known in the art, and description thereof will be omitted here.

ステップＳ６０３において、検出された各歩行者の外観特徴と目標歩行者の目標外観特徴とをマッチングし、マッチング結果に基づいてビデオフレームから目標歩行者を識別する。このステップにおいて、検出された各歩行者の外観特徴と目標歩行者の外観特徴とをマッチングし、両者の類似度に基づいてビデオフレームから検出された歩行者が当該目標歩行者に該当するか否かを決定して、歩行者の再識別を行ってもよい。選択的に、目標歩行者の数は複数であってもよく、これに対応して、当該ステップにおいて、検出された各歩行者の外観特徴を、構築された各目標歩行者の外観特徴とそれぞれマッチングし、両者の類似度に基づいて、ビデオフレームから検出された歩行者が各目標歩行者のいずれに該当するか否かを決定して、歩行者の再識別を行うようにしてもよい。目標歩行者の目標外観特徴は、予め構築された歩行者外観特徴ライブラリに含まれ、この歩行者外観特徴ライブラリは、この歩行者再識別デバイスのローカルに、又は、歩行者再識別デバイスがアクセス可能なサーバに格納され得ることが理解できる。これに代わって、目標歩行者の目標外観特徴は、歩行者外観特徴ライブラリに予め記憶せず、必要に応じて写真から生成されて外部から入力されるものであってもよい。 In step S603, the detected appearance features of each pedestrian and the target appearance features of the target pedestrian are matched, and the target pedestrian is identified from the video frame based on the matching result. In this step, the appearance characteristics of each detected pedestrian are matched with the appearance characteristics of the target pedestrian, and whether or not the pedestrian detected from the video frame corresponds to the target pedestrian based on the similarity between the two. The pedestrian may be re-identified by determining the above. Optionally, the number of target pedestrians may be plural, and the appearance features of each pedestrian detected in the step correspond to the appearance features of each target pedestrian constructed. Matching may be performed, and based on the similarity between the two, it may be determined whether or not the pedestrian detected from the video frame corresponds to each target pedestrian, and the pedestrian may be re-identified. The target pedestrian's target appearance features are included in a pre-built pedestrian appearance feature library, which is accessible locally or by the pedestrian reidentification device. It can be understood that it can be stored in various servers. Alternatively, the target appearance feature of the target pedestrian may be generated from a photograph and input from the outside, if necessary, without being stored in the pedestrian appearance feature library in advance.

当該ステップにおいて、特徴マッチングを行う際に基にする目標歩行者の目標外観特徴は、この目標歩行者の初期外観特徴と最近外観特徴の両方に基づいて生成するので、生成された特徴は、目標歩行者の固有の外観特性と、その外観及び姿勢の変化を十分に反映することができ、歩行者の再識別の正確性を確保することができる。前記初期外観特徴は、この目標歩行者を含む初期ビデオフレームから予め抽出された外観特徴を含み、前記最近外観特徴は、この目標歩行者を含む最近ビデオフレームから予め抽出された外観特徴を含んでもよい。目標歩行者の初期外観特徴と最近外観特徴とを融合して、当該目標歩行者の目標外観特徴を生成し、検出された歩行者特徴をそれとマッチングさせてもよい。当該ステップにおいて特徴マッチングを行う際に基にする目標歩行者の目標外観特徴は、上記の歩行者再識別デバイス１００の歩行者識別ユニット１３０について記載の内容と類似するので、以下に簡単に説明する。 In this step, the target pedestrian's target appearance feature, which is the basis for feature matching, is generated based on both the initial appearance feature and the recent appearance feature of this target pedestrian, so that the generated feature is the target. It is possible to sufficiently reflect the pedestrian's unique appearance characteristics and changes in the appearance and posture, and to ensure the accuracy of pedestrian re-identification. The initial appearance features include appearance features pre-extracted from an initial video frame containing the target pedestrian, and the recent appearance features may include appearance features pre-extracted from a recent video frame containing the target pedestrian. Good. The initial appearance feature of the target pedestrian may be fused with the recent appearance feature to generate the target appearance feature of the target pedestrian and match the detected pedestrian feature with it. The target appearance characteristics of the target pedestrian based on the feature matching in the step are similar to the contents described for the pedestrian identification unit 130 of the pedestrian re-identification device 100, and will be briefly described below. ..

本開示の一実施形態によれば、予め設定された重みに基づいて、目標歩行者の初期外観特徴と最近外観特徴とを重み付け融合して、この目標歩行者の目標外観特徴とすることができる。本開示の実施例では、様々な方法で、予め設定された重みに基づいて、目標歩行者の初期外観特徴と最近外観特徴とを重み付けして融合することができる。一例では、まず、予め設定された重みに基づいて、重み付け融合に用いられる当該目標歩行者の初期ビデオフレームの数及び最近ビデオフレームの数を決定してもよい。そして、決定された数の初期ビデオフレーム及び決定された数の最近ビデオフレームにおける当該目標歩行者の各外観特徴の加重平均値を計算してもよい。別の例では、まず、所定数の当該目標歩行者の初期ビデオフレーム及び最近ビデオフレームを取得してもよい。そして、重み付け融合の際に、初期ビデオフレームと最近ビデオフレームから抽出された外観特徴に異なる重みを付与し、加重平均により、当該目標歩行者の目標外観特徴を求めてもよい。 According to one embodiment of the present disclosure, the initial appearance feature of the target pedestrian and the recent appearance feature can be weighted and fused to obtain the target appearance feature of the target pedestrian based on a preset weight. .. In the embodiments of the present disclosure, the initial appearance features of the target pedestrian and the recent appearance features can be weighted and fused in various ways based on preset weights. In one example, first, the number of initial video frames and the number of recent video frames of the target pedestrian used for weighting fusion may be determined based on preset weights. Then, the weighted average value of each appearance feature of the target pedestrian in the determined number of initial video frames and the determined number of recent video frames may be calculated. In another example, first, a predetermined number of initial video frames and recent video frames of the target pedestrian may be acquired. Then, at the time of weighting fusion, different weights may be given to the appearance features extracted from the initial video frame and the recently video frame, and the target appearance feature of the target pedestrian may be obtained by the weighted average.

本開示の他の実施例によれば、この目標歩行者の初期外観特徴と最近外観特徴とをカスケード接続し、カスケード接続した外観特徴に対して畳み込み操作を行うことにより、当該初期外観特徴と最近外観特徴とをカスケード接続し融合してこの目標歩行者の目標外観特徴とすることができる。 According to another embodiment of the present disclosure, the initial appearance feature and the recent appearance feature of the target pedestrian are cascaded, and the initial appearance feature and the recent appearance feature are convoluted by performing a convolution operation on the cascaded appearance feature. The appearance features can be cascaded and fused to form the target appearance features of this target pedestrian.

ステップＳ６０３に戻り、上記の目標外観特徴に基づいて歩行者の再識別を行う具体的な動作について、以下に詳細に説明する。例えば、検出された各歩行者の外観特徴と目標歩行者の目標外観特徴との間の特徴距離、例えば、マンハッタン距離（Manhattan distance）やユークリッド距離（Euclidean distance）、バッタチャリヤ距離（Bhattacharyya distance）などを算出し、予め設定された閾値と比較することにより、検出された歩行者が当該目標歩行者に該当するか否かを決定してもよい。選択的に、目標歩行者の数は複数であってもよく、この場合、検出された歩行者について、各目標歩行者との夫々の特徴距離を算出し、夫々の特徴距離と予め設定された閾値とを比較し、当該予め設定された閾値よりも低い特徴距離の中から最も近い特徴距離を特定し、当該検出された歩行者を最も近い特徴距離に対応する目標歩行者として識別してもよい。次に、このマッチング結果に基づいて歩行者の再識別を出力してもよく、例えば、検出された歩行者がある目標歩行者に該当する場合、検出された歩行者と当該目標歩行者のアイデンティティーとを関連付け、検出された歩行者と目標歩行者とがマッチングできない場合、検出された歩行者に新たなアイデンティティーを割り当てる。 The specific operation of returning to step S603 and re-identifying the pedestrian based on the above-mentioned target appearance feature will be described in detail below. For example, the feature distance between the detected appearance feature of each pedestrian and the target appearance feature of the target pedestrian, such as Manhattan distance, Euclidean distance, Bhattacharyya distance, etc. By calculating and comparing with a preset threshold, it may be determined whether or not the detected pedestrian corresponds to the target pedestrian. Optionally, the number of target pedestrians may be plural, in which case, for the detected pedestrians, the characteristic distances of each target pedestrian are calculated, and the characteristic distances and the respective characteristic distances are preset. Even if the pedestrian is compared with the threshold value, the closest feature distance is specified from the feature distances lower than the preset threshold value, and the detected pedestrian is identified as the target pedestrian corresponding to the closest feature distance. Good. Next, the pedestrian re-identification may be output based on this matching result. For example, when the detected pedestrian corresponds to a target pedestrian, the detected pedestrian and the idea of the target pedestrian Associate with the entity and assign a new identity to the detected pedestrian if the detected pedestrian cannot match the target pedestrian.

選択的に、当該歩行者再識別方法は、識別結果に基づいて、目標歩行者の目標外観特徴を更新するステップをさらに含む。例えば、上述したようにビデオフレームから検出された歩行者がある目標歩行者に該当すると決定し、検出された歩行者と当該目標歩行者のアイデンティティーとを関連付ける場合、ビデオフレームにおける検出された歩行者の外観特徴を当該目標歩行者の最も新しい最近外観特徴とし、当該目標歩行者の初期外観特徴と当該最も新しい最近外観特徴とを融合して、当該目標歩行者の新たな目標外観特徴を生成することにより、更新した特徴が目標歩行者の最近の姿勢及び外観の変化にリアルタイムに追従できるようにしてもよい。また、以上のように、ビデオフレームから検出された歩行者が目標歩行者とマッチングできないと決定し、検出された歩行者に新たなアイデンティティーを割り当てる場合、以降の特徴マッチングのためにこの新たな目標歩行者の目標外観特徴を構築してもよい。 Optionally, the pedestrian re-identification method further includes the step of updating the target appearance feature of the target pedestrian based on the identification result. For example, when determining that a pedestrian detected from a video frame corresponds to a target pedestrian as described above and associating the detected pedestrian with the identity of the target pedestrian, the detected pedestrian in the video frame. The appearance feature of the target pedestrian is set as the newest recent appearance feature of the target pedestrian, and the initial appearance feature of the target pedestrian is fused with the newest recent appearance feature to generate a new target appearance feature of the target pedestrian. By doing so, the updated feature may be able to follow the recent changes in the posture and appearance of the target pedestrian in real time. In addition, as described above, when it is determined that the pedestrian detected from the video frame cannot be matched with the target pedestrian and a new identity is assigned to the detected pedestrian, this new feature matching is performed. The target appearance feature of the target pedestrian may be constructed.

以上説明したように、本開示の実施例に係る歩行者再識別方法では、目標歩行者の初期外観特徴及び最近外観特徴を用いて目標歩行者を識別することによって、目標歩行者の最近の姿勢及び外観の変化を効果的に追従し、目標歩行者の初期特徴に基づいてエラー訂正を効果的に行うことができ、歩行者の姿勢が大きく変化したシーン又は異なるカメラにより識別されたシーンにおいて歩行者の再識別の正確度を向上させることができる。 As described above, in the pedestrian re-identification method according to the embodiment of the present disclosure, the recent posture of the target pedestrian is identified by identifying the target pedestrian using the initial appearance feature and the recent appearance feature of the target pedestrian. And can effectively follow changes in appearance, effectively correct errors based on the initial characteristics of the target pedestrian, and walk in scenes where the pedestrian's posture has changed significantly or identified by different cameras. The accuracy of re-identification of a person can be improved.

本開示の実施例は、上記の方法及びデバイスに加えて、プロセッサに上記の歩行者再識別方法を実行させるためにプロセッサによって実行可能なコンピュータプログラム命令が記憶されたコンピュータで読み取り可能な記憶媒体も含んでもよい。 In addition to the methods and devices described above, the embodiments of the present disclosure also include a computer-readable storage medium in which computer program instructions that can be executed by the processor to cause the processor to perform the pedestrian reidentification method are stored. It may be included.

結果検証
以下では、(１)目標歩行者の初期外観特徴に基づく歩行者の再識別、(２)目標歩行者の最近外観特徴に基づく歩行者の再識別、(３)目標歩行者の目標外観特徴に基づく歩行者の再識別の３種類の歩行者再識別方法を検証した結果を示す。具体的には、この例示的な検証では、複数の目標歩行者の外観特徴を含む目標歩行者ライブラリが予め構築され、解析すべきビデオシーケンス内の１つ以上のビデオフレームに対して歩行者を検出し、ビデオフレームから検出した各歩行者の外観特徴と構築された目標歩行者の外観特徴との特徴距離を算出することによって、ビデオフレームから目標歩行者を識別する。この検証では、上記３種類の歩行者再識別方法による歩行者の再識別の正確度を、以下の２つの指標で評価する。 Result verification In the following, (1) re-identification of pedestrians based on the initial appearance characteristics of the target pedestrian, (2) re-identification of pedestrians based on the recent appearance characteristics of the target pedestrian, (3) target appearance of the target pedestrian. The results of verifying three types of pedestrian re-identification methods based on characteristics are shown. Specifically, in this exemplary validation, a target pedestrian library containing the appearance features of multiple target pedestrians was pre-built to identify pedestrians for one or more video frames in the video sequence to be analyzed. The target pedestrian is identified from the video frame by calculating the feature distance between the appearance feature of each pedestrian detected and the appearance feature of the constructed target pedestrian and the appearance feature of the constructed target pedestrian. In this verification, the accuracy of pedestrian re-identification by the above three types of pedestrian re-identification methods is evaluated by the following two indexes.

Ｒａｎｋ１：この指標は、検出された歩行者の外観特徴と各目標歩行者の外観特徴とをマッチングした後、各目標歩行者のうち、識別結果の最も良い目標歩行者(すなわち、特徴距離が最も近い目標歩行者)がちょうど正解結果である確率を示し、以下の式によって表される。 Rank1: This index matches the detected appearance characteristics of the pedestrian with the appearance characteristics of each target pedestrian, and then, among the target pedestrians, the target pedestrian with the best identification result (that is, the feature distance is the best). It shows the probability that (close target pedestrian) is just the correct answer result, and is expressed by the following formula.

（式１）
(Equation 1)

正解率：この指標は、画像シーケンスにおいて同一の歩行者が同一のＩＤを保持する確率の平均値を示し、以下の式によって表される。 Correct answer rate: This index indicates the average value of the probabilities that the same pedestrian holds the same ID in the image sequence, and is expressed by the following formula.

（式２）
(Equation 2)

上記の２つの指標に基づき、３種類の歩行者再識別方法を検証した結果が表１に示される。
（表１）
Table 1 shows the results of verifying three types of pedestrian re-identification methods based on the above two indexes.
(Table 1)

表１に示すように、本開示に係る目標外観特徴に基づく歩行者再識別方法は、２つの指標において最優の結果を得る。 As shown in Table 1, the pedestrian re-identification method based on the target appearance features according to the present disclosure gives the best results in the two indicators.

また、図７(ａ)-７(ｃ)は、上記した３種類の歩行者再識別方法を用いて歩行者を再識別した結果を模式的に示している。具体的に、図７(ａ)は、目標歩行者の初期外観特徴に基づいて歩行者の再識別を行った結果を示しており、丸付ける同一の歩行者を例とし、撮像角度や姿勢の変化により、異なるビデオフレームにおいて、歩行者がＰ１２とＰ７としてそれぞれ識別されている。図７(ｂ)は、目標歩行者の最近外観特徴に基づいて歩行者の再識別を行った結果を示しており、丸付ける同一の歩行者を例とし、この歩行者がＰ１７からＰ１５と誤識別され、それ以降は、Ｐ１５の誤識別結果を保持したまま訂正不能となる。図７(ｃ)は、目標歩行者の対象外観特徴に基づく歩行者の再識別の結果を示しており、目標歩行者の目標外観特徴は、その初期外観特徴と最近外観特徴とに基づいて生成されたものであるため、異なる撮像角度や歩行者の姿勢のいずれにおいても正しい識別結果が得られている。 Further, FIGS. 7 (a) to 7 (c) schematically show the results of re-identifying pedestrians using the above-mentioned three types of pedestrian re-identification methods. Specifically, FIG. 7A shows the result of re-identifying the pedestrian based on the initial appearance characteristics of the target pedestrian, and the image angle and posture of the same pedestrian to be rounded are taken as an example. Due to the change, pedestrians are identified as P12 and P7 in different video frames, respectively. FIG. 7B shows the result of re-identifying the pedestrian based on the recent appearance characteristics of the target pedestrian. Taking the same pedestrian to be rounded as an example, this pedestrian is mistaken for P17 to P15. After being identified, it becomes impossible to correct while retaining the misidentification result of P15. FIG. 7 (c) shows the result of pedestrian re-identification based on the target appearance feature of the target pedestrian, and the target appearance feature of the target pedestrian is generated based on the initial appearance feature and the recent appearance feature. Therefore, correct identification results are obtained at any of the different imaging angles and pedestrian postures.

本発明の目標外観特徴に基づく歩行者再識別方法では、目標歩行者の初期外観特徴又は最近外観特徴のみに基づいて歩行者の再識別を行う場合に比べて、歩行者の姿勢が大きく変化したシーン又は異なるカメラにより識別されたシーンにおいて、歩行者の再識別の正確度を向上させることが分かる。 In the pedestrian re-identification method based on the target appearance feature of the present invention, the posture of the pedestrian has changed significantly as compared with the case where the pedestrian is re-identified based only on the initial appearance feature of the target pedestrian or the recent appearance feature. It can be seen that it improves the accuracy of pedestrian re-identification in scenes or scenes identified by different cameras.

本開示の基本原理は、具体的な実施例に関連して説明されたが、本開示において言及される利点、優勢、効果などは、限定ではなく例示に過ぎず、本開示の各実施形態が備えなければならないと考えられるべきではない。また、上記に開示された具体的な細部は、例示的な作用と理解を容易にする作用に過ぎず、限定的なものではなく、上記の細部は、本開示が必ずしも上記具体的な細部を用いて実現されるように限定されるものではない。 Although the basic principles of the present disclosure have been described in the context of specific embodiments, the advantages, advantages, effects, etc. referred to in the present disclosure are merely examples, not limitations, and each embodiment of the present disclosure. It should not be considered that it must be prepared. In addition, the specific details disclosed above are merely exemplary actions and actions that facilitate understanding, and are not limiting. It is not limited to be realized by using.

本明細書で使用する「に基づいて」という記載は、別段に明記されていない限り、「のみに基づいて」を意味しない。言い換えれば、「に基づいて」という記載は、「のみに基づいて」と「に少なくとも基づいて」の両方を意味する。 The phrase "based on" as used herein does not mean "based on" unless otherwise stated. In other words, the statement "based on" means both "based only" and "at least based on".

本明細書で使用する「第１の」、「第２の」などの呼称を使用した要素へのいかなる参照も、それらの要素の量又は順序を全般的に限定するものではない。これらの呼称は、２つ以上の要素間を区別する便利な方法として本明細書で使用され得る。従って、第１及び第２の要素への参照は、２つの要素のみがそこで採用され得ること、又は何らかの形で第１の要素が第２の要素に先行しなければならないことを意味しない。 Any reference to elements using designations such as "first", "second" as used herein does not generally limit the quantity or order of those elements. These designations can be used herein as a convenient way to distinguish between two or more elements. Thus, references to the first and second elements do not mean that only two elements can be adopted there, or that the first element must somehow precede the second element.

本明細書で使用する「判断（決定）(ｄｅｔｅｒｍｉｎｉｎｇ)」という用語は、多種多様な動作を包含する場合がある。「判断（決定）」は、例えば、計算(ｃａｌｃｕｌａｔｉｎｇ)、算出(ｃｏｍｐｕｔｉｎｇ)、処理(ｐｒｏｃｅｓｓｉｎｇ)、導出(ｄｅｒｉｖｉｎｇ)、調査(ｉｎｖｅｓｔｉｇａｔｉｎｇ)、探索(ｌｏｏｋｉｎｇｕｐ)（例えば、テーブル、データベース又は別のデータ構造での探索）、確認(ａｓｃｅｒｔａｉｎｉｎｇ)したことなどを「判断（決定）」したとみなし得る。また、「判断（決定）」は、受信(ｒｅｃｅｉｖｉｎｇ)（例えば、情報を受信すること）、送信(ｔｒａｎｓｍｉｔｔｉｎｇ)(例えば、情報を送信すること)、入力(ｉｎｐｕｔ)、出力(ｏｕｔｐｕｔ)、アクセス(ａｃｃｅｓｓｉｎｇ)（例えば、メモリ中のデータにアクセスすること）したことなどを「判断（決定）」したとみなし得る。また、「判断（決定）」は、解決(ｒｅｓｏｌｖｉｎｇ)、選択(ｓｅｌｅｃｔｉｎｇ)、選定(ｃｈｏｏｓｉｎｇ)、確立(ｅｓｔａｂｌｉｓｈｉｎｇ)、比較(ｃｏｍｐａｒｉｎｇ)したことなどを「判断（決定）」したとみなし得る。つまり、「判断（決定）」は、何らかの動作を「判断（決定）」したとみなし得る。 The term "determining" as used herein may include a wide variety of actions. A "decision" is, for example, a calculation, a computing, a processing, a deriving, an investigating, a searching up (eg, a table, a database or another data). It can be regarded as "judgment (decision)" that the search in the structure), the confirmation (ascertaining), and the like. In addition, "judgment (decision)" includes receiving (for example, receiving information), transmitting (for example, transmitting information), input (input), output (output), and access (for example). It can be regarded as "judgment (decision)" that the input) (for example, accessing the data in the memory) is performed. Further, the "judgment (decision)" can be regarded as "judgment (decision)" such as solving, selecting, selecting, establishing, and comparing. That is, "judgment (decision)" can be regarded as "judgment (decision)" of some action.

「接続された(ｃｏｎｎｅｃｔｅｄ)」、「結合された(ｃｏｕｐｌｅｄ)」という用語、又はこれらのあらゆる変形は、２又はそれ以上の要素間の直接的又は間接的なあらゆる接続又は結合を意味し、互いに「接続」又は「結合」された２つの要素間に１又はそれ以上の中間要素が存在することを含むことができる。要素間の結合又は接続は、物理的なものであっても、論理的なものであっても、或いは両方の組み合わせであってもよい。例えば、「接続」は「アクセス」に読み替えられてもよい。本明細書で使用する場合、２つの要素は、１又はそれ以上の電線、ケーブル及び／又はプリント電気接続を使用することにより、並びにいくつかの非限定的かつ非包括的な例として、無線周波数領域、マイクロ波領域及び／又は光（可視及び不可視の両方）領域の波長を有する電磁エネルギーなどの電磁エネルギーを使用することにより、互いに「接続」又は「結合」されると考えることができる。 The terms "connected", "coupled", or any variation thereof, mean any direct or indirect connection or connection between two or more elements, and each other. It can include the presence of one or more intermediate elements between two "connected" or "combined" elements. The connections or connections between the elements may be physical, logical, or a combination of both. For example, "connection" may be read as "access." As used herein, the two elements are by using one or more wires, cables and / or printed electrical connections, and, as some non-limiting and non-comprehensive examples, radio frequencies. By using electromagnetic energies such as electromagnetic energies with wavelengths in the region, microwave region and / or light (both visible and invisible) regions, they can be considered to be "connected" or "coupled" to each other.

「含む(ｉｎｃｌｕｄｉｎｇ)」、「含んでいる（ｃｏｍｐｒｉｓｉｎｇ）」、及びそれらの変形が、本明細書或いは特許請求の範囲で使用されている限り、これら用語は、用語「備える」と同様に、包括的であることが意図される。さらに、本明細書或いは特許請求の範囲において使用されている用語「または（or）」は、排他的論理和ではないことが意図される。 As long as "inclusion," "comprising," and variations thereof are used herein or in the claims, these terms are as comprehensive as the term "comprising." Intended to be targeted. Furthermore, the term "or" as used herein or in the claims is intended not to be an exclusive OR.

本明細書で説明する各形態/実施形態は、単独で用いてもよく、組み合わせて用いてもよく、実行中に切り替えて用いてもよい。また、本明細書で説明した各形態/実施形態の処理手順、シーケンス、フローチャート等は、矛盾の無い限り、順序を入れ替えてもよい。例えば、本明細書で説明した方法について、例示的な順序で様々なステップ要素を提示しており、提示した特定の順序に限定されない。 Each of the embodiments / embodiments described herein may be used alone, in combination, or switched during execution. Further, the order of the processing procedures, sequences, flowcharts, etc. of each embodiment / embodiment described in the present specification may be changed as long as there is no contradiction. For example, for the methods described herein, various step elements are presented in an exemplary order and are not limited to the particular order presented.

なお、本開示のデバイス及び方法において、ソフトウェアは、ソフトウェア、ファームウェア、ミドルウェア、マイクロコード、ハードウェア記述言語と呼ばれるか、又は他の名称で呼ばれるかにかかわらず、コマンド、命令セット、コード、コードセグメント、プログラムコード、プログラム、サブプログラム、ソフトウェアモジュール、アプリケーション、ソフトウェアアプリケーション、ソフトウェアパッケージ、ルーチン、サブルーチン、オブジェクト、実行可能ファイル、実行スレッド、手順、機能などを意味するよう広く解釈されるべきである。 It should be noted that in the devices and methods of the present disclosure, software is a command, instruction set, code, code segment, whether referred to as software, firmware, middleware, microcode, hardware description language, or by any other name. , Program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, executable files, execution threads, procedures, functions, etc. should be broadly interpreted to mean.

また、ソフトウェア、命令、情報などは、伝送媒体を介して送受信されてもよい。例えば、ソフトウェアが、有線技術（同軸ケーブル、光ファイバケーブル、ツイストペア及びデジタル加入者回線（ＤＳＬ、ＤｉｇｉｔａｌＳｕｂｓｃｒｉｂｅｒＬｉｎｅ）など）の及び／又は無線技術（赤外線、マイクロ波など）を使用してウェブサイト、サーバ、又は他のリモートソースから送信される場合、これらの有線技術及び／又は無線技術は、伝送媒体の定義内に含まれる。 Further, software, instructions, information and the like may be transmitted and received via a transmission medium. For example, the software uses wired technology (coaxial cable, fiber optic cable, twisted pair and digital subscriber line (DSL, Digital Subscriber Line), etc.) and / or wireless technology (infrared, microwave, etc.) to website, When transmitted from a server, or other remote source, these wired and / or wireless technologies are included within the definition of transmission medium.

以上、本発明について詳細に説明したが、当業者にとっては、本発明が本明細書中に説明した実施形態に限定されるものではないということは明らかである。本発明は、特許請求の範囲の記載により定まる本発明の趣旨及び範囲を逸脱することなく修正及び変更態様として実施することができる。従って、本明細書の記載は、例示説明を目的とするものであり、本発明に対して何ら制限的な意味を有するものではない。 Although the present invention has been described in detail above, it is clear to those skilled in the art that the present invention is not limited to the embodiments described herein. The present invention can be implemented as a modified or modified mode without departing from the spirit and scope of the present invention determined by the description of the claims. Therefore, the description of the present specification is for the purpose of exemplification and does not have any limiting meaning to the present invention.

Claims

With a pedestrian detection unit configured to detect pedestrians from each video frame in the video sequence,
A feature extraction unit configured to extract appearance features for each pedestrian detected from a video frame,
Includes a pedestrian identification unit configured to match the extracted appearance features of each pedestrian with the target appearance features of the target pedestrian and identify the target pedestrian from the video frame based on the matching results.
The target appearance feature of the target pedestrian is a pedestrian re-identification device generated based on the initial appearance feature and the recent appearance feature of the target pedestrian.

The initial appearance feature includes appearance features pre-extracted from an initial video frame containing the target pedestrian, and the recent appearance feature includes appearance features pre-extracted from a recent video frame containing the target pedestrian. The pedestrian re-identification device according to 1.

The pedestrian re-identification device according to claim 2, wherein the initial appearance feature of the target pedestrian and the recent appearance feature are fused to generate the target appearance feature of the target pedestrian.

The pedestrian re-identification device according to claim 3, wherein the initial appearance feature of the target pedestrian and the recent appearance feature are weighted and fused based on a preset weight.

Weighting and fusing the initial appearance feature of the target pedestrian with the recent appearance feature based on a preset weight is an initial weighting fusion of the target pedestrian for weighting fusion based on a preset weight. Claiming, including determining the number of video frames and the number of recent video frames and calculating the weighted average of the determined number of initial video frames and the determined number of recent video frames of the target pedestrian appearance feature. Item 4. The pedestrian re-identification device according to item 4.

By cascade-connecting the initial appearance feature and the recent appearance feature of the target pedestrian and performing a convolution operation on the cascade-connected appearance feature, the initial appearance feature and the recent appearance feature are cascade-connected and fused. , The pedestrian re-identification device according to claim 3.

The pedestrian identification unit determines whether or not the pedestrian detected from the video frame corresponds to the target pedestrian based on the similarity between the extracted pedestrian appearance feature and the target appearance feature. The pedestrian re-identification device according to claim 2, which is further configured as described above.

When the pedestrian detected from the video frame corresponds to the target pedestrian, the feature update configured to make the detected pedestrian appearance feature in the video frame the newest recently appearance feature of the target pedestrian. The pedestrian re-identification device according to claim 7, further comprising a unit.

The video sequence and the initial video frame and the recent video frame from which the target appearance feature of the target pedestrian is extracted are those taken by different cameras or taken by the same camera at different times. , The pedestrian re-identification device according to claim 2.

It has a processor and a memory in which computer program instructions are stored.
When the computer program instruction is executed by the processor, the processor detects a pedestrian from each video frame of the video sequence.
For each pedestrian detected from the video frame, the steps to extract the appearance features,
The step of matching the extracted appearance features of each pedestrian with the target appearance features of the target pedestrian and identifying the target pedestrian from the video frame based on the matching result is executed.
The target appearance feature of the target pedestrian is a pedestrian re-identification device generated based on the initial appearance feature and the recent appearance feature of the target pedestrian.