TWI698805B

TWI698805B - System and method for detecting and tracking people

Info

Publication number: TWI698805B
Application number: TW107136190A
Authority: TW
Inventors: 邱敬淳; 王才沛; 張家瑋; 柳恆崧; 吳玉善
Original assignee: 中華電信股份有限公司
Priority date: 2018-10-15
Filing date: 2018-10-15
Publication date: 2020-07-11
Also published as: TW202016798A

Abstract

This invention discloses a system and a method for detecting and tracking people for applying to an image provided by the fisheye camera, comprising a mask generating module for generating a plurality of detecting masks having different sizes in different areas of the screen, a foreground-background segmentation module splitting the image from the screen into foreground and background regions, a mask screening module applying a plurality of detection masks to the foreground region to filter out the detection mask of the candidate, a masking image capturing module applying the detection mask of the candidate to the original image to extract a mask image of the candidate, and a mask image classification module classifying the mask image of the candidate by using a classifier corresponding to the candidate mask of the candidate to obtain a mask image of the real character.

Description

Person detection and tracking system and method

本案係關於一種人物偵測與追蹤技術，詳而言之，係關於一種用於魚眼攝影機所提供之畫面的人物偵測與追蹤之系統及方法。 This case is about a person detection and tracking technology, in detail, it is about a system and method for person detection and tracking in the picture provided by a fisheye camera.

隨著攝影機、網路技術、與人工智慧的快速發展，智慧型視訊監控技術的應用有大幅度的成長。視訊監控所使用的攝影機的主要型態包括投射式攝影機與魚眼攝影機，其中絕大部分是屬於投射式(projective)攝影機，成像機制類似於人類的視覺系統，但主要缺點是在於其拍攝範圍上的限制。通常一個場域需要多支攝影機來涵蓋其範圍，造成硬體成本與資料量的增加，以及相互校正與資訊整合的複雜度。 With the rapid development of cameras, network technology, and artificial intelligence, the application of intelligent video surveillance technology has grown substantially. The main types of cameras used in video surveillance include projection cameras and fish-eye cameras, most of which are projective cameras. The imaging mechanism is similar to the human visual system, but the main disadvantage is its shooting range. limits. Usually, a field requires multiple cameras to cover its scope, resulting in increased hardware costs and data volume, as well as the complexity of mutual calibration and information integration.

與投射式攝影機相較，頂照式魚眼攝影機的優勢主要是在於其360度涵蓋範圍，最少可使用單一攝影機，即可完整涵蓋相當大的空間而無攝影機死角問題，在硬體成本與使用上都有其優勢，因此是許多室內監控應用的重要選項。此外，頂照式魚眼攝影機用於移動物件(包括人物) 的追蹤、辨識等用途時，其另一個優點是較低的交互遮蔽現象，因此在於定位、計數、追蹤等問題上，都可以有較高的準確度。 Compared with projection cameras, the main advantage of top-illuminated fisheye cameras lies in their 360-degree coverage. At least a single camera can be used, which can completely cover a considerable space without the problem of camera dead ends. In terms of hardware cost and use Each has its advantages, so it is an important option for many indoor surveillance applications. In addition, a top-illuminated fisheye camera is used to move objects (including people) When used for tracking and identification purposes, another advantage is the lower interactive masking phenomenon, so it can have a higher accuracy in positioning, counting, tracking and other issues.

然而，現有的針對頂照式魚眼攝影機的影像進行人物偵測或追蹤的技術遠少於投射式攝影機。除了資料來源較少的因素外，主要的挑戰來自於人物在影像中的不同位置時其方向與外形的變異。現有技術主要分兩類：第一類是只有單純的前景/背景分離，將每個前景區塊當成一個人物並追蹤，但沒有對此前景區塊是否為人物進行辨識，因此對於多人或有其他前景移動物件的環境有困難；第二類是先進行影像的轉正或攤平(dewarping)，再套用傳統影像的人物偵測方法，但轉正或攤平通常只能處理影像靠外緣的區域，會造成影像失真，且對影像中心(攝影機下方)區域的人物型態改變仍無法處理。 However, the existing technology for human detection or tracking on the image of the top-illuminated fisheye camera is far less than that of the projection camera. In addition to factors with fewer sources of information, the main challenge comes from the variation of the orientation and appearance of the characters in different positions in the image. The existing technology is mainly divided into two categories: the first category is only the separation of foreground/background, each foreground block is regarded as a person and tracked, but it is not recognized whether the foreground block is a person, so there may be other The environment of moving objects in the foreground is difficult; the second type is to perform image correction or dewarping first, and then apply traditional image human detection methods, but normalization or flattening usually only handles the area near the outer edge of the image. It will cause the image to be distorted, and it is still unable to deal with the change of the character in the center of the image (below the camera).

綜合以上情形，上述習用技術仍有諸多缺失，亟待加以改良。 In view of the above situation, there are still many shortcomings in the above-mentioned conventional technologies, which need to be improved urgently.

為解決上述及其他問題，本案揭示一種人物偵測與追蹤之系統及方法，可用於魚眼攝影機所提供之畫面。 In order to solve the above and other problems, this case discloses a system and method for human detection and tracking, which can be used in the image provided by a fisheye camera.

本案之人物偵測與追蹤之系統包括：前背景分割模組，將來自該畫面之影像分割為前景區域和背景區域；遮罩產生模組，用以在該畫面中之不同區域內產生具有不同尺寸之複數個偵測遮罩；遮罩篩選模組，將該複數個偵測遮罩套用至該前景區域，以篩選出候選人物之偵測遮罩；遮罩影像擷取模組，將該候選人物之偵測遮罩套用至該影像，以擷取出候選人物之遮罩影像；以及遮罩影像分類模組，使用與該候選人物之偵測遮罩對應之分類器，對該候選人物之遮罩影像進行分類，俾獲得實為人物之遮罩影像。 The human detection and tracking system in this case includes: a front background segmentation module, which divides the image from the frame into a foreground area and a background area; a mask generation module, which is used to generate different areas in different areas of the frame A plurality of detection masks of different sizes; a mask filtering module applies the plurality of detection masks to the foreground area to filter out candidate detection masks; masks The image capture module applies the detection mask of the candidate object to the image to capture the mask image of the candidate object; and the mask image classification module uses the detection mask corresponding to the candidate object The classifier classifies the mask image of the candidate object to obtain the mask image that is actually a person.

本案之人物偵測與追蹤之系統更包括人物追蹤單元，該人物追蹤單元包括遮罩特徵比對模組、人物序列更新模組、及特定衣著辨識模組，其中，該遮罩特徵比對模組根據由該遮罩影像分類模組所計算之影像特徵比對後續影像的遮罩影像，以於比對符合時，納入追蹤人物序列。 The person detection and tracking system in this case further includes a person tracking unit, which includes a mask feature comparison module, a person sequence update module, and a specific clothing recognition module, wherein the mask feature comparison module The group compares the mask images of subsequent images according to the image features calculated by the mask image classification module, so that when the comparison matches, the tracking person sequence is included.

本案之人物偵測與追蹤之系統中的分類器具有訓練階段和分類階段，於訓練階段時，該分類器針對自該影像的中心至各該偵測遮罩的中心之複數個徑向區域，建立對應各該偵測遮罩的分類器模型，於分類階段時，對該候選人物之遮罩影像使用該候選人物之偵測遮罩的所在區域選擇對應的分類器，以分類出該實為人物之遮罩影像。 The classifier in the human detection and tracking system in this case has a training phase and a classification phase. During the training phase, the classifier targets a plurality of radial regions from the center of the image to the center of each detection mask. Establish a classifier model corresponding to each detection mask. In the classification stage, use the candidate object's detection mask for the candidate object's mask image to select the corresponding classifier to classify the actual object Masked images of people.

本案之人物偵測與追蹤之方法包括：自該畫面擷取影像以將該影像分割為前景區域和背景區域；將複數個偵測遮罩套用至該前景區域，以篩選出候選人物之偵測遮罩；將該候選人物之偵測遮罩套用至該影像，以擷取出候選人物之遮罩影像；以及使用與該候選人物之偵測遮罩對應的分類器，對該候選人物之遮罩影像進行分類，俾獲得實為人物之遮罩影像。 The method of human detection and tracking in this case includes: capturing an image from the screen to divide the image into a foreground area and a background area; applying a plurality of detection masks to the foreground area to filter out candidate detections Mask; apply the detection mask of the candidate object to the image to extract the mask image of the candidate object; and use the classifier corresponding to the detection mask of the candidate object to mask the candidate object The images are classified to obtain masked images that are actually people.

本案之人物偵測與追蹤之方法更包括對該候選人物之遮罩影像計算影像特徵，根據該影像特徵比對後續影像的遮罩影像，以於比對符合時，納入追蹤人物序列。 The method of human detection and tracking in this case further includes calculating image features of the masked image of the candidate object, and comparing subsequent images based on the image features The masked image of, so that when the comparison matches, it will be included in the tracking person sequence.

於本案之人物偵測與追蹤之方法中，該複數個偵測遮罩之產生方法包括：將人物模型放置於場景中的不同位置，以計算出該人物模型佔該影像的範圍或該人物模型在該影像的景深，藉此在該畫面中之不同區域內產生具有不同尺寸之複數個偵測遮罩。 In the method of human detection and tracking in this case, the method for generating the plurality of detection masks includes: placing the character model at different positions in the scene to calculate the area occupied by the character model in the image or the character model In the depth of field of the image, a plurality of detection masks of different sizes are generated in different areas in the frame.

因此，本案使用複數個偵測遮罩(detection masks)來進行人物偵測，偵測遮罩本身即包含人物在不同位置的方向與大小變異資訊，並且已涵蓋場景中固定物件造成的遮蔽，因此可達到即時的偵測。此外，本案另針對影像中不同的徑向距離區域訓練個別的分類器模型，因此可以有效的偵測在不同區域而有不同外形的人物。再者，本案亦可將偵測的人物串聯成序列，藉此達到即時追蹤與紀錄軌跡的效果，相較於習知技術具有更高的準確性與實用性。 Therefore, this case uses a plurality of detection masks for human detection. The detection mask itself contains information about the direction and size variation of people in different positions, and covers the shadow caused by fixed objects in the scene. Therefore, Real-time detection can be achieved. In addition, this case also trains individual classifier models for different radial distance regions in the image, so people with different shapes in different regions can be effectively detected. Furthermore, this case can also concatenate the detected persons into a sequence, thereby achieving the effect of real-time tracking and recording of the trajectory, which has higher accuracy and practicability than the conventional technology.

1‧‧‧畫面影像 1‧‧‧Screen image

100‧‧‧魚眼攝影機 100‧‧‧Fisheye Camera

11‧‧‧前景區域 11‧‧‧Foreground area

12‧‧‧偵測遮罩 12‧‧‧Detection mask

2‧‧‧人物偵測與追蹤系統 2‧‧‧People detection and tracking system

200‧‧‧人物遮罩設定單元 200‧‧‧Personal mask setting unit

210‧‧‧場景設定模組 210‧‧‧Scene Setting Module

220‧‧‧遮罩產生模組 220‧‧‧Mask generation module

300‧‧‧人物偵測單元 300‧‧‧People Detection Unit

310‧‧‧影像擷取模組 310‧‧‧Image capture module

320‧‧‧前背景分割模組 320‧‧‧Front background segmentation module

330‧‧‧遮罩篩選模組 330‧‧‧Mask filter module

340‧‧‧遮罩影像擷取模組 340‧‧‧Mask image capture module

350‧‧‧遮罩影像分類模組 350‧‧‧Mask image classification module

400‧‧‧人物追蹤單元 400‧‧‧People Tracking Unit

410‧‧‧遮罩特徵比對模組 410‧‧‧Mask feature comparison module

420‧‧‧人物序列更新模組 420‧‧‧Character Sequence Update Module

430‧‧‧特定衣著辨識模組 430‧‧‧Specific clothing recognition module

501~519‧‧‧方塊 501~519‧‧‧Cube

A~C‧‧‧人物 A~C‧‧‧People

E、F‧‧‧固定物件 E, F‧‧‧Fixed objects

S201~S206‧‧‧步驟 S201~S206‧‧‧Step

第1A圖為本案人物偵測與追蹤之系統的實施例之配置示意圖；第1B圖為本案人物偵測與追蹤之系統的具體實施例之配置示意圖；第2圖為本案人物偵測與追蹤之方法的實施例之流程示意圖；第3圖為本案人物偵測與追蹤之方法的具體實施例之方塊示意圖；第4圖為本案人物偵測與追蹤之系統及方法的魚眼攝影機之場景示意圖；第5圖為本案人物偵測與追蹤之系統及方法的魚眼攝影機之畫面示意圖；第6圖為本案人物偵測與追蹤之系統及方法的魚眼攝影機之自畫面中擷取的影像的前景區域示意圖；第7A圖為本案人物偵測與追蹤之系統及方法的偵測遮罩之示意圖；第7B圖為本案人物偵測與追蹤之系統及方法的偵測遮罩的遮罩影像之示意圖；第8圖為本案人物偵測與追蹤之系統及方法的偵測遮罩套用於前景區域之示意圖；第9圖為本案人物偵測與追蹤之系統及方法的經篩選的偵測遮罩之示意圖；第10圖為本案人物偵測與追蹤之系統及方法的魚眼攝影機之畫面的徑向區域之示意圖；以及第11A至11D圖為本案人物偵測與追蹤之系統及方法的由魚眼攝影機所拍攝的實際畫面及偵測遮罩的應用。 Figure 1A is a schematic diagram of the configuration of an embodiment of the system for human detection and tracking; Figure 1B is a schematic diagram of the configuration of a specific embodiment of the system for human detection and tracking; Figure 2 is a schematic diagram of the configuration of the human detection and tracking system A schematic diagram of the flow of an embodiment of the method; Figure 3 is a block diagram of a specific embodiment of the method for human detection and tracking in this case; Figure 4 is a fisheye shot of the system and method for human detection and tracking in this case Schematic diagram of the scene of the camera; Figure 5 is a schematic diagram of the fish-eye camera of the system and method for human detection and tracking; Figure 6 is the screen of the fish-eye camera of the system and method for human detection and tracking A schematic diagram of the foreground area of the captured image; Figure 7A is a schematic diagram of the detection mask of the system and method of human detection and tracking; Figure 7B is the detection mask of the system and method of human detection and tracking Figure 8 is a schematic diagram of the application of the detection mask in the foreground area of the system and method for human detection and tracking; Figure 9 is a screened view of the system and method for human detection and tracking Figure 10 is a schematic diagram of the radial area of the fisheye camera screen of the system and method for human detection and tracking; and Figures 11A to 11D are the system for human detection and tracking. The actual picture taken by the fisheye camera and the application of the detection mask of the method.

以下藉由特定的實施例說明本案之實施方式，熟習此項技藝之人士可由本文所揭示之內容輕易地瞭解本案之其他優點及功效。本說明書所附圖式所繪示之結構、比例、大小等均僅用於配合說明書所揭示之內容，以供熟悉此技藝之人士之瞭解與閱讀，非用於限定本案可實施之限定條件，故任何修飾、改變或調整，在不影響本案所能產生之功效及所能達成之目的下，均應仍落在本案所揭示之技術內容得能涵蓋之範圍內。 The following specific examples are used to illustrate the implementation of this case. Those familiar with the art can easily understand the other advantages and effects of this case from the content disclosed in this article. The structure, ratio, size, etc. shown in the drawings in this manual are only used to match the content disclosed in the manual for the understanding and reading of those who are familiar with the art, and are not used to limit the implementation of this case. Therefore, any modification, change or adjustment will not affect the The effect and the achievable purpose should still fall within the scope of the technical content disclosed in this case.

參閱第1A和1B圖，本案之人物偵測與追蹤系統2包括人物遮罩設定單元200、人物偵測單元300、和人物追蹤單元400。 Referring to FIGS. 1A and 1B, the human detection and tracking system 2 of this case includes a human mask setting unit 200, a human detection unit 300, and a human tracking unit 400.

人物遮罩設定單元200包含場景設定模組210和遮罩產生模組220。場景設定模組210用以供使用者設定魚眼攝影機之參數、場景範圍、場景中固定物件之位置與大小資訊、以及人物實際大小參數等。遮罩產生模組220用以在魚眼攝影機的畫面中之不同區域內產生具有不同尺寸之複數個偵測遮罩，其中，遮罩產生模組220可根據使用者所輸入的參數或資訊等，將人物模型放置於場景中的不同位置，換言之，可根據使用者所設定之人物實際大小參數產生人物模型，再將人物模型放置於場景中的不同位置，以計算出人物模型佔影像的範圍或該人物模型在該影像的景深，亦即，根據已設定之魚眼攝影機之參數、場景範圍、場景中固定物件之位置與大小資訊、以及人物實際大小參數，可計算出人物模型在影像的景深，藉此在畫面中之不同區域內產生具有不同尺寸之複數個偵測遮罩。 The character mask setting unit 200 includes a scene setting module 210 and a mask generation module 220. The scene setting module 210 is used for the user to set the parameters of the fisheye camera, the scene range, the position and size information of the fixed objects in the scene, and the actual size parameters of the characters. The mask generation module 220 is used to generate a plurality of detection masks with different sizes in different areas of the fisheye camera's screen. The mask generation module 220 can be based on the parameters or information input by the user, etc. , Place the character model in different positions in the scene. In other words, the character model can be generated according to the actual size parameters of the character set by the user, and then the character model can be placed in different positions in the scene to calculate the range of the image occupied by the character model Or the depth of field of the character model in the image, that is, according to the parameters of the fisheye camera that have been set, the scene range, the position and size information of the fixed objects in the scene, and the actual size of the character, the character model in the image can be calculated Depth of field, thereby generating multiple detection masks of different sizes in different areas of the screen.

此外，遮罩產生模組220也計算人物模型與場景物件在影像各像素位置的深度(與攝影機距離)，藉由比較其相對深度，得到偵測遮罩區域中實際可見(未被場景物件遮蔽)的範圍。藉此，遮罩產生模組220可產生一組偵測遮罩，各偵測遮罩之資訊包含其在原影像中的範圍、偵測遮罩所覆蓋的影像中各像素在原影像中之座標、偵測遮罩所覆蓋的影像中的實際可見區域、及偵測遮罩在真實世界中座標位置，故於後續的人物偵測階段，不須任何額外的判斷，即可有效率的處理場景物件對人物部分遮蔽的問題。 In addition, the mask generation module 220 also calculates the depth (distance from the camera) of the character model and the scene object at each pixel position of the image, and by comparing their relative depths, the detection mask area is actually visible (not covered by the scene object). ) Scope. In this way, the mask generation module 220 can generate a set of detection masks. The information of each detection mask includes its range in the original image, the coordinates of each pixel in the image covered by the detection mask in the original image, The actual visible area in the image covered by the detection mask and the coordinate position of the detection mask in the real world. Therefore, in the subsequent human detection stage, the scene objects can be processed efficiently without any additional judgment The problem of partial occlusion of characters.

人物偵測單元300包括影像擷取模組310、前背景分割模組320、遮罩篩選模組330、遮罩影像擷取模組340、及遮罩影像分類模組350。影像擷取模組310可自魚眼攝影機所拍攝的畫面擷取影像，而前背景分割模組320再將影像分割為前景區域和背景區域，藉此更新背景模型並計算畫面中之前景區域。 The person detection unit 300 includes an image capturing module 310, a front background segmentation module 320, a mask filtering module 330, a mask image capturing module 340, and a mask image classification module 350. The image capturing module 310 can capture an image from the frame captured by the fisheye camera, and the front background segmentation module 320 divides the image into a foreground area and a background area, thereby updating the background model and calculating the foreground area in the screen.

遮罩篩選模組330使用由人物遮罩設定單元200所計算以產生之複數個偵測遮罩，套用至影像的前景區域，接著計算各偵測遮罩範圍所包含之前景區域比例，以篩選出對於前景比例達到一給定臨界值以上之遮罩，亦即，可能為人物(亦稱為候選人物)之偵測遮罩。 The mask filtering module 330 uses the plurality of detection masks calculated by the human mask setting unit 200 to apply them to the foreground area of the image, and then calculates the proportion of the foreground area included in each detection mask range to filter A mask whose foreground ratio reaches a given threshold or more, that is, a detection mask that may be a person (also called a candidate) is generated.

遮罩影像擷取模組340將可能為人物(候選人物)之偵測遮罩套用至原影像，以擷取出可能為人物(候選人物)之遮罩影像，即原影像中被可能為人物(候選人物)之偵測遮罩所覆蓋的部分，其中，可將所擷取的影像轉正。 The mask image capturing module 340 applies the detection mask that may be a person (candidate) to the original image to capture the mask image that may be a person (candidate), that is, the original image is likely to be a person ( The part covered by the detection mask of the candidate object), in which the captured image can be corrected.

遮罩影像分類模組350可計算可能為人物(候選人物)之遮罩影像的影像特徵，再使用與可能為人物(候選人物)之偵測遮罩對應的分類器，對各個可能為人物(候選人物)之遮罩影像產生實為人物可能性的評分，藉此分類出實為人物之遮罩影像。詳細來說，對於分類器之訓練，所使用的訓練資料為事前經由操作者人工標記為人物或非人物的遮罩影像，在訓練階段，此分類器針對魚眼攝影機影像中的數個徑向區域(依照遮罩中心與影像中心之距離相對於魚眼攝影機影像半徑的比例分段)，使用遮罩中心位於該區域中之訓練用遮罩影像，各建立不同的分類器模型，而在實際分類階段，亦使用各遮罩所在區域分類器模型對其分類。換言之，分類器針對自該影像的中心至各該偵測遮罩的中心之複數個徑向區域，建立對應各該偵測遮罩的分類器模型，以於該訓練階段時，對該可能為人物(候選人物)之遮罩影像使用該可能為人物(候選人物)之偵測遮罩的所在區域選擇對應的分類器，以分類出實為人物之遮罩影像。 The mask image classification module 350 can calculate the image features of the mask image that may be a person (candidate), and then use a classifier corresponding to the detection mask that may be a person (candidate) to determine each possible person ( The mask image of the candidate) generates a score of the possibility of being a person, thereby classifying the mask image that is a person. In detail, for the training of the classifier, the training data used is the masked image that was manually marked as human or non-human by the operator in advance. In the training phase, this classifier is targeted at several radial images in the fisheye camera image. Area (segmented according to the ratio of the distance between the center of the mask and the center of the image relative to the radius of the fisheye camera image), use the center of the mask to be located in this area For training mask images in the domain, a different classifier model is established, and in the actual classification stage, the classifier model of the region where each mask is located is also used to classify it. In other words, the classifier establishes a classifier model corresponding to each detection mask for a plurality of radial regions from the center of the image to the center of each detection mask, so that in the training phase, the probability is The mask image of the person (candidate) uses the area where the detection mask of the person (candidate) is located to select a corresponding classifier to classify the mask image that is actually a person.

人物追蹤單元400包括遮罩特徵比對模組410、人物序列更新模組420、及特定衣著辨識模組430，其中，遮罩特徵比對模組410可根據遮罩影像分類模組350所計算之遮罩影像的影像特徵，比對後續影像的遮罩影像，以於比對符合時，納入追蹤人物序列。詳而言之，人物追蹤單元400會保持一組紀錄，包含所有追蹤中人物序列，而每個人物序列包含一個人物從進入場景到離開場景期間的活動軌跡。對於每個新的攝影機畫面，由人物偵測單元300產生並傳送到人物追蹤單元400者包括一組偵測遮罩及/或遮罩影像，以及這些偵測遮罩包含實際人物的可能性分數(可能性分數低於一預設臨界值的遮罩則排除)，遮罩特徵比對模組410則將這些遮罩與追蹤中的人物序列進行比對，依照其外觀相似度、位置符合程度(人物序列最後位置與遮罩的真實世界座標)、以及遮罩影像為人物的可能性，判斷與每個人物序列有最佳符合程度的遮罩，若此符合程度達到一預設之臨界值以上，則由人物序列更新模組420將遮罩位置加入人物序列，並且以遮罩之外觀與位置資訊更新人物序列之外觀與位置資訊。對於無法與由人物偵測單元300所選取之遮罩連結的人物序列，則暫不更新其位置與外觀資訊，而在一定數量的畫面中對其持續進行追蹤，若是在這些畫面中皆無法與其中的遮罩連結，則判斷該人物序列已離開場景而停止追蹤。人物序列更新模組420亦負責人物序列的初始化，其方法是先選擇由人物偵測單元300所選取之高人物可能性遮罩，若是其無法與現有序列連結，且其人物可能性高於鄰近的其他遮罩，則以其初始化新的人物序列，使用者亦可在場景設定階段選擇場景的部分區域為場景入口區域，並限制於此類區域初始化人物序列。特定衣著辨識模組430的目的是用來標示辨識特定衣著的人物(如著制服的店員)，辨識階段使用的衣著外觀模型則在系統初始化與設定階段，由人工選擇的具有該特殊衣著的遮罩影像建立。 The person tracking unit 400 includes a mask feature comparison module 410, a person sequence update module 420, and a specific clothing recognition module 430, wherein the mask feature comparison module 410 can be calculated according to the mask image classification module 350 The image feature of the mask image is compared with the mask image of the subsequent image, so that when the comparison matches, the tracking person sequence is included. In detail, the character tracking unit 400 maintains a set of records, including all the character sequences being tracked, and each character sequence includes the movement track of a character from entering the scene to leaving the scene. For each new camera frame, the person detection unit 300 generates and sends to the person tracking unit 400, including a set of detection masks and/or mask images, and the probability scores that these detection masks contain actual people (Masks whose probability scores are lower than a preset threshold are excluded), and the mask feature comparison module 410 compares these masks with the sequence of people being tracked, according to their appearance similarity and location conformity. (The final position of the character sequence and the real world coordinates of the mask) and the possibility that the masked image is a character. Determine the mask with the best degree of conformity with each character sequence. If the degree of conformity reaches a preset threshold Above, the character sequence update module 420 will cover Add a character sequence to the mask position, and update the appearance and position information of the character sequence with the mask’s appearance and position information. For the character sequence that cannot be connected with the mask selected by the human detection unit 300, its position and appearance information will not be updated temporarily, and it will continue to be tracked in a certain number of frames. The mask link among them determines that the character sequence has left the scene and stops tracking. The person sequence update module 420 is also responsible for the initialization of the person sequence. The method is to first select the high person possibility mask selected by the person detection unit 300, if it cannot be connected to the existing sequence, and its person possibility is higher than the neighbor The other masks are used to initialize the new character sequence. The user can also select part of the scene as the scene entrance area during the scene setting stage, and initialize the character sequence in such areas. The purpose of the specific clothing recognition module 430 is to identify the person who recognizes the specific clothing (such as a shop assistant wearing a uniform). The clothing appearance model used in the recognition phase is in the initialization and setting phase of the system, and the mask with the special clothing is manually selected. The mask image is created.

接著參閱第2圖，本案之人物偵測與追蹤方法可以第2圖來概略說明。於步驟S201中，自畫面擷取影像；於步驟S202中，將影像分割為前景區域和背景區域；於步驟S203中，將複數個偵測遮罩套用至前景區域，以篩選出可能為人物(候選人物)之偵測遮罩；於步驟S204中，將可能為人物(候選人物)之偵測遮罩套用至原影像，以擷取出可能為人物(候選人物)之遮罩影像；以及於步驟S205中，使用與可能為人物(候選人物)之偵測遮罩對應的分類器，對可能為人物(候選人物)之遮罩影像進行分類；於步驟S206 中，獲得實為人物之遮罩影像。 Next, referring to Figure 2, the method of human detection and tracking in this case can be outlined in Figure 2. In step S201, an image is captured from the screen; in step S202, the image is divided into a foreground area and a background area; in step S203, a plurality of detection masks are applied to the foreground area to filter out possible people ( In step S204, the detection mask that may be a person (candidate) is applied to the original image to extract the mask image that may be a person (candidate); and in step S204 In S205, a classifier corresponding to the detection mask that may be a person (candidate) is used to classify the masked image that may be a person (candidate); in step S206 In, the masked image that is actually a person is obtained.

於步驟S203中，該複數個偵測遮罩之產生方法包括：將人物模型放置於場景中的不同位置，以計算出該人物模型佔該影像的範圍或該人物模型在該影像的景深，藉此在該畫面中之不同區域內產生具有不同尺寸之複數個偵測遮罩。 In step S203, the method for generating the plurality of detection masks includes: placing the character model in different positions in the scene to calculate the range of the character model in the image or the depth of field of the character model in the image, by This generates a plurality of detection masks with different sizes in different areas in the frame.

於步驟S205中，於分類器的訓練階段時，該分類器針對自該影像的中心至各該偵測遮罩的中心之複數個徑向區域，建立對應各該偵測遮罩的分類器模型，以於分類器的分類階段時，對該可能為人物(候選人物)之遮罩影像使用該可能為人物(候選人物)之偵測遮罩的所在區域選擇對應的分類器，以分類出實為人物之遮罩影像。於步驟S206之後，更包括如後方法：對該可能為人物(候選人物)之遮罩影像計算影像特徵，根據該影像特徵比對後續影像的遮罩影像，以於比對符合時，納入追蹤人物序列。 In step S205, during the training phase of the classifier, the classifier establishes a classifier model corresponding to each detection mask for a plurality of radial regions from the center of the image to the center of each detection mask , In the classification stage of the classifier, the mask image that may be a person (candidate) is used to select the corresponding classifier in the area of the detection mask that may be a person (candidate) to classify the real The mask image of the character. After step S206, the method further includes the following method: calculating the image feature of the mask image that may be a person (candidate), and comparing the mask image of the subsequent image according to the image feature, so as to include the tracking when the comparison matches Character sequence.

因此，本案先利用偵測遮罩對影像的前景區域篩選出可能為人物(候選人物)之偵測遮罩，以自原影像中擷取出可能為人物(候選人物)之遮罩影像，再利用分類器對該些遮罩影像進行評分，以分類出實為人物之遮罩影像。本案的偵測遮罩可依據場景資訊等來預先計算其尺寸，而本案之分類器亦預先訓練，並依據偵測遮罩與影像中心距離，選取對應分類器，藉此產生偵測遮罩可能為人物之分數。 Therefore, this case first uses the detection mask to filter out the detection masks that may be people (candidates) in the foreground area of the image, so as to extract the masked images that may be people (candidates) from the original image, and then use The classifier scores the masked images to classify the masked images that are actually people. The size of the detection mask in this case can be pre-calculated based on scene information, etc., and the classifier in this case is also pre-trained, and the corresponding classifier is selected based on the distance between the detection mask and the image center, thereby generating the possibility of detection mask Is the score of the character.

繼續參閱第3圖，其為本案人物偵測與追蹤之方法的具體實施例之方塊示意圖，以下配合第4~10圖的示意圖以及第11A~11D的實際圖來說明。 Continue to refer to Figure 3, which is a block diagram of a specific embodiment of the method for human detection and tracking in this case. The following diagrams are matched with the diagrams in Figures 4 to 10 And the actual diagrams 11A~11D.

方塊501~502表示：如第4圖所示的魚眼攝影機100，以垂直向下之角度拍攝下方之場景，場景影像中有兩個固定物件E和F(如商店之貨架)與三個人物A、B和C。方塊503表示：在初始設定階段，操作者先使用場景設定模組設定攝影機之參數及人物實際大小之參數，此外，操作者還可根據畫面影像1，如第5或11A圖所示，使用場景設定模組設定場景範圍、場景中固定物件之位置與大小資訊，如第11B圖所示，之後由遮罩產生模組自動計算場景之深度資訊，如第11C圖所示。 Boxes 501~502 indicate: The fisheye camera 100 shown in Figure 4 shoots the scene below from a vertical downward angle. There are two fixed objects E and F (such as a store shelf) and three people in the scene image. A, B, and C. Block 503 indicates: in the initial setting stage, the operator first uses the scene setting module to set the parameters of the camera and the actual size of the person. In addition, the operator can also use the scene according to the screen image 1, as shown in Figure 5 or 11A. The setting module sets the scene range, the position and size information of fixed objects in the scene, as shown in Figure 11B, and then the mask generation module automatically calculates the depth information of the scene, as shown in Figure 11C.

方塊504表示：遮罩產生模組自動計算場景不同位置之人物在影像中之長寬，並產生用在影像中的偵測遮罩。第7A圖顯示場景中對應三個人物A、B、C位置的偵測遮罩在原影像中的位置與範圍；實際使用之遮罩形狀(如圖所示之虛線橢圓)、數量或密度可由操作者設定。遮罩產生模組亦對每個偵測遮罩計算人物在影像中的深度資訊，並且與場景物件之深度資訊比較，藉此決定每個偵測遮罩的實際可見區域，如第7B圖所示。 Block 504 indicates that the mask generation module automatically calculates the length and width of people in different positions of the scene in the image, and generates a detection mask used in the image. Figure 7A shows the position and range of the detection mask corresponding to the positions of the three characters A, B, and C in the original image; the actual mask shape (the dotted ellipse as shown in the figure), the number or density can be operated者 Settings. The mask generation module also calculates the depth information of the person in the image for each detection mask, and compares it with the depth information of the scene object to determine the actual visible area of each detection mask, as shown in Figure 7B Show.

方塊505~507表示：影像取像模組擷取魚眼攝影機所拍攝之畫面，傳送至前背景區域分割模組，以自動建構與更新背景影像模型。前背景區域分割模組可計算畫面影像1與背景影像模型之差異，產生一灰階或二元影像以標示畫面影像1中之前景區域11，如第6圖所示。 Blocks 505~507 indicate that the image capturing module captures the frame captured by the fisheye camera and sends it to the front background area segmentation module to automatically construct and update the background image model. The front background area segmentation module can calculate the difference between the screen image 1 and the background image model, and generate a grayscale or binary image to mark the foreground area 11 in the screen image 1, as shown in FIG.

方塊508表示：將所有偵測遮罩套用至此前景區域，計算偵測遮罩內之前景區域比例，亦即如第8圖所示，在畫面影像1中，偵測遮罩12與前景區域11合併，接著篩選出前景比例達到一臨界值之偵測遮罩，如第9圖所示。又如第11D圖所示為實際畫面的影像套用數個偵測遮罩。接著，將通過篩選之偵測遮罩再套用至原影像，產生轉正之遮罩影像。 Box 508 means: apply all detection masks to this foreground area, Calculate the ratio of the foreground area in the detection mask, that is, as shown in Figure 8, in the screen image 1, the detection mask 12 is merged with the foreground area 11, and then the detection mask whose foreground ratio reaches a critical value is filtered out Cover, as shown in Figure 9. As shown in Figure 11D, several detection masks are applied to the image of the actual screen. Then, apply the filtered detection mask to the original image to generate a corrected mask image.

方塊509、510和512表示：遮罩影像分類模組計算通過篩選之偵測遮罩的遮罩影像之影像特徵，再使用已訓練完成之分類器，對各遮罩影像產生實為人物之可能性的評分，以此評分用於分辨該遮罩影像是否可能為人物(候選人物)或是非人物的其他前景物件，並排除此可能性分數低於一預設臨界值的遮罩影像。具體而言，將魚眼攝影機的畫面之影像分為數個徑向區域(依照偵測遮罩中心與影像中心之距離相對於影像半徑的比例分段，如第10圖的虛線所示，其中方塊可能為背景)，各建立不同的分類器模型來對應人物在不同徑向距離的外觀差異，而在實際分類階段使用各偵測遮罩所在區域之分類器模型對遮罩影像進行分類。此外，方塊511表示：特定衣著辨識模組可將人物序列的外觀特徵與操作者設定的特定衣著特徵比對，若符合程度高於一臨界值，則標示該序列為具特定衣著的人物(如著制服的店員)。 Blocks 509, 510, and 512 indicate that: the mask image classification module calculates the image features of the mask images that have passed the screening detection mask, and then uses the trained classifier to generate real human possibilities for each mask image A sexual score, which is used to distinguish whether the mask image is likely to be a person (candidate) or other foreground objects other than a person, and to exclude the mask image whose probability score is lower than a predetermined threshold. Specifically, the image of the fisheye camera screen is divided into several radial regions (according to the ratio of the distance between the center of the detection mask and the image center to the image radius, as shown by the dotted line in Figure 10, where the square It may be the background), each establishes different classifier models to correspond to the appearance differences of the characters at different radial distances, and in the actual classification stage uses the classifier model of the area where each detection mask is located to classify the mask image. In addition, block 511 indicates that the specific clothing recognition module can compare the appearance features of the character sequence with the specific clothing features set by the operator, and if the degree of conformity is higher than a threshold, it will mark the sequence as a character with specific clothing (such as Store clerk in uniform).

方塊513表示：遮罩特徵比對模組對保留之遮罩影像計算其用於追蹤的特徵值，包括外觀(顏色、紋理等)與其在真實世界中之位置，並以這些特徵與追蹤中的人物序列之特徵進行比對，計算其相似度。 Block 513 indicates: the mask feature comparison module calculates the feature values for tracking of the retained mask image, including appearance (color, texture, etc.) and its position in the real world, and uses these features to compare with the tracking data. Characters Compare the listed features and calculate their similarity.

方塊514~方塊519表示：對每個追蹤中的人物序列，選擇有最佳符合程度的遮罩，若此符合程度達到一預設之臨界值以上，則由人物序列更新模組將該偵測遮罩的位置加入該人物序列(延伸其軌跡紀錄)，並且以偵測遮罩之外觀與位置資訊更新該人物序列之外觀與位置資訊。對於無法與選取之偵測遮罩連結的人物序列，則暫不更新其位置與外觀資訊，而在一定數量的畫面中對其持續進行追蹤，若是在這些畫面中皆無法與其中的偵測遮罩連結，則判斷該人物序列已離開場景而停止追蹤。對於無法與現有人物序列連結之偵測遮罩，若是其人物可能性分數達到一臨界值且高於鄰近的其他偵測遮罩，則以其初始化新的人物序列。 Blocks 514 to 519 indicate that: for each person sequence in the tracking, select the mask with the best degree of conformity. If the degree of conformity exceeds a preset threshold, the person sequence update module will detect this The position of the mask is added to the character sequence (extending its track record), and the appearance and position information of the character sequence are updated with the appearance and position information of the detection mask. For the human sequence that cannot be linked to the selected detection mask, its position and appearance information will not be updated temporarily, and it will be tracked continuously in a certain number of frames. If none of these screens can be matched with the detection mask. If the cover is connected, it is determined that the character sequence has left the scene and the tracking is stopped. For the detection mask that cannot be connected to the existing human sequence, if the human probability score reaches a threshold and is higher than other adjacent detection masks, then the new human sequence is initialized.

綜上所述，本案之物偵測與追蹤系統及方法大致可分為偵測遮罩之預先計算及產生、遮罩影像可能為人物之可能性(或評分)計算、及將相鄰畫面選取的偵測遮罩串聯為人物序列之人物追蹤。 In summary, the object detection and tracking system and method in this case can be roughly divided into the pre-calculation and generation of the detection mask, the calculation of the possibility (or score) that the masked image may be a person, and the selection of adjacent screens The detection masks are connected in series for the person tracking of the person sequence.

首先，由操作者先設定攝影機之參數、場景範圍、場景中固定物件之位置與大小資訊、以及人物大小之參數，接著將三維的人物模型置於場景中的不同位置，然後透過攝影機的成像機制，計算出人物模型在影像中的範圍，以此得到人物遮罩在影像中不同位置的長寬；計算人物模型與場景物件在影像各像素位置的相對深度，得到人物遮罩區域中實際可見(未被場景物件遮蔽)的範圍。以上步驟將產生一系列人物偵測遮罩，各遮罩之資訊包含其在原影像中的範圍、遮罩影像中各像素在原影像中之座標、遮罩影像中的實際可見區域、及遮罩在真實世界中之座標位置。此方法可以在之後的人物偵測階段，不須任何額外的判斷，即可有效率的處理場景物件對人物部分遮蔽的問題。 First, the operator sets the parameters of the camera, the scope of the scene, the position and size information of the fixed objects in the scene, and the parameters of the character size, and then places the three-dimensional character model at different positions in the scene, and then uses the camera's imaging mechanism Calculate the range of the character model in the image to obtain the length and width of the character mask at different positions in the image; calculate the relative depth of the character model and the scene object at each pixel position of the image to obtain the actual visible area of the character mask ( The range not covered by scene objects). Above steps A series of people detection masks will be generated. The information of each mask includes its range in the original image, the coordinates of each pixel in the mask image in the original image, the actual visible area in the mask image, and the mask in the real world Coordinate position in. This method can effectively deal with the problem of partial occlusion of people by scene objects without any additional judgment in the subsequent human detection stage.

其次，對於由魚眼攝影機擷取之畫面，先進行前背景區域分割，以更新背景模型並計算畫面中之前景區域，然後將預先計算之人物偵測遮罩，套用至影像中之前景區域，計算各遮罩範圍所包含之前景區域比例，並且篩選對於前景比例達到一給定臨界值以上之遮罩，再次套用至原影像中，產生已轉正之遮罩區域影像。選取之遮罩影像先計算其影像特徵，再使用已訓練完成之人物分類器，對各遮罩影像產生其為人物之可能性的評分。此分類器會根據遮罩在原影像中所在之徑向區域(依照遮罩中心與影像中心之距離相對於魚眼攝影機影像半徑的比例分段)，使用針對該區域人物影像特徵的分類器模型，因此可以解決魚眼攝影機影像中人物外形隨位置變異的問題。 Secondly, for the frame captured by the fisheye camera, the front background area is segmented to update the background model and calculate the foreground area in the screen, and then apply the pre-calculated human detection mask to the foreground area in the image. Calculate the ratio of the foreground area included in each mask range, filter the mask whose foreground ratio reaches a given threshold or more, and apply it to the original image again to generate an image of the mask area that has been normalized. The selected mask image is first calculated for its image characteristics, and then the trained person classifier is used to generate a score for the likelihood of each mask image being a person. This classifier will use a classifier model based on the image features of people in this region according to the radial region where the mask is located in the original image (the distance between the center of the mask and the center of the image relative to the radius of the fisheye camera image). Therefore, it can solve the problem that the shape of the person in the fisheye camera image varies with the position.

最後，每個人物序列包含一個人物從進入場景到離開場景期間的活動軌跡。對於每個新的攝影機畫面所選取可能為人物的遮罩(包含其實際為人物的可能性分數)，與追蹤中的人物序列進行比對，依照其外觀相似度、位置符合程度、以及遮罩影像為人物的可能性，判斷與每個人物序列有最佳符合程度的遮罩，若此符合程度達到預設之臨界值以上，則將該遮罩位置加入該人物序列，並且以遮罩之外觀與位置資訊更新該人物序列之外觀與位置資訊。對於無法與任何所選取之遮罩連結的人物序列，則暫不更新其位置與外觀資訊，而在一定數量的畫面中對其持續進行追蹤，若是在這些畫面中皆無法與其中的遮罩連結，則判斷該人物序列已離開場景而停止追蹤。對於人物序列的初始化，方法是先選擇高人物可能性的遮罩，若是其無法與現有序列連結，且其人物可能性高於鄰近的其他遮罩，則以其初始化新的人物序列，使用者亦可在場景設定階段選擇場景的部分區域為場景入口區域，並限制於此類區域初始化人物序列。 Finally, each character sequence contains a character's activity trajectory from entering the scene to leaving the scene. For each new camera screen, the selected mask that may be a person (including the possibility score of the person actually being a person) is compared with the sequence of the person being tracked, according to their appearance similarity, position coincidence, and mask The possibility that the image is a person, determine the mask with the best degree of conformity with each person sequence, if the degree of conformity exceeds the preset threshold, the mask position is added to the person sequence, and the mask is The appearance and location information update the appearance and location information of the character sequence. For the character sequence that cannot be linked to any selected mask, its position and appearance information will not be updated temporarily, and it will be tracked continuously in a certain number of screens. If it is not connected to the mask in these screens , It is judged that the character sequence has left the scene and stop tracking. For the initialization of the character sequence, the method is to first select the mask of high character possibility. If it cannot be connected with the existing sequence and its character possibility is higher than other adjacent masks, then use it to initialize the new character sequence. It is also possible to select some areas of the scene as the scene entrance area during the scene setting stage, and to limit the initial character sequence to such areas.

上述實施例僅例示性說明本案之功效，而非用於限制本案，任何熟習此項技藝之人士均可在不違背本案之精神及範疇下對上述該些實施態樣進行修飾與改變。因此本案之權利保護範圍，應如後述之申請專利範圍所列。 The above-mentioned embodiments only illustrate the effects of this case, and are not used to limit the case. Anyone familiar with this technique can modify and change the above-mentioned implementation aspects without departing from the spirit and scope of the case. Therefore, the scope of protection of the rights in this case should be listed in the scope of patent application described later.

220‧‧‧遮罩產生模組 220‧‧‧Mask generation module

330‧‧‧遮罩篩選模組 330‧‧‧Mask filter module

340‧‧‧遮罩影像擷取模組 340‧‧‧Mask image capture module

Claims

A system for human detection and tracking, used in the picture provided by a fisheye camera. The system includes: a front background segmentation module that divides the image from the picture into a foreground area and a background area; To generate a plurality of detection masks of different sizes in different areas of the screen; the mask filtering module applies the plurality of detection masks to the foreground area to filter out candidate detection masks Mask; mask image capture module, apply the detection mask of the candidate object to the image to capture the mask image of the candidate object; and mask image classification module, use the detection mask of the candidate object The classifier corresponding to the mask classifies the mask image of the candidate object to obtain the mask image that is actually a person.

For example, the system described in item 1 of the scope of patent application further includes a character mask setting unit. The character mask setting unit includes a scene setting module and the mask generation module, wherein the scene setting module is used for setting Camera parameters, scene range, position and size information of fixed objects in the scene, or actual size parameters of characters.

The system described in item 2 of the scope of patent application, wherein the mask generation module generates a character model according to the set actual size parameters of the character, so as to place the character model in different positions in the scene to calculate the character The model occupies the scope of the image, or according to the parameters of the set camera, the scope of the scene, the position and size of fixed objects in the scene, or the characters The actual size parameter is used to calculate the depth of field of the character model in the image, thereby generating a plurality of detection masks with different sizes in different areas in the frame.

For example, the system described in item 1 of the scope of patent application further includes a person detection unit, which includes an image capture module, the front background segmentation module, the mask filter module, and the mask image capture A module and the mask image classification module are obtained, wherein the mask image classification module further calculates image features of the mask image of the candidate object.

For the system described in item 4 of the scope of patent application, in the training phase, the classifier creates a corresponding detection mask for a plurality of radial regions from the center of the image to the center of each detection mask The masked classifier model, in the classification stage, selects a corresponding classifier for the masked image of the candidate object using the area where the candidate object's detection mask is located to classify the masked image that is actually a person.

For example, the system described in item 4 of the scope of patent application further includes a person tracking unit, which includes a mask feature comparison module, a person sequence update module, and a specific clothing recognition module, wherein the mask feature The comparison module compares the mask images of the subsequent images according to the image features calculated by the mask image classification module, so that when the comparison matches, the tracking person sequence is included.

A method for human detection and tracking used in a picture provided by a fisheye camera includes: capturing an image from the picture to divide the image into a foreground area and a background area; Apply a plurality of detection masks to the foreground area to filter out the detection mask of the candidate object; apply the detection mask of the candidate object to the image to capture the mask image of the candidate object; and use The classifier corresponding to the detection mask of the candidate object classifies the mask image of the candidate object to obtain a mask image that is actually a person.

Such as the method described in item 7 of the scope of patent application, wherein the method for generating the plurality of detection masks includes: generating a character model according to the set actual size parameters of the character to place the character model in different positions in the scene, To calculate the range of the character model occupying the image, or to calculate the depth of field of the character model in the image based on the set camera parameters, scene range, position and size information of fixed objects in the scene, or actual size parameters of the character, In this way, a plurality of detection masks with different sizes are generated in different areas in the frame.

Such as the method described in item 7 of the scope of patent application, wherein, in the training phase, the classifier creates a corresponding detection mask for a plurality of radial regions from the center of the image to the center of each detection mask The masked classifier model, in the classification stage, selects a corresponding classifier for the masked image of the candidate object using the area where the candidate object's detection mask is located to classify the masked image that is actually a person.

For example, the method described in item 7 of the scope of patent application further includes calculating image features of the mask image of the candidate object, and comparing the mask images of subsequent images according to the image features, so that when the comparison matches, the tracking person sequence is included .