TW202143167A

TW202143167A - System and method for visitor interest extent analysis

Info

Publication number: TW202143167A
Application number: TW110116950A
Authority: TW
Inventors: 黃基雲; 邱德正
Original assignee: 普安科技股份有限公司
Priority date: 2020-05-12
Filing date: 2021-05-11
Publication date: 2021-11-16
Also published as: TWI802881B

Abstract

A system for a visitor’s interest extent analysis and a method thereof is disclosed. The system and method is applied to analyze image data to calculate out the targets that a visitor may be looking at. With the positions of objects in the environment, it is inferred that what the visitor is interested in and the degree of interest. The system and method recognizes, through the deep learning according to an algorithm under the convolutional neural network (CNN) architecture, the head direction of the visitor in the image data. Further, by means of a virtual light source placed behind the head of the visitor, the gazing area where the visitor may be looking at is accordingly obtained. When the position of a target is located in a sub-area representing a certain probability within the gazing area, it is determined that the visitor has a degree of interest on the target, and it also means that the visitor may be looking at the target.

Description

System and method for analyzing the degree of visitor's interest

本發明係揭露一種判斷(Estimation)影像資料中人們/顧客所注視的標的為何種物體的方法。特別是一種基於卷積神經網路(Convolutional Neural Network, CNN)架構下的影像資料的偵測方法，該方法不需透過2個影像鏡頭(Camera Lens)所記錄的影像資料去追蹤其中人臉與眼球方向，僅透過單1個影像鏡頭所記錄的影像資料，就能達到辨識影像資料中人們所注視標的為何種物體的方法。The present invention discloses a method for judging (Estimation) what kind of object the people/customers are looking at in the image data. In particular, it is a detection method based on image data under the architecture of Convolutional Neural Network (CNN). This method does not need to use the image data recorded by two camera lenses to track the face and The direction of the eyeball, only through the image data recorded by a single image lens, can achieve a method of identifying what kind of object people are looking at in the image data.

請參考圖1A，繪示先前技術所實施的一種注視標的追蹤(Gaze Tracking)判定方法。圖1A中顯示環境中有一人注視(Gaze)眼前的一物體(Object) 50。又，在此環境的兩個不同位置(Location)處分別架設有攝影機，稱為Camera 1 (Cam 1，虛線圓圈處) 60與Camera 2 (Cam 2，虛線圓圈處) 70，此兩支攝影機對進入環境的人及環境內的物體50進行攝影；其中，在某個時間點該兩支攝影機Cam 1 60、Cam 2 70所擷取到的人臉影像如圖1B所示。該兩支攝影機Cam 1 60、Cam 2 70根據擷取到的人臉影像上的資訊計算臉部中心線(Facial Central Line，圖未示)、眼睛內眼球的方向(圖未示)，以求得此人注視視線(Gazing Line) 40的方向，依此推估這個人注視的標的是否為物體50。Please refer to FIG. 1A, which illustrates a Gaze Tracking determination method implemented in the prior art. FIG. 1A shows that there is a person watching (Gaze) an object (Object) 50 in front of his eyes in the environment. In addition, cameras are set up at two different locations (Location) in this environment, which are called Camera 1 (Cam 1, the dotted circle) 60 and Camera 2 (Cam 2, the dotted circle) 70. These two cameras are paired with each other. People entering the environment and objects 50 in the environment are photographed; among them, the face images captured by the two cameras Cam 1 60 and Cam 2 70 at a certain point in time are shown in FIG. 1B. The two cameras Cam 1 60 and Cam 2 70 calculate the center line of the face (Facial Central Line, not shown) and the direction of the eyeballs in the eyes (not shown) based on the information on the captured face images. The direction of the person's gaze (Gazing Line) 40 is obtained, and then it is estimated whether the target of the person's gaze is the object 50.

圖1A中的先前技術為求得影像資料中人物注視視線40及該人物所注視的物體50必須要以下條件配合：In the prior art in FIG. 1A, the following conditions must be met in order to obtain the person's gaze line 40 and the object 50 that the person is looking at in the image data:

(a) 一般的狀況下必須使用2台Camera才能實施此一先前技術，若是環境中只設置一台Camera則無法達成。在另一種狀況下，若是使用3D Camera時，因一台3D Camera會配置2個或2個以上的鏡頭(Lens)，如此亦可達成以上設置2台一般Camera的效果。不過，無論是使用2台一般Camera或是一台3D Camera，都需要較高的Camera成本來達成此一先前技術。(a) Under normal circumstances, two cameras must be used to implement this prior art. If only one camera is installed in the environment, it cannot be achieved. In another situation, if a 3D Camera is used, because one 3D Camera will be equipped with two or more lenses (Lens), the above effect of setting two general cameras can also be achieved. However, whether it is to use two general cameras or one 3D camera, a higher camera cost is required to achieve this prior art.

(b) 因為此一先前技術需要辨識人臉上佔比不大的眼睛以及眼睛內眼球的方向，因此須要影像資料的「解析度」高過一定的門檻，才得以達到辨識人物的「臉部中心線」與「眼球的偏移角度」的目的。此外，若是影像資料中人物的臉部被遮住或是因為影像解析度不夠的緣故，當影像資料內人物的「臉部中心線」、「眼球的偏移角度」其中之一無法得知時，則無法計算出人物所注視的標的為何。(b) Because this prior art needs to recognize the eyes that are not a large part of the human face and the direction of the eyeballs in the eyes, the "resolution" of the image data needs to be higher than a certain threshold to achieve the recognition of the "face" The purpose of "center line" and "eyeball deviation angle". In addition, if the face of the person in the image data is hidden or because the image resolution is not enough, when one of the person’s "face center line" and "eyeball deviation angle" in the image data is not known , It is impossible to calculate what the character is looking at.

(c) 需要較大的影像資料儲存空間，用於儲存解析度較高的影像資料。因為此一先前技術需要較高解析度的影像資料，如此每一影像資料的大小會增加，同樣的資料量需要較多的儲存空間，因而增加儲存空間的成本。(c) A larger storage space for image data is required to store image data with higher resolution. Because this prior art requires higher-resolution image data, the size of each image data will increase, and the same amount of data requires more storage space, thus increasing the cost of storage space.

此外，使用此一先前技術在辨識影像資料時由於計算複雜，因此需要花費較多的時間，或是以較高的硬體成本來節省時間。換句話說，使用較低成本的硬體則花費時間長，效能差；若需縮短時間、增加效能，則須投入較高的硬體成本。In addition, due to the complicated calculations when using this prior art to identify image data, it takes more time or saves time with higher hardware costs. In other words, using lower-cost hardware will take a long time and poor performance; if you need to shorten the time and increase performance, you must invest in higher hardware costs.

由上述的說明可知圖1A與圖1B代表之先前技術的缺點。故，本案發明提出一種解決上述先前技術中缺點的技術方法，除了可讓使用到的硬體成本降低外，同時也增加系統整體效能。From the above description, the shortcomings of the prior art represented by FIGS. 1A and 1B can be known. Therefore, the present invention proposes a technical method to solve the above-mentioned shortcomings of the prior art, which not only reduces the cost of the hardware used, but also increases the overall performance of the system.

本發明之一目的係提供一種一種訪客興趣程度分析系統，係用於分析一訪客對至少一物品的興趣程度，該系統包括：至少一影像擷取裝置，設置於一地點，用以擷取該地點的一影像資料，該影像資料中記錄有該訪客的一第一頭部影像；以及一影像分析伺服器，連接於該至少一影像擷取裝置，用以計算並分析來自該至少一影像擷取裝置的該影像資料，該影像分析伺服器更包含：一資料處理中心，用於執行一影像分析應用程式，可以依據自該第一頭部影像中所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向，並依循該第一頭部方向而計算出一第一投影區域；以及一記憶單元，用來暫存該影像資料、該第一頭部影像、該第一特徵對映(feature mapping)、以及其它該資料處理中心在運作過程中所需或產出的相關資料；其中，該第一投影區域的計算係利用一虛擬光源，該虛擬光源置於該訪客的頭部位置的後方且與該第一頭部方向一致的方向投射光線而形成模擬的該第一投影區域，當該第一投影區域涵蓋該至少一物品的位置時，即表示該訪客對該至少一物品有一興趣程度。An object of the present invention is to provide a visitor interest level analysis system for analyzing a visitor’s interest level in at least one item. The system includes: at least one image capturing device set in a place for capturing the An image data of a location, in which a first head image of the visitor is recorded; and an image analysis server, connected to the at least one image capture device, for calculating and analyzing from the at least one image capture Taking the image data of the device, the image analysis server further includes: a data processing center for executing an image analysis application program, which can be based on a first feature mapping (feature mapping) obtained from the first head image ) To determine a first head direction corresponding to the first head image, and calculate a first projection area according to the first head direction; and a memory unit for temporarily storing the image data, the The first head image, the first feature mapping (feature mapping), and other related data required or produced during the operation of the data processing center; wherein the calculation of the first projection area uses a virtual light source , The virtual light source is placed behind the head position of the visitor and casts light in a direction consistent with the first head direction to form the simulated first projection area, when the first projection area covers the position of the at least one object Time, it means that the visitor has a degree of interest in the at least one item.

本發明之另一目的係提供一種訪客興趣程度分析系統，連接於至少一影像擷取模組且接收來自該至少一影像擷取模組的一影像資料，係用於分析一訪客對至少一標的的興趣程度，該系統包括：一資料處理中心，執行一影像分析應用程式，可以依據自該影像資料中一第一頭部影像所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向，並依循該第一頭部方向而計算出一第一投影區域；以及一記憶單元，用來暫存該影像資料、該第一頭部影像、該第一特徵對映(feature mapping)、以及其它該資料處理中心在運作過程中所需或產出的相關資料；其中，該第一投影區域的計算係利用一虛擬光源，該虛擬光源置於該訪客的頭部位置的後方且與該第一頭部方向一致的方向投射光線而形成模擬的該第一投影區域，當該第一投影區域涵蓋該至少一標的的位置時，即表示該訪客對該至少一標的有一興趣程度。Another object of the present invention is to provide a visitor interest level analysis system, which is connected to at least one image capture module and receives an image data from the at least one image capture module, which is used to analyze a visitor to at least one target The system includes: a data processing center that executes an image analysis application program, and can determine the first feature mapping based on a first feature mapping obtained from a first head image in the image data A first head direction corresponding to the head image, and a first projection area is calculated according to the first head direction; and a memory unit for temporarily storing the image data, the first head image, The first feature mapping (feature mapping) and other related data required or produced during the operation of the data processing center; wherein, the calculation of the first projection area uses a virtual light source which is placed on The first projection area is formed by projecting light rays behind the visitor's head position in a direction consistent with the first head direction. When the first projection area covers the at least one target position, it means the visitor There is a degree of interest in the at least one subject.

本發明之更一目的係提供一種分析訪客興趣程度的方法，該方法係由一影像分析伺服器所執行，用以判斷一訪客對至少一標的的興趣程度，該方法包括：提供一影像分析應用程式於該影像分析伺服器中；該影像分析伺服器取得一影像資料；該影像分析伺服器偵測該影像資料中具有一第一頭部特徵的一第一頭部影像；該影像分析伺服器分析該第一頭部影像，並藉由分析所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向；該影像分析伺服器計算該第一頭部影像於一立體空間中的所在位置，並依據該第一頭部影像的所在位置、該第一頭部方向以及利用一虛擬光源計算出模擬的一第一投影區域；以及該影像分析伺服器根據該第一投影區域的涵蓋範圍以及該至少一標的的位置，而判斷對應該第一頭部影像的該訪客是否對該至少一標的有一興趣程度。A further object of the present invention is to provide a method for analyzing the degree of interest of visitors. The method is executed by an image analysis server to determine the degree of interest of a visitor in at least one subject. The method includes: providing an image analysis application The program is in the image analysis server; the image analysis server obtains an image data; the image analysis server detects a first head image with a first head feature in the image data; the image analysis server Analyze the first head image, and determine a first head direction corresponding to the first head image through a first feature mapping obtained by the analysis; the image analysis server calculates the first head direction The location of a head image in a three-dimensional space, and a simulated first projection area is calculated based on the location of the first head image, the first head direction, and a virtual light source; and the image analysis The server determines whether the visitor corresponding to the first head image has a degree of interest in the at least one object according to the coverage of the first projection area and the position of the at least one object.

請參考圖2，依據本發明的一實施例，圖2係一影像資料擷取與分析系統(Image Data Capture and Analysis System, IDCAS) 100，該系統100包括複數個影像擷取裝置(Image Capture Device) 400A至400N、一雲資料儲存單元(Cloud Storage Unit) 150與一影像分析伺服器(Image Analysis Server) 300。其中該複數個影像擷取裝置400A至400N與該雲資料儲存單元150之間係透過網路(network)180或傳輸線(transmission line)互相傳送訊號與資料，而該雲資料儲存單元150與該影像分析伺服器300之間係透過網路(network) 190或傳輸線(transmission line)互相傳送訊號與資料。Please refer to FIG. 2, according to an embodiment of the present invention, FIG. 2 is an image data capture and analysis system (Image Data Capture and Analysis System, IDCAS) 100, the system 100 includes a plurality of image capture devices (Image Capture Device ) 400A to 400N, a Cloud Storage Unit 150 and an Image Analysis Server 300. The plurality of image capturing devices 400A to 400N and the cloud data storage unit 150 transmit signals and data to each other through a network 180 or transmission line, and the cloud data storage unit 150 and the image The analysis servers 300 transmit signals and data to each other through a network 190 or a transmission line.

依據本發明的一實施例，圖2中該複數個影像擷取裝置400A至400N可以是任何形式的一般攝影機或IP攝影機(IP Camera)，例如：可遙控上下仰角、左右旋轉、鏡頭遠近的雲台攝影機(Tripod Head Camera)、圓頂攝影機(Doom Camera)、紅外線攝影機(Infrared Camera)、魚眼攝影機(Fisheye Camera)、3D攝影機(3D Camera)等。該複數個影像擷取裝置400A至400N係設置於不同的地點(Site)，採取持續不間斷攝影的模式、定時攝影的模式、或是移動偵測(motion detection)的模式，擷取各個地點的環境中的影像資料，並透過網路(Network) 180或傳輸線(transmission line)將該些影像資料傳送至雲資料儲存單元150來儲存。該網路180可以是一區域網路(Local Area Network, LAN)、一廣域網路(Wide Area Network, WAN)、一網際網路(Internet)、或是一無線網路(Wireless Network)。該影像擷取裝置400A至400N所擷取的影像資料的格式可以是「動態影像專家組」（Moving Picture Experts Group，即MPEG）制定的影音格式，例如：MPEG-4、MPEG-2…等，或是其他影音格式，例如：音訊影片交錯(Audio Video Interleave, AVI)、真實媒體可變位元速率(Real Media Variable Bitrate, RMVB)…等。該影像資料的規格可以是，例如：解析度(Resolution) 1980X1080畫素(pixel)/30幀(Frame Per Second, FPS)的影像資料，但不受限於此規格的影像資料，也可以是任何畫素/幀數規格的影像資料。上述所指持續不間斷的攝影模式係指，攝影機不分白天晚上整日持續的攝影；上述所指定時的攝影模式係指，設定攝影機在某一或某些時段才會攝影。例如：09:00 – 20:00時段(營業時間)進行攝影，其他時間則不進行攝影；或是11:00 - 14:00以及17:00-21:00兩個時段進行攝影，其他時間則不進行攝影。上述所指移動偵測(motion detection)的攝影模式是指，當攝影機偵測到其環境內出現物體移動時，才會觸發攝影機的攝影功能，攝影的時間可能是一段預設時間或是持續到沒有再偵測到任何物體移動的現象為止。依據本發明的另一實施例，複數個影像擷取裝置400A至400N在將影像資料傳送至雲資料儲存單元150儲存之前，可以先經過一影像處理裝置(圖未示)先行處理部分的影像資料後再傳送至雲資料儲存單元150，然後影像分析伺服器300再由雲資料儲存單元150下載影像資料做進一步處理。因影像資料的處理會佔用影像分析伺服器300的圖形處理器(GPU) 370相當龐大的資源，故此做法可以減少傳送至影像分析伺服器300的影像資料量，也可以減輕影像分析伺服器300的工作負載，使該系統100整體效能增加。依據本發明的一實施例，上述影像處理裝置(圖未示)可以是其他的資料伺服器(Data Server)、個人電腦(PC)、筆記型電腦(Notebook PC)、平版型電腦(Tablet PC)或是另一影像分析伺服器300。According to an embodiment of the present invention, the plurality of image capturing devices 400A to 400N in FIG. 2 can be any form of general cameras or IP cameras, for example: can remotely control up and down elevation angle, left and right rotation, and the distance of the lens. Tripod Head Camera, Doom Camera, Infrared Camera, Fisheye Camera, 3D Camera, etc. The plurality of image capturing devices 400A to 400N are set in different locations (Sites), adopting continuous uninterrupted photography mode, time-lapse photography mode, or motion detection mode to capture the image of each location. The image data in the environment is transmitted to the cloud data storage unit 150 for storage through the network 180 or transmission line. The network 180 may be a local area network (LAN), a wide area network (Wide Area Network, WAN), an Internet (Internet), or a wireless network (Wireless Network). The format of the image data captured by the image capture devices 400A to 400N can be the audio-visual format established by the "Moving Picture Experts Group" (MPEG), such as MPEG-4, MPEG-2, etc., Or other audio and video formats, such as: Audio Video Interleave (AVI), Real Media Variable Bitrate (RMVB)... etc. The specifications of the image data can be, for example, resolution (Resolution) 1980X1080 pixels (pixel)/30 frames (Frame Per Second, FPS) image data, but it is not limited to the image data of this specification, and it can also be any Image data with pixel/frame number specifications. The above-mentioned continuous uninterrupted photography mode refers to the continuous photography of the camera regardless of day and night; the above-specified photography mode refers to the setting of the camera to only shoot at a certain or certain period of time. For example: 09:00 – 20:00 (business hours) for photography and no photography at other times; or 11:00-14:00 and 17:00-21:00 for photography, and other times No photography. The aforementioned motion detection shooting mode means that when the camera detects the movement of objects in its environment, it will trigger the camera's shooting function. The shooting time may be a preset period of time or last until No more movement of objects is detected until now. According to another embodiment of the present invention, the plurality of image capturing devices 400A to 400N may process part of the image data through an image processing device (not shown) before transmitting the image data to the cloud data storage unit 150 for storage. Then it is sent to the cloud data storage unit 150, and then the image analysis server 300 downloads the image data from the cloud data storage unit 150 for further processing. Since the processing of image data will take up considerable resources of the graphics processing unit (GPU) 370 of the image analysis server 300, this approach can reduce the amount of image data sent to the image analysis server 300, and can also reduce the cost of the image analysis server 300. The workload increases the overall performance of the system 100. According to an embodiment of the present invention, the aforementioned image processing device (not shown) may be other data servers (Data Server), personal computers (PC), notebook computers (Notebook PC), and tablet PCs. Or another image analysis server 300.

依據本發明的一實施例，圖2中的雲資料儲存單元150是指一雲提供者(Cloud Provider)所建構的一雲端資料儲存環境，該雲提供者可以是，例如：Google、Amazon、阿里巴巴、DropBox …等提供雲端資料環境的廠商。該雲資料儲存單元150提供充足的資料儲存空間給圖2中的影像擷取裝置400A至400N用以長久儲存影像資料，而影像擷取裝置400A至400N的內部儲存空間則只用於暫存資料。依據本發明的另一個實施例，該雲資料儲存單元150可以省略，圖2中的影像擷取裝置400A至400N亦可以直接將影像資料傳送給影像分析伺服器300來處理。According to an embodiment of the present invention, the cloud data storage unit 150 in FIG. 2 refers to a cloud data storage environment constructed by a cloud provider. The cloud provider may be, for example, Google, Amazon, or Alibaba. Baba, DropBox… and other vendors that provide cloud data environments. The cloud data storage unit 150 provides sufficient data storage space for the image capture devices 400A to 400N in FIG. 2 for long-term storage of image data, while the internal storage space of the image capture devices 400A to 400N is only used for temporary storage of data . According to another embodiment of the present invention, the cloud data storage unit 150 can be omitted, and the image capture devices 400A to 400N in FIG. 2 can also directly send the image data to the image analysis server 300 for processing.

請參考圖2，依據本發明的一實施例，圖2中的影像分析伺服器300係透過一網路190下載影像擷取裝置400A至400N傳送至雲資料儲存單元150的影像資料，然後做一系列的影像分析與處理。該網路190可以是一區域網路(Local Area Network, LAN)、一廣域網路(Wide Area Network, WAN)、一網際網路(Internet)、或是一無線網路(Wireless Network)。影像分析伺服器300進行上述之影像分析與處理之前，須先做深度學習(Deep Learning)，以訓練本身具有可以辨識影像中特定特徵(feature)的能力；其中，該深度學習(Deep Learning)基本上是應用卷積神經網路(Convolutional Neural Network, CNN)的演算法進行。影像分析伺服器300從所連接的一資料庫(Data Base) 450內讀取已定義特定類別的資料集(Data Set)，透過卷積神經網路(Convolutional Neural Network, CNN)的演算法，學習代表該資料集的特徵；例如：從定義「人臉」的資料集(Data Set)中「學習」影像資料中「人臉」的特徵(Feature)。待訓練(Training)/學習(Learning)的階段完成後，影像分析伺服器300便有能力可以辨識其它影像資料是否具有該特定特徵(Feature)。Please refer to FIG. 2, according to an embodiment of the present invention, the image analysis server 300 in FIG. 2 downloads the image data sent to the cloud data storage unit 150 from the image capture devices 400A to 400N through a network 190, and then performs a Series of image analysis and processing. The network 190 may be a local area network (LAN), a wide area network (WAN), an Internet (Internet), or a wireless network (Wireless Network). Before the image analysis server 300 performs the above-mentioned image analysis and processing, it must first perform deep learning to train itself to have the ability to recognize specific features in the image; among them, the deep learning is basically The above is the application of Convolutional Neural Network (Convolutional Neural Network, CNN) algorithm. The image analysis server 300 reads a data set (Data Set) of a defined specific category from a connected database (Data Base) 450, and learns through the algorithm of Convolutional Neural Network (CNN) Represents the feature of the data set; for example: "learning" the feature of the "face" in the image data from the data set (Data Set) that defines the "face". After the training/learning phase is completed, the image analysis server 300 is capable of identifying whether other image data has the specific feature (Feature).

請參考圖3A，圖3A係根據本發明一實施例繪示圖2中的影像分析伺服器300其硬體480的基本架構方塊圖。依據本發明之一實施例，該影像分析伺服器300的硬體480基本架構包括：一中央處理單元(CPU) 310、一唯讀記憶體(ROM) 340、一動態隨機存取記憶體(DRAM) 320、一儲存介面控制器(Storage Interface Controller) 350、一實體儲存裝置陣列(Physical Storage Device Array, PSD Array) 380、一非揮發性記憶體(NVRAM) 330、一網路介面控制器(Network Interface Controller, NIC) 360及一圖形處理單元(GPU) 370。上述各單元間透過一或多個匯流排(Bus) 390來傳遞彼此間的訊息與資料。其中，該中央處理單元(CPU) 310與該圖形處理單元(GPU) 370可以是兩獨立單元、或者是整合在一晶片或一軟/硬體模組之中而形成一資料處理中心；該實體儲存裝置陣列(PSD Array) 380更包含複數個實體儲存裝置(Physical Storage Device, PSD) 385，例如：硬碟(Hard Disk Drive, HDD)、固態硬碟(Solid State Disk, SSD)、或其它可以達到相同儲存功能的實體儲存裝置385。Please refer to FIG. 3A, which is a block diagram of the basic architecture of the hardware 480 of the image analysis server 300 in FIG. 2 according to an embodiment of the present invention. According to an embodiment of the present invention, the basic hardware 480 architecture of the image analysis server 300 includes: a central processing unit (CPU) 310, a read-only memory (ROM) 340, and a dynamic random access memory (DRAM) ) 320, a Storage Interface Controller 350, a Physical Storage Device Array (PSD Array) 380, a Non-Volatile Memory (NVRAM) 330, and a Network Interface Controller (Network Interface Controller) 380 Interface Controller (NIC) 360 and a graphics processing unit (GPU) 370. The above-mentioned units transmit information and data to each other through one or more buses (Bus) 390. Wherein, the central processing unit (CPU) 310 and the graphics processing unit (GPU) 370 may be two independent units, or integrated into a chip or a software/hardware module to form a data processing center; the entity The storage device array (PSD Array) 380 also includes a plurality of physical storage devices (PSD) 385, such as hard disk (Hard Disk Drive, HDD), solid state disk (Solid State Disk, SSD), or other A physical storage device 385 that achieves the same storage function.

圖3A中，該中央處理單元310（或是該中央處理單元(CPU) 310與該圖形處理單元(GPU) 370整合在一起後的該資料處理中心）係為該影像分析伺服器300的一核心單元，其用來執行硬體、作業系統、與應用程式間的資料處理程序。其中該中央處理單元310（或是該資料處理中心）可為一Power PC、一x86或任何架構的CPU。該唯讀記憶體340係用來儲存該影像分析伺服器300開機時的基本輸出入系統(BIOS)及/或其他程式。該動態隨機存取記憶體(DRAM) 320係用做為CPU指令或是各種影像資料的暫存之處，其可用來儲存來自該雲資料儲存單元150的影像資料，以等待該中央處理單元310及/或該圖形處理單元370處理；或是暫時儲存該中央處理單元310及/或該圖形處理單元370已處理好的資料，以待適當時間再儲存至該實體儲存裝置陣列380內、或是將上述已處理好的資料透過網路介面控制器360送出。該圖形處理單元370係該影像分析伺服器300的另一核心單元，基本上係用於處理與圖形相關的影像資料。該圖形處理單元370的硬體係特別為配合處理圖形或影像的需求而設計，因此在處理影像資料上一般比該中央處理單元310快速得多，適合用於處理大量的影像資料。該非揮發性記憶體(NVRAM) 330可以利用快閃記憶體(Flash memory)來實現，其用來儲存輸出入請求(I/O request)的執行狀態的相關資料，以備輸出入請求(I/O request)存取該實體儲存裝置陣列380的相關操作尚未做完前發生不正常電源關閉時，做為檢驗使用。該儲存介面控制器350係為一儲存介面，用以將中央處理單元310及/或該圖形處理單元370執行處理後的資料，儲存至實體儲存裝置陣列380內，或是自實體儲存裝置陣列380內讀取相關的資料至動態隨機存取記憶體320內暫存，等待中央處理單元310及/或該圖形處理單元370處理。該儲存介面控制器350所採用的通訊協定可以是光纖通道(Fiber Channel, FC)、串列附接小型電腦系統介面(Serial Attached SCSI, SAS)、串列先進技術接取(Serial ATA, SATA)、或是任何適用的傳輸協定。該實體儲存裝置陣列380係由多個實體儲存裝置385所組成，用來提供該影像分析伺服器300儲存資料的空間。依據本發明的另一實施例，當該影像分析伺服器300不提供資料的儲存空間時，則該實體儲存裝置陣列380可以省略，而此情況下，資料則更改儲存至該非揮發性記憶體(NVRAM)330內或是儲存至一外接式的儲存裝置，如JBOD。該網路介面控制器360對外連接到一網路，其係將中央處理單元310及/或該圖形處理單元370處理後的資料或是訊息透過網路傳給網路上的其他裝置，或是將該網路上其他裝置的資料傳送至動態隨機存取記憶體320內暫存。依據本發明的另一實施例，當中央處理單元310處理資料的能力足以同時負荷處理外部命令與處理影像資料時，則上述之資料處理中心只有包含中央處理單元310，圖形處理單元370可以省略。In FIG. 3A, the central processing unit 310 (or the data processing center after the central processing unit (CPU) 310 and the graphics processing unit (GPU) 370 are integrated) is a core of the image analysis server 300 The unit is used to execute data processing procedures between hardware, operating systems, and applications. The central processing unit 310 (or the data processing center) can be a Power PC, an x86 or any architecture CPU. The read-only memory 340 is used to store the basic input/output system (BIOS) and/or other programs when the image analysis server 300 is turned on. The dynamic random access memory (DRAM) 320 is used as a temporary storage place for CPU commands or various image data, which can be used to store the image data from the cloud data storage unit 150 for waiting for the central processing unit 310 And/or processed by the graphics processing unit 370; or temporarily store the processed data of the central processing unit 310 and/or the graphics processing unit 370, and then store it in the physical storage device array 380 at an appropriate time, or The above-mentioned processed data is sent out through the network interface controller 360. The graphics processing unit 370 is another core unit of the image analysis server 300, and is basically used to process image data related to graphics. The hardware system of the graphics processing unit 370 is specially designed to meet the requirements of processing graphics or images. Therefore, it is generally much faster than the central processing unit 310 in processing image data, and is suitable for processing a large amount of image data. The non-volatile memory (NVRAM) 330 can be implemented by flash memory (Flash memory), which is used to store data related to the execution status of the I/O request, in preparation for the I/O request (I/O request). O request) When the related operation of accessing the physical storage device array 380 is not completed, it is used for inspection when abnormal power off occurs. The storage interface controller 350 is a storage interface for storing data processed by the central processing unit 310 and/or the graphics processing unit 370 in the physical storage device array 380 or from the physical storage device array 380 Relevant data is read internally and temporarily stored in the dynamic random access memory 320 for processing by the central processing unit 310 and/or the graphics processing unit 370. The communication protocol used by the storage interface controller 350 can be Fibre Channel (FC), serial attached small computer system interface (Serial Attached SCSI, SAS), serial advanced technology access (Serial ATA, SATA) , Or any applicable transmission protocol. The physical storage device array 380 is composed of a plurality of physical storage devices 385 to provide space for the image analysis server 300 to store data. According to another embodiment of the present invention, when the image analysis server 300 does not provide storage space for data, the physical storage device array 380 can be omitted. In this case, the data is modified and stored in the non-volatile memory ( NVRAM) 330 or stored in an external storage device, such as JBOD. The network interface controller 360 is externally connected to a network, which transmits data or messages processed by the central processing unit 310 and/or the graphics processing unit 370 to other devices on the network through the network, or transmits Data from other devices on the network are sent to the dynamic random access memory 320 for temporary storage. According to another embodiment of the present invention, when the data processing capability of the central processing unit 310 is sufficient for processing external commands and processing image data at the same time, the aforementioned data processing center only includes the central processing unit 310, and the graphics processing unit 370 can be omitted.

請參考圖3B，圖3B係根據本發明一實施例繪示圖2中的影像分析伺服器300其軟、硬體架構的關係示意圖。圖3B中該影像分析伺服器300的軟體係架構於硬體480之上，該硬體480組織架構如圖3A中所示。Please refer to FIG. 3B. FIG. 3B is a schematic diagram illustrating the relationship between the software and hardware architecture of the image analysis server 300 in FIG. 2 according to an embodiment of the present invention. The soft architecture of the image analysis server 300 in FIG. 3B is based on the hardware 480, and the organization structure of the hardware 480 is as shown in FIG. 3A.

根據圖3B所示之實施例，在影像分析伺服器300的硬體480與作業系統(Operating System, OS) 475之間有一超控制器(Hypervisor) 470，亦可稱為虛擬機器監視器(Virtual Machine Monitor, VMM)，該超控制器(Hypervisor) 470(或稱為VMM)可以用軟體、韌體或硬體的方式實現。該超控制器470提供一虛擬的作業平台供一或多個作業系統(OS)共用該硬體480的資源，因此該超控制器 470也可以視為圖3B中該作業系統475的「元作業系統」(Pre Operating System)，其主要用來協調及分配硬體480的資源，以供運行於該影像分析伺服器300上的多個作業系統使用。在不中斷各作業系統運轉的情況下，該超控制器470可以自動地調整(增加或減少)任一作業系統可以運用的硬體資源，例如：分配的CPU時間、記憶體空間、網路介面、硬碟儲存空間等硬體資源，使得各作業系統間的工作負載(workload)接近平衡。雖然圖3B中僅繪出一作業系統475，但實際上運行於該超控制器470上的作業系統可以是多個。依據本發明的另一實施例，該超控制器470上可再執行一第二作業系統。該第二作業系統內的程式主要將圖3A中的多個實體儲存裝置385規劃為複數個的資料區塊(Data Block)，並透過RAID (Redundant Array of Independent Disks) 的機制(例如：RAID 5或RAID 6)保護該複數個資料區塊內儲存的資料，避免因其中1或2個實體儲存裝置385的毀損而造成資料遺失。依據本發明的另一實施例，若該影像分析伺服器300僅需要單一個作業系統時，則該超控制器470可以省略。According to the embodiment shown in FIG. 3B, there is a hypervisor (Hypervisor) 470 between the hardware 480 of the image analysis server 300 and the operating system (OS) 475, which can also be referred to as a virtual machine monitor (Virtual Machine Monitor). Machine Monitor, VMM), the Hypervisor 470 (or VMM) can be implemented in software, firmware or hardware. The super controller 470 provides a virtual operating platform for one or more operating systems (OS) to share the resources of the hardware 480. Therefore, the super controller 470 can also be regarded as the "meta-operation of the operating system 475" in FIG. 3B. The "Pre Operating System" is mainly used to coordinate and allocate hardware 480 resources for use by multiple operating systems running on the image analysis server 300. Without interrupting the operation of each operating system, the super controller 470 can automatically adjust (increase or decrease) the hardware resources that any operating system can use, such as: allocated CPU time, memory space, network interface , Hard disk storage space and other hardware resources, making the workload between operating systems close to balance. Although only one operating system 475 is depicted in FIG. 3B, in fact, there may be multiple operating systems running on the super controller 470. According to another embodiment of the present invention, a second operating system can be executed on the super controller 470. The program in the second operating system mainly organizes the multiple physical storage devices 385 in FIG. 3A into a plurality of data blocks (Data Block), and uses a RAID (Redundant Array of Independent Disks) mechanism (for example: RAID 5). Or RAID 6) protect the data stored in the plurality of data blocks to avoid data loss due to damage to one or two of the physical storage devices 385. According to another embodiment of the present invention, if the image analysis server 300 only needs a single operating system, the super controller 470 can be omitted.

圖3B中，該作業系統475可以是一般常見的作業系統，例如：Windows、Linux、Solaris等。該作業系統475提供多工(Multi-Task)、分時(Time Sharing)的作業環境，讓多個應用程式與程序可以同時執行。圖3B中，在該作業系統475上架構有二功能區塊，分別代表一影像分析應用程式460及一雲通道服務(Cloud Gateway Service)應用程式 465。該影像分析應用程式460係為一軟體、或軟硬體兼具的模組，係由圖3A中的中央處理單元(CPU) 310、或圖形處理單元(GPU) 370、或整合在一起後的資料處理中心（對應硬體480）所執行。該影像分析應用程式460用於偵測並分析圖2中該影像擷取裝置400A至400N所得到的影像資料。透過該影像分析應用程式460分析影像資料後，可得到影像資料中擷取的訪客與商品的相關資訊，再配合日期、時間等相關資料，經過適當的計算，最後可得到諸多可供使用者利用的統計資訊並輸出。該雲通道服務應用程式465係作為在作業系統475上執行的相關應用程式，對外與雲資料儲存單元150之間做資料傳輸的一中介服務程式，其接受該相關應用程式的指令對該雲資料儲存單元150進行檔案的存取。依據本發明的另一實施例，當該系統100不需要透過該雲資料儲存單元150來儲存/分析資料時，則該雲通道服務應用程式465可以省略。In FIG. 3B, the operating system 475 may be a common operating system, such as Windows, Linux, Solaris, etc. The operating system 475 provides a multi-task (Multi-Task) and time-sharing (Time Sharing) operating environment, allowing multiple applications and programs to be executed at the same time. In FIG. 3B, there are two functional blocks on the operating system 475, representing an image analysis application 460 and a Cloud Gateway Service application 465, respectively. The image analysis application 460 is a software, or a module with both software and hardware, which is composed of the central processing unit (CPU) 310, or the graphics processing unit (GPU) 370 in FIG. 3A, or the integrated Executed by the data processing center (corresponding to hardware 480). The image analysis application 460 is used to detect and analyze the image data obtained by the image capturing devices 400A to 400N in FIG. 2. After analyzing the image data through the image analysis application 460, the relevant information of visitors and products captured in the image data can be obtained, and with the relevant data such as date and time, after proper calculation, many users can finally be obtained. Statistics information and output. The cloud channel service application 465 is an intermediary service program that performs data transmission between the external and the cloud data storage unit 150 as a related application executed on the operating system 475, and accepts instructions from the related application to the cloud data The storage unit 150 performs file access. According to another embodiment of the present invention, when the system 100 does not need to store/analyze data through the cloud data storage unit 150, the cloud channel service application 465 can be omitted.

依據本發明的另一實施例，圖2中的影像分析伺服器300亦可位於雲資料儲存單元150上；換句話說，影像分析伺服器300可以是由雲資料儲存單元150的供應商所提供的一虛擬機器(Virtual Machine, VM)。執行於該虛擬機器上的作業系統亦由該雲資料儲存單元150的供應商來提供。圖3B中該影像分析應用程式460可於該虛擬機器上的作業系統中執行；也就是說，使用者上傳影像分析應用程式460到雲資料儲存單元150後，透過該虛擬機器來執行該影像分析應用程式460。在這種情況下，雲端中的影像分析伺服器300所使用的Data Base 450可以是雲資料儲存單元150內的一或多個物件檔案(Object File)。如此，影像擷取裝置400A至400N傳送至雲資料儲存單元150的影像資料可以立即由同樣位於雲端中的虛擬機器分析處理，且分析處理後所產生的相關資料也可以儲存於雲資料儲存單元150上，視需要再對外傳送。According to another embodiment of the present invention, the image analysis server 300 in FIG. 2 can also be located on the cloud data storage unit 150; in other words, the image analysis server 300 can be provided by the supplier of the cloud data storage unit 150 One of the virtual machines (Virtual Machine, VM). The operating system running on the virtual machine is also provided by the supplier of the cloud data storage unit 150. The image analysis application 460 in FIG. 3B can be executed in the operating system on the virtual machine; that is, after the user uploads the image analysis application 460 to the cloud data storage unit 150, the image analysis is executed through the virtual machine Application 460. In this case, the Data Base 450 used by the image analysis server 300 in the cloud may be one or more object files in the cloud data storage unit 150. In this way, the image data sent by the image capturing devices 400A to 400N to the cloud data storage unit 150 can be immediately analyzed and processed by the virtual machine also located in the cloud, and the relevant data generated after the analysis and processing can also be stored in the cloud data storage unit 150 , And then send it to other countries as needed.

依據本發明的另一實施例，圖2中的影像分析伺服器300亦可以整合至影像擷取裝置400(即400A至400N)內。在這種情況下，圖3B中的影像分析應用程式460則移至影像擷取裝置400內執行。該影像分析應用程式460更包含了複數個功能模組，如圖4所示之功能模組的全部或其中一部分。依據本發明的一實施例，該影像擷取裝置400同時具有記錄影像資料與對影像資料做分析/處理的功能。影像擷取裝置400在分析/處理影像資料後，所得到的完整或部分相關資訊會儲存在內部的記憶單元內，例如：SD卡(Secure Digital Memory Card)或是內建的快閃記憶體(Flash Memory)。依據本發明的另一實施例，當影像擷取裝置400分析/處理影像資料並得到完整或部分的相關資訊後，會透過網路180傳送分析結果至該雲資料儲存單元150儲存，或是傳送至影像分析伺服器300做更進一步的分析與處理。上述的完整或部分相關資訊可以是，例如：人臉偵測後的影像資料、商品偵測後的影像資料 …等等。According to another embodiment of the present invention, the image analysis server 300 in FIG. 2 can also be integrated into the image capture device 400 (ie, 400A to 400N). In this case, the image analysis application 460 in FIG. 3B is moved to the image capturing device 400 for execution. The image analysis application 460 further includes a plurality of functional modules, all or part of the functional modules as shown in FIG. 4. According to an embodiment of the present invention, the image capturing device 400 has the functions of recording image data and analyzing/processing the image data at the same time. After the image capture device 400 analyzes/processes the image data, the complete or partial related information obtained will be stored in an internal memory unit, such as an SD card (Secure Digital Memory Card) or a built-in flash memory ( Flash Memory). According to another embodiment of the present invention, after the image capture device 400 analyzes/processes the image data and obtains complete or partial related information, it will transmit the analysis result to the cloud data storage unit 150 for storage or transmission via the network 180 Go to the image analysis server 300 for further analysis and processing. The above-mentioned complete or partial related information can be, for example: image data after face detection, image data after product detection... etc.

請參考圖4，圖4係顯示圖3B中的影像分析應用程式460更包含的功能模組。依據本發明一實施例，該影像分析應用程式460包含了以下所有功能模組；依據本發明的另一實施例，該影像分析應用程式460只包含了圖4中的部分功能模組。雖然圖4中的各功能模組係以各自獨立的方式繪示，依據本發明一實施例，在實際實行時，二個或二個以上的功能模組可能整合在一起。圖4中之各功能模組說明如下：Please refer to FIG. 4, which shows the functional modules further included in the image analysis application 460 in FIG. 3B. According to an embodiment of the present invention, the image analysis application 460 includes all the following functional modules; according to another embodiment of the present invention, the image analysis application 460 only includes some of the functional modules in FIG. 4. Although each functional module in FIG. 4 is shown in an independent manner, according to an embodiment of the present invention, in actual implementation, two or more functional modules may be integrated together. The description of each functional module in Figure 4 is as follows:

(A) 人流(People Flow)分析模組405：偵測影像資料中屬於人物特徵的部分、及分析影像資料中人們流動的軌跡，來計算環境中的人數、評估擁擠程度、或分辨訪客在環境中的移動方向。其中一種應用可以是，透過架設在某一商場的一或多部影像擷取裝置400中所記錄的影像資料，統計某一段時間內購物的人數及/或訪客的屬性(例如：年紀、性別、職業…)，其中也包含了不同屬性的訪客在購物時的移動方向。(A) People Flow analysis module 405: Detect the part of the image data belonging to the characteristics of people, and analyze the trajectory of the people flowing in the image data, to calculate the number of people in the environment, evaluate the degree of crowding, or distinguish visitors in the environment The direction of movement in. One of the applications can be to use the image data recorded in one or more image capture devices 400 installed in a shopping mall to count the number of shoppers and/or the attributes of visitors (for example: age, gender, Occupation...), which also includes the movement directions of visitors with different attributes when shopping.

(B) 頭部偵測(Head Detection)模組410：偵測影像資料中具有人的頭部特徵的部分，並分析各個頭部影像的位置與屬性。所謂頭部影像的屬性可以是，例如，藉由分析影像資料中頭部或人臉的特徵(feature)，判斷人的頭部呈現的頭部方向(Head Direction)為何；以上判斷頭部方向(Head Direction)的依據是，根據影像分析伺服器300在訓練階段後所獲得有關頭部方向的「特徵對映(feature mapping)」，或稱為「特徵向量(feature vector)」。此外，頭部偵測模組410亦可進一步判斷頭部或人臉的其他屬性，例如：睜眼或閉眼、表現出的情緒、頭髮顏色、臉部的視覺幾何、或是其他與頭部相關的屬性。頭部影像的屬性資料配合中繼資料標籤(例如：快樂、眼鏡、年齡範圍、會員資料…)，可以幫助使用者快速整理或搜尋或識別人物。頭部偵測模組410可能需要影像資料具有一定的畫素大小、或一定的解析度以上，如此才能將顧客臉/頭部的特徵辨識出來。(B) Head Detection (Head Detection) module 410: Detect the part of the image data that has the characteristics of the human head, and analyze the position and attributes of each head image. The attribute of the so-called head image can be, for example, by analyzing the features of the head or face in the image data to determine the head direction of the person's head; the above determining the head direction ( The head direction is based on the "feature mapping" or "feature vector" obtained by the image analysis server 300 after the training phase of the head direction. In addition, the head detection module 410 can also further determine other attributes of the head or face, such as open or closed eyes, emotions expressed, hair color, visual geometry of the face, or other head related attributes. Attributes. The attribute data of the head image and the metadata tags (for example: happiness, glasses, age range, member information...) can help users quickly sort or search or identify people. The head detection module 410 may require the image data to have a certain pixel size or a certain resolution or higher, so that the characteristics of the customer's face/head can be identified.

(C) 訪客身份(Visitor’s ID)辨識模組415：將捕捉到的訪客的臉部特徵與資料庫中已有的訪客資料(臉部特徵)做比對，以辨識訪客的身分(例如：姓名或代號等)，以及進一步得知該身分的相關資料(例如：常購買的物品、付費習慣等)。為捕捉訪客的臉部特徵，可以透過影像攝影機中鏡頭拉近(Room in)的功能，或是即時改變錄影時影像資料的解析度的功能，幫助捕捉行進中的訪客臉部的特徵，並與資料庫中的臉部特徵的資料比對是否有一樣或接近的臉部資料，如此可以了解是否為不同時間點曾進入環境的同一人。依據本發明一實施例，在資料庫中，針對一訪客身份(Visitor’s ID)至少會記錄10張(但不限於此數)各種不同角度的臉部特徵的圖像資料，並利用該些臉部特徵的圖像資料比對/辨識影像擷取裝置400所擷取的影像資料中訪客的臉部特徵。依據本發明的另一實施例，若影像擷取裝置400無法判讀訪客臉部的特徵時，訪客身份辨識模組415會依據影像中訪客走路時身體各部分所展現的獨有特徵來辨識訪客的身份，例如：訪客走路時手腳的擺幅、走路姿勢、頭部/背部/手腳間的長度比例...等特定訪客獨有外觀特徵來辨識該訪客的身份。(C) Visitor's ID identification module 415: compare the captured facial features of the visitor with the existing visitor data (face features) in the database to identify the identity of the visitor (for example: name) Or code, etc.), as well as further information about the identity (for example: frequently purchased items, payment habits, etc.). In order to capture the facial features of visitors, you can use the room in function of the video camera, or the function of changing the resolution of the image data during recording, to help capture the facial features of the traveling visitor, and The facial feature data in the database is compared to see if there are the same or close facial data, so that it can be understood whether it is the same person who has entered the environment at different time points. According to an embodiment of the present invention, for a visitor's ID, at least 10 (but not limited to this number) of facial feature image data of various angles are recorded in the database, and these facial features are used The feature image data compares/recognizes the facial features of the visitor in the image data captured by the image capturing device 400. According to another embodiment of the present invention, if the image capturing device 400 is unable to identify the features of the visitor’s face, the visitor identification module 415 will identify the visitor’s identity based on the unique features displayed by each part of the visitor’s body while walking in the image. Identity, such as: the swing of the visitor's hands and feet when walking, the walking posture, the length ratio between the head/back/hands and feet... and other unique appearance characteristics of the visitor to identify the visitor's identity.

(D) 商品偵測(Product Detection)模組420：偵測影像資料中具有商品特徵的部分，可依據設定將各種或特定商品框顯出來。一般來說，擷取的影像資料的畫素/解析度需要達一定大小以上，商品偵測模組420才能將各種商品的特徵辨識出來，然後依據設定，可以將該商品的圖像做一凸顯(加框)的動作。(D) Product Detection (Product Detection) module 420: Detects the part of the image data with product characteristics, and can display various or specific products in a frame according to the setting. Generally speaking, the pixel/resolution of the captured image data needs to be above a certain size before the product detection module 420 can identify the characteristics of various products, and then according to the settings, the image of the product can be highlighted (Boxed) action.

(E) 注視區域(Gazing Area)分析模組425：分析影像資料中辨識為「人」的圖像所注視的區域範圍與其相對的座標位置為何。該注視區域分析模組425係以判別人的頭部方向(Head Direction)的方式，來決定眼睛所注視的方向為何。模擬一光源，並假設該光源位於影像資料中人的頭部區域的正後方，設定該光源以與頭部方向一致的方向射出光線，並於頭部前方的一虛擬平面上產生光線的投影，如此推測光線投影的區域範圍即為訪客可能注視的區域範圍。透過這樣模擬的方式，可以將投射(或注視)的區域範圍計算出來，同時該注視的區域範圍與其相對於該影像擷取裝置400的距離也可以被換算出來。一影像資料中可能包括一或多個訪客，這些訪客可能 (a) 正在行進中、且持續注視一特定物品，或 (b) 停留於環境中某一處、並注視一特定物品。注視區域分析模組425可以處理呈現在影像資料中上述各種訪客的注視行為，並能計算(推測)各個訪客所注視的區域範圍。配合在環境中各個物品的位置，便能知道座落在訪客注視區域範圍內的物品，而這些(這一)物品極可能是訪客有興趣的商品。(E) Gazing Area (Gazing Area) analysis module 425: Analyze the area of the gazing area of the image identified as "person" in the image data and its relative coordinate position. The gaze area analysis module 425 determines the direction in which the eyes are gazing by judging the head direction of others. Simulate a light source, and assume that the light source is located directly behind the head area of the person in the image data. Set the light source to emit light in the same direction as the head, and project the light on a virtual plane in front of the head. It is inferred that the area of the light projection is the area that the visitor may look at. Through such a simulation method, the projection (or gaze) area can be calculated, and the distance between the gaze area and the image capturing device 400 can also be converted. An image data may include one or more visitors, these visitors may (a) are moving and continue to look at a particular item, or (b) stay in a certain place in the environment and look at a particular item. The gaze area analysis module 425 can process the gaze behaviors of the various visitors presented in the image data, and can calculate (speculate) the area of each visitor's gaze. With the location of each item in the environment, it is possible to know the items that are located within the gaze area of the visitor, and these (this) items are most likely to be of interest to the visitor.

(F) 注視物品(Gazing Object)分析模組430：分析影像資料中的訪客正在看哪一(哪些)物品。注視物品分析模組430為頭部偵測模組410、訪客身份辨識模組415、商品偵測模組420、注視區域分析模組425的其中至少二者的輸出結果的結合，而產生的進一步應用。注視物品分析模組430除了分析影像資料中的訪客正在看什麼物品外，對於一特定訪客對某一物品的注視的次數、每一次注視的時間、訪客的性別、訪客的年紀…等各種資料都會記錄下來。此外，注視物品分析模組430同時也會合併記錄日期、時間、地點、溫度、濕度 …等其它需要的基本資訊。(F) Gazing Object analysis module 430: Analyze which object(s) the visitor is looking at in the image data. The gaze analysis module 430 is a combination of the output results of at least two of the head detection module 410, the visitor identification module 415, the product detection module 420, and the gaze area analysis module 425, resulting in further application. The gaze analysis module 430 not only analyzes what items the visitor is looking at in the image data, but also various data such as the number of gazes of a specific visitor, the time of each gaze, the gender of the visitor, the age of the visitor... write it down. In addition, the gaze analysis module 430 also combines and records other basic information such as date, time, location, temperature, humidity, etc.

(G) 興趣度(Intensity of Interest)分析模組435：透過其他模組提供的各種資料，興趣度分析模組435利用大數據(big data)的分析方式，產生使用者需要的統計資訊。這些統計資訊可以供使用者調整商品擺放位置、改變銷售策略、或是改變廣告/促銷內容，以提高整體的營收與經營績效。舉例來說，分析某一段時間、多個地點(商場)的記錄資料可以彙整出一賣場中「哪一些商品最容易引起顧客注意」、「哪一些商品較不引起顧客興趣」、「特定商品引起顧客注意的年齡層、性別」、「各種商品與天氣、星期、特定顧客層間的連動程度」…等統計資訊。(G) Intensity of Interest analysis module 435: Through various data provided by other modules, the interest analysis module 435 uses big data analysis methods to generate statistical information required by users. These statistical information can be used by users to adjust the placement of products, change sales strategies, or change advertising/promotional content to improve overall revenue and operating performance. For example, analyzing the record data of a certain period of time and multiple locations (shopping malls) can summarize "Which products are most likely to attract customers' attention", "Which products are less interesting to customers", and "Specific products cause Customer’s attention to age group, gender”, “the degree of linkage between various products and weather, day of the week, and specific customer levels”...etc.

(H) 移動路徑(Moving Path)分析模組440：透過單一或是多個的影像擷取裝置400來追蹤特定條件的訪客(特定族群)的移動路徑(軌跡)。藉由分析其移動路徑及停滯時間，以提供訪客行為資訊供使用者參考。使用者可以依據該訪客行為資訊，調整物品/商品的擺設地點或改變物品/商品的擺設方式，以吸引訪客/顧客注意的依據。上述之特定條件的訪客可以由使用者設定，例如是：40-50歲間的女性、10-12歲小學的男學生、70歲以上使用柺杖的男性 …等。(H) Moving Path analysis module 440: Tracking the moving path (trajectory) of visitors (specific groups) under specific conditions through a single or multiple image capturing devices 400. By analyzing its moving path and stagnation time, it can provide visitor behavior information for users to refer to. The user can adjust the location of the item/commodity or change the display method of the item/commodity based on the visitor’s behavior information to attract the attention of the visitor/customer. The visitors with the above-mentioned specific conditions can be set by the user, for example: women between 40-50 years old, male students in elementary school aged 10-12, men over 70 years old who use crutches...etc.

(I) 訪客屬性分析模組445：透過資料庫450內已經預先定義的各種屬性的資料集(Data Set)，讓影像分析伺服器300在訓練階段先進行深度學習。深度學習完成後，訪客屬性分析模組445便專責於判斷影像資料中訪客的各種屬性，例如：性別、年齡、職業 …等。如前所述，本發明基於CNN的架構，透過對各類人物所定義的資料集(Data Set)做深度學習，加上分析訪客的衣著特徵、動作特徵、選購商品的類別等各種資訊來進一步交叉分析比對，使該訪客屬性分析模組445可以提高對訪客屬性推斷的正確性。(I) Visitor attribute analysis module 445: The image analysis server 300 is allowed to perform deep learning in the training phase through the predefined data sets of various attributes in the database 450. After the deep learning is completed, the visitor attribute analysis module 445 is specifically responsible for judging various attributes of the visitor in the image data, such as gender, age, occupation, etc. As mentioned above, the present invention is based on the CNN architecture, through in-depth learning of data sets defined by various characters, plus analysis of visitors’ clothing characteristics, movement characteristics, categories of purchased goods and other information. Further cross-analysis and comparison, so that the visitor attribute analysis module 445 can improve the accuracy of the visitor attribute inference.

(J) 回流客(Returning Customer)分析模組455：透過資料庫450內記錄的歷史訪客影像資料，訪客身份辨識模組415可以辨識目前影像資料中的訪客是否以前曾經造訪過。若是判斷為回流客，則資料庫450中會紀錄有與該回流客相關的歷史統計資訊，例如：歷史購物記錄。上述回流客的歷史統計資訊可以再結合注視物品分析模組430提供的訪客對不同物品的注視次數、注視時間等資訊，則回流客分析模組455可以進一步推斷訪客再次購買某一物品的機率/頻率，供使用者參考以提早作準備，例如：商場可以知道多數顧客有興趣的商品以提早準備充足的貨量。(J) Returning Customer analysis module 455: Through the historical visitor image data recorded in the database 450, the visitor identification module 415 can identify whether the visitor in the current image data has visited before. If it is determined to be a returning customer, the database 450 will record historical statistical information related to the returning customer, for example, historical shopping records. The aforementioned historical statistical information of return visitors can be combined with the information provided by the gaze item analysis module 430 on the number of times the visitor looks at different items, the gaze time, etc., and the return customer analysis module 455 can further infer the probability that the visitor will purchase an item again. Frequency is for users to refer to to make early preparations. For example, shopping malls can know most of the products that customers are interested in to prepare sufficient quantities in advance.

請參考圖5，依據本發明的一實施例，圖5係本發明辨識影像資料中人們及物體的整體應用概念的流程圖。該流程分為兩個階段，先是訓練階段幫助影像分析伺服器300中的影像分析應用程式460做深度學習；之後進入應用階段，影像分析伺服器300中的影像分析應用程式460便可以進行各種關於影像偵測與辨識的實務應用。Please refer to FIG. 5, according to an embodiment of the present invention, FIG. 5 is a flowchart of the overall application concept of identifying people and objects in image data according to the present invention. The process is divided into two stages. The first is the training stage to help the image analysis application 460 in the image analysis server 300 do deep learning; then it enters the application stage, and the image analysis application 460 in the image analysis server 300 can perform various related tasks. Practical application of image detection and recognition.

圖5中之訓練階段包含步驟510與步驟520。The training phase in FIG. 5 includes step 510 and step 520.

在步驟510中，利用資料庫450中已經定義的各種人/物特徵的資料集(Data Set)，幫助影像分析伺服器300做深度學習。依據本發明的一實施例，資料集(Data Set)可以是具有各種人/物特徵且呈現各種不同角度的多張影像資料，例如：具有同一特徵(feature)的一系列相關圖片。本發明中所謂的資料集(Data Set)是指符合同一主題(subject)且具有該主題特徵(subject feature)的多張圖像資料的總稱；其中的主題(subject)可以是，舉例來說，各種不同態樣的男生、女生、老人、小孩、衣服、褲子、帽子、植物、動物、汽車、工具、食物 …等等。同一主題(subject)的圖像資料具有符合該主題的特徵(feature)，且可能呈現在不同「空間中旋轉自由度」下所呈現的態樣。所謂「空間中旋轉自由度」，舉例來說，包含有翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)等3種軸向的維度，或稱為「三維空間中的旋轉自由度」。代表「空間中旋轉自由度」的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個維度的數值會記錄在圖像資料的「標籤(Label)」資料中。此外，該標籤資料中還會記錄定義該影像資料的主題(subject)，例如：男生、女生、老人、小孩、衣服、褲子、帽子、植物、動物、汽車、工具、食物…等等。所以，圖像資料的「標籤(Label)」中記載不同的數值，也表示該圖像資料中的人/物於立體空間中的不同旋轉自由度下所呈現的各種態樣。例如：一資料集(Data Set)中的一圖像資料中顯示「人的頭部向右轉(翻滾) 45、仰角(俯仰) 0、斜角(偏擺) 0」的態樣，其中「人的頭部、向右轉45、仰角0、斜角0」即是記錄於該影像資料中的「標籤(Label)」。影像分析伺服器300利用各種不同主題(subject)的資料集(Data Set)，透過卷積神經網路(CNN)架構做深度學習。經過深度學習的訓練後，影像分析伺服器300本身便可獲得學習主題的「特徵對映(feature mapping)」；如此，影像分析伺服器300便具有能力可以辨識影像擷取裝置400擷取的影像資料中具有哪種/哪些主題(人或物)的特徵。依據本發明的一實施例，上述定義各種人/物特徵的資料集(Data Set)係儲存於與該影像分析伺服器300連接的資料庫(Data Base) 450內，在訓練階段時供影像分析應用程式460利用。依據本發明的一實施例，資料集的圖型檔案可以是符合聯合圖像專家小組(Joint Photographic Experts Group，JPEG)、點陣圖(Bitmap，BMP)、PC畫筆交換(PC Paintbrush Exchange，PCX)、圖像交換格式(Graphics Interchange Format，GIF) …等任何影像標準的圖形檔案。In step 510, the data set (Data Set) of various human/object characteristics defined in the database 450 is used to help the image analysis server 300 to perform deep learning. According to an embodiment of the present invention, a data set may be multiple image data with various human/object characteristics and presenting various angles, for example, a series of related pictures with the same feature. The so-called data set in the present invention refers to the general term of multiple image data that conform to the same subject and have the subject feature; the subject can be, for example, All kinds of boys, girls, old people, children, clothes, pants, hats, plants, animals, cars, tools, food... etc. The image data of the same subject (subject) has features that conform to the subject, and may be presented under different "degrees of freedom of rotation in space". The so-called "degree of freedom of rotation in space", for example, includes three axial dimensions of roll, pitch (pitch), and yaw (Yaw), or it is called "degree of freedom of rotation in three-dimensional space" . The three dimensions of Roll, Pitch, and Yaw representing the "degree of freedom of rotation in space" will be recorded in the "Label" data of the image data. In addition, the tag data will also record the subject that defines the image data, such as boys, girls, old people, children, clothes, pants, hats, plants, animals, cars, tools, food... etc. Therefore, the "Label" of the image data records different values, which also represents the various states of the people/objects in the image data under different rotational degrees of freedom in the three-dimensional space. For example: an image data in a data set (Data Set) shows the state of "human head turns to the right (roll) 45, elevation (pitch) 0, oblique (yaw) 0", where " The head of a person, turning right 45, elevation angle 0, oblique angle 0" is the "Label" recorded in the image data. The image analysis server 300 uses data sets of various subjects to perform deep learning through a convolutional neural network (CNN) architecture. After deep learning training, the image analysis server 300 itself can obtain the "feature mapping" of the learning theme; in this way, the image analysis server 300 has the ability to recognize the image captured by the image capture device 400 What type of theme (person or thing) characteristics are present in the data. According to an embodiment of the present invention, the aforementioned data set (Data Set) defining various human/object characteristics is stored in a database (Data Base) 450 connected to the image analysis server 300 for image analysis during the training phase Application 460 is used. According to an embodiment of the present invention, the graphic files of the data set may conform to Joint Photographic Experts Group (JPEG), Bitmap (BMP), PC Paintbrush Exchange (PCX) , Graphics Interchange Format (GIF)...and other graphics files of any imaging standard.

步驟520中，係在測試步驟510的深度學習成果。經過步驟510中的深度學習過程後，影像分析應用程式460可以藉由所獲得的「特徵對映(feature mapping)」學習到辨識影像資料中各種「人/物」的能力，但此辨識能力的優劣仍須經過測試。根據本發明的一實施例，測試的方式可以是輸入測試影像資料，以驗證影像分析應用程式460辨識「人/物」的能力是否合乎要求。在尚未通過設定的評估標準前，影像分析應用程式460會重複步驟510的深度學習過程，以提高其辨識各種「人/物」的能力，直到可以通過設定的評估標準為止。In step 520, the deep learning result of step 510 is tested. After the deep learning process in step 510, the image analysis application 460 can learn the ability to recognize various "people/objects" in the image data through the obtained "feature mapping", but this recognition ability is The pros and cons still have to be tested. According to an embodiment of the present invention, the testing method may be to input test image data to verify whether the ability of the image analysis application 460 to recognize "people/objects" meets the requirements. Before passing the set evaluation criteria, the image analysis application 460 will repeat the deep learning process of step 510 to improve its ability to recognize various "people/objects" until the set evaluation criteria can be passed.

上述步驟510與步驟520表示影像分析應用程式460正式上線前的前置訓練(訓練階段)。此前置訓練的成果與影像分析應用程式460於實際上線時的辨識能力息息相關。若是訓練階段做得好，表示影像分析應用程式460具有較佳的影像辨識能力，可以輸出品質好與正確的結果。又，步驟510與步驟520執行之後，影像分析應用程式460可以學習到辨識影像資料內各種「人/物」特徵的「特徵對映(feature mapping)」，或稱為「特徵向量(feature vector)」。根據學習到的「特徵對映(feature mapping)」，該影像分析應用程式460於實際上線時，並不需再比對資料庫(Data Base) 450內的資料集(Data Set)，而依照學習到的「特徵對映(feature mapping)」來偵測或辨識影像資料具有何種人或物的特徵。The above steps 510 and 520 represent the pre-training (training phase) before the image analysis application 460 is officially launched. The result of this pre-training is closely related to the recognition ability of the image analysis application 460 when it is actually online. If the training phase is done well, it means that the image analysis application 460 has better image recognition capabilities and can output good quality and correct results. Furthermore, after step 510 and step 520 are executed, the image analysis application 460 can learn to identify the "feature mapping" or "feature vector" of various "person/object" features in the image data. ". According to the learned "feature mapping", the image analysis application 460 does not need to compare the data set in the database (Data Base) 450 when it is actually online. The "feature mapping" is used to detect or identify the characteristics of people or objects in the image data.

圖5中之應用階段包含步驟530至步驟570。The application phase in FIG. 5 includes step 530 to step 570.

步驟530係影像分析應用程式460辨識來自影像擷取裝置400的影像資料。步驟530係為應用階段之一。根據本發明的一實施例，將影像擷取裝置400設置於超市/賣場的環境中，則影像擷取裝置400所擷取的影像資料可能會包含購物顧客的臉/頭與身體的部分、同時也可能會包含貨架上的各種商品。步驟530中，影像分析應用程式460中的頭部偵測模組410與商品偵測模組420，可以偵測影像資料中哪些部份的圖像為購物顧客(含購物顧客所穿的衣服)、哪些部份的圖像為貨架上的商品等。基本上，無論是購物顧客(臉/頭部特徵)或是貨架上商品的圖像的畫素，都須大於一門檻值以上，例如：40X40畫素以上，但不受限於此規格，則購物的顧客(臉/頭部特徵)或是貨架上商品就可以被影像分析應用程式460偵測與辨識出來。上述所稱「40X40畫素」僅為對畫質的門檻值提出說明時的一個範例，這個門檻值可能是其它任意值，例如：50X50畫素、80X80畫素、100X100畫素…等，端視實際應用所需。由此可知，根據本發明技術，一張/幀影像資料上呈現的購物的顧客與貨架上商品，不論數目多少，只要各別的畫質達到門檻值以上，即可以同時被該影像分析應用程式460所辨識出來。Step 530 is that the image analysis application 460 recognizes the image data from the image capture device 400. Step 530 is one of the application phases. According to an embodiment of the present invention, if the image capturing device 400 is installed in a supermarket/sales environment, the image data captured by the image capturing device 400 may include parts of the face/head and body of the shopping customer, and at the same time It may also contain various items on the shelf. In step 530, the head detection module 410 and the product detection module 420 in the image analysis application 460 can detect which part of the image in the image data is a shopping customer (including the clothes worn by the shopping customer) , Which part of the image is the product on the shelf, etc. Basically, whether it is a shopping customer (face/head feature) or the image of the goods on the shelf, the pixels must be greater than a threshold value, for example: 40X40 pixels or more, but not limited to this specification, then Shopping customers (face/head features) or goods on the shelf can be detected and identified by the image analysis application 460. The above-mentioned "40X40 pixels" is only an example when explaining the threshold value of image quality. This threshold value may be any other value, such as: 50X50 pixels, 80X80 pixels, 100X100 pixels... etc., end view Required for practical applications. It can be seen that, according to the technology of the present invention, the shopping customers and the products on the shelf presented on one piece/frame of image data, no matter the number, as long as the respective image quality exceeds the threshold value, can be simultaneously used by the image analysis application 460 identified it.

步驟540中，係將辨識後的各種人/物特徵的資料提供給影像分析應用程式460的其它功能模組做進一步應用。根據本發明的一實施例，所謂人/物特徵係指影像資料中可以辨識為人或物的特定形狀、顏色、大小、或以上的組合；此外，在步驟540中亦會計算辨識的人或物相對於影像擷取裝置400的位置與距離。以上所稱的進一步應用係指，利用影像分析應用程式460的各種功能模組，例如圖4中所列的：人流分析模組405、訪客身分辨識模組415、注視區域分析模組425、注視物品分析模組430、興趣度分析模組435、移動路徑分析模組440、訪客屬性分析模組445、回流客分析模組455 …等功能模組，對辨識後的各種人/物特徵的資料做進一步的處理與分析，以產生使用者需要的資料及/或資訊。In step 540, the identified data of various human/object characteristics is provided to other functional modules of the image analysis application 460 for further application. According to an embodiment of the present invention, the so-called person/object feature refers to a specific shape, color, size, or combination of the above that can be recognized as a person or object in the image data; in addition, the recognized person or object is also calculated in step 540 The position and distance of the object relative to the image capturing device 400. The above-mentioned further application refers to the use of various functional modules of the image analysis application 460, such as those listed in FIG. 4: the flow analysis module 405, the visitor identification module 415, the gaze area analysis module 425, and Item analysis module 430, interest degree analysis module 435, movement path analysis module 440, visitor attribute analysis module 445, return visitor analysis module 455... and other functional modules, which can analyze the data of various people/object characteristics after identification Do further processing and analysis to generate the data and/or information the user needs.

步驟550中，係將各功能模組之歷史輸出結果記錄於資料庫(Data Base) 450內。記錄於資料庫(Data Base) 450的輸出結果不限定於影像資料，也可是，舉例來說：包括文字、表格、資料庫等的相關資料，或是影像資料與任意相關資料的組合。根據本發明的一實施例，將本發明技術應用於超市或賣場時，記錄到超市/賣場光顧的顧客(Customer，可視為每一ID)、消費或注視(有興趣)的商品(Product Item)、顧客的移動路徑、以及購物行為(Behavior)，是提供作進一步分析應用的重要基礎元素。除了上述有關顧客、商品、移動路徑、消費行為的記錄外，影像分析應用程式460更可以自動結合日期/時間、地點、天氣、溫度、濕度、節日、廣告促銷…等其他參考資訊，以形成另一種可再利用的資訊。In step 550, the historical output results of each functional module are recorded in a database (Data Base) 450. The output result recorded in the database (Data Base) 450 is not limited to image data, but can also be, for example, related data including text, tables, database, etc., or a combination of image data and any related data. According to an embodiment of the present invention, when the technology of the present invention is applied to a supermarket or a store, the customers (Customer, which can be regarded as each ID), the products (Product Item) that are consumed or watched (interested) are recorded in the supermarket/store. , Customer's moving path, and shopping behavior (Behavior) are important basic elements provided for further analysis applications. In addition to the aforementioned records of customers, products, movement paths, and consumption behaviors, the image analysis application 460 can also automatically combine other reference information such as date/time, location, weather, temperature, humidity, festivals, advertising and promotion, etc., to form another A kind of reusable information.

步驟560中，係將各功能模組儲存於資料庫(Data Base) 450內的歷史輸出資料再做進一步的統計分析。雖然只針對單日的記錄、單人的記錄、或單項商品的記錄亦可以提供一些參考資訊；但若結合所有的資料做大數據分析更可以得到其中的趨勢與關係。步驟560就是對各功能模組輸出的歷史記錄再加以統計分析，以提供使用者各種有用的參考資訊，例如：容易引起顧客興趣的商品、所有顧客第一眼注意的商品、不易引起顧客興趣的區域、促銷商品對顧客的影響」、顧客的行動路線、商品的最佳擺設…等諸多的統計資訊輸出。In step 560, the historical output data of each functional module stored in the database (Data Base) 450 is then further statistically analyzed. Although only single-day records, single-person records, or single-item records can also provide some reference information; but if you combine all the data to do big data analysis, you can get the trends and relationships. Step 560 is to perform statistical analysis on the historical records output by each functional module to provide users with various useful reference information, such as: products that easily arouse customer interest, products that all customers pay attention to at first sight, and those that are not easy to arouse customer interest. The output of statistical information such as the influence of regions and promotional products on customers, the customer’s course of action, and the best display of products.

步驟570中，係將步驟560中對各功能模組輸出的歷史記錄再加以統計分析後的結果，甚至根據該結果提供建議，輸出資訊給使用者參考。根據本發明的一實施例，輸出資訊的形式可以是：報表式的資訊、圖表式的資訊、敘述性的資訊、或是以上任意兩者或兩者以上的組合。舉例來說，該輸出資訊可以是，「某物品受較多顧客關注，其中顧客的男女性別比例、顧客的年紀大小」、「易受到顧客關注的物品項目與促銷活動的關連度」、「顧客第一眼的喜好商品與實際購買此商品的比例」、「不易受顧客注意的區域」、「顧客與促銷商品的關連程度」、「顧客的移動路線」、「特定商品的最佳擺設」…等。另外，輸出資訊中亦可以提供使用者改善的建議，例如：「受顧客注意的商品其增加庫存的頻率與數量」、「將顧客最常關注的哪些商品放置於哪些位置的貨架上可以增加銷售量」、「基於顧客移動路線，建議安排貨架上商品擺放位置的方式」、「哪些是較冷門的商品建議應改擺放在其他位置」…等等。In step 570, the result of statistical analysis of the historical records output by each functional module in step 560 is performed, and suggestions are even provided based on the results, and the information is output for the user's reference. According to an embodiment of the present invention, the output information may be in the form of report information, graph information, narrative information, or a combination of any two or more of the above. For example, the output information may be, "a certain item attracts more attention from customers, among which the gender ratio of the customer, the age of the customer", "the degree of relevance between items that are easily attracted to customers and promotional activities", "customers The ratio of the favorite product at first glance to the actual purchase of this product", "Areas that is not easily noticed by customers", "The degree of connection between customers and promotional products", "Customer's moving route", "The best display of a specific product"... Wait. In addition, the output information can also provide users with suggestions for improvement, such as: "The frequency and quantity of increasing inventory for products that are noticed by customers", "Putting which products customers pay most attention to on shelves in which locations can increase sales "Quantity", "Based on the customer's moving route, it is recommended to arrange the position of the goods on the shelf", "Which are less popular products should be placed in other positions"... etc.

請參考圖6，圖6係關於圖5中的訓練階段(步驟510、步驟520)的流程圖。該訓練階段的流程開始於步驟610。Please refer to FIG. 6, which is a flowchart of the training phase (step 510, step 520) in FIG. 5. The process of this training phase starts at step 610.

步驟610中，影像分析伺服器300中的影像分析應用程式460基於CNN的架構下，利用輸入的各種「人/物」特徵的資料集(Data Set)進行深度學習。依據本發明的一實施例，資料集(Data Set)的影像資料可以是如圖7A中定義「呈現各種不同角度的女性的頭部」，或是如圖7B、圖7C中定義「各種物體與食物」的影像資料。依據本發明的一實施例，圖7A中的影像資料至少包括了編號1至91各個「頭部方向(Head Direction)」的女性影像資料。每一圖7A中編號1-91的影像資料都有其代表主體內容及其頭部方向(Head Direction)的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個維度的標籤(label)資訊。編號1-91號的影像資料的「標籤」中的主題(subject)是女性，而標籤中不同數值的Roll、Pitch、Yaw等資訊代表該女性的臉/頭部於立體空間下，在不同的「空間中的旋轉自由度」下所呈現的不同態樣。圖7A中的女性「在不同的空間中旋轉自由度下所呈現的不同態樣」可以稱為該女性的「頭部方向」。上述每一編號的影像資料中不同值的Roll、Pitch、Yaw為該編號的影像資料的「標籤」之部份資訊。此外，主題(subject)是女性也是「標籤」的另一部份資訊。例如編號46號的影像資料(粗線方框處)，其特徵為臉/頭正對前方且該影像資料中臉/頭無任何的仰角與偏移(即無任何的翻滾、俯仰、偏擺值)。所以編號46號的影像資料，其Roll、Pitch、Yaw的值分別被定義為(0, 0, 0)。圖7A中編號46號以外其他編號的影像資料與編號46號的影像資料相比較，其臉/頭的仰角與偏移都有不同程度的改變，故其Roll、Pitch、Yaw的值可能是為(-20, 10, 0)、(5, -10, 30)、(10, 20, -5) …等不同的情況，而影像資料中此數值代表其標籤之一部分，用來區別不同臉/頭部的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)所形成的頭部方向的影像資料。上述該(0, 0, 0)、(-20, 10, 0)、(5, -10, 30)、(10, 20, -5)等數值並不一定表示實際的角度，而可能是代表與基準的影像資料的相異程度。影像分析伺服器300中的影像分析應用程式460經過步驟610的深度學習過程後，可以得到學習主題的「特徵對映(feature mapping)」」，或稱為「特徵向量(feature vector)」，用來辨識具有特定主題特徵的「人」或「物」。In step 610, the image analysis application 460 in the image analysis server 300 is based on the CNN architecture and uses the input data sets of various "people/object" characteristics to perform deep learning. According to an embodiment of the present invention, the image data of the data set (Data Set) can be defined as "female heads showing various angles" as defined in Figure 7A, or as defined in Figure 7B and Figure 7C as "various objects and "Food" image data. According to an embodiment of the present invention, the image data in FIG. 7A at least includes the female image data numbered from 1 to 91 in each "Head Direction". Each image data numbered 1-91 in Figure 7A has its label representing the main content and the three dimensions of the head direction (Roll), pitch (Pitch), and yaw (Yaw). )Information. The subject in the "tag" of the image data No. 1-91 is a woman, and the different values of Roll, Pitch, Yaw and other information in the tag represent that the woman’s face/head is in a three-dimensional space and is in different The different states presented under "Rotational Freedom in Space". The "different attitudes presented by the woman in different degrees of freedom of rotation in different spaces" in FIG. 7A can be referred to as the "head direction" of the woman. The different values of Roll, Pitch, and Yaw in the image data of each number above are part of the information of the "tag" of the image data of that number. In addition, the subject of women is also another part of the "label" of information. For example, the image data No. 46 (in the thick line box) is characterized by the face/head facing the front and the face/head in the image data has no elevation angle or offset (that is, no roll, pitch, yaw value). Therefore, for the image data No. 46, the values of Roll, Pitch, and Yaw are respectively defined as (0, 0, 0). Compared with the image data of No. 46, the elevation angle and offset of the face/head have different degrees of changes in the image data of other numbers other than No. 46 in Figure 7A, so the values of Roll, Pitch, and Yaw may be (-20, 10, 0), (5, -10, 30), (10, 20, -5)… and other different situations, and this value in the image data represents a part of its label, which is used to distinguish different faces/ The image data of the head direction formed by the roll, pitch, and yaw of the head. The above-mentioned (0, 0, 0), (-20, 10, 0), (5, -10, 30), (10, 20, -5) and other values do not necessarily represent actual angles, but may represent The degree of difference from the benchmark image data. After the image analysis application 460 in the image analysis server 300 undergoes the deep learning process in step 610, it can obtain the "feature mapping", or "feature vector", of the learning topic. To identify "people" or "things" with specific thematic characteristics.

圖7B、7C中的「物體與食物」的資料集(Data Set)影像資料可能是，舉例來說，賣場內的各種商品，例如：水果、衣服、褲子、飲料、食物…等。同樣地，各種商品的資料集(Data Set)中，每一種商品的每一影像資料也會具有一標籤，標示不同的Roll、Pitch、Yaw數值，以在多種商品中呈現「同一種商品但不同態樣」下的影像資料。The image data of the "Object and Food" data set (Data Set) in FIGS. 7B and 7C may be, for example, various commodities in the store, such as fruits, clothes, pants, beverages, food, etc. Similarly, in the Data Set of various products, each image data of each product will also have a label, indicating different values of Roll, Pitch, and Yaw, so as to present "the same product but different Image data under "Stylish".

步驟620中，係利用測試影像資料驗證影像分析應用程式460於步驟610的學習成果。雖然經過步驟610的深度學習過程，影像分析應用程式460應具備一定程度的能力辨識所學習主題(subject)的特徵，但成功率是否合乎要求仍需經測試影像資料的驗證。測試影像資料的數量可能是資料集(Data Set)的影像資料的數量的「四分之一」或是「五分之一」，但不以此為限。根據本發明的一實施例，測試過程的正確性可能還是需要「評定者(人)」的介入來作最後的判定，判定的內容，舉例來說：看辨識的結果是否正確、辨識率是否達到一定的程度。此一驗證的歷程可以被記錄下來，以供未來再學習與實際辨識影像資料時的參考。In step 620, the test image data is used to verify the learning result of the image analysis application 460 in step 610. Although after the deep learning process of step 610, the image analysis application 460 should have a certain level of ability to recognize the characteristics of the learned subject, whether the success rate meets the requirements still needs to be verified by the test image data. The number of test image data may be "one-quarter" or "one-fifth" of the number of image data in the data set, but it is not limited to this. According to an embodiment of the present invention, the correctness of the test process may still require the intervention of the "rater (person)" to make the final judgment. To a certain extent. This verification process can be recorded for reference in future learning and actual identification of image data.

步驟630中，判斷影像分析應用程式460的學習結果是否達到預定的標準。舉例來說，該判定的標準可以是：利用測試影像資料測試的錯誤率必須低於10%以下；上述10%的門檻值只是舉例，並不以此為限，實務上使用者可以設定任何想要的門檻值。再以10%為例，錯誤率低於10%的意義是：假設一個測試影像資料中有10個「人」存在於畫面中，該影像分析應用程式460必須正確辨識出其中至少9人以上的臉/頭部區域、預估該臉/頭部所呈現的Roll、Pitch、Yaw等特徵。若可達到測試影像資料的90%以上的正確辨識率才算合格，否則即是未達到標準。同樣地，當測試對象為「物」時也是比照同樣的檢視標準。影像分析應用程式460的學習結果若是達到預定的標準，則結束訓練階段。否則，則執行步驟640。In step 630, it is determined whether the learning result of the image analysis application 460 meets a predetermined standard. For example, the criterion for this determination can be: the error rate of the test using test image data must be less than 10%; the above threshold of 10% is just an example and not limited to this. In practice, users can set any ideas. The required threshold. Taking 10% as an example, the meaning of an error rate of less than 10% is: assuming that there are 10 "persons" in the test image data, the image analysis application 460 must correctly identify at least 9 of them. Face/head area, and predict features such as Roll, Pitch, Yaw, etc. presented by the face/head. If it can reach the correct recognition rate of more than 90% of the test image data, it is considered qualified, otherwise it is not up to the standard. Similarly, when the test object is a "thing", the same inspection standard is also used. If the learning result of the image analysis application 460 reaches the predetermined standard, the training phase is ended. Otherwise, proceed to step 640.

步驟640中，係針對某些測試後未達到標準的主題(subject)，輸入更多相關的資料集(Data Set)再次進行深度學習。步驟640執行完畢後會再回到步驟620，利用測試影像資料驗證影像分析應用程式460的學習結果是否達到預定的標準。如此反覆學習並驗證，直到學習的結果達到預定的標準為止。In step 640, for some subjects that have not reached the standard after the test, input more relevant data sets (Data Set) to perform deep learning again. After step 640 is executed, it will return to step 620 to verify whether the learning result of the image analysis application 460 meets the predetermined standard by using the test image data. Repeated learning and verification in this way, until the results of the learning reach the predetermined standard.

依據本發明的另一實施例，當步驟630的判斷結果為否時，則步驟640中並不再輸入更多相關的資料集(Data Set)且再次進行深度學習，而是選擇另一種演算法來做為辨識各種主題(subject)於各種態樣下之特徵的工具。依據本發明的另一實施例，當步驟630的判斷結果為否時，則步驟640中變更/調整其中的學習參數，以使學習的結果可以更接近預期的結果。上述經由改變而達到預定標準的變因可以是透過：(a) 輸入更多與其相關的資料集(Data Set)的影像資料；(b) 選擇另一種演算法；(c) 調整其中的學習參數，等方法來達到改變學習的結果。改變變因的方式，可以一次改變一個變因，或是同時改變多個的變因來改變其學習的結果。According to another embodiment of the present invention, when the judgment result of step 630 is no, then no more relevant data sets are input in step 640 and deep learning is performed again, but another algorithm is selected. It is used as a tool to identify the characteristics of various subjects in various aspects. According to another embodiment of the present invention, when the judgment result of step 630 is negative, the learning parameters therein are changed/adjusted in step 640 so that the learning result can be closer to the expected result. The above-mentioned variables that reach the predetermined standard through changes can be achieved by: (a) inputting more image data related to the data set (Data Set); (b) choosing another algorithm; (c) adjusting the learning parameters. , And other methods to achieve the result of changing learning. The way to change the variable cause can be changed one variable at a time, or change multiple variables at the same time to change the result of learning.

請參考圖8A至圖8C，圖8A至圖8C係例示圖6中的步驟620以測試影像驗證影像分析應用程式460的學習結果。一幀(frame)測試影像為諸多的畫素所構成的圖片(picture)，其中包含使用者希望影像分析應用程式460可以辨識的特定圖像。經過深度學習的訓練後，影像分析應用程式460應該具有相當的能力可以辨識測試影像的圖片(picture)中學習過的人或物的特徵。一般來說，在下面兩種情況下，影像分析應用程式460應該可以順利完成辨識：(a) 測試影像中具有欲辨識的人或物的特徵圖像超過一定畫素大小、(b)出現互相重疊的人或物的特徵圖像時，其中互相重疊的部分不超過一定比例而影響可辨識的程度。圖8A至圖8C中呈現，經過影像分析應用程式460辨識後，以某一顏色的方框標註出「人的頭部」於測試影像的圖片(picture)中的位置；另外以另一顏色的方框標註出特定物體於測試影像資料的圖片(picture)中的位置。雖然在圖8A與圖8B之測試影像的圖片(picture)中，「人的頭部」的角度各不相同，但只要欲辨識的特徵圖像超過一定畫素大小，該特徵即應可被辨識出來。測試影像的整體辨識率須要達到預定的標準之上，訓練階段才能結束。Please refer to FIGS. 8A to 8C. FIGS. 8A to 8C illustrate step 620 in FIG. 6 to test the image to verify the learning result of the image analysis application 460. A frame of test image is a picture composed of many pixels, which contains a specific image that the user wants the image analysis application 460 to recognize. After deep learning training, the image analysis application 460 should be able to recognize the learned characteristics of the person or object in the picture of the test image. Generally speaking, in the following two situations, the image analysis application 460 should be able to successfully complete the identification: (a) the characteristic image of the person or object to be identified in the test image exceeds a certain pixel size, (b) mutual appearance When overlapping characteristic images of people or objects, the overlapping parts do not exceed a certain ratio, which affects the degree of recognition. As shown in Figures 8A to 8C, after being identified by the image analysis application 460, the position of the "person's head" in the picture of the test image is marked with a box of a certain color; The box marks the position of the specific object in the picture of the test image data. Although in the pictures of the test images in Figures 8A and 8B, the angle of the "human head" is different, as long as the feature image to be recognized exceeds a certain pixel size, the feature should be recognized come out. The overall recognition rate of the test image must reach a predetermined standard before the training phase can end.

請參考圖9，圖9係說明圖5的步驟540的細部流程。依據本發明的一實施例，圖5的流程是針對本發明諸多可以實施的應用中的其中之一「分析最容易引起賣場中購物顧客興趣的商品」加以說明。圖5的流程開始於步驟700。Please refer to FIG. 9. FIG. 9 illustrates the detailed flow of step 540 in FIG. 5. According to an embodiment of the present invention, the process of FIG. 5 is described for one of the many applications of the present invention, "analyzing the products that are most likely to arouse the interest of shopping customers in the store". The flow of FIG. 5 starts at step 700.

步驟700中，影像分析伺服器300取得要進行分析的影像資料。依據本發明的一實施例，取得影像資料的方式可以是影像擷取裝置(Image Capture Device) 400將擷取的影像資料先透過網路180上傳(upload)至雲資料儲存單元150內儲存，之後影像分析伺服器300再透過網路190從雲資料儲存單元150下載(download)影像資料做進一步分析。依據本發明的另一實施例，影像資料也可以由該影像擷取裝置(Image Capture Device) 400透過網路或連接線直接傳送到影像分析伺服器300。本發明中對影像資料進行分析的影像分析伺服器300不限於一個，也有可能是複數個。根據本發明一實施例，影像資料分析的工作皆是由單一影像分析伺服器300負責，在此情況下圖9的流程都在同一影像分析伺服器300內進行。另一種情況是，由複數個影像分析伺服器300協力完成影像分析的工作，則圖9的流程會被分成幾個部分，由不同的影像分析伺服器300各進行一部分。舉例來說，影像資料先傳送至一第一影像分析伺服器300做部分的影像處理與/或分析，然後將做過部分處理與/或分析的影像資料上傳到雲資料儲存單元150內儲存；另有一第二影像分析伺服器300隨時可以從雲資料儲存單元150下載影像資料做進一步的處理與分析。上述第一影像分析伺服器300與第二影像分析伺服器300可以都是具有完整功能的影像分析伺服器300，或是兩影像分析伺服器300或其中之一僅具有部分影像處理與分析的功能。依據本發明的一實施例，影像分析伺服器300接收到影像資料後必須先判斷影像資料的檔案格式是否是可以接受的檔案格式，若不是，則影像資料必須先經一檔案格式的轉檔步驟後再進行後續處理。In step 700, the image analysis server 300 obtains image data to be analyzed. According to an embodiment of the present invention, the method of obtaining image data may be that the image capture device (Image Capture Device) 400 first uploads the captured image data through the network 180 to the cloud data storage unit 150 for storage, and then The image analysis server 300 then downloads the image data from the cloud data storage unit 150 via the network 190 for further analysis. According to another embodiment of the present invention, the image data can also be directly transmitted from the image capture device (Image Capture Device) 400 to the image analysis server 300 via a network or a connection line. In the present invention, the image analysis server 300 for analyzing the image data is not limited to one, and there may be more than one. According to an embodiment of the present invention, a single image analysis server 300 is responsible for the image data analysis work. In this case, the process of FIG. 9 is all performed in the same image analysis server 300. In another case, if a plurality of image analysis servers 300 work together to complete the image analysis work, the process of FIG. 9 will be divided into several parts, and different image analysis servers 300 will each perform one part. For example, the image data is first sent to a first image analysis server 300 for partial image processing and/or analysis, and then the partially processed and/or analyzed image data is uploaded to the cloud data storage unit 150 for storage; In addition, a second image analysis server 300 can download image data from the cloud data storage unit 150 for further processing and analysis at any time. The above-mentioned first image analysis server 300 and second image analysis server 300 may both be image analysis servers 300 with complete functions, or two image analysis servers 300 or one of them may only have partial image processing and analysis functions . According to an embodiment of the present invention, after the image analysis server 300 receives the image data, it must first determine whether the file format of the image data is an acceptable file format, if not, the image data must first undergo a file format conversion step Follow-up processing will be carried out later.

步驟705中，係辨識該影像資料中具有主題(subject)的部分；其中所謂的「主題(subject)」是，影像分析伺服器300經過深度學習後可以辨識之具有特定特徵的標的。影像資料的每一圖片(picture)中可能包含有一或複數個具有特定特徵的標的，這些標的可能是具有「人臉或頭部」特徵的標的、或是具有「物品或商品)」特徵的標的。依據本發明一實施例，由於影像資料中的每一圖片是由眾多的畫素所組成，故步驟705必須判斷圖片中包括了哪些「人」特徵的標的與哪些「物」特徵的標的，此外還要知道各個標的各自分布在圖片的哪一位置；換句話說，判斷圖片中每一具有「人」特徵的標的及每一具有「物」特徵的標的，及其個別在圖片中所佔的範圍(畫素大小)與在圖片中的位置，是執行步驟705的主要目的。一般來說，雖然影像資料的動態規格是由一秒鐘數張到數十張/幀的圖片(picture)所組成，根據本發明一實施例，在實作上不需要每一圖片都偵測其包括了多少具有「人」特徵的標的與具有「物」特徵的標的。若以最常見的30幀(frame)/秒(second)的規格為例，可以選擇，但不以此為限，在每間隔1/6秒時選取一幀(frame)圖片(picture)去偵測該圖片包括了多少具有「人」特徵的標的與具有「物」特徵的標的，以及其個別位於圖片中的位置與範圍大小。換句話說，在此假設前提下，步驟705須每1秒鐘判別6個圖片具有的「人」特徵的標的與「物」特徵的標的。上述每間隔1/6秒偵測一幀圖片僅做為本發明的一實施例，實際上該時間間隔可以為任意其它數值的時間間隔。必須注意的是：步驟705中對於圖片中「人」及/或「物」的偵測並非透過Data Base進行比對而實現，而是藉由在訓練過程的深度學習中所學習得到的「特徵對映(feature mapping)」來執行辨識圖片中「人」及/或「物」的工作。In step 705, the part of the image data with a subject is identified; the so-called "subject" is a subject with specific characteristics that can be identified by the image analysis server 300 after deep learning. Each picture of the image data may contain one or more objects with specific characteristics. These objects may be objects with the characteristics of "face or head" or objects with the characteristics of "articles or commodities". . According to an embodiment of the present invention, since each picture in the image data is composed of a large number of pixels, step 705 must determine which "person" feature objects and which "thing" feature objects are included in the image. In addition, It is also necessary to know where each object is located in the picture; in other words, determine each object with the characteristics of "people" and each object with the characteristics of "thing" in the picture, as well as the individual occupancy in the picture The range (pixel size) and the position in the picture are the main purposes of performing step 705. Generally speaking, although the dynamic specification of image data is composed of pictures from a few to dozens of pictures per second per second, according to an embodiment of the present invention, it is not necessary to detect every picture in practice. It includes how many objects with the characteristics of "people" and objects with the characteristics of "things". If you take the most common 30 frame/second (second) specification as an example, you can choose, but not limited to this, select a frame (picture) at every 1/6 second interval to detect Measure how many objects with the characteristics of "people" and objects with the characteristics of "things" are included in the picture, as well as their respective positions and extents in the picture. In other words, under this assumption, step 705 must determine the subject of the "person" feature and the subject of the "thing" feature of the 6 pictures every 1 second. The foregoing detection of a frame of pictures every 1/6 second is only an embodiment of the present invention, in fact, the time interval can be any other numerical time interval. It must be noted that the detection of "people" and/or "objects" in the image in step 705 is not achieved through the comparison of the Data Base, but by the "features" learned in the deep learning of the training process. "Feature mapping" to perform the task of identifying the "person" and/or "thing" in the picture.

在步驟705中對影像資料進行辨識後，可將辨識到的「人」與「物」的標的，區分為「顧客」與「商品」二條平行流程來處理。因此，在步驟705之後，辨識為「人(顧客)」的標的至步驟710以及執行其後的步驟；辨識為「物(商品)」的標的至步驟715以及執行其後的步驟。After the image data is identified in step 705, the identified objects of "person" and "thing" can be divided into two parallel processes of "customer" and "commodity" for processing. Therefore, after step 705, the object identified as "person (customer)" proceeds to step 710 and the subsequent steps are executed; the object identified as "thing (commodity)" proceeds to step 715 and the subsequent steps are executed.

承上說明，步驟710是圖片中辨識為「人(顧客)」的標的所進入的步驟。由於圖片中呈現出人臉/頭部的範圍有大有小，所以圖片中若有過小區域(畫素)的人臉/頭部，可能無法有效的被偵測出來。於本發明一實施例，可以被偵測出來的畫素大小為40X40畫素(含)以上。該40X40畫素的門檻值僅是本發明提供的一實施參考，本發明並不以此為限，最低可被辨識的畫素門檻值可能是任意其它數值的畫素大小，端視軟硬體的能力而定。在步驟710中確定辨識為「人」的標的後，依據本發明一實施例，若不須維護顧客的身分(會員)資料，則進入步驟730之「顧客頭部方向之偵測」與步驟735之「顧客頭部位置之估算」等二個步驟，該二個步驟將在稍後說明。依據本發明另一實施例，若須要維護顧客的身分(會員)資料，則進入步驟720。Continuing from the above description, step 710 is the step where the subject identified as "person (customer)" in the picture enters. Since the range of the face/head shown in the picture is large or small, if there is a small area (pixel) of the face/head in the picture, it may not be effectively detected. In an embodiment of the present invention, the pixel size that can be detected is more than 40×40 pixels (inclusive). The threshold value of 40X40 pixels is only an implementation reference provided by the present invention, and the present invention is not limited to this. The lowest recognizable pixel threshold value may be any other value of pixel size, depending on the hardware and software. Depends on the ability. After the subject identified as "person" is determined in step 710, according to an embodiment of the present invention, if the customer's identity (member) data does not need to be maintained, proceed to step 730 of "detection of the direction of the customer's head" and step 735 The "Estimation of Customer's Head Position" and other two steps, these two steps will be explained later. According to another embodiment of the present invention, if the customer's identity (member) information needs to be maintained, step 720 is entered.

步驟720中，係對圖片中的顧客做身份辨識(Identity Recognition)。依據本發明的一實施例，對於圖片中所有的顧客的身份會逐一比對。賣場中的人可能是隨意走動四處張望觀看商品，也有可能是暫停腳步而專注於某些商品。依據本發明的一實施例，所謂的顧客是指那些有停下腳步且其觀看某一物品(商品)的時間長度有超過一時間門檻值的人；其中，設定時間門檻值可能是，例如：2秒鐘、5秒鐘、10秒鐘，但不以此為限，可以是設定的任一時間長度。在這種設定下，圖片中不符合上述條件的人，將不被認定是顧客。依據本發明的另一實施例，對於圖片中正在移動的人，但其注視視線持續於同一物品(商品)達一設定時間以上(例如：2秒鐘以上，但不以此為限)，也會被認定是顧客而進一步辨識其身份。In step 720, identity recognition (Identity Recognition) is performed on the customer in the picture. According to an embodiment of the present invention, the identities of all customers in the picture are compared one by one. People in the store may walk around at random to look around at the merchandise, or they may pause and focus on certain merchandise. According to an embodiment of the present invention, the so-called customers refer to those who have stopped and the length of time they watch an item (commodity) exceeds a time threshold; the set time threshold may be, for example: 2 seconds, 5 seconds, 10 seconds, but not limited to this, can be any length of time set. Under this setting, the person in the picture who does not meet the above conditions will not be recognized as a customer. According to another embodiment of the present invention, for people who are moving in the picture, but their gaze continues on the same item (commodity) for more than a set time (for example, more than 2 seconds, but not limited to this), Will be identified as customers and further identify their identity.

此外，步驟720須要藉由比對圖2中資料庫(Data Base) 450的資料，以達到對顧客做身份辨識的目的；其中，資料庫(Data Base) 450的資料包含了曾經來訪的顧客ID以及各來訪顧客的「臉部特徵」的影像資料，該臉部特徵的影像資料包含了不同頭部方向(Head Direction)的影像資料。透過資料庫450內所記錄的各顧客的「臉部特徵」的影像資料，影像分析應用程式460可以依所記錄的各「臉部特徵」是否符合目前圖片中的顧客的「臉部特徵」，以辨識出顧客的身份。上述的顧客ID可能是顧客的會員ID，或是透過影像分析應用程式460所指派的臨時ID。臨時ID的樣態可能是，例如：20200101-F0000，以日期、購物區域等資料所組成的數字組合。當顧客身份比對成功時，表示該顧客曾經來訪並被記錄於資料庫(Data Base) 450中，此時不需於資料庫(Data Base) 450中再新增該顧客的資料。相反地，當顧客身份比對不成功時，則表示該顧客不曾被記錄於資料庫(Data Base) 450、或是該顧客的頭部影像資料的畫質不佳以致無法順利進行比對，此時可以在資料庫(Data Base) 450中新增一顧客臨時ID以代表該顧客的身份。In addition, step 720 needs to compare the data of the database (Data Base) 450 in FIG. 2 to achieve the purpose of identifying the customer. Among them, the data of the database (Data Base) 450 includes the ID of the customer who has visited and The image data of the "facial features" of each visiting customer. The image data of the facial features includes image data of different head directions. Based on the image data of the "facial characteristics" of each customer recorded in the database 450, the image analysis application 460 can determine whether the recorded "facial characteristics" match the customer's "facial characteristics" in the current picture. To identify the identity of the customer. The aforementioned customer ID may be the customer's member ID or a temporary ID assigned through the image analysis application 460. Temporary ID may be, for example: 20200101-F0000, a combination of numbers composed of data such as date and shopping area. When the customer identity comparison is successful, it means that the customer has visited and is recorded in the database (Data Base) 450. At this time, there is no need to add the customer's information in the database (Data Base) 450. Conversely, when the customer identity comparison is unsuccessful, it means that the customer has not been recorded in the Data Base 450, or the quality of the customer’s head image data is not good enough for the comparison to proceed smoothly. At this time, a temporary customer ID can be added to the database (Data Base) 450 to represent the identity of the customer.

步驟725中，若是步驟720無法在資料庫(Data Base) 450中比對到顧客臉部特徵的資料，則在資料庫450中新增顧客ID及記錄其臉部特徵的影像資料，並新增其注視商品的相關記錄；若是步驟720可以在資料庫(Data Base) 450中比對到顧客臉部特徵的資料，則對比對到ID的顧客新增其注視商品的相關記錄，此外亦可以根據設定決定是否新增該顧客的臉部特徵的影像資料於資料庫(Data Base) 450中。於本發明的一實施例，步驟725中除了記錄顧客ID、顧客的臉部特徵的影像資料之外，還會記錄注視商品的相關資訊，包括：日期/時間、注視商品的名稱/代號、注視商品的持續時間、注視商品的次數等資料。In step 725, if the data of the customer's facial features cannot be compared in the database (Data Base) 450 in step 720, add the customer ID and record the image data of the facial features in the database 450, and add The related record of the product that it looks at; if step 720 can compare the data of the customer’s facial features in the Data Base 450, then compare the ID of the customer to add the related record of the product that they look at. In addition, it can also be based on The setting determines whether to add the image data of the customer's facial features to the database (Data Base) 450. In one embodiment of the present invention, in step 725, in addition to recording the customer ID and the image data of the customer's facial features, the related information of the watched product is also recorded, including: date/time, the name/code of the watched product, and the watched product. The duration of the product, the number of times the product was looked at, etc.

依據本發明的另一實施例，圖9中的步驟720與步驟725皆省略，以增進影像資料擷取與分析系統100的整體的效能。在這種情況下，不執行步驟720中「與資料庫(Data Base) 450做比對以辨識顧客ID」的步驟，對於圖片中的每一顧客都當成新顧客；在這種情況下，也不會執行步驟725中「在資料庫(Data Base) 450中記錄顧客的臉/頭部的影像資料與其注視商品」的步驟。這些相關的顧客資料只會暫存於影像分析伺服器300裡的動態隨機存取記憶體(DRAM) 320、非揮發性記憶體(NVRAM) 330、或其他適合的儲存空間，不做永久性保存。According to another embodiment of the present invention, steps 720 and 725 in FIG. 9 are omitted to improve the overall performance of the image data acquisition and analysis system 100. In this case, the step of "compare with the database (Data Base) 450 to identify the customer ID" in step 720 is not performed, and each customer in the picture is treated as a new customer; in this case, also The step of "recording the image data of the customer's face/head and the product they watched in the database (Data Base) 450" in step 725 will not be performed. The relevant customer data will only be temporarily stored in the dynamic random access memory (DRAM) 320, non-volatile memory (NVRAM) 330, or other suitable storage space in the image analysis server 300, and will not be stored permanently .

依據本發明的另一實施例，除了上述於步驟720中「對圖片中的顧客做身份辨識」之外，更進一步包含辨識顧客的屬性資料，例如：性別、年紀、職業等，但不以此為限。請參考圖9中以虛線繪示的部分，若以辨識顧客的性別、年紀、職業為例，在步驟720之後可以接著執行步驟722A、步驟722B、步驟722C等3步驟。這3個步驟都是以虛線來繪示，代表這3個步驟可以選擇性執行或省略。在實際應用上，執行較多顧客屬性的辨識很可會降低系統100的整體效能，所以使用者可以視需求而決定是否增加辨識顧客的屬性。According to another embodiment of the present invention, in addition to the "identification of the customer in the picture" in step 720, the attribute data for identifying the customer is further included, such as gender, age, occupation, etc., but not Is limited. Please refer to the part shown by the dotted line in FIG. 9. If the identification of the customer's gender, age, and occupation is taken as an example, after step 720, three steps such as step 722A, step 722B, and step 722C can be performed. These 3 steps are all shown in dashed lines, which means that these 3 steps can be selectively executed or omitted. In practical applications, performing more identification of customer attributes is likely to reduce the overall performance of the system 100, so users can decide whether to increase the attributes of identifying customers according to their needs.

步驟722A中，係對圖片中的顧客做性別辨識與記錄。同上述，本發明基於CNN的架構，利用與性別相關的資料集(Data Set)做深度學習後，影像分析應用程式460學習得到辨識主題為性別的「特徵對映(feature mapping)」，可以據以辨識圖片中的顧客是年輕的女性、年長的女性、年輕的男性、年長的男性、女童或男童、或是幼兒等等。當辨識出顧客的性別後也會在資料庫450內該顧客的性別欄位新增其性別資料。同前述，步驟722A係以虛線來繪示，代表該步驟722A可以省略。In step 722A, gender identification and recording are performed on the customer in the picture. As mentioned above, the present invention is based on the CNN architecture, after using gender-related data sets (Data Set) for deep learning, the image analysis application 460 learns to obtain the "feature mapping" that identifies the subject as gender. To identify whether the customer in the picture is a young woman, an older woman, a young man, an older man, a girl or a boy, or a child, etc. When the gender of the customer is identified, the customer's gender data will be added to the gender field of the database 450. As mentioned above, step 722A is drawn with a dashed line, which means that step 722A can be omitted.

步驟722B中，係對圖片中的顧客做年紀推估與記錄。同上述，本發明基於CNN架構，利用與年紀相關的資料集(Data Set)做深度學習後，影像分析應用程式460學習得到辨識主題為年紀的「特徵對映(feature mapping)」，可以據以推估圖片中顧客的年紀。在訓練階段，資料庫450提供愈多的各年齡的男/女性/小孩的資料集(Data Set)供做深度學習，則步驟722B中影像分析應用程式460所推估的顧客年紀就會愈正確。當推估出顧客的年紀後也會在資料庫450內該顧客的年紀欄位新增其年紀資料。同前所述，步驟722B係以虛線來繪示，代表該步驟722B可以省略。In step 722B, the department estimates and records the age of the customer in the picture. As mentioned above, the present invention is based on the CNN architecture. After deep learning is performed using the age-related data set (Data Set), the image analysis application 460 learns to obtain the "feature mapping" whose subject is the age. Estimate the age of the customer in the picture. In the training phase, the more data sets (Data Sets) of male/female/children of various ages provided by the database 450 for deep learning, the more accurate the customer age estimated by the image analysis application 460 in step 722B will be . When the customer's age is estimated, the customer's age field in the database 450 will be added with its age data. As described above, step 722B is drawn with a dashed line, which means that step 722B can be omitted.

步驟722C中，係對圖片中的顧客做職業推估與記錄。同上述，本發明基於CNN架構，利用與職業相關的資料集(Data Set)做深度學習後，影像分析應用程式460學習得到辨識主題為職業的「特徵對映(feature mapping)」，可以據以推估圖片中顧客的職業。在訓練階段，資料庫450提供愈多的各種職業所穿著的服裝、或是其工作時所使用工具的資料集(Data Set)供做深度學習，則步驟722C中影像分析應用程式460所推估出的顧客的職業就會愈接近事實。當推估出顧客的職業後也會在資料庫450內該顧客的職業欄位新增其職業資料。同前所述，步驟722C係以虛線來繪示，代表該步驟722C可以省略。In step 722C, the department estimates and records the career of the customer in the picture. As mentioned above, the present invention is based on the CNN architecture. After deep learning is done with the occupation-related data set (Data Set), the image analysis application 460 learns to obtain the "feature mapping" that recognizes the subject as the occupation, which can be based on Estimate the occupation of the customer in the picture. In the training phase, the database 450 provides more data sets of clothing worn by various occupations or tools used in work for deep learning, which is estimated by the image analysis application 460 in step 722C The career of the exported customer will be closer to the truth. When the customer's occupation is estimated, the customer's occupation data will also be added to the customer's occupation field in the database 450. As mentioned above, step 722C is drawn with a dashed line, which means that step 722C can be omitted.

步驟730中，係對圖片中顧客頭部方向(Head Direction)的偵測。承前面步驟710的說明，在步驟710執行完畢之後，會以平行處理的方式執行步驟730中「顧客頭部方向的偵測」與步驟735中「顧客頭部位置之估算」。依據本發明的一實施例，進行步驟730之前，須先在圖5的訓練階段中利用如圖7A的資料集(DataSet)讓影像分析應用程式460做深度學習，以學習得到的主題為頭部方向的「特徵對映(feature mapping)」，然後再依據該頭部方向的「特徵對映(feature mapping)」判定圖片中顧客頭部方向最可能的態樣。以圖7A為例，其中編號1-91號的圖片呈現同一女性於各種不同頭部方向的影像，每一影像都有相對應的標籤(label)，每一標籤(label)內的記載的資訊至少包括「主題(subject)」(例如：女性的頭部)、及「主題(subject)的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三數值」。由上所述可知：編號1-91號的圖片係呈現一女性的臉/頭部於不同的「頭部方向」下的態樣。於實際應用時，圖片中顧客的頭部方向(即由頭部的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)等值所形成的空間中翻轉自由度)大致上可以對應到學習過的編號1-91其中一種狀況，因此影像分析應用程式460可以由過去訓練階段時深度學習所得到的「特徵對映(feature mapping)」，來判定圖片中顧客頭部方向相對應的Roll, Pitch, Yaw等數值。若圖片中該顧客頭部方向並不完全符合上述編號1-91的其中之一者，步驟730中也會依照學習到的「特徵對映(feature mapping)」對顧客的頭部方向做一最接近的判斷，推估其最可能的Roll, Pitch, Yaw等數值。此外，根據本發明一實施例，步驟730可以進一步計算顧客頭部於圖片中的位置與範圍大小，並把圖片中屬於顧客臉/頭部的部分區分(標示)出來。在可以辨識的情況下，若一圖片中包括10位顧客，則10位顧客的頭部方向都會被偵測出來，且進一步該10位顧客的臉/頭部也會在圖片中被標示出來，如圖8A或是圖8B中以顏色粗框標示的顧客的臉/頭部。In step 730, the head direction of the customer in the picture is detected. Following the description of step 710 above, after step 710 is executed, the "detection of the customer's head orientation" in step 730 and the "estimation of the customer's head position" in step 735 will be executed in parallel processing. According to an embodiment of the present invention, before step 730 is performed, the data set (DataSet) shown in FIG. 7A must be used in the training phase of FIG. The "feature mapping" of the direction, and then the "feature mapping" of the head direction to determine the most likely pattern of the customer's head direction in the picture. Take Figure 7A as an example. The pictures numbered 1-91 show images of the same woman with different head orientations. Each image has a corresponding label, and the information recorded in each label At least include "subject" (e.g. female head), and "subject's roll, pitch, and yaw three values". From the above, it can be seen that the pictures No. 1-91 show a female face/head in different "head directions". In actual application, the direction of the customer’s head in the picture (that is, the degree of freedom of turning in the space formed by the head roll (Roll), pitch (Pitch), yaw (Yaw), etc.) can roughly correspond to the learning The number 1-91 is one of the conditions, so the image analysis application 460 can use the "feature mapping" obtained by deep learning in the past training phase to determine the roll corresponding to the direction of the customer’s head in the picture. Pitch, Yaw and other values. If the customer’s head orientation in the picture does not completely match one of the above numbers 1-91, in step 730, the customer’s head orientation will also be optimized according to the learned "feature mapping" Close judgment, estimate its most likely Roll, Pitch, Yaw and other values. In addition, according to an embodiment of the present invention, step 730 can further calculate the position and range of the customer's head in the picture, and distinguish (mark) the part of the customer's face/head in the picture. In the case of identifiable, if a picture includes 10 customers, the head orientation of 10 customers will be detected, and further the faces/heads of the 10 customers will also be marked in the picture. The face/head of the customer is marked with a bold colored frame in Fig. 8A or Fig. 8B.

步驟735中，係對圖片中顧客臉/頭部在環境(賣場)中的位置做估算。本發明在執行圖9所示的應用流程之前，須先對影像擷取裝置400做校準(calibration)的工作，以得到影像擷取裝置400的相關參數(例如：Intrinsic/Extrinsic matrix)，經過深度學習訓練後，步驟735可由影像擷取裝置400的參數與圖片中顧客的影像資料，推算出圖片中顧客與影像擷取裝置400的距離與相對位置。依據本發明的一實施例，該特定物體與該影像擷取裝置400的相對距離可以視為空間中的兩個點之間的距離。一般可以將影像擷取裝置400設為原點，所以影像擷取裝置400於3D立體空間的座標表示為(0, 0, 0)。而若可以推算出該特定物體於3D立體空間的座標，例如：(100, 552, 211)，則特定物體相對於原點(影像擷取裝置400)的距離就是一可計算的長度。依據本發明的另一實施例，3D立體空間中的原點亦可設定為圖片中賣場的兩面相鄰牆壁與地面的三角交會點。總而言之，本發明先對影像擷取裝置400做校準(calibration)的工作，以得知2D圖片中的特定點對影像擷取裝置400或參考點(設定的原點)在3D空間中的相對距離。據此，步驟735中便可利用校準後的影像擷取裝置400估算顧客頭部在3D空間中的位置。In step 735, the position of the customer's face/head in the picture in the environment (store) is estimated. In the present invention, before executing the application process shown in FIG. 9, the image capturing device 400 must be calibrated to obtain the relevant parameters of the image capturing device 400 (for example, Intrinsic/Extrinsic matrix). After learning and training, step 735 can calculate the distance and relative position between the customer in the picture and the image capturing device 400 from the parameters of the image capturing device 400 and the image data of the customer in the picture. According to an embodiment of the present invention, the relative distance between the specific object and the image capturing device 400 can be regarded as the distance between two points in space. Generally, the image capturing device 400 can be set as the origin, so the coordinates of the image capturing device 400 in the 3D space are expressed as (0, 0, 0). If the coordinates of the specific object in the 3D space can be calculated, for example: (100, 552, 211), then the distance of the specific object relative to the origin (the image capturing device 400) is a calculable length. According to another embodiment of the present invention, the origin in the 3D space can also be set as the triangle intersection point between two adjacent walls of the store in the picture and the ground. In summary, the present invention first performs calibration on the image capturing device 400 to know the relative distance of a specific point in the 2D picture to the image capturing device 400 or a reference point (the set origin) in 3D space . Accordingly, in step 735, the calibrated image capturing device 400 can be used to estimate the position of the customer's head in the 3D space.

步驟740中，係推測顧客的注視區域(Gazing Area)及估算該注視區域中各部份可能是顧客視線落點的機率(probability)。依據本發明的一實施例，經過步驟730中顧客頭部方向之偵測與步驟735中顧客頭部位置的估算後，可以得知圖片中某一顧客的「頭部方向」(roll、pitch、yaw三數值)與顧客頭部於立體空間中的位置。透過此二種資訊，進行以下模擬：從顧客頭部的正後方設置一虛擬光源，該虛擬光源以與該頭部正前方的方向投射出光線(light)，設定該投射光線延伸一設定距離、或是一直延伸直到有物體(商品)的位置，然後在該設定距離處、或該物體(商品)位置處，假設有一模擬的屏幕且該投射光線在上面形成一模擬的投影(projection)。該模擬投影所形成的投影區域是該顧客可能的注視區域，如圖11A與圖11B中所示的橢圓形投影區域815A、815B。根據本發明一實施例，該投影區域中進一步區分為複數個子投影區域，其中具有最高機率的子投影區域是該顧客的視線最可能聚焦的範圍，也可以推論是環境(賣場)中最引起顧客興趣的範圍，如圖11B中的子投影區域820。以圖11B為例，子投影區域820中所顯示的百分比數值為95%，其代表該顧客可能注視該子投影區域820的機率為95%。此外，以該子投影區域820為中心向外環狀放射的其他子投影區域830、840分別顯示有不同的機率數值，各是65%與15%，同樣地也代表顧客可能注視該子投影區域的機率。一般來說，人們的視線應該大多看向正前方並較多機會凝視自己正前方的物體。以此假設為前提，根據本發明一實施例，設定以該橢圓形投影區域的中心點為基準點劃分出複數個同心的子投影區域，離開中心點越遠的子投影區域，其被注視的機率愈小，如圖11B所示。根據本發明另一實施例，考量人們的眼球會轉動因此其視線可能並非看向正前方，因此設定劃分各個子投影區域的基準點偏離投影區域的中心，也可能偏左、偏右、偏上、或偏下等等。在步驟740中，圖片中的每一顧客依據其不同的頭部方向都會有各自的投影區域，每一顧客各自的投影區域代表該每一顧客的注視區域。In step 740, the Gazing Area of the customer is estimated and the probability that each part of the Gazing Area may be the customer's line of sight is estimated. According to an embodiment of the present invention, after the detection of the customer's head orientation in step 730 and the estimation of the customer's head position in step 735, the "head orientation" (roll, pitch, yaw three values) and the position of the customer’s head in the three-dimensional space. Through these two kinds of information, the following simulations are carried out: a virtual light source is set directly behind the customer's head, and the virtual light source projects a light in a direction directly in front of the head, and sets the projected light to extend a set distance, Or it extends until there is an object (commodity), and then at the set distance or position of the object (commodity), suppose there is a simulated screen and the projected light forms a simulated projection on it. The projection area formed by the simulated projection is the possible gaze area of the customer, such as the elliptical projection areas 815A and 815B as shown in FIGS. 11A and 11B. According to an embodiment of the present invention, the projection area is further divided into a plurality of sub-projection areas, where the sub-projection area with the highest probability is the most likely focus area of the customer’s line of sight. The range of interest is the sub-projection area 820 in FIG. 11B. Taking FIG. 11B as an example, the percentage value displayed in the sub-projection area 820 is 95%, which represents a 95% probability that the customer may look at the sub-projection area 820. In addition, the other sub-projection areas 830 and 840 radiating outward from the center of the sub-projection area 820 have different probability values, respectively 65% and 15%, which also means that customers may look at the sub-projection area. Probability. Generally speaking, people should mostly look straight ahead and have more opportunities to stare at objects directly in front of them. Based on this assumption, according to an embodiment of the present invention, it is set to divide a plurality of concentric sub-projection areas with the center point of the elliptical projection area as the reference point. The probability is smaller, as shown in Figure 11B. According to another embodiment of the present invention, considering that people’s eyeballs will rotate, their line of sight may not be looking straight ahead. Therefore, the reference point for dividing each sub-projection area is set to deviate from the center of the projection area, or it may be left, right, or up. , Or lower and so on. In step 740, each customer in the picture has its own projection area according to its different head orientation, and each customer's respective projection area represents the gaze area of each customer.

請參考圖11C，根據本發明另一實施例，假設一顧客向左轉角度θ往一方向(3)觀看，使用步驟735中之將影像擷取裝置400設為原點O，推算出顧客頭部的中心點H與原點間的距離跟顧客頭部正前方向左轉角度θ。接著，依據一旋轉矩陣公式計算出一注視點(Gazing Point) G，進而以注視點G與顧客頭部的中心點H所形成之一第一向量的方向推算出位於顧客頭部的中心點H正後方、距離中心點H為的適當位置處P，為一虛擬光源的投影起始點(Fictitious Projection Point) P；該適當位置P，於本發明一實施例中，可以是頭部後方大約3倍的頭部模擬球體直徑之處。此時，以一方程式(x-H_x )² +(y-H_y )² =r² 模擬顧客頭部外圍形狀，其中(H_x , H_y )代表頭部模擬球體的中心點、r代表頭部模擬球體的半徑；然後，透過一切點方程式推算出虛擬投影起始點P與方程式(x-H_x )² +(y-H_y )² =r² 所模擬的顧客頭部外圍形狀間之二個切點L與R，從而形成二向量，亦即由切點L與虛擬投影起始點P、由切點R與虛擬投影起始點P所形成一第二向量

跟一第三向量

。最後，依據前述之第二向量

跟第三向量

在屏幕810上之交點，據以形成一模擬投影區域（即為投影區域820）。Please refer to FIG. 11C, according to another embodiment of the present invention, suppose a customer turns to the left at an angle θ to look in a direction (3), using the image capturing device 400 in step 735 to set the origin O to calculate the customer's head The distance between the center point H of the part and the origin is the left-turn angle θ in the direction of the customer’s head. Next, a Gazing Point G is calculated according to a rotation matrix formula, and then the center point H of the customer's head is calculated in the direction of a first vector formed by the gazing point G and the center point H of the customer's head The proper position P directly behind and from the center point H is the Fictitious Projection Point P of a virtual light source; in an embodiment of the present invention, the proper position P may be about 3 behind the head. The head is twice the diameter of the sphere. At this time, use a formula (xH _x ) ² +(yH _y ) ² =r ^{2 to} simulate the outer shape of the customer’s head, where (H _x , H _y ) represents the center point of the head simulation sphere, and r represents the head simulation sphere Then, calculate the virtual projection starting point P through the equation of all points and the two tangent points L and R between the outer shape of the customer’s head simulated by the _{equation (xH x} ) ² +(yH _y ) ² =r ^2, Thus a two vector is formed, that is, a second vector is formed by the tangent point L and the virtual projection starting point P, and the tangent point R and the virtual projection starting point P

Followed by a third vector

. Finally, according to the aforementioned second vector

With the third vector

At the intersection on the screen 810, a simulated projection area (that is, the projection area 820) is formed accordingly.

根據本發明另一實施例，可以依據需求推算所需數量之切點，以形成對應數量之向量，進而得出前述向量在屏幕810上之交點，並據以形成一區域（即為投影區域820）。According to another embodiment of the present invention, the required number of tangent points can be calculated according to requirements to form a corresponding number of vectors, and then the intersection points of the aforementioned vectors on the screen 810 can be obtained, and an area (that is, the projection area 820) can be formed accordingly .

請參考圖11D，根據本發明另一實施例，假設一顧客頭部向右翻滾角度α、向上俯仰角度β以及向右偏擺角度γ往一方向(4)觀看，使用步驟735中之將影像擷取裝置400設為原點O，推算出顧客頭部的中心點H與原點間的距離跟顧客頭部正前方向(方向(4))的頭部轉動角度。接著，依據一旋轉矩陣公式計算出一注視點(Gazing Point) G，進而以注視點G與顧客頭部的中心點H所形成之一第四向量的方向推算出位於顧客頭部的中心點H正後方、距離中心點H為r的位置處，為一虛擬投影起始點(Fictitious Projection Point) P；該r位置，於本發明一實施例中，可以是頭部後方位於頭部模擬球體直徑的3.5倍之處。此時，以一方程式(x-H_x )² +(y-H_y )² +(z-H_z )² =r² 模擬顧客頭部外圍形狀，並透過一切點方程式推算出虛擬投影起始點P與方程式(x-H_x )² +(y-H_y )² +(z-H_z )² =r² 所模擬的顧客頭部外圍形狀間之二個切點A與B，從而形成二向量，亦即由切點A與虛擬投影點P、由切點B與虛擬投影點P所形成一第五向量

跟一第六向量

。最後，依據前述之第二向量

跟第三向量

在一第一虛擬平面（屏幕810）上之交點，據以形成一模擬投影區域（即為投影區域820）。Please refer to FIG. 11D, according to another embodiment of the present invention, suppose that a customer's head rolls to the right at an angle α, an upward pitch angle β, and a right yaw angle γ to look in one direction (4), using the image in step 735 The capturing device 400 is set as the origin O, and the distance between the center point H of the customer's head and the origin is calculated and the head rotation angle in the direction (direction (4)) directly in front of the customer's head is calculated. Then, a Gazing Point G is calculated according to a rotation matrix formula, and then the center point H of the customer's head is calculated in the direction of a fourth vector formed by the gazing point G and the center point H of the customer's head Right behind and at a position r from the center point H, is a virtual projection starting point (Fictitious Projection Point) P; the r position, in an embodiment of the present invention, may be the back of the head at the diameter of the simulated sphere of the head 3.5 times the value. At this time, a formula (xH _x ) ² +(yH _y ) ² +(zH _z ) ² =r ^{2 is used to} simulate the outer shape of the customer’s head, and the virtual projection starting point P and the equation (xH _x ) ² +(yH _y ) ² +(zH _z ) ² = ^{the two tangent points A and B between the peripheral shape of the customer's head simulated by r 2} to form a two vector, that is, from the tangent point A and the virtual projection point P , A fifth vector formed by the tangent point B and the virtual projection point P

With a sixth vector

. Finally, according to the aforementioned second vector

With the third vector

At the intersection on a first virtual plane (screen 810), a simulated projection area (that is, the projection area 820) is formed accordingly.

根據本發明另一實施例，參酌人因工程(Human Factors Engineering)領域的研究，人類眼睛的視野(visual fields)在頭部靜止不轉動的狀態下，單眼視野在垂直方向上約有120˚~140˚的視野大小、在水平方向上約有150˚的視野大小，是故雙眼所共同的視野領域在垂直方向、水平方向上分別約有60˚、90˚的視野大小。若要更仔細觀看一物體（標的）的話（也意味著對於該物體（標的）的專注度越高），則視野領域的角度必然會更狹窄。因此，於本發明一實施例，採用垂直方向為30°的角度、水平方向為30°的角度所形成的一或多條虛擬投影線定義訪客雙眼的共同視野領域，其中該一或多條虛擬投影線往後延伸交會於該訪客之人頭部後方的約6r處（r為頭部半徑）之一點（一虛擬投影點P，如圖11C所示）。從該點（該虛擬投影點P）沿著該一或多條虛擬線（例如圖11C所示之該第五向量

跟該第六向量

）來模擬一圓錐狀的模擬投影照射區域，並在一或多個虛擬平面（例如圖11D之該第一虛擬平面（屏幕810）、一第二虛擬平面（屏幕812））形成該訪客之一或多個模擬投影區域（例如圖11D之該投影區域820、一投影區域822）。若該物品（標的）的部份或全部為該圓錐狀模擬投影照射區域內，則表示該物品（標的）為該顧客所關注的標的，其中，若該物品（標的）的位置越靠近中心線（例如圖11C中之由點H與點O所形成一第一中心線及由點H與點G所形成一第二中心線），則表示該顧客不僅關注該物品（商品）且關注該物品（商品）的程度越高。According to another embodiment of the present invention, referring to research in the field of Human Factors Engineering, the visual fields of the human eyes are about 120˚ in the vertical direction when the head is stationary and not rotating. The field of view of 140˚ and the field of view of about 150˚ in the horizontal direction are about 60˚ and 90˚ of the field of vision shared by both eyes in the vertical and horizontal directions, respectively. If you want to look at an object (target) more closely (which also means that the degree of concentration on the object (target) is higher), the angle of the field of view will inevitably be narrower. Therefore, in an embodiment of the present invention, one or more virtual projection lines formed by an angle of 30° in the vertical direction and an angle of 30° in the horizontal direction are used to define the common field of view of the visitor’s eyes, wherein the one or more The virtual projection line extends backward and meets at a point (a virtual projection point P, as shown in Figure 11C) at about 6r (r is the radius of the head) behind the head of the visitor. From the point (the virtual projection point P) along the one or more virtual lines (for example, the fifth vector shown in FIG. 11C)

With the sixth vector

) To simulate a cone-shaped simulated projection irradiation area, and one or more virtual planes (for example, the first virtual plane (screen 810) and a second virtual plane (screen 812) in FIG. 11D) form one of the visitors Or multiple simulated projection areas (for example, the projection area 820 and a projection area 822 in FIG. 11D). If part or all of the item (target) is within the conical simulated projection irradiation area, it means that the item (target) is the target that the customer pays attention to, and if the position of the item (target) is closer to the center line (For example, in Figure 11C, a first centerline formed by points H and O and a second centerline formed by points H and G), it means that the customer not only pays attention to the item (commodity) but also pays attention to the item (Commodity) the higher the degree.

根據本發明另一實施例，該旋轉矩陣公式可以是一個二維旋轉矩陣公式如下：

。According to another embodiment of the present invention, the rotation matrix formula may be a two-dimensional rotation matrix formula as follows:

.

根據本發明另一實施例，該旋轉矩陣公式可以是一個三維旋轉矩陣公式如下：

。According to another embodiment of the present invention, the rotation matrix formula may be a three-dimensional rotation matrix formula as follows:

.

上述說明中，依據本發明的一實施例，只有當顧客停下腳步且其觀看某一物品(標的)的時間長度超過一設定的時間門檻值，才會針對他們推測其注視區域(Gazing Area)；其中的設定時間可能是，例如：2秒鐘，但不以此為限，可以是設定的任一時間長度。依據本發明的另一實施例，若是顧客正在移動且其注視視線持續於同一物品(標的)達超過一設定的時間門檻值以上(例如：2秒鐘以上，但不以此為限)，也會針對他們推測其注視區域(Gazing Area)。除了無法辨識出顧客的頭部方向的情形外，圖片中的每一顧客都應可估算出其注視區域(投影區域)與其中子投影區域中被注視的機率大小。上述「無法辨識顧客的頭部方向」的原因可能是：(1) 圖片中某一顧客的頭部影像不完整，例如：該顧客的頭部影像有大面積被其他顧客或物體擋住；(2) 圖片中某一顧客的頭部影像範圍所包含的畫素太小，致使該顧客的頭部無法被辨識出來。依據本發明所揭露的技術，圖片中顧客的頭部影像就算沒有包括臉部的眼睛(例如：後側的頭部影像)，該顧客的頭部方向還是可以被辨識出來。依據本發明的另一實施例，當圖片中顧客的頭部影像明顯地呈現出臉部的眼睛特徵時，可以計算並參考顧客的視線方向，將上述投影區域做進一步限縮，以增加估算顧客注視區域的正確性。加上計算視線方向後限縮的注視區域可能如圖11B中的820E所示。步驟740執行完畢，即跳至步驟750執行。In the above description, according to an embodiment of the present invention, only when customers stop and the length of time they watch an item (object) exceeds a set time threshold, will they be inferred for their Gazing Area (Gazing Area) ; The set time may be, for example: 2 seconds, but not limited to this, it can be any set time length. According to another embodiment of the present invention, if the customer is moving and his gaze continues on the same item (subject) for more than a set time threshold (for example, more than 2 seconds, but not limited to this), Will speculate on their Gazing Area (Gazing Area). Except for the situation where the head direction of the customer cannot be recognized, each customer in the picture should be able to estimate the probability of being looked at in its gaze area (projection area) and its sub-projection area. The reasons for the above "unrecognizable head direction of the customer" may be: (1) The head image of a customer in the picture is incomplete, for example: the head image of the customer has a large area blocked by other customers or objects; (2) ) The pixels contained in the image range of a customer's head in the picture are too small, so that the customer's head cannot be identified. According to the technology disclosed in the present invention, even if the head image of the customer in the picture does not include the eyes of the face (for example, the head image on the back side), the head direction of the customer can still be recognized. According to another embodiment of the present invention, when the customer’s head image in the picture clearly shows the eye characteristics of the face, the customer’s line of sight direction can be calculated and referenced, and the projection area can be further restricted to increase the estimated customer The correctness of the gaze area. The constricted gaze area after calculating the gaze direction may be as shown in 820E in FIG. 11B. After step 740 is executed, jump to step 750 for execution.

回到上述步驟705，當步驟705對影像資料的特定圖片中具有特定特徵的部分辨識為「物(商品)」時，則進入步驟715。Returning to the above step 705, when step 705 recognizes the part of the specific picture of the image data that has a specific feature as a "thing (commodity)", then step 715 is entered.

步驟715中，係對圖片中辨識為「物(商品)」的標的所進入的步驟。本發明基於CNN的架構，在訓練階段時透過資料庫450內各種「物(商品)」的資料集(Data Set)做深度學習，之後影像分析應用程式460可以學習到主題為各種「物(商品)」的「特徵對映(feature mapping)」，所以在步驟715中可以辨識出圖片(幀)中屬於「物(商品)」的標的。除了無法辨識的商品外，圖片中的所有商品應都可被辨識出來。上述「無法辨識商品」的原因可能是：(1) 圖片中某一商品的影像資料不完整，例如：該商品的影像範圍有大面積被其他的顧客或物品擋住；(2) 圖片中該商品的影像範圍所包含的畫素太小，致使該商品無法被辨識出來。由於圖片中所包含的商品其影像範圍大小不一，所以過小影像範圍的商品可能無法有效地被辨識出來。於本發明一實施例，商品可以順利被偵測並被辨識出來的畫素大小為40X40畫素(含)以上。該40X40畫素的門檻值僅是本發明提供的一實施參考，本發明並不以此為限，實際上最低可被辨識的畫素門檻值可能是任意其它數值的畫素大小，端視軟硬體的能力而定。一般來說商場中的商品種類非常的多，為了可以正確地辨識出每一種商品，故在訓練階段，每一種商品的資料集(Data Set)都必須先提供給影像分析應用程式460做深度學習。於本發明一實施例，定義各個商品的資料集(Data Set)中，其影像資料也呈現了「該商品於立體空間中的不同旋轉自由度下所呈現的態樣」，如圖7B與圖7C所示；甚至在影像資料的標籤(Label)中記錄了代表「空間中旋轉自由度」的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個旋轉角度的數值。步驟715執行完畢接著執行步驟745。In step 715, it is a step of entering the subject identified as "thing (commodity)" in the picture. The present invention is based on the CNN architecture. During the training phase, deep learning is performed through various "things (commodity)" data sets (Data Sets) in the database 450. After that, the image analysis application 460 can learn the topics as various "things (commodities)". )" "feature mapping (feature mapping)", so in step 715, the subject of the "thing (commodity)" in the picture (frame) can be identified. Except for the unrecognizable products, all the products in the picture should be identifiable. The reasons for the above "unidentifiable product" may be: (1) the image data of a product in the picture is incomplete, for example: the image range of the product is largely blocked by other customers or items; (2) the product in the picture The pixels contained in the image range of is too small, making the product unrecognizable. Since the image ranges of the products contained in the pictures vary in size, products with a too small image range may not be effectively identified. In an embodiment of the present invention, the pixel size at which the product can be successfully detected and recognized is 40×40 pixels (inclusive) or more. The threshold value of 40X40 pixels is only an implementation reference provided by the present invention, and the present invention is not limited to this. In fact, the lowest recognizable pixel threshold value may be the pixel size of any other value. Depending on the capabilities of the hardware. Generally speaking, there are many types of commodities in shopping malls. In order to correctly identify each commodity, in the training phase, the data set of each commodity must be provided to the image analysis application 460 for deep learning. . In an embodiment of the present invention, in the Data Set that defines each product, the image data also presents "the status of the product under different rotational degrees of freedom in a three-dimensional space", as shown in Figure 7B and Figure 7B. As shown in 7C; even in the label (Label) of the image data, the values of the three rotation angles of roll (Roll), pitch (Pitch) and yaw (Yaw) representing the "degree of freedom of rotation in space" are recorded. Step 715 is executed and then step 745 is executed.

依據本發明的另一實施例，步驟715可以不被執行，也就是不對影像資料中有關商品的部分做偵測。有關商品的部分，改為採用一種輸入商品資料的方式，供影像分析應用程式460參考與比對，其細節將於後文中說明。於此實施例，在訓練階段可以不提供商品的資料集(Data Set)供影像分析應用程式460做深度學習。According to another embodiment of the present invention, step 715 may not be executed, that is, no detection is performed on the part of the image data related to the product. For the product-related part, a method of inputting product data is adopted for reference and comparison by the image analysis application 460, the details of which will be explained later. In this embodiment, during the training phase, the data set of the product may not be provided for the image analysis application 460 to perform deep learning.

步驟745中，係估算或取得商品在環境(賣場)中的位置。依據本發明一實施例，步驟745中是透過辨識影像資料(圖片)中的商品來估算其位置。與步驟735相同的理由，換算2D圖片中的特定點(物體)於3D立體空間中的位置前，須要先對影像擷取裝置400做校準(calibration)的工作。所以，依據本發明一實施例，步驟745中估算所得圖片中某一商品的位置可以用3D立體空間座標來表示，例如：(200, 20, 525)；或是以2D平面座標來表示，例如：(550, 200)。In step 745, the position of the product in the environment (store) is estimated or obtained. According to an embodiment of the present invention, in step 745, the position of the product is estimated by identifying the product in the image data (picture). For the same reason as step 735, before converting the position of the specific point (object) in the 2D picture in the 3D space, the image capturing device 400 needs to be calibrated. Therefore, according to an embodiment of the present invention, the position of a product in the picture estimated in step 745 can be represented by 3D spatial coordinates, for example: (200, 20, 525); or represented by 2D planar coordinates, for example : (550, 200).

依據本發明另一實施例，步驟745可以藉由參考已輸入之商品資料，來取得「影像擷取裝置400所處的環境(賣場)中各個商品的擺放位置」。上述所謂「已輸入之商品資料」是指：事先在資料庫450中輸入已經設定好之各個商品的擺放位置，例如：各個商品擺放在環境(賣場)中的哪一空間點/範圍、或是賣場中的哪一貨架的哪一位置…等。在這種情況下，影像分析應用程式460並不需要辨識影像資料中有哪些商品，也不需要藉由辨識影像資料中的商品來估算其位置。當需要知道顧客的注視方向(或有興趣的範圍內)有哪些商品時，只要參考事先在資料庫450中輸入的商品位置資料，便可以達到目的。According to another embodiment of the present invention, in step 745, the "placement position of each product in the environment (sales) where the image capture device 400 is located" can be obtained by referring to the input product information. The above-mentioned "entered product information" refers to: inputting the placement position of each product that has been set in the database 450 in advance, for example: which space point/range in the environment (sales) where each product is placed, Or which shelf in which position in the store... etc. In this case, the image analysis application 460 does not need to identify which products are in the image data, nor does it need to identify the products in the image data to estimate its location. When it is necessary to know which products are in the customer's gaze direction (or within the range of interest), the purpose can be achieved by referring to the product location data entered in the database 450 in advance.

另，必須注意的是，本發明的步驟735與步驟745中，對於「顧客頭部位置的估算」及「商品位置的估算」，無論是要計算何者在3D立體空間中的位置，都不需同時透過二部影像擷取裝置400的二個攝影鏡頭(Lens)，僅需要透過一部影像擷取裝置400內部的一個攝影鏡頭即可達到此功能。雖然一般的3D影像攝影機在外觀上就只是單一部影像攝影機，但其內部卻包括至少二個或是二個以上的攝影鏡頭。所以，本發明技術亦可不須借重3D影像攝影機的複數個鏡頭，僅需要透過一個普通影像擷取裝置400的一個鏡頭，即可推算出2D圖片中各個位置點(代表人或物)在環境中的3D位置。In addition, it must be noted that in steps 735 and 745 of the present invention, for the "estimation of the position of the customer's head" and the "estimation of the position of the product", no matter which position in the 3D space is to be calculated, there is no need to At the same time, through the two camera lenses (Lens) of the two image capture devices 400, this function can be achieved by only one camera lens inside the image capture device 400. Although a general 3D video camera is only a single video camera in appearance, it contains at least two or more camera lenses inside. Therefore, the technology of the present invention does not need to rely on multiple lenses of a 3D image camera, and only needs to use one lens of an ordinary image capture device 400 to calculate the position of each point (representing a person or an object) in the 2D image in the environment. 3D position.

步驟750中，匯整步驟740中「顧客注視區域及其機率」與步驟745中「商品位置」的資訊，估算圖片中哪些商品被顧客注視區域所涵蓋與其獲得顧客目光(興趣)的機率。依圖9的流程圖所示，步驟745與步驟740執行後的結果皆輸出至步驟750。步驟740執行後可以得知圖片中顧客的注視區域的可能涵蓋範圍與該注視區域是顧客視線落點的機率；步驟745執行後可以得知環境(賣場)中的商品於3D立體空間中的位置。所以根據步驟740與步驟745提供的資訊，步驟750可以推測出，顧客的注視區域範圍內涵蓋了哪些商品，即「圖片中的顧客正在注視哪一些商品」。其中，顧客的注視區域本身又進一步區分為各種不同機率的子區域。所以，以上「圖片中的顧客正在注視哪一些商品」可以「顧客的注視區域的各子區域各涵蓋甚麼商品及其被注視的機率」來表達，或是選擇「落在顧客的注視區域內最高機率的一或複數個子區域的商品」。若顧客的注視區域或其子區域涵蓋了2種或以上的商品，則該些商品都會被包含進來。依據本發明的另一實施例，對於圖片中每一顧客所注視的商品並非以該商品的項目(種類)來表示，而係以該商品於該圖片的平面座標來表示、或是該商品於空間中的座標來表示。In step 750, the information of the "customer gaze area and its probability" in step 740 and the "product location" information in step 745 are aggregated to estimate which products in the picture are covered by the customer's gaze area and the probability of obtaining customer attention (interest). As shown in the flowchart of FIG. 9, the results after the execution of step 745 and step 740 are all output to step 750. After step 740 is executed, it is possible to know the possible coverage of the customer's gaze area in the picture and the probability that the gaze area is the customer's line of sight; after step 745 is executed, it is possible to know the position of the goods in the environment (sales) in the 3D space . Therefore, based on the information provided in steps 740 and 745, step 750 can infer which products are covered by the customer's gaze area, that is, "which products the customer in the picture is looking at". Among them, the customer's gaze area itself is further divided into various sub-areas with different probabilities. Therefore, the above "which products the customer is looking at in the picture" can be expressed by "what products are covered by the sub-regions of the customer's gaze area and the probability of being looked at", or you can choose to "fall in the customer's gaze area the highest Probability of products in one or more sub-regions". If the customer's gaze area or its sub-areas covers 2 or more types of products, these products will all be included. According to another embodiment of the present invention, the product that each customer looks at in the picture is not represented by the item (category) of the product, but is represented by the plane coordinates of the product in the picture, or the product in the picture. It is expressed by coordinates in space.

步驟755中，係將以上有關顧客和商品的資料記錄於資料庫450，並在圖片上產生視覺化(Visualization)的效果。依據本發明一實施例，此處所謂「視覺化的效果」係指在圖片中以文字、數字或符號顯示所辨識出的顧客身份(ID)與其所注視的商品。依據本發明的另一實施例，圖片中商品的視覺化效果以可以該商品的「座標值」來代替文字、數字或符號來顯示。此外，圖片中顧客臉/頭部的位置、商品的位置也都可以加入方框(或其他形狀的標示)來凸顯(highlight)其視覺化效果，如圖8B所示。依據本發明的另一實施例，此處所謂「視覺化的效果」，除了上述在圖片中以文字、數字或符號顯示所辨識出的顧客身份(ID)與其所注視的商品之外，更在圖片中顧客臉/頭部的位置、商品的位置，以「加強這些位置區域中像素的亮度」的方式來達成。依據本發明的再一實施例，此處所謂「視覺化的效果」，除了上述在原圖片中以文字、數字或符號顯示所辨識出的顧客身份(ID)與其所注視的商品之外，更以「圖片中除了顧客臉/頭部的位置、商品的位置之外其他區域的像素減少亮度(變暗)」的方式來達成；而圖片中顧客臉/頭部的位置、商品的位置的像素的亮度則可維持正常不變。又，請參考圖8C，圖8C中的各種商品並非都是以完全正面的方式來擺放，部分的商品有一定的旋轉角度。但經過訓練階段中利用資料庫450內的資料集(Data Set)做深度學習後，步驟755中影像分析應用程式460仍能辨識出各商品並以帶有顏色的方框標示出來，並在圖片中該商品的旁邊加入代表商品名稱的文字或符號。In step 755, the above information about customers and products is recorded in the database 450, and a visualization effect is generated on the picture. According to an embodiment of the present invention, the so-called "visual effect" herein refers to displaying the identified customer ID and the product they are looking at in the picture with words, numbers or symbols. According to another embodiment of the present invention, the visual effect of the product in the picture can be displayed with the "coordinate value" of the product instead of words, numbers, or symbols. In addition, the position of the customer's face/head and the position of the product in the picture can also be added with a box (or other shapes) to highlight the visual effect, as shown in FIG. 8B. According to another embodiment of the present invention, the so-called "visual effect" here is in addition to displaying the identified customer ID (ID) and the product they are looking at in the picture in words, numbers or symbols, but also The position of the customer's face/head and the position of the product in the picture are achieved by "enhancing the brightness of the pixels in these position areas". According to another embodiment of the present invention, the so-called "visualization effect" here, in addition to displaying the identified customer ID and the product they are looking at, in the original picture in words, numbers or symbols, it is more "In the picture, except for the position of the customer's face/head and the position of the product, the pixels in other areas are reduced in brightness (darkening)"; while the position of the customer's face/head and the position of the product in the picture are The brightness can be maintained as normal. Also, please refer to Figure 8C. The various products in Figure 8C are not all placed in a completely frontal manner, and some products have a certain angle of rotation. However, after using the Data Set in the database 450 for deep learning in the training phase, the image analysis application 460 in step 755 can still identify each product and mark it with a colored box, and display it in the picture Add text or symbols representing the name of the product next to the product.

請參考圖10，圖10係繪示步驟755中有關記錄顧客及其所注視商品的一範例。步驟755中會將前面各步驟得到的顧客相關資料、商品相關資料、顧客持續注視商品的時間等資訊記錄(儲存)於資料庫450內。根據本發明一實施例，上述記錄於資料庫450的各種相關資訊包含圖10所示的各欄位的資料。圖10的表格中包含有複數個項目，每一項目記錄一位顧客的身分(ID)及其相關屬性資料，例如：性別、年紀、及/或職業；還有記錄該顧客注視(有興趣)商品的相關資料，例如：「商品名稱」、「商品位置」、「持續關注時間」、「注視等級」…等；此外，亦會記錄「日期/時間」供做參考。其中「持續關注時間」是某一顧客關注某一商品的持續時間，基本上該持續時間應該都會大於一設定的最小注視時間長度，例如：2秒鐘，但不以此為限。「商品位置」是指該商品於環境(賣場)中的實際位置，可以用3D立體空間的三維座標來表示(如圖10中所示)、或是以事先定義的代號表示(如123-456表示123走道的456貨架)。根據本發明一實施例，取得一商品的位置後可以用來輔助確認該商品的名稱。「年紀」、「性別」、與職業(圖10中未示)等顧客的屬性資料，需視實際情況來決定是否記錄(儲存)於資料庫450中。若辨識顧客ID時不啟用此類辨識屬性資料相關的功能模組，則圖10中的「年紀」與「性別」的資料可以省略不記錄。「注視等級」的資料係表示此商品引起該顧客興趣的強度(Intensity)，其可以是以數值代表的等級，例如：以1到5表示由弱到強的五種等級，但並不以此為限。「注視等級」可以與「持續關注時間」相關連，例如：「持續關注時間」的數值越大表示該商品引起顧客興趣度愈高；此外，若商品位在顧客注視區域中較高機率的子區域，亦會提升「注視等級」。在記錄顧客的ID資料時，若影像分析應用程式460無法從資料庫450中找到對應的顧客資料時，則會以一暫時的代號來替代，例如圖10中的「「F0103」資料。在這種情況下，步驟755中除了以一暫時代號做為顧客的ID記錄外，還需增加該顧客的人臉/頭部的影像資料至資料庫450內，以提供下次當同一顧客再度至賣場購物時辨識其身份所使用。依據本發明的另一實施例，圖10中的「商品位置」中的空間座標也可以平面坐標的資料來取代。Please refer to FIG. 10. FIG. 10 shows an example of recording customers and their products in step 755. In step 755, the customer-related data, product-related data, and the time that the customer continues to look at the product obtained in the previous steps are recorded (stored) in the database 450. According to an embodiment of the present invention, the various related information recorded in the database 450 includes data in each field shown in FIG. 10. The table in Figure 10 contains multiple items. Each item records the identity (ID) of a customer and its related attributes, such as gender, age, and/or occupation; and records the customer's attention (interested) Product related information, such as: "product name", "product location", "continued attention time", "attention level"... etc.; in addition, the "date/time" will also be recorded for reference. Among them, "continuous attention time" is the duration of time a customer pays attention to a certain product. Basically, the duration should be greater than a set minimum gaze time length, for example: 2 seconds, but not limited to this. "Product location" refers to the actual location of the product in the environment (sales). It can be represented by three-dimensional coordinates in 3D space (as shown in Figure 10), or represented by a pre-defined code (such as 123-456). Represents the 456 shelf in aisle 123). According to an embodiment of the present invention, the location of a product can be used to assist in confirming the name of the product. The customer's attribute data such as "age", "sex", and occupation (not shown in Figure 10) should be recorded (stored) in the database 450 according to the actual situation. If the functional modules related to this type of identification data are not enabled when identifying the customer ID, the data of "age" and "gender" in Figure 10 can be omitted and not recorded. The data of "attention level" refers to the intensity of the product's interest in the customer (Intensity), which can be a level represented by a value, for example: 1 to 5 represent five levels from weak to strong, but not based on this Is limited. The "attention level" can be related to the "continuous attention time". For example, the larger the value of the "sustained attention time", the higher the degree of customer interest the product arouses; in addition, if the product is located in the customer's gaze area with a higher probability Area, will also increase the "attention level." When recording customer ID data, if the image analysis application 460 cannot find the corresponding customer data from the database 450, it will replace it with a temporary code, such as the "F0103" data in FIG. 10. In this case, in addition to using a temporary code as the customer’s ID record in step 755, it is also necessary to add the customer’s face/head image data to the database 450 to provide the same customer again next time Used to identify their identity when shopping at the store. According to another embodiment of the present invention, the spatial coordinates in the "product location" in FIG. 10 can also be replaced by plane coordinate data.

步驟760中，係將以上步驟取得的各種有關的資料更新至資料庫450內，並累積增加記錄內容用以供做統計分析之用。統計分析的資料量可以是一段時間(例如：1小時或是1天)的記錄，例如圖10所示的圖表；或是一段較長時間的累積資料(例如：一星期、一個月、一季、或是一年)。不同資料數量所做出的統計分析結果各有其意義，端視使用者的需求而定。統計分析的項目可以是，例如：「賣場中哪一些商品是容易引起購物顧客的興趣」、「某一些商品引起購物顧客興趣的性別、年齡層」、「最不容易引起購物顧客興趣的商品」、「某一商品擺放於賣場的何處最容易引起購物顧客的興趣」、「同款但不同色的商品中，何種顏色是容易引起購物顧客的興趣」…等等可供使用者利用的資訊。In step 760, various relevant data obtained in the above steps are updated to the database 450, and the record content is accumulated for statistical analysis. The amount of data for statistical analysis can be a period of time (for example: 1 hour or 1 day) of records, such as the chart shown in Figure 10; or a longer period of accumulated data (for example: one week, one month, one quarter, Or one year). The statistical analysis results of different amounts of data have their own meanings, depending on the needs of users. Statistical analysis items can be, for example: "Which products in the store are likely to arouse the interest of shopping customers", "Gender and age group of certain products that arouse the interest of shopping customers", "Products that are least likely to arouse the interest of shopping customers" , "Where is a product placed in the store most likely to arouse the interest of shopping customers", "Which color of the same product but in different colors is likely to arouse the interest of shopping customers"... etc. are available for users to use Information.

步驟765中，係根據步驟760中的統計分析結果，產出分析報告與/或建議給客戶。產出的分析報告可能包含加入視覺化效果後的影像資料、分析資料後所得到的有用資訊、分析後的建議等等。其中，分析後的建議可以是，例如：「某種商品的補貨數量及補貨預估時間」、「某種不受顧客青睞的商品更換到某一區域內來擺設」、「某種商品與其價格的促銷時機」、「某種商品的最佳擺設位置」… 等對客戶來說是基於大數據(big data)分析後的可幫助銷售的有用建議。In step 765, based on the statistical analysis result in step 760, an analysis report and/or suggestions are generated to the customer. The generated analysis report may include image data after adding visualization effects, useful information obtained after analyzing the data, suggestions after analysis, and so on. Among them, the suggestions after analysis can be, for example: "a certain product's replenishment quantity and estimated replenishment time", "a certain product that is not favored by customers should be replaced in a certain area for display", "a certain product The timing of the promotion with its price, the “best placement of a certain product”... etc. are useful suggestions for customers that can help sales based on big data analysis.

步驟765中「產出分析報告與建議」的執行，可能由於各行業的特殊性之不同，也可能基於客戶的實際需要而步驟765予以省略。步驟765執行完畢後也結束了圖9的流程。The execution of the "output analysis report and recommendations" in step 765 may be omitted due to the particularity of each industry or based on the actual needs of customers. After step 765 is executed, the flow of FIG. 9 is also ended.

請參考圖11A，圖11A係繪示本發明中參考頭部方向(Head Direction)以模擬投影光線的方式推算顧客可能注視(有興趣)的區域的示意圖。圖11A中分別包括了2個呈現不同角度的頭部800A與800B示意圖，其中頭部800A的頭部方向1(Head Direction 1)係以(Roll1, Pitch1, Yaw1)表示其在空間中呈現的三個旋轉角度。(Roll1, Pitch1, Yaw1)中的3個角度的數值可以代表該頭部方向1(Head Direction 1)於立體空間中所呈現的偏轉態樣。同樣地，頭部800B的頭部方向2(Head Direction 2)係以(Roll2, Pitch2, Yaw2)表示其在空間中呈現的三個旋轉角度。(Roll2, Pitch2, Yaw2)中的3個角度的數值可以代表該頭部方向2(Head Direction 2)於立體空間中所呈現的偏轉態樣。依據本發明的一實施例，本發明中以模擬投影光線的方式推算顧客可能注視(有興趣)的區域，說明如下。假設於立體空間中在頭部800A、800B的正後方有一光源(Light Source)，分別以頭部方向1(Head Direction 1)與頭部方向2(Head Direction 2)往頭部前方投射光線。此一模擬的光源從頭部的正後方往頭部前方投射光線，模擬的投射光線在立體空間中會形成一錐狀體或圓柱體。當模擬光線一直往前投射至物品或商品的所在之處，便在該處假設有一模擬屏幕(Screen) 810，而該模擬光線的錐狀體或圓柱體在該模擬屏幕810上的投影(Projection)即是顧客可能注視(有興趣)的區域範圍。圖11A中顯示，以不同頭部方向(Head Direction 1與Head Direction 2)投射光線後，在同一處的模擬屏幕810上分別產生不同的投影815A與815B。該投影815A與投影815B即是推算出來之頭部800A與頭部800B各自的注視區域範圍，而位於該投影815A與投影815B的區域範圍內的物品，即可推測是顧客注視或有興趣之物品(商品)。Please refer to FIG. 11A. FIG. 11A is a schematic diagram of the present invention referring to the head direction (Head Direction) to estimate the area that the customer may look at (interested) by simulating the projection light. Figure 11A includes two schematic diagrams of heads 800A and 800B showing different angles. The head direction 1 (Head Direction 1) of the head 800A is represented by (Roll1, Pitch1, Yaw1) to represent the three in the space. Rotation angles. The three angle values in (Roll1, Pitch1, Yaw1) can represent the deflection of the head direction 1 (Head Direction 1) in the three-dimensional space. Similarly, the head direction 2 (Head Direction 2) of the head 800B is represented by (Roll2, Pitch2, Yaw2) to represent the three rotation angles in the space. The three angle values in (Roll2, Pitch2, Yaw2) can represent the deflection of the head direction 2 (Head Direction 2) in the three-dimensional space. According to an embodiment of the present invention, in the present invention, the area that the customer may look at (interested) is estimated by simulating the projected light, which is described as follows. Assume that there is a light source (Light Source) directly behind the heads 800A and 800B in the three-dimensional space, and the light is projected toward the front of the head with the head direction 1 (Head Direction 1) and the head direction 2 (Head Direction 2) respectively. This simulated light source projects light from directly behind the head to the front of the head, and the simulated projected light forms a cone or cylinder in a three-dimensional space. When the simulated light is projected forward to the place where the object or commodity is located, it is assumed that there is a simulated screen (Screen) 810 there, and the projection of the cone or cylinder of the simulated light on the simulated screen 810 (Projection) ) Is the area that customers may look at (interested). As shown in FIG. 11A, after light is projected in different head directions (Head Direction 1 and Head Direction 2), different projections 815A and 815B are respectively generated on the simulated screen 810 at the same place. The projection 815A and the projection 815B are the calculated gaze areas of the head 800A and the head 800B, and the items located within the area of the projection 815A and the projection 815B can be presumed to be items that the customer is watching or interested. (commodity).

一般情況下，人們很少會以眼睛的餘光來看物體，大部分的情況下是以與頭部方向差不多的視線來往前注視物體。當設定眼睛的視野區域局限於與頭部同寬時，如圖11B中以虛線805C與805D所表示的視野區域，則推算出的可能注視區域範圍會很有限，如圖11B中的區域範圍820；反之，當眼睛的視野區域擴大到眼睛餘光可及之處時，如圖11B中以虛線805A與805B所表示的視野區域，則推算出的可能的注視區域範圍會變大許多，如圖11B中的區域範圍815A。又，圖11A中所示的投影815A、815B可能是圖11B中繪示的任一橢圓區域820、830、840。根據本發明技術，顧客(頭部800A與800B)於立體空間中的所在位置可以推算出來，而顧客(頭部800A與800B)至該虛擬屏幕810(也就是物品所在處)的距離也是一可以推測的數值，加上利用深度學習後得到的「特徵對映(feature mapping)」判斷顧客的頭部800A與800B於立體空間中所呈現的角度(Head Direction 1及Head Direction 2)，將上述的各種資料輸入影像分析應用程式460後，便可以計算出顧客可能注視(有興趣)的區域(投影815A與投影815B)。Under normal circumstances, people seldom look at objects with the peripheral light of their eyes. In most cases, they look at objects with a line of sight similar to the direction of their heads. When the visual field area of the eyes is set to be the same width as the head, as shown by the dashed lines 805C and 805D in Fig. 11B, the estimated possible fixation area will be very limited, as shown in the area range 820 in Fig. 11B ; Conversely, when the visual field area of the eye is expanded to the area where the peripheral light of the eye can reach, as shown in the visual field area represented by the dashed lines 805A and 805B in Fig. 11B, the estimated possible fixation area will become much larger, as shown in Fig. Area range 815A in 11B. Furthermore, the projections 815A and 815B shown in FIG. 11A may be any of the elliptical regions 820, 830, and 840 shown in FIG. 11B. According to the technology of the present invention, the location of the customer (head 800A and 800B) in the three-dimensional space can be calculated, and the distance from the customer (head 800A and 800B) to the virtual screen 810 (that is, where the item is located) can also be calculated. The inferred value, plus the "feature mapping" obtained by deep learning to determine the angles (Head Direction 1 and Head Direction 2) of the customer's head 800A and 800B in the three-dimensional space (Head Direction 1 and Head Direction 2) After inputting various data into the image analysis application 460, the area (projection 815A and projection 815B) that the customer may look at (interested) can be calculated.

請參考圖11B，圖11B係以另一角度呈現圖11A中模擬投影的方式所形成的可能注視區域範圍與眼睛視野之示意圖。圖11B中，虛線805A、805B表示眼睛餘光所及的視野，並以其定義投影範圍840是；而虛線805C、805D則表示與頭部800A同寬的視野，並以其定義投影範圍820。承前面所述，以眼睛餘光所及的視野805A、805B所形成的區域範圍840為最大；以與頭部800A同寬的視野投射所形成的區域範圍820為最小。在一般的情況下，眼睛視線總是以與頭部方向一致的方向往前看，故位於最中心的區域範圍820是顧客注視(有興趣)的區域的機率最高。舉例來說，假設顧客有95%的機率都在看最中心的區域範圍820裡的物品(商品)，再外圈的區域範圍830去掉區域範圍820的環狀部分可能是65%的機率，最外圈的區域範圍840去掉區域範圍830的環狀部分可能只剩15%的機率。依據本發明的另一實施例，當可以清楚地辨識圖片中人臉的眼睛時，可以依據辨識出的眼睛視線來調整計算方法，而可以得到一個更精確的區域範圍820E，而該區域範圍820E的機率百分比應該是高於區域範圍820內的95%，圖11B中所顯示該區域範圍820E的機率百分比為98%。Please refer to FIG. 11B. FIG. 11B is a schematic diagram showing the possible fixation area and the visual field of the eye formed by the simulated projection in FIG. 11A from another angle. In FIG. 11B, the dashed lines 805A and 805B indicate the visual field of the eye and define the projection range 840; and the dashed lines 805C and 805D indicate the visual field of the same width as the head 800A and define the projection range 820. In view of the foregoing, the area 840 formed by the field of vision 805A and 805B with the peripheral light of the eye is the largest; the area 820 formed by the field of view projection with the same width as the head 800A is the smallest. In general, the line of sight of the eyes always looks forward in the same direction as the head. Therefore, the area 820 located in the center is the area where the customer is most likely to look at (interested). For example, suppose that there is a 95% chance that customers are looking at the items (commodities) in the most central area 820, and then the outer area 830 removes the ring part of the area 820 may have a 65% probability. The area range 840 of the outer ring may only have a 15% chance of removing the ring part of the area range 830. According to another embodiment of the present invention, when the eyes of the face in the picture can be clearly recognized, the calculation method can be adjusted according to the recognized line of sight of the eyes, and a more accurate area range 820E can be obtained, and the area range 820E The probability percentage of 820E should be higher than 95% in the area range 820. The probability percentage of 820E in the area range 820E shown in Figure 11B is 98%.

本發明中所揭露的技術中與先前技術相較，先前技術必須要透過2個鏡頭同時記錄一人物的臉部中心線、眼睛視線的方向(眼球方向)的資訊才可以得知該人物正在注視的物體為何。而本發明所揭露的技術只需透過1鏡頭來擷取人物的頭部影像，即使沒有看到臉部也可知道該人物正在注視何種物體。所以本發明所揭露的方法適合使用在影像資料中包含許多人/物體的場合，且該方法只需用到單一影像攝影機(鏡頭)，同時影像資料的畫素(解析度)要求也不需要高到可以看清楚人臉上的眼球的程度。所以使用本發明的方法，可以簡單地達到辨識圖片中的哪些人正在注視哪些物體的目的。Compared with the prior art, the technology disclosed in the present invention must record the center line of a person’s face and the direction of the eye (eyeball direction) through two lenses at the same time to know that the person is watching. What is the object. The technology disclosed in the present invention only needs to capture the head image of a person through one lens, and even if the face is not seen, it is possible to know what kind of object the person is looking at. Therefore, the method disclosed in the present invention is suitable for use in situations where the image data contains many people/objects, and the method only needs to use a single image camera (lens), and the pixel (resolution) requirement of the image data does not need to be high. To the extent that the eyeballs on the human face can be seen clearly. Therefore, the method of the present invention can simply achieve the purpose of identifying which people in the picture are looking at which objects.

請同時參考圖12A與圖12B，依據本發明的一實施例，圖12A係一智慧電子看板系統(Intelligent Electric Signage System, IESS) 100M的架構圖。該系統100M包括了一影像處理裝置(Image Processing Device) 300M、一影音顯示裝置(Video/Audio Display Device) 850、一影像擷取裝置(Image Capture Device) 400M、一內容伺服器(Content Server, CS) 870與一媒體播放器(Media Player) 880。其中影像處理裝置300M與影像擷取裝置400M之間、影像處理裝置300M與媒體播放器880之間、以及媒體播放器880與影音顯示裝置850之間，係各自透過傳輸線(Transmission Line) 855互相連接；影像處理裝置300M、媒體播放器880兩者與內容伺服器870之間，則透過網路860或傳輸線互相連接以傳送資料。該網路860可以是一區域網路(LAN)、一廣域網路(WAN)、一網際網路(Internet)、或一無線網路(Wireless Network)等。依據本發明的另一實施例，影像處理裝置300M與影像擷取裝置400M之間不是透過傳輸線855傳送資料，而是透過該網路860或是另一網路互相連接以傳送資料。依據本發明的另一實施例，影像處理裝置300M與媒體播放器880間並非透過傳輸線855傳送資料，而是透過該網路860或是另一網路互相連接以傳送資料。依據本發明的一實施例，影像擷取裝置400M可以獨立設置於影音顯示裝置850的附近；依據本發明的另一實施例，影像擷取裝置400M係與影音顯示裝置850整合在一起成為單一裝置。Please refer to FIG. 12A and FIG. 12B at the same time. According to an embodiment of the present invention, FIG. 12A is an architecture diagram of an Intelligent Electric Signage System (IESS) 100M. The system 100M includes an image processing device (Image Processing Device) 300M, a video/audio display device (Video/Audio Display Device) 850, an image capture device (Image Capture Device) 400M, and a content server (Content Server, CS). ) 870 and a Media Player 880. Among them, the image processing device 300M and the image capturing device 400M, the image processing device 300M and the media player 880, and the media player 880 and the audio-visual display device 850 are connected to each other through a transmission line 855. ; The image processing device 300M, the media player 880 and the content server 870 are connected to each other through a network 860 or a transmission line to transmit data. The network 860 may be a local area network (LAN), a wide area network (WAN), an Internet (Internet), or a wireless network (Wireless Network), etc. According to another embodiment of the present invention, the image processing device 300M and the image capturing device 400M do not transmit data through the transmission line 855, but are connected to each other through the network 860 or another network to transmit data. According to another embodiment of the present invention, the image processing device 300M and the media player 880 do not transmit data through the transmission line 855, but are connected to each other through the network 860 or another network to transmit data. According to an embodiment of the present invention, the image capturing device 400M can be independently installed near the audio-visual display device 850; according to another embodiment of the present invention, the image capturing device 400M is integrated with the audio-visual display device 850 into a single device .

依據本發明的一實施例，影像處理裝置300M及/或內容伺服器870可以是圖2中雲資料儲存單元150內的一虛擬機器(Virtual Machine, VM)，執行於該虛擬機器上的作業系統亦由雲資料儲存單元150的供應商來提供。請參考圖13B，其中的影像分析應用程式460M可於雲資料儲存單元150提供的虛擬機器上的作業系統中執行；也就是說，使用者上傳影像分析應用程式460M到雲資料儲存單元150中，然後透過其中的虛擬機器來執行該影像分析應用程式460M。在此實施例，影像擷取裝置400M所擷取的影像資料會先透過網路860傳送到雲資料儲存單元150，以提供給位於雲端中的虛擬機器(亦即，影像處理裝置300M)分析處理。According to an embodiment of the present invention, the image processing device 300M and/or the content server 870 may be a virtual machine (Virtual Machine, VM) in the cloud data storage unit 150 in FIG. 2, and an operating system running on the virtual machine It is also provided by the supplier of the cloud data storage unit 150. Please refer to FIG. 13B, where the image analysis application 460M can be executed in the operating system on the virtual machine provided by the cloud data storage unit 150; that is, the user uploads the image analysis application 460M to the cloud data storage unit 150, Then the image analysis application 460M is executed through the virtual machine. In this embodiment, the image data captured by the image capturing device 400M is first transmitted to the cloud data storage unit 150 via the network 860, so as to be provided to the virtual machine located in the cloud (ie, the image processing device 300M) for analysis and processing .

圖12A中，影像擷取裝置400M係用來擷取位於前方的人及/或物體。圖12A中的影像擷取裝置400M與圖2中的影像擷取裝置400A-400N是類似的裝置，彼此間的功能差異不大，有關影像擷取裝置400M的詳細細節請參考前文中關於圖2之影像擷取裝置400A-400N的說明。於本發明的一實施例，當影像擷取裝置400M與影音顯示裝置850整合為單一裝置時，影像擷取裝置400M可以安置於影音顯示裝置850的上方、側邊、或下方處，以記錄正在觀看影音顯示裝置850中之影像資料的人們。基本上，影像擷取裝置400M內部只包含一個影像鏡頭即可實施本發明所揭露的技術；但在其它實施例中，影像擷取裝置400M內部亦可以裝設多個鏡頭，以符合特殊應用的需求。透過影像擷取裝置400M所擷取的影像資料，影像處理裝置300M可以偵測站在影像擷取裝置400M(亦即，影音顯示裝置850)前面的人們的視線方向，以判定是否有人、有多少人、以及哪些人正在注視或觀看影音顯示裝置850顯示的內容。故，影像擷取裝置400M的設置位置只要能達到以上目的(亦即，可以記錄人們臉部(視線方向)的位置處)即可，不限於要設置在影音顯示裝置850的上方或其它特定位置。In FIG. 12A, the image capturing device 400M is used to capture people and/or objects located in the front. The image capturing device 400M in FIG. 12A and the image capturing devices 400A-400N in FIG. 2 are similar devices, and their functions are not very different. For the details of the image capturing device 400M, please refer to the previous section on Figure 2 Description of the image capture device 400A-400N. In an embodiment of the present invention, when the image capture device 400M and the audio-visual display device 850 are integrated into a single device, the image capture device 400M can be placed above, on the side, or below the audio-visual display device 850 to record People watching the video data in the audio-visual display device 850. Basically, the image capturing device 400M contains only one image lens to implement the technology disclosed in the present invention; but in other embodiments, the image capturing device 400M can also be equipped with multiple lenses to meet the requirements of special applications. need. Through the image data captured by the image capturing device 400M, the image processing device 300M can detect the sight direction of people standing in front of the image capturing device 400M (ie, the audio-visual display device 850) to determine whether there are people and how many People and who are watching or watching the content displayed by the audio-visual display device 850. Therefore, the installation position of the image capture device 400M only needs to achieve the above purpose (that is, it can record the position of people's face (line of sight)), and it is not limited to be installed above the audio-visual display device 850 or other specific positions. .

圖12A中，本發明之影像處理裝置300M係用來處理與分析來自影像擷取裝置400M的影像資料。該影像處理裝置300M的硬體與軟體架構係與圖3A、圖3B中的影像分析伺服器300相近。請參考圖13A與圖13B，與圖3A、圖3B的影像分析伺服器300相比，影像處理裝置300M只需要一個實體儲存裝置385及單一個作業系統即已足夠實施本發明之智慧電子看板系統；於本發明之另一實施例，亦可使用圖3A與圖3B的影像分析伺服器300做為影像處理裝置300M。關於圖13A與圖13B中有關影像處理裝置300M的各個軟硬體元件，請參考前文中圖3A與圖3B的各個硬體/軟體元件的說明。影像處理裝置300M接收來自影像擷取裝置400M的影像資料，透過影像處理裝置300M上的影像分析應用程式460M做偵測與分析，以判斷正在觀看影音顯示裝置850所播放的影音內容(例如：廣告)的人們的各種屬性(Attributes of People)。又，同步配合媒體播放器880的播放進度及/或內容伺服器870關於複數個影音廣告的內容屬性(Attribute of Contents)的資料，做為之後進一步調整複數個影音廣告的播放內容與順序的參考。當影像處理裝置300M需要記錄觀看影音廣告的人們的各種屬性、媒體播放器880的播放進度、以及複數個影音廣告的內容屬性等各項相關的統計資料時，且這些統計資料需要透過一手持裝置(Hand Hold Device)輸出時，則圖13A中的影像處理裝置300M的基本架構更包含一無線傳輸裝置(Wireless Transmission Device，圖未示)，例如：WiFi裝置或是藍芽(Bluetooth)裝置。上述的手持裝置可以是，例如：智慧型手機(Smart phone)、平板電腦(Tablet PC)、筆記型電腦(Notebook PC)等。In FIG. 12A, the image processing device 300M of the present invention is used to process and analyze the image data from the image capture device 400M. The hardware and software architecture of the image processing device 300M is similar to the image analysis server 300 in FIG. 3A and FIG. 3B. Please refer to FIGS. 13A and 13B. Compared with the image analysis server 300 of FIGS. 3A and 3B, the image processing device 300M only needs one physical storage device 385 and a single operating system, which is enough to implement the smart electronic signage system of the present invention. In another embodiment of the present invention, the image analysis server 300 of FIGS. 3A and 3B can also be used as the image processing device 300M. For the various hardware and software components of the image processing device 300M in FIGS. 13A and 13B, please refer to the description of each hardware/software component in FIGS. 3A and 3B in the foregoing. The image processing device 300M receives the image data from the image capturing device 400M, and performs detection and analysis through the image analysis application 460M on the image processing device 300M to determine whether the video content (such as advertisements) played by the video and audio display device 850 is being watched. ) Of the various attributes of people (Attributes of People). In addition, it is synchronized with the playback progress of the media player 880 and/or the content server 870 regarding the Attribute of Contents data of the plurality of audiovisual advertisements, as a reference for further adjusting the playback content and order of the plurality of audiovisual advertisements later. . When the image processing device 300M needs to record the various attributes of the people watching the audio-visual advertisements, the playback progress of the media player 880, and the content attributes of multiple audio-visual advertisements, etc., and these statistical data need to pass through a handheld device (Hand Hold Device) output, the basic structure of the image processing device 300M in FIG. 13A further includes a wireless transmission device (Wireless Transmission Device, not shown), such as a WiFi device or a Bluetooth device. The aforementioned handheld device may be, for example, a smart phone (Smart phone), a tablet PC (Tablet PC), a notebook PC (Notebook PC), etc.

圖12A中的媒體播放器(Media Player) 880係透過網路860將來自內容伺服器870的串流影音資料(Streaming Video/Audio Data)加以解碼重組成影像資料與聲音資料後，透過傳輸線855輸出給影音顯示裝置850；同時媒體播放器880也會將即時的播放進度傳送給影像處理裝置300M。依據本發明的另一實施例，媒體播放器880只有將影像資料與聲音資料透過傳輸線855輸出給影音顯示裝置850，並不會將即時的播放進度傳送給影像處理裝置300M。上述傳送影像資料與聲音資料至影音顯示裝置850的傳輸線855可以是，例如：與HDMI端子匹配的傳輸線、與A/V端子匹配的傳輸線…等。圖12A中的媒體播放器880實作上係以硬體來達成。依據本發明的另一實施例，媒體播放器880也可用軟體來實施，例如，圖13B中的一多媒體播放程式465M，其係為在影像處理裝置300M內以軟體來執行串流影音的播放的方法。The Media Player 880 in Figure 12A decodes the Streaming Video/Audio Data from the content server 870 through the network 860, reassembles it into video data and audio data, and outputs it through the transmission line 855 To the audio-visual display device 850; meanwhile, the media player 880 will also transmit the real-time playback progress to the image processing device 300M. According to another embodiment of the present invention, the media player 880 only outputs the video data and audio data to the audio-visual display device 850 through the transmission line 855, and does not transmit the real-time playback progress to the image processing device 300M. The above-mentioned transmission line 855 for transmitting video data and audio data to the audio-visual display device 850 may be, for example, a transmission line matched with an HDMI terminal, a transmission line matched with an A/V terminal, etc. The media player 880 in FIG. 12A is implemented by hardware. According to another embodiment of the present invention, the media player 880 can also be implemented by software. For example, a multimedia player program 465M in FIG. 13B is used in the image processing device 300M to perform streaming video and audio playback with software. method.

圖12A中的影音顯示裝置850係用來顯示該媒體播放器880所傳送的串流影音資料，其內部基本上包括一顯示器(面板) 890與一揚聲器系統(圖未標號)。該串流影音資料是由內容伺服器870透過網路860傳送出來，經媒體播放器880內的影音解碼器的解碼後，將其中的影像資料輸出至該顯示器890來顯示影音廣告，而解碼後的聲音資料則送至揚聲器系統來播放聲音。依據本發明的另一實施例，影音顯示裝置850更具有一網路介面，並非透過傳輸線855而是透過網路860接收來自媒體播放器880的影像資料與聲音資料。The audio-visual display device 850 in FIG. 12A is used to display the streaming audio-visual data transmitted by the media player 880, and basically includes a display (panel) 890 and a speaker system (not numbered). The streaming audio-visual data is transmitted by the content server 870 through the network 860, and after being decoded by the audio-visual decoder in the media player 880, the video data is output to the display 890 to display the audio-visual advertisement, and then decoded The sound data is sent to the speaker system to play the sound. According to another embodiment of the present invention, the audio-visual display device 850 further has a network interface, not through the transmission line 855 but through the network 860 to receive the video data and audio data from the media player 880.

依據本發明的另一實施例，圖12A中的影像處理裝置300M與媒體播放器880可以整合為一單一裝置；此外，影像擷取裝置400M與影音顯示裝置850亦可以整合為一單一裝置。依據本發明的另一實施例，上述影像處理裝置300M、媒體播放器880、影像擷取裝置400M與影音顯示裝置850四者可以整合為一單一裝置。According to another embodiment of the present invention, the image processing device 300M and the media player 880 in FIG. 12A can be integrated into a single device; in addition, the image capturing device 400M and the audio-visual display device 850 can also be integrated into a single device. According to another embodiment of the present invention, the aforementioned image processing device 300M, media player 880, image capturing device 400M, and audio-visual display device 850 can be integrated into a single device.

內容伺服器870係為一具有網路介面的伺服器(Server)。基本上，內容伺服器870的硬體架構與一般伺服器的差異不大；但在軟體內容上，內容伺服器870會提供多種不同類型的影音檔案(例如：廣告)。這些多種不同類型的影音檔案(例如：廣告)的內容係針對不同族群的人所設計，其中所謂不同族群的人可以是，例如：老年的男/女、年輕的男/女、小孩、肥胖的男/女、戴眼鏡的男/女、飼養寵物的男/女…等各種不同屬性的族群。影音檔案(例如：廣告)的內容中牽涉包括食、衣、住、行、育、樂等各領域的產品或服務。又，上述每一影音檔案(例如：廣告)所針對的族群、播放順序、影片長度…等各種資訊為該影音檔案(例如：廣告)的「內容屬性」(Attribute of Contents)。內容伺服器870可以24小時不間斷或特定時間的方式循環上述多種不同類型的影音檔案檔案(例如：廣告)，並將該些影音檔案(例如：廣告)轉換成影音串流資料格式並透過網路860傳送給媒體播放器880。圖12A中的內容伺服器870會將該影音檔案(例如：廣告)的內容屬性等相關資料透過網路860傳送給影像處理裝置300M。此外，內容伺服器870也會透過網路860接受影像處理裝置300M的指令而改變影音檔案(例如：廣告)的播放行為，例如：中斷播放、變更播放順序、影像快轉、音量調整、顯示特定文字…等。The content server 870 is a server with a network interface. Basically, the hardware architecture of the content server 870 is not much different from the general server; but in terms of software content, the content server 870 will provide a variety of different types of audiovisual files (for example, advertisements). The contents of these various types of audiovisual files (such as advertisements) are designed for people of different ethnic groups, among which the so-called people of different ethnic groups can be, for example: elderly men/women, young men/women, children, obese people Men/women, men/women who wear glasses, men/women who keep pets... and other ethnic groups with different attributes. The content of audio-visual files (for example, advertisements) involves products or services in various fields including food, clothing, housing, transportation, education, and music. In addition, various information such as the ethnic group, playback sequence, video length, etc., targeted by each of the above-mentioned audiovisual files (for example, advertisements) are the "Attribute of Contents" of the audiovisual file (for example, advertisements). The content server 870 can circulate the above-mentioned various types of audio-visual files (such as advertisements) in a 24-hour uninterrupted manner or at a specific time, and convert these audio-visual files (such as advertisements) into the format of audio-visual streaming data and connect them to the Internet. Route 860 to the media player 880. The content server 870 in FIG. 12A transmits the content attributes and other related data of the audiovisual file (for example: advertisement) to the image processing device 300M via the network 860. In addition, the content server 870 will also receive instructions from the image processing device 300M through the network 860 to change the playback behavior of video files (such as advertisements), such as interrupting playback, changing the playback order, fast forwarding images, volume adjustment, display specific Text...etc.

請參考圖12B，圖12B係依據本發明的一實施例中所示的智慧電子看板系統100M的應用情境圖。圖12B中顯示影音顯示裝置850前有2位顧客正在觀看目前播出的廣告內容，位於影音顯示裝置850上方的影像擷取裝置400M正在擷取2位訪客的表情與反應，同一時間影像擷取裝置400M也將所記錄的影像資料傳送給影像處理裝置300M來分析。影像處理裝置300M透過內部的影像分析應用程式460M即時地分析並判斷出2位訪客的性別、年紀、臉部表情、衣著、身高…等屬性資料。上述有關訪客的各種屬性資料與播出的影音廣告的內容有著極高的關連性，此種關連性會使播出的影音廣告吸引到與其最相符的顧客群的注意，以發揮最大的廣告效益。Please refer to FIG. 12B, which is an application scenario diagram of the smart electronic signage system 100M shown in an embodiment of the present invention. Figure 12B shows that 2 customers are watching the current advertisement content in front of the audio-visual display device 850. The image capture device 400M located above the audio-visual display device 850 is capturing the expressions and reactions of the two visitors. The images are captured at the same time. The device 400M also sends the recorded image data to the image processing device 300M for analysis. The image processing device 300M uses the internal image analysis application 460M to analyze and determine the gender, age, facial expression, clothing, height, etc. attributes of the two visitors in real time. The above-mentioned various attribute information about visitors has a very high correlation with the content of the broadcasted video and audio advertisements. This kind of connection will make the broadcasted video and audio advertisements attract the attention of the most consistent customer group to maximize the advertising benefits. .

請參考圖13A，圖13A係依據本發明一實施例所繪示圖12A中的該影像處理裝置300M其硬體的基本架構方塊圖。圖13A中該影像處理裝置300M的硬體架構與圖3A中該影像分析伺服器300的硬體架構幾乎相同。差異的地方是該影像處理裝置300M可能進一步包括一無線傳輸裝置(Wireless Transmission Device, 圖未示)，該無線傳輸裝置係讓該影像處理裝置300M與手持裝置間以無線的方式來傳輸資料。圖13A中各硬體架構請參考本說明書中圖3A中的相關說明。Please refer to FIG. 13A. FIG. 13A is a block diagram of the basic hardware structure of the image processing device 300M in FIG. 12A according to an embodiment of the present invention. The hardware architecture of the image processing device 300M in FIG. 13A is almost the same as the hardware architecture of the image analysis server 300 in FIG. 3A. The difference is that the image processing device 300M may further include a wireless transmission device (Wireless Transmission Device, not shown), which allows the image processing device 300M and a handheld device to transmit data wirelessly. For each hardware architecture in FIG. 13A, please refer to the related description in FIG. 3A in this specification.

請參考圖13B，圖13B係依據本發明一實施例繪示圖12A中的該影像處理裝置300M其軟、硬體的架構關係圖。圖13A中該影像處理裝置300M的硬體架構與圖3A中該影像分析伺服器300的硬體架構幾乎相同，只有該實體儲存裝置的數量及硬體規格上的差異。另外，關於圖13B中的軟體架構，除了少了虛擬機器監視器470外，該影像處理裝置300M的作業系統475、影像分析應用程式460與圖3A中的作業系統475、影像分析應用程式460幾乎都是相同的。此外，圖13B中，該作業系統475中還可能包括一多媒體播放程式465M，該多媒體播放程式465M係以虛線繪示，其表示當不使用該媒體播放器880時，該多媒體播放程式465M(軟體)可以代替該媒體播放器880(硬體)的功能。因該多媒體播放程式465M不一定會在該作業系統475中執行，所以圖13B中該多媒體播放程式465M是以虛線所繪示的一軟體模組。又，依據本發明的一實施例，圖13B中該影像分析應用程式460中的應用模組，除了圖4中所顯示的各種應用模組外，更增加例如：表情偵測模組、身高偵測模組、衣著偵測模組、胖瘦偵測模組、隨身工具偵測模組、隨身寵物偵測模組、職業預測模組…等。關於圖13B中該影像分析伺服器的300軟、硬體的架構，請參考前述圖3B、圖4的說明。Please refer to FIG. 13B. FIG. 13B is a diagram illustrating the software and hardware architecture relationship of the image processing device 300M in FIG. 12A according to an embodiment of the present invention. The hardware architecture of the image processing device 300M in FIG. 13A is almost the same as the hardware architecture of the image analysis server 300 in FIG. 3A, except for the number of physical storage devices and the hardware specifications. In addition, with regard to the software architecture in FIG. 13B, in addition to the absence of a virtual machine monitor 470, the operating system 475 and image analysis application 460 of the image processing device 300M are almost identical to the operating system 475 and image analysis application 460 in FIG. 3A. It's all the same. In addition, in FIG. 13B, the operating system 475 may also include a multimedia player program 465M. The multimedia player program 465M is drawn with a dotted line, which means that when the media player 880 is not used, the multimedia player program 465M (software ) Can replace the function of the media player 880 (hardware). Because the multimedia player program 465M may not necessarily be executed in the operating system 475, the multimedia player program 465M in FIG. 13B is a software module shown by a dotted line. In addition, according to an embodiment of the present invention, the application modules in the image analysis application 460 in FIG. 13B, in addition to the various application modules shown in FIG. 4, include expression detection modules and height detection modules. Measurement module, clothing detection module, fat and thin detection module, portable tool detection module, portable pet detection module, occupation prediction module... etc. For the hardware and software architecture of the image analysis server 300 in FIG. 13B, please refer to the descriptions of FIGS. 3B and 4 described above.

請參考圖14，圖14係依據本發明一實施例繪示該智慧電子看板系統100M的整體操作流程圖。該整體操作流程圖其中包含了智慧電子看板人物偵測訓練階段900、智慧電子看板人物調查/統計分析階段910、智慧電子看板應用階段920。其中，該智慧電子看板人物偵測訓練階段900即是圖5、圖6中所表示的訓練階段(圖5的步驟510、圖6的步驟520與步驟610至步驟640)。關於該智慧電子看板人物偵測訓練階段900請參考圖5的步驟510、圖6的步驟520與步驟610至步驟640的說明。Please refer to FIG. 14, which is a flowchart of the overall operation of the smart electronic signage system 100M according to an embodiment of the present invention. The overall operation flow chart includes a smart electronic signage person detection training phase 900, a smart electronic signage person survey/statistical analysis phase 910, and a smart electronic signage application phase 920. The person detection training phase 900 of the smart electronic signage is the training phase shown in FIG. 5 and FIG. 6 (step 510 in FIG. 5, step 520 in FIG. 6, and steps 610 to 640). Please refer to step 510 of FIG. 5, step 520, and step 610 to step 640 of FIG. 5 for the person detection training phase 900 of the smart electronic signage.

本發明中對於影像資料中人的偵測是基於CNN架構下透過訓練階段時利用資料集(Data Set)做深度學習所得到的人的各種屬性的「特徵對映(feature mapping)」，並在應用階段時透過這些學習到的「特徵對映(feature mapping)」來偵測影像擷取裝置400M擷取的影像資料中符合特徵對映(feature mapping)的部分，進而分析「人的各種屬性」。特別的是：對於圖14中該智慧電子看板系統的人物偵測訓練階段900，基本上只與「人臉/頭部的辨識」、「身體的辨識」等關於「人的辨識」有關，並不包括「物體/商品的辨識」。所以訓練階段時對於各種的影像資料，主要是提供與「人」相關的資料集(DataSet)。該與人相關的資料集除了提供如圖7A的檔案資料外，還包括：各種年齡層(老人/中年人/年輕人/小孩…)的男性/女性的臉部態樣的資料集、身體體態(胖/瘦/適中…)的體態態樣的資料集、臉部表情(喜/怒/哀/樂/驚恐/失望…)的表情態樣的資料集…等。此外，與「人」相關的資料集還可進一步包括「嬰兒」與「寵物」的資料集。若需要估測人的職業時，則還需要進一步提供包括「衣服(各行業的制服)」、「工具(各行業的工具)」…等各種的資料集。關於圖14中的「智慧電子看板人物偵測訓練階段」900，請參考圖5中步驟510、步驟520與圖6中步驟610至步驟640的說明。另，圖14中的「智慧電子看板人物調查/統計分析階段」910實施時的詳細流程將於圖15中說明；圖14中的「智慧電子看板應用階段」920實施時的詳細流程將於圖16中說明。又，圖14中步驟910係以一虛線繪示，其原因為：該系統100M若不需將步驟910的統計資料輸出並加以分析的情況時，則圖14中該步驟910「智慧電子看板人物調查/統計分析階段」可以省略，故步驟910以虛線來繪示。In the present invention, the detection of people in image data is based on the "feature mapping" of various attributes of people obtained through deep learning using the Data Set during the training phase under the CNN architecture, and in In the application stage, the learned "feature mapping" is used to detect the part of the image data captured by the image capture device 400M that meets the feature mapping, and then analyze the "various attributes of people" . In particular, for the person detection training stage 900 of the smart electronic signage system in Figure 14, it is basically only related to "person recognition" such as "face/head recognition" and "body recognition", and Does not include "identification of objects/commodities". Therefore, in the training phase, for various image data, the main data set (DataSet) related to "people" is provided. This person-related data set not only provides the file data shown in Figure 7A, but also includes: data sets of male/female faces and bodies of various age groups (elderly/middle-aged/young/children...) A data collection of postures (fat/slim/moderate...), a data collection of facial expressions (happy/angry/sorrow/joy/fear/disappointment...) etc. In addition, data sets related to "people" can further include data sets of "babies" and "pets". If you need to estimate a person’s occupation, you need to provide various data sets including "clothes (uniforms for various industries)", "tools (tools for various industries)"... and so on. For the "Smart Electronic Signage Person Detection Training Phase" 900 in FIG. 14, please refer to the descriptions of steps 510 and 520 in FIG. 5 and steps 610 to 640 in FIG. In addition, the detailed flow of the implementation of the "Smart Electronic Signage Person Survey/Statistical Analysis Phase" 910 in Figure 14 will be illustrated in Figure 15; the detailed flow of the implementation of the "Smart Electronic Signage Application Phase" 920 in Figure 14 will be shown in Figure 14 Explained in 16. In addition, step 910 in FIG. 14 is shown with a dotted line. The reason is that if the system 100M does not need to output and analyze the statistical data of step 910, then step 910 in FIG. 14 "Smart Electronic Signage Characters" "Investigation/Statistical Analysis Phase" can be omitted, so step 910 is shown by a dotted line.

請參考圖15，圖15係顯示依據圖14中關於「智慧電子看板系統的人物調查/統計分析階段」910的流程圖。承圖14中步驟910的說明，當該系統100M若不需將步驟910的統計資料輸出並加以分析的情況時，則圖15中的流程可以省略。圖15中該流程開始於步驟1000。Please refer to FIG. 15. FIG. 15 shows a flowchart based on the "personal survey/statistical analysis stage of the smart electronic signage system" 910 in FIG. 14. Following the description of step 910 in FIG. 14, when the system 100M does not need to output and analyze the statistical data of step 910, the flow in FIG. 15 can be omitted. The process in FIG. 15 starts at step 1000.

步驟1000中，該影像處理裝置300M取得要進行分析的影像資料。圖12A中，當該影音顯示裝置850在播放不同「內容屬性」的影音廣告時，該影像擷取裝置400M也同步在記錄位於該影音顯示裝置850 前的人的各種態樣。依據本發明的一實施例，此處所謂的「人物」僅指影像資料中與「人」相關的部分，並不包括其他的物體，例如：車子、建築物、交通號誌…等，以下說明書中所指的「人物」皆與此相同。In step 1000, the image processing device 300M obtains image data to be analyzed. In FIG. 12A, when the audio-visual display device 850 is playing audio-visual advertisements with different "content attributes", the image capturing device 400M also simultaneously records various aspects of the person in front of the audio-visual display device 850. According to an embodiment of the present invention, the so-called "person" here only refers to the part of the image data related to the "person", and does not include other objects, such as cars, buildings, traffic signs, etc., in the following description The "persons" referred to in the middle are all the same.

步驟1010中，係辨識該影像資料中具有主題(subject)的部分；其中所謂的「主題」是，影像分析伺服器300M經過深度學習後可以辨識之具有特定特徵的標的。影像資料的每一幀(frame)圖片(picture)中可能包含有一或複數個具有特定特徵的標的，這些標的可能是具有人的「臉/頭部」特徵的標的、或是具有人的「身體」等特徵的標的。依據本發明一實施例，由於影像資料中的每一圖片是由眾多的畫素所組成，故步驟1010必須判斷圖片中包括了哪些是人的「臉/頭部」特徵的標的與哪些是人的「身體」特徵的標的，此外還要知道各個標的各自分布在圖片的哪一位置；換句話說，判斷圖片中每一具有人的「臉/頭部」特徵的標的及每一具有人的「身體」特徵的標的，及其個別在圖片中所佔的範圍(畫素大小)與在圖片中的位置，是執行步驟1010的主要目的。一般來說，雖然影像資料的動態規格是由一秒鐘數張到數十張/幀的圖片(picture)所組成，根據本發明一實施例，在實作上不需要每一圖片都偵測其包括了多少具有人的「臉/頭部」特徵的標的與具有人的「身體」特徵的標的。若以最常見的30幀(frame)/秒(second)的規格為例，可以選擇，但不以此為限，在每間隔1/6秒時選取一幀(frame)圖片(picture)去偵測該圖片包括了多少具有人的「臉/頭部」特徵的標的與具有人的「身體」特徵的標的，以及其個別位於圖片中的位置與範圍大小。換句話說，在此假設前提下，步驟1010須每1秒鐘判別6個圖片具有人的「臉/頭部」特徵的標的與具有人的「身體」特徵的標的。上述每間隔1/6秒偵測一幀圖片僅做為本發明的一實施例，實際上該時間間隔可以為任意其它數值的時間間隔。必須注意的是：步驟1010中對於圖片中人的「臉/頭部」或是人的「身體」的偵測並非透過Data Base進行比對而實現，而是藉由在訓練過程的深度學習中所學習得到的「特徵對映(feature mapping)」來執行辨識圖片中人的「臉/頭部」及/或人的「身體」的工作。In step 1010, the part of the image data with a subject is identified; the so-called "subject" is a subject with specific characteristics that can be identified by the image analysis server 300M after deep learning. Each frame of the image data may contain one or more objects with specific characteristics. These objects may be objects with the characteristics of a person's "face/head" or a person's "body". "" and other characteristics of the subject. According to an embodiment of the present invention, since each picture in the image data is composed of a large number of pixels, in step 1010, it is necessary to determine which targets of the "face/head" feature of a person are included in the picture and which ones are persons. In addition, it is necessary to know where each object is distributed in the picture; in other words, to judge every object with human "face/head" characteristics in the picture and every object with human The target of the "body" feature, and the area (pixel size) and position of each individual in the picture are the main purpose of performing step 1010. Generally speaking, although the dynamic specification of image data is composed of pictures from a few to dozens of pictures per second per second, according to an embodiment of the present invention, it is not necessary to detect every picture in practice. It includes how many objects with the characteristics of a person's "face/head" and objects with the characteristics of a person's "body". If you take the most common 30 frame/second (second) specification as an example, you can choose, but not limited to this, select a frame (picture) to detect every 1/6 second interval. Measure how many objects with human "face/head" characteristics and objects with human "body" characteristics are included in the picture, as well as their respective positions and ranges in the picture. In other words, under this assumption, in step 1010, every 1 second, it is necessary to discriminate between 6 pictures with a person's "face/head" feature and a person's "body" feature. The foregoing detection of a frame of pictures every 1/6 second is only an embodiment of the present invention, in fact, the time interval can be any other numerical time interval. It must be noted that the detection of the person's "face/head" or the person's "body" in the picture in step 1010 is not achieved through the comparison of the Data Base, but by the deep learning in the training process The learned "feature mapping" is used to identify the person's "face/head" and/or the person's "body" in the picture.

在步驟1010中對影像資料進行辨識後，可將辨識到的人的「臉/頭部」與人的「身體」的標的，區分為「臉/頭部」與「身體」二條平行流程來處理。因此，在步驟1010之後，辨識為人的「臉/頭部」的標的至步驟1020以及執行其後的步驟；辨識為人的「身體」的標的至步驟1030以及執行其後的步驟。After the image data is recognized in step 1010, the target of the recognized person's "face/head" and the person's "body" can be divided into two parallel processes: "face/head" and "body" for processing . Therefore, after step 1010, the target of the person's "face/head" is recognized to step 1020 and the following steps are performed; the target of the person's "body" is recognized to step 1030 and the following steps are performed.

又，依據本發的一實施例，上述所謂人的「臉/頭部」除了臉部外，還包含頭部的部分；此處所謂的「身體」是指「臉/頭部」以外身體的其他部分。依據本發明的另一實施例，該影像處理裝置300M對該影像資料的內容來分析，除了「臉/頭部」與「身體」外，還包括與「人」極度相關的所有「物體」，例如：隨身衣物(衣服、帽子 …）、隨身物體(嬰兒、眼鏡、首飾、寵物、皮包…)等。Furthermore, according to an embodiment of the present invention, the above-mentioned so-called "face/head" of a person includes parts of the head in addition to the face; the so-called "body" here refers to parts of the body other than the "face/head" other parts. According to another embodiment of the present invention, the image processing device 300M analyzes the content of the image data, in addition to "face/head" and "body", it also includes all "objects" extremely related to "people". For example: personal clothing (clothes, hats...), personal objects (babies, glasses, jewelry, pets, purses...), etc.

承上說明，步驟1020是圖片中辨識為人的「臉/頭部」的標的所進入的步驟，步驟1020係對該圖片中人的臉/頭部的偵測。由於圖片中呈現出人臉/頭部的範圍有大有小，所以圖片中若有過小區域(畫素)的人臉/頭部，可能無法有效的被偵測出來。於本發明一實施例，可以被偵測出來的畫素大小為40X40畫素(含)以上。該40X40畫素的門檻值僅是本發明提供的一實施參考，本發明並不以此為限，最低可被辨識的畫素門檻值可能是任意其它數值的畫素大小，端視軟硬體的能力而定。在步驟1020中確定辨識為人的「臉/頭部」的標的後，則進入步驟1040之「人的頭部方向之偵測」、步驟1050之「人的頭部位置之估算」、步驟1060之「人的性別/年紀/表情之偵測」等3個步驟。上述3個步驟都是與人的「臉/頭部」有相關，該3個步驟將在稍後說明。Continuing from the above description, step 1020 is a step in which the target identified as a person's "face/head" in the picture is entered, and step 1020 is the detection of the person's face/head in the picture. Since the range of the face/head shown in the picture is large or small, if there is a small area (pixel) of the face/head in the picture, it may not be effectively detected. In an embodiment of the present invention, the pixel size that can be detected is more than 40×40 pixels (inclusive). The threshold value of 40X40 pixels is only an implementation reference provided by the present invention, and the present invention is not limited to this. The lowest recognizable pixel threshold value may be any other value of pixel size, depending on the hardware and software. Depends on the ability. After determining the target of the "face/head" recognized as a person in step 1020, proceed to "detection of the direction of the human head" in step 1040, "estimate the position of the human head" in step 1050, and step 1060 "Detection of Human Gender/Age/Expression" and other 3 steps. The above three steps are all related to the "face/head" of a person, and these three steps will be explained later.

步驟1040中，係對該圖片中人的頭部方向(Head Direction)的偵測。承前面步驟1020的說明，在步驟1020執行完畢之後，會以平行處理的方式執行步驟1040中「人的頭部方向的偵測」、步驟1050中「人的頭部位置之估算」與步驟1060中「人的性別/年紀/表情之偵測」。依據本發明的一實施例，進行步驟1040之前，須先在圖14的訓練階段中利用如圖7A的資料集(DataSet)讓影像分析應用程式460M做深度學習，以學習得到的主題為頭部方向的「特徵對映(feature mapping)」，然後再依據該頭部方向的「特徵對映(feature mapping)」判定圖片中人的頭部方向最可能的態樣。以圖7A為例，其中編號1-91號的圖片呈現同一女性於各種不同頭部方向的影像，每一影像都有相對應的標籤(label)，每一標籤(label)內的記載的資訊至少包括「主題(subject)」(例如：女性的頭部)、及「主題(subject)的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三數值」。由上所述可知：編號1-91號的圖片係呈現一女性的臉/頭部於不同的「頭部方向」下的態樣。於實際應用時，圖片中顧客的頭部方向(即由頭部的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)等值所形成的空間中翻轉自由度)大致上可以對應到學習過的編號1-91其中一種狀況，因此影像分析應用程式460M可以由過去訓練階段時深度學習所得到的「特徵對映(feature mapping)」，來判定圖片中顧客頭部方向相對應的Roll, Pitch, Yaw等數值。若圖片中該人的頭部方向並不完全符合上述編號1-91的其中之一者，步驟1040中也會依照學習到的「特徵對映(feature mapping)」對人的頭部方向做一最接近的判斷，推估其最可能的Roll, Pitch, Yaw等數值。此外，根據本發明一實施例，步驟1040可以進一步計算人的頭部於圖片中的位置與範圍大小。在可以辨識的情況下，若一圖片中包括10位男女，則10位男女的頭部方向都會被偵測出來。又，在該影像擷取裝置400M前方的人們可能會有不同的行為：可能是注視正在播出的影音廣告、可能是正在交談中、可能正在看遠處的建築物等不同的行為，該等行為其「人的頭部方向」所呈現的態樣也會不相同。In step 1040, the head direction of the person in the picture is detected. Following the description of the previous step 1020, after the execution of step 1020 is completed, the "detection of human head orientation" in step 1040, the "estimation of human head position" in step 1050, and step 1060 will be executed in parallel processing. In "Detection of Human Gender/Age/Expression". According to an embodiment of the present invention, before step 1040 is performed, the data set (DataSet) shown in FIG. 7A must be used in the training phase of FIG. 14 to let the image analysis application 460M do deep learning, with the learned topic as the head The "feature mapping" of the direction, and then the "feature mapping" of the head direction to determine the most likely state of the person's head direction in the picture. Take Figure 7A as an example. The pictures numbered 1-91 show images of the same woman with different head orientations. Each image has a corresponding label, and the information recorded in each label At least include "subject" (e.g. female head), and "subject's roll, pitch, and yaw three values". From the above, it can be seen that the pictures No. 1-91 show a female face/head in different "head directions". In actual application, the direction of the customer’s head in the picture (that is, the degree of freedom of turning in the space formed by the head roll (Roll), pitch (Pitch), yaw (Yaw), etc.) can roughly correspond to the learning Number 1-91 is one of the conditions, so the image analysis application 460M can use the "feature mapping" obtained by deep learning in the past training phase to determine the roll corresponding to the customer's head direction in the picture. Pitch, Yaw and other values. If the person’s head orientation in the picture does not completely match one of the above numbers 1-91, step 1040 will also perform a comparison of the person’s head orientation according to the learned "feature mapping" The closest judgment, estimate its most likely Roll, Pitch, Yaw and other values. In addition, according to an embodiment of the present invention, step 1040 may further calculate the position and range of the human head in the picture. If it can be identified, if a picture includes 10 men and women, the head directions of the 10 men and women will be detected. In addition, people in front of the image capturing device 400M may have different behaviors: they may be watching video advertisements that are being broadcast, they may be talking, they may be looking at distant buildings, and other different behaviors. The behavior presented by the "person's head direction" will also be different.

步驟1050中，係對該圖片中人的頭部位置之估算。本發明在執行圖15所示的應用流程之前，須先對影像擷取裝置400M做校準(calibration)的工作，以得到影像擷取裝置400M的相關參數(例如：Intrinsic/Extrinsic matrix)，經過深度學習訓練後，步驟1050可由影像擷取裝置400M的參數與圖片中特定物體的影像資料，推算出圖片中特定物體與影像擷取裝置400M的距離與相對位置。依據本發明的一實施例，該特定物體與該影像擷取裝置400M的相對距離可以視為空間中的兩個點之間的距離。一般可以將影像擷取裝置400M設為原點，所以影像擷取裝置400M於3D立體空間的座標表示為(0, 0, 0)。則可以推算出該特定物體於3D立體空間的座標，例如：(100, 552, 211)，則特定物體相對於原點(影像擷取裝置400M)的距離就是一可計算的長度。依據本發明的另一實施例，3D立體空間中的原點亦可設定為圖片中賣場的兩面相鄰牆壁與地面的三角交會點。總而言之，本發明先對影像擷取裝置400M做校準(calibration)的工作，以得知2D圖片中的特定點對影像擷取裝置400M或參考點(設定的原點)在3D空間中的相對距離。據此，步驟1050中便可利用校準後的影像擷取裝置400M估算人的頭部在3D空間中的位置。In step 1050, it is an estimation of the head position of the person in the picture. In the present invention, before executing the application process shown in FIG. 15, the image capturing device 400M must be calibrated to obtain the relevant parameters (for example: Intrinsic/Extrinsic matrix) of the image capturing device 400M. After learning and training, step 1050 can use the parameters of the image capturing device 400M and the image data of the specific object in the picture to calculate the distance and relative position of the specific object in the picture and the image capturing device 400M. According to an embodiment of the present invention, the relative distance between the specific object and the image capturing device 400M can be regarded as the distance between two points in space. Generally, the image capturing device 400M can be set as the origin, so the coordinates of the image capturing device 400M in the 3D space are expressed as (0, 0, 0). Then the coordinates of the specific object in the 3D space can be calculated, for example: (100, 552, 211), the distance of the specific object relative to the origin (the image capturing device 400M) is a calculable length. According to another embodiment of the present invention, the origin in the 3D space can also be set as the triangle intersection point between two adjacent walls of the store in the picture and the ground. In summary, the present invention first performs calibration on the image capturing device 400M to know the relative distance of the specific point in the 2D picture to the image capturing device 400M or the reference point (the set origin) in the 3D space . Accordingly, in step 1050, the calibrated image capturing device 400M can be used to estimate the position of the human head in the 3D space.

步驟1070中，係對該圖片中人的注視區域的估算。承步驟1040、步驟1050執行後的結果(資料)，可以進一步得知該圖片中的人的注視區域。In step 1070, it is the estimation of the gaze area of the person in the picture. According to the results (data) after performing steps 1040 and 1050, the gaze area of the person in the picture can be further known.

依據本發明的一實施例，經過步驟1040中人的頭部方向之偵測與步驟1050中人的頭部位置的估算後，可以得知圖片中某一個人的「頭部方向」(roll、pitch、yaw三數值)與該某一個人的頭部於立體空間中的位置。透過此二種資訊，進行以下模擬：從圖片中人的頭部的正後方以與該頭部方向一致的方向投射出的光線(light)，設定該投射光線一直延伸直到該影音顯示裝置850的位置，然後在該影音顯示裝置850位置處，假設有一模擬的屏幕且該投射光線在上面形成一模擬的投影(projection)。該模擬投影所形成的投影區域是該某一個人可能的注視區域，如圖11A與圖11B中所示的橢圓形投影區域815A、815B。根據本發明一實施例，該投影區域中進一步區分為複數個子投影區域，其中具有最高機率的子投影區域是該某一個人的視線最可能聚焦的範圍，也可以由此推論是該某一個人所注視的物體是否為該影音顯示裝置850上的螢幕890，以判定該某一個人是否觀看正在播出的影音廣告。請參考圖9中步驟740、圖11A、圖11B中的說明內容。在步驟1070中，圖片中的每一個人依據其不同的頭部方向都會有各自的投影區域，每一個人各自的投影區域代表該每一顧客的注視區域。According to an embodiment of the present invention, after detecting the head direction of the person in step 1040 and estimating the head position of the person in step 1050, it is possible to know the "head direction" (roll, pitch) of a person in the picture. , Yaw three values) and the position of the head of a certain person in three-dimensional space. Through these two kinds of information, the following simulation is performed: the light projected in the same direction as the direction of the head from right behind the head of the person in the picture, and the projected light is set to extend until the audio-visual display device 850 Position, and then at the position of the audio-visual display device 850, suppose there is a simulated screen and the projected light forms a simulated projection on it. The projection area formed by the simulated projection is the possible gaze area of the certain person, such as the elliptical projection areas 815A and 815B as shown in FIGS. 11A and 11B. According to an embodiment of the present invention, the projection area is further divided into a plurality of sub-projection areas, where the sub-projection area with the highest probability is the most likely focus range of the person’s line of sight, and it can also be inferred from this that the person is watching Whether the object is the screen 890 on the audio-visual display device 850, to determine whether the person is watching the audio-visual advertisement that is being broadcast. Please refer to the description of step 740 in FIG. 9, FIG. 11A, and FIG. 11B. In step 1070, each person in the picture has its own projection area according to its different head orientation, and each person's respective projection area represents the gaze area of each customer.

依據本發明的一實施例，步驟1070中，只有當圖片中的人觀看某一物體的時間長度超過一設定的時間門檻值，才會針對他們推測其注視區域(Gazing Area)是否為該影音顯示裝置850上的螢幕890；其中的設定的時間門檻值可能是，例如：5秒鐘，但不以此為限，可以是設定的任一時間長度。依據本發明的另一實施例，若是圖片中的人正在移動且其注視視線持續於同一物體超過一設定的時間門檻值(例如：5秒鐘，但不以此為限)，也會針對他們推測其注視區域是否為該影音顯示裝置850上的螢幕890。步驟1070中，除了無法辨識出圖片中人的頭部方向的情形外，圖片中的每一個人都應可估算出其注視區域(投影區域)與其中子投影區域中被注視的機率大小。上述「無法辨識圖片中人的頭部方向」的原因可能是：(1) 圖片中人的頭部影像不完整，例如：該人的頭部影像有大面積被其他人或物體擋住；(2) 圖片中某一個人的頭部影像範圍所包含的畫素太小，致使該某一個人的頭部無法被辨識出來。依據本發明所揭露的技術，圖片中人的頭部影像就算沒有包括臉部的眼睛(例如：後側的頭部影像)，該人的頭部方向還是可以被辨識出來。依據本發明的另一實施例，當圖片中人的頭部影像明顯地呈現出臉部的眼睛特徵時，可以計算並參考顧客的視線方向，將上述投影區域做進一步限縮，以增加估算圖片中人注視區域的正確性。步驟1070執行完畢，即跳至步驟1080執行。According to an embodiment of the present invention, in step 1070, only when the person in the picture has watched an object for more than a set time threshold, will it be inferred for them whether their gazing area (Gazing Area) is the audio-visual display The screen 890 on the device 850; the set time threshold may be, for example, 5 seconds, but it is not limited to this, and can be any set time length. According to another embodiment of the present invention, if the person in the picture is moving and their gaze continues on the same object for more than a set time threshold (for example: 5 seconds, but not limited to this), they will also be targeted It is inferred whether the gaze area is the screen 890 on the audio-visual display device 850. In step 1070, except for the situation where the head direction of the person in the picture cannot be recognized, each person in the picture should be able to estimate the size of their gaze area (projection area) and the probability of being looked at in the sub-projection area. The reasons for the above "unrecognizable head direction of the person in the picture" may be: (1) The head image of the person in the picture is incomplete, for example: a large area of the person's head image is blocked by other people or objects; (2) ) The pixels contained in the image range of a certain person's head in the picture are too small, so that the certain person's head cannot be identified. According to the technology disclosed in the present invention, even if the head image of the person in the picture does not include the eyes of the face (for example, the head image on the back side), the head direction of the person can still be recognized. According to another embodiment of the present invention, when the head image of the person in the picture clearly shows the eye characteristics of the face, the direction of the customer’s line of sight can be calculated and referred to, and the projection area mentioned above can be further reduced to increase the estimated picture. The correctness of the middle person's gaze area. After step 1070 is executed, skip to step 1080 for execution.

再回到步驟1060。步驟1060中，係對該圖片中人的性別、年紀、表情的偵測。依據發明的一實施例，對於圖片中的人的頭部方向外，其「臉部」的部分也要能清楚的辨識，圖片中未能達到此標準的人則會忽略不予偵測。依據本發明的另一實施例，當圖片中的人的臉部被其他的人或是物體遮蓋的面積超過該影像分析應用程式460M可以判斷的上限時，則該被遮蔽的人則會忽略不予偵測。基本上，圖片中該人的臉部的區域，其所包含的畫素至少要為40X40畫素以上，其中該40X40的數值僅作為本發明的一實施例，並非對本發明的限制，故該圖片中該人的臉部的區域，其所包含的畫素可以是任何大小的畫素。步驟1060中對於「性別」、「年紀」、「表情」的偵測係透過圖14中步驟920的訓練階段時利用該資料庫450中的Data Set做深度學習，所得到的主題各自為「性別」、「年紀」、「表情」的「特徵對映(feature mapping)」，故在圖14中步驟920的訓練階段時，該資料庫450要提供，例如：「年輕的男/女」、「中年的男/女」、「老年的男/女」、「小孩中男/女」等，各種年齡層的資料集(Data Set)，以學習偵測出圖片中該人的「性別」、「年紀」。對於步驟1060中偵測圖片中人的「年紀」，只要訓練階段時所提供的DataSet愈多，步驟1060中便可以更準確地偵測該圖片中人的「年紀」。又，對於圖片中人「表情」的偵測，在圖14中步驟920的訓練階段時該資料庫450中的DataSet要包含各種情緒，例如：微笑、高興、生氣、悲傷、快樂、流淚、驚訝、害怕、疑惑…等各種顯現情緒的Data Set。雖然步驟1060中顯示該「性別」、「年紀」、「表情」的偵測被列在同一步驟中，但實作上對於該「性別」、「年紀」、「表情」的偵測係與步驟1040、步驟1050是以平行流程來處理的。又，上述關於步驟1060中圖片中人的「性別」、「年紀」、「表情」可以視為該「人的屬性」之一。Go back to step 1060. In step 1060, it is the detection of the gender, age, and expression of the person in the picture. According to an embodiment of the invention, the "face" part of the person in the picture should be clearly recognized outside the head direction, and the person in the picture who fails to meet this standard will be ignored and not detected. According to another embodiment of the present invention, when the area of the face of the person in the picture that is covered by other people or objects exceeds the upper limit that the image analysis application 460M can determine, the person who is covered will be ignored. I detect. Basically, the area of the person’s face in the picture must contain at least 40×40 pixels or more. The value of 40×40 is only an embodiment of the present invention and is not a limitation of the present invention. Therefore, the picture In the area of the person's face, the pixels contained in it can be pixels of any size. The detection of "gender", "age", and "expression" in step 1060 uses the Data Set in the database 450 during the training phase of step 920 in Figure 14 for deep learning, and the topics obtained are each "gender" "Feature mapping" of "Age", "Expression", so in the training phase of step 920 in Figure 14, the database 450 should be provided, for example: "Young male/female", " "Middle-aged male/female", "Older male/female", "Middle-aged male/female", etc., data sets of various age groups (Data Set) to learn to detect the "gender" of the person in the picture, "age". For detecting the "age" of the person in the picture in step 1060, as long as more DataSets are provided during the training phase, the "age" of the person in the picture can be detected more accurately in step 1060. In addition, for the detection of human "emoji" in the picture, the DataSet in the database 450 during the training phase of step 920 in FIG. 14 should contain various emotions, such as: smile, happiness, anger, sadness, happiness, tears, surprise , Fear, doubt... and other data sets that show emotions. Although it is shown in step 1060 that the detection of the "gender", "age", and "expression" are listed in the same step, the system and steps for the detection of the "gender", "age" and "expression" are implemented 1040. Step 1050 is processed in a parallel process. In addition, the above-mentioned "gender", "age", and "expression" of the person in the picture in step 1060 can be regarded as one of the "person's attributes".

回到上述步驟1010，步驟1010中，當辨識圖片中的人所具有的特徵的標的為「身體」時，則進入步驟1030。Returning to the above step 1010, in step 1010, when the subject of identifying the characteristics of the person in the picture is "body", then step 1030 is entered.

步驟1030中，係對該圖片中人的身體部分的偵測，特別是關於「人的身高」、「人的體態(胖/瘦)」、「人的衣著」等部分的偵測。對於人的身高必須要明確地在圖片中顯示人中「頭到鞋子」所有的影像資料，否則未能達到此標準的人的身高則會忽略不予計算。依據本發明的一實施例，由於前述步驟1050中人的頭部位置可以被估算出來，所以同樣地，在步驟1030中，圖片中人的身高也可以依此方法被估算出來。基本上，透過身高的偵測可以辨識出「成年的男/女」、「孩童的男/女」。又，步驟1030中對於「人的體態(胖/瘦)」、「人的衣著」的偵測，同樣的係在圖14中步驟920的訓練階段時利用資料庫450中的Data Set做深度學習，所得到的主題各自為「人的體態」、「人的衣著」的「特徵對映(feature mapping)」。故在圖14中步驟920的訓練階段時，資料庫450要提供男女各種體態(高矮、胖瘦等)的資料集(Data Set)、以及「各種職業所穿的制服」的資料集(Data Set)，以學習偵測出圖片中人的體態(高矮、胖瘦等)、以及透過偵測圖片中人所穿的衣服(制服)來判斷其職業。對於步驟1030中偵測圖片中「人的體態」與「人的衣著」，只要在訓練階段時提供的Data Set愈多，則學習成果會越好；換言之，可以具有更強的能力可以準確地判斷圖片中有關「人的體態」與「人的衣著」的屬性。依據本發明的另一實施例，對於圖片中人的職業之推測，進一步包括分析該人「隨身的物品」。例如：當圖片中人的隨身物品為「公事包」時，該圖片中人的職業可以推測其是「上班族」；當圖片中人的隨身物品為「書包」時，該圖片中人的職業可以推測其是「學生」…等。由於圖片中人其職業的推測很不容易，故實作上，對於圖片中人的職業之推測基本上會對數種情況來分析，除了上述「衣著」、「隨身物品」外，還包括「配飾」、「時間」、「地點」、「天氣」、…等各種條件加入分析，以推測出該圖片中人的「職業」。雖然步驟1030中顯示該「身高」、「胖/瘦」、「衣著」的偵測被列在同一步驟中，但實作上對於該「人的身高」、「人的體態(胖/瘦)」、「衣著」的偵測係與步驟1040、步驟1050、步驟1060是平行來處理的。In step 1030, it is the detection of the body part of the person in the picture, especially the detection of the "person's height", "person's posture (fat/thin)", "person's clothing" and other parts. For the height of a person, it is necessary to clearly display all the image data of the person "head to shoes" in the picture, otherwise the height of the person who fails to meet this standard will be ignored and not calculated. According to an embodiment of the present invention, since the head position of the person in step 1050 can be estimated, in the same way, in step 1030, the height of the person in the picture can also be estimated in this way. Basically, "adult male/female" and "child male/female" can be identified through height detection. In addition, for the detection of "human body (fat/thin)" and "human clothing" in step 1030, the same is done in the training phase of step 920 in Figure 14 using the Data Set in the database 450 for deep learning , The themes obtained are "feature mapping" of "human posture" and "human clothing". Therefore, in the training phase of step 920 in Figure 14, the database 450 needs to provide data sets of various postures (height, fat, thin, etc.) of men and women, and data sets of "uniforms worn by various occupations" (Data Set). ) To learn to detect the posture of the person in the picture (tall, short, fat, thin, etc.), and to determine the occupation by detecting the clothes (uniform) worn by the person in the picture. For the detection of "human posture" and "human clothing" in the picture in step 1030, as long as the more Data Sets provided during the training phase, the better the learning result; in other words, it can have a stronger ability to accurately Determine the attributes of the "person's posture" and "person's clothing" in the picture. According to another embodiment of the present invention, inferring the occupation of the person in the picture further includes analyzing the person's "carry-on items". For example: when the person’s personal belongings in the picture is a "briefcase", the occupation of the person in the picture can be presumed to be an "office worker"; when the person's personal belongings in the picture is a "school bag", the occupation of the person in the picture It can be inferred that it is a "student"...etc. Since it is not easy to guess the occupation of the person in the picture, in practice, the estimation of the occupation of the person in the picture will basically analyze several situations, in addition to the above-mentioned "clothing" and "personal items", it also includes "accessories" , "Time", "Location", "Weather", ... and other conditions are added to the analysis to infer the "occupation" of the person in the picture. Although it is shown in step 1030 that the detections of the "height", "fat/slim", and "clothing" are listed in the same step, in practice, the "person's height" and "person's posture (fat/slim) The detection of "clothing" is processed in parallel with step 1040, step 1050, and step 1060.

步驟1070中，係對該圖片中人的注視區域的估算。承步驟1040與步驟1050的執行結果(輸出)，步驟1070可以估算出圖片中某一人所注視的區域是在何處。步驟1070執行完畢後即可知道「圖片中的某一人是否正在注視該影音顯示裝置850上的螢幕890」，即代表圖片中的該某一人是否正在觀看播放中的影音廣告。所以收集「圖片中正在觀看播放中的影音廣告的人」的相關資訊(屬性)是最重要的流程。對於此流程還必須包括「持續觀看正在播出的影音廣告一段時間以上」的條件，該一段時間的條件是5秒。又，該5秒的數值僅作為本發明的一實施例，並非對本發明的一限制，上述該5秒的時間長度可以是任何秒數的時間長度。承上步驟1070、步驟1060、步驟1030的輸出結果，本發明中需要被統計的標的對象(Target Person)，其基本條件是：「圖片中人持續觀看正在播出的影音廣告超過一段預設的時間以上」。該步驟1070、步驟1060、步驟1030執行完畢後則執行步驟1080的判斷。In step 1070, it is the estimation of the gaze area of the person in the picture. Following the execution result (output) of step 1040 and step 1050, step 1070 can estimate where the area that a person is looking at in the picture is. After step 1070 is executed, it can be known "whether a person in the picture is watching the screen 890 on the audio-visual display device 850", that is, whether the person in the picture is watching the audio-visual advertisement being played. Therefore, the most important process is to collect relevant information (attributes) of the "people in the picture who are watching the video advertisement being played". For this process, it must also include the condition of "continue watching the video and audio advertisement being broadcast for more than a period of time." The condition for this period of time is 5 seconds. In addition, the value of 5 seconds is only an embodiment of the present invention, and is not a limitation of the present invention. The above-mentioned 5 seconds time length can be any number of seconds. Continuing from the output results of step 1070, step 1060, and step 1030, the basic condition of the target person (Target Person) that needs to be counted in the present invention is: "The person in the picture continues to watch the video advertisement being broadcast for more than a preset period of time. Over time". After the step 1070, step 1060, and step 1030 are executed, the judgment of step 1080 is executed.

步驟1080中，係判斷圖片中的人是否正在注視該影音顯示裝置850的螢幕890且持續注視該螢幕890的時間長度超過一時間門檻值，例如：5秒以上。依據本發明的一實施例，圖片中同時具備上述2種條件的人才是符合需要被統計的標的，圖片中其他不符合上述2種條件的人則會予以忽略。步驟1080中的判斷，其結果若為真(Yes)，則執行步驟1110，否則執行步驟1120。In step 1080, it is determined whether the person in the picture is looking at the screen 890 of the audio-visual display device 850 and the time length of looking at the screen 890 continuously exceeds a time threshold, for example, more than 5 seconds. According to an embodiment of the present invention, the talents in the picture who meet the above two conditions at the same time meet the targets that need to be counted, and the other people in the picture who do not meet the above two conditions will be ignored. If the result of the judgment in step 1080 is true (Yes), step 1110 is executed, otherwise, step 1120 is executed.

步驟1090中，係該媒體播放器880對該影像處理裝置300M提供目前正在播出的影音廣告的播放進度及該內容伺服器870對該影像處理裝置300M提供目前正在播放的影音廣告的內容屬性。依據本發明的另一實施例，該目前影音廣告的播放進度及目前正在播放的影音廣告的內容屬性可以都由該內容伺服器870來提供或是都由該媒體播放器880來提供。步驟1090中，該媒體播放器880提供目前正在播出的影音廣告的播放進度，該播放進度基本上係指該目前正在播放的影音廣告中的第幾秒。該影音廣告的內容屬性包括：目前正在播放的影音廣告的編號、該影音廣告的播放順序、該影音廣告的標的產品(例如：冰淇淋、汽車、眼鏡、保健食品…等)、該影音廣告所適合的族群對象(例如：婦女、老人、小孩、男性上班族…等)、該影音廣告預設的播出期間、該影音廣告播放的總長度(時間)…等各種資訊。In step 1090, the media player 880 provides the video processing device 300M with the playback progress of the video and audio advertisement currently being played, and the content server 870 provides the video processing device 300M with the content attributes of the video and audio advertisement currently being played. According to another embodiment of the present invention, the playback progress of the current audio-visual advertisement and the content attributes of the currently-played audio-visual advertisement may both be provided by the content server 870 or by the media player 880. In step 1090, the media player 880 provides the playback progress of the audiovisual advertisement currently being broadcast, and the playback progress basically refers to the number of seconds in the audiovisual advertisement currently being played. The content attributes of the audio-visual advertisement include: the number of the audio-visual advertisement currently being played, the playing sequence of the audio-visual advertisement, the target product of the audio-visual advertisement (for example: ice cream, car, glasses, health food... etc.), the audio-visual advertisement suitable for The ethnic group of the audience (for example: women, the elderly, children, male office workers... etc.), the preset broadcast period of the video and audio advertisement, the total length (time) of the video and audio advertisement... and other information.

步驟1110中，係同步記錄圖片中人的相關資訊(內容屬性)與影音廣告的內容屬性。步驟1110中會執行記錄該步驟1060、步驟1030、步驟1090中的各輸出結果等相關資訊。上述所謂的「記錄」係為一統計記錄(圖未示)，該統計記錄係以新增的方式來累積資料庫450內的記錄資料。該統計記錄包括目前的日期/時間、正在觀看該影音廣告的人之性別、年紀、身高、胖瘦、職業、表情與該目前正在播放的影音廣告的內容屬性(編號、標的產品、預設的播出期間、播放的總長度(時間))與目前播放進度等各種欄位資訊。又該統計記錄可以是一資料庫檔案、表格、文字檔等記錄格式。該統計記錄可以儲存在該影像處理裝置300M的硬碟385中或是該非揮發性記憶體330內。依據本發明的一實施例，當該統計記錄到達預定的記錄數量或是該影音廣告達到預定的測試時間後，儲存於該影像處理裝置300M的該統計記錄會透過該網路860傳送至其他的伺服器(圖未示)。步驟1110執行完畢，會跳回至步驟1010繼續圖片中其他人的影像資料分析，持續反覆循環的執行。依據本發明的一實施例，基本上，圖片中的每一個人都會以平行處理的方式來偵測其性別、年紀、身高、胖瘦、職業、表情等屬性(人的屬性)。In step 1110, the related information (content attribute) of the person in the picture and the content attribute of the video and audio advertisement are synchronously recorded. In step 1110, relevant information such as the output results in step 1060, step 1030, and step 1090 will be recorded. The aforementioned "record" is a statistical record (not shown in the figure), and the statistical record accumulates the record data in the database 450 in a newly added manner. The statistical record includes the current date/time, the gender, age, height, fat, thin, occupation, facial expressions of the person watching the video advertisement and the content attributes (number, target product, preset Various field information such as the duration of the broadcast, the total length (time) of the broadcast, and the current playback progress. In addition, the statistical record can be in a record format such as a database file, a table, and a text file. The statistical record can be stored in the hard disk 385 of the image processing device 300M or in the non-volatile memory 330. According to an embodiment of the present invention, when the statistical record reaches a predetermined number of records or the audio-visual advertisement reaches a predetermined test time, the statistical record stored in the image processing device 300M will be transmitted to others through the network 860 Server (not shown). After step 1110 is executed, it will jump back to step 1010 to continue the analysis of other people's image data in the picture, and continue to execute iteratively. According to an embodiment of the present invention, basically, each person in the picture will detect their gender, age, height, weight, occupation, expression and other attributes (personal attributes) in a parallel processing manner.

又，承上述步驟1080中的執行結果，當其結果為假(No)時，則執行步驟1120。In addition, following the execution result in step 1080 above, when the result is false (No), step 1120 is executed.

步驟1120中，係判斷是否已經達到預定的測試時間或是該統計記錄所累積的資料量已經足夠。正常的情況下，圖14中該步驟910的人物調查/資料統計階段在執行時會有一預定的期間，該預定的期間可以是1天、3天、7天…等不同的時間區間。當時間區間較長時，該累積的資料量較多，則該統計記錄的分析結果在實際運用時會較接近本發明所預定達到的目標。此外，當該統計記錄的累積資料量已經達到一定的數目時，可以進一步對該統計記錄與該等影音廣告的關連性來分析。當步驟1120的判斷結果為真(Yes)時，則執行步驟1130，否則跳回步驟1010中繼續執行。In step 1120, it is determined whether the predetermined test time has been reached or the amount of data accumulated in the statistical record is sufficient. Under normal circumstances, the person survey/data statistics phase of step 910 in FIG. 14 has a predetermined period when it is executed, and the predetermined period can be 1 day, 3 days, 7 days... and other different time intervals. When the time interval is longer, the amount of accumulated data is larger, and the analysis result of the statistical record will be closer to the goal predetermined by the present invention in actual application. In addition, when the cumulative data volume of the statistical record has reached a certain number, the correlation between the statistical record and the audio-visual advertisements can be further analyzed. When the judgment result of step 1120 is true (Yes), step 1130 is executed, otherwise, it jumps back to step 1010 to continue execution.

步驟1130中，係將該統計記錄(圖未示)透過網路860輸出到其他裝置上來分析。基本上，該統計記錄會透過該網路860來輸出，之後會繼續對該統計記錄與該等影音廣告的關連性來分析。依據本發明的另一實施例，該統計記錄(圖未示)係透過WiFi或是藍芽裝置輸出至一手持裝置上。In step 1130, the statistical record (not shown) is output to other devices via the network 860 for analysis. Basically, the statistical record will be output through the network 860, and then the correlation between the statistical record and the audio-visual advertisements will be analyzed. According to another embodiment of the present invention, the statistical record (not shown) is output to a handheld device via WiFi or a Bluetooth device.

步驟1140中，係分析該統計記錄與各影音廣告的內容間的關連性。該分析行為的執行係由該等影音廣告內容的製作廠商、或是由該影像處理裝置300M的提供者來執行。承上述步驟1110中執行後所累積的該統計記錄(圖未示)進一步來分析，該分析的結果可能會得到：「廣告A吸引的族群是70歲以上老年人」、「廣告B吸引的族群是年紀35-45歲間的上班族女性」、「廣告C的廣告內容不吸引人，大多數的人的注視時間幾乎都只有1.5秒」、「廣告D的1:30至2:05的內容最容易引起所有人大笑」…等分析結果，該等許多的分析結果會做為圖14的步驟920中該「智慧電子看板應用階段」在執行時的資訊基礎，同時該結論也代表該等影音廣告內容的製作廠商的無人化調查問卷。In step 1140, the correlation between the statistical record and the content of each video and audio advertisement is analyzed. The execution of the analysis behavior is performed by the manufacturer of the video and audio advertisement content, or by the provider of the image processing device 300M. Continuing to analyze the statistical records (not shown) accumulated after the execution of step 1110 above, the results of the analysis may be: "The ethnic group attracted by Advertisement A is the elderly over 70 years old", "The ethnic group attracted by Advertisement B It is an office worker between the ages of 35-45", "The content of the advertisement in Advertisement C is not attractive, and most people's attention time is almost 1.5 seconds", "The content of Advertisement D is between 1:30 and 2:05 It’s most likely to cause everyone to laugh"... and other analysis results. These many analysis results will be used as the information basis of the "smart electronic signage application stage" in step 920 of Figure 14 when it is executed, and the conclusion also represents the audiovisual An unmanned questionnaire for the manufacturer of the advertising content.

步驟1140執行完畢，即表示圖15中的流程已經結束，同時也代表圖14中該步驟910的「智慧電子看板人物調查/統計分析階段」已執行完畢。The completion of step 1140 means that the process in FIG. 15 has ended, and it also means that the "smart electronic signage character survey/statistical analysis stage" of step 910 in FIG. 14 has been executed.

上述步驟1140執行後的分析結果，可以達到「哪些影音廣告的內容會吸引到哪些族群」的目的。所以步驟1140執行後的分析結果，都可以新增再作為該些影音廣告的「內容屬性」的一部份。The analysis result after the execution of the above step 1140 can achieve the purpose of "which video and audio advertisement content will attract which ethnic groups". Therefore, the analysis result after the execution of step 1140 can be added as part of the "content attribute" of the video advertisement.

又，上述步驟1040、步驟1050、步驟1070的執行係為了判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890，若圖片中的人不是正在注視影音顯示裝置850上的螢幕890時，則圖片中此種情況的人不會被列入統計的對象。依據本發明的另一實施例，當不需要判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890時，則步驟1040、步驟1050、步驟1070可以省略。此外，在此情況下，也不需步驟1080去判斷圖片中人是否注視影音顯示裝置850上的螢幕890的時間長度是否超過一預設的時間門檻值，此時步驟1080也會同步省略。Furthermore, the execution of the above steps 1040, 1050, and 1070 is to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850. If the person in the picture is not looking at the screen 890 on the audio-visual display device 850, then People in this situation in the picture will not be included in the statistics. According to another embodiment of the present invention, when it is not necessary to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850, step 1040, step 1050, and step 1070 can be omitted. In addition, in this case, there is no need to step 1080 to determine whether the person in the picture looks at the screen 890 on the audio-visual display device 850 for a time longer than a preset time threshold. In this case, step 1080 will also be omitted simultaneously.

請參考圖16，圖16係顯示依據圖14中關於「智慧電子看板系統的應用階段」920的流程圖。該流程開始於步驟1200。Please refer to FIG. 16. FIG. 16 shows a flowchart based on the "application stage of the smart electronic signage system" 920 in FIG. 14. The process starts at step 1200.

步驟1200中，係取得來自該影像擷取裝置400M的影像資料。圖16中步驟1200與圖15中步驟1000是相同，請參考圖15中步驟1000的說明。In step 1200, the image data from the image capturing device 400M is obtained. Step 1200 in FIG. 16 is the same as step 1000 in FIG. 15. Please refer to the description of step 1000 in FIG. 15.

步驟1205中，係提供該影音廣告內容的屬性的相關資料。該廣告容的屬性係指由該內容伺服器870或是該媒體播放器880所提供的影音廣告的內容屬性等資料。上述該影音廣告的內容屬性包括：每一影音廣告的編號、每一影音廣告的播放順序、目前正在播放的影音廣告的編號、每一影音廣告的標的產品(例如：冰淇淋、汽車、眼鏡、保健食品…等)、每一影音廣告所適合的族群對象(例如：婦女、老人、小孩、男性上班族…等)、每一影音廣告預設的播出期間、每一影音廣告播放的總時間…等各種資訊。其中，每一影音廣告所適合的族群對象與後續偵測影像資料的圖片中所具有的各種「人的屬性」的關係程度非常高，該「人的屬性」係為，例如：性別、身高、胖瘦、年紀、職業…等。In step 1205, relevant information about the attributes of the audio-visual advertisement content is provided. The attributes of the advertisement content refer to the content attributes and other data of the audio-visual advertisement provided by the content server 870 or the media player 880. The content attributes of the above-mentioned audio-visual advertisements include: the serial number of each audio-visual advertisement, the playing sequence of each audio-visual advertisement, the number of the audio-visual advertisement currently being played, and the target product of each audio-visual advertisement (for example: ice cream, car, glasses, health care) Food... etc.), the ethnic group that each video advertisement is suitable for (for example: women, the elderly, children, male office workers... etc.), the preset broadcast period of each video and audio advertisement, the total playing time of each video and audio advertisement... And other information. Among them, the ethnic group that each audiovisual advertisement is suitable for has a very high degree of relationship with the various "human attributes" in the pictures of the subsequent detection image data. The "human attributes" are, for example: gender, height, Fat, thin, age, occupation... etc.

步驟1210中，係辨識該影像資料中具有主題(subject)的部分。圖16中的步驟1210與圖15中步驟1010是相同，請參考圖15中步驟1010的說明。In step 1210, identify the subject part of the image data. Step 1210 in FIG. 16 is the same as step 1010 in FIG. 15. Please refer to the description of step 1010 in FIG. 15.

步驟1220中，係對該圖片中人的臉/頭部的偵測。圖16中的步驟1220與圖15中步驟1020是相同，請參考圖15中步驟1020的說明。In step 1220, it is the detection of the face/head of the person in the picture. Step 1220 in FIG. 16 is the same as step 1020 in FIG. 15. Please refer to the description of step 1020 in FIG. 15.

步驟1240中，係對該圖片中人的頭部方向的偵測。圖16中的步驟1240與圖15中步驟1040是相同，請參考圖15中步驟1040的說明。In step 1240, it is the detection of the head direction of the person in the picture. Step 1240 in FIG. 16 is the same as step 1040 in FIG. 15. Please refer to the description of step 1040 in FIG. 15.

步驟1250中，係對該圖片中人的頭部位置之估算。圖16中的步驟1250與圖15中步驟1050是相同，請參考圖15中步驟1050的說明。In step 1250, it is an estimation of the head position of the person in the picture. Step 1250 in FIG. 16 is the same as step 1050 in FIG. 15. Please refer to the description of step 1050 in FIG. 15.

步驟1270中，係對該圖片中人的注視區域的估算。承步驟1240、步驟1250執行後的結果(輸出)，可以進一步得知圖片中某一人的注視區域，以判斷該某一人是否正在注視該影音顯示裝置850上的螢幕890。圖16中的步驟1270與圖15中步驟1070是相同，請參考圖15中步驟1070的說明。In step 1270, it is the estimation of the gaze area of the person in the picture. Following the results (outputs) after steps 1240 and 1250 are executed, the gaze area of a person in the picture can be further known to determine whether the person is looking at the screen 890 on the audio-visual display device 850. Step 1270 in FIG. 16 is the same as step 1070 in FIG. 15. Please refer to the description of step 1070 in FIG. 15.

步驟1260中，係對該圖片中人的性別、年紀、表情的偵測。依據本發明的另一實施例，該步驟1260中只針對該圖片中人的性別、年紀來做偵測，並不包含表情的偵測。圖16中的步驟1260與圖15中步驟1060是相同，請參考圖15中步驟1060的說明。In step 1260, it is the detection of the gender, age, and expression of the person in the picture. According to another embodiment of the present invention, the step 1260 only detects the gender and age of the person in the picture, and does not include the detection of expressions. Step 1260 in FIG. 16 is the same as step 1060 in FIG. 15. Please refer to the description of step 1060 in FIG. 15.

回到上述步驟1210，步驟1210中，當辨識圖片中的人所具有的特徵的標的為「身體」時，則進入步驟1230。Returning to the above step 1210, in step 1210, when the subject of identifying the characteristics of the person in the picture is "body", then step 1230 is entered.

步驟1230中，係對該圖片中人的身體部分的偵測，特別是關於「人的身高」、「人的體態(胖/瘦)」、「人的衣著」等部分的偵測。雖然步驟1230中顯示該「人的身高」、「人的體態(胖/瘦)」、「人的衣著」的偵測被列在同一步驟中，但實作上對於該「人的身高」、「人的體態(胖/瘦)」、「人的衣著」的偵測係與步驟1240、步驟1250、步驟1260是平行來處理的。圖16中的步驟1230與圖15中步驟1030是相同，請參考圖15中步驟1030的說明。In step 1230, it is the detection of the body part of the person in the picture, especially the detection of the "person's height", "person's posture (fat/thin)", "person's clothing" and other parts. Although it is shown in step 1230 that the detections of the "person's height", "person's posture (fat/thin)", and "person's clothing" are listed in the same step, in practice, for the "person's height", The detection of "human posture (fat/thin)" and "human clothing" is processed in parallel with step 1240, step 1250, and step 1260. Step 1230 in FIG. 16 is the same as step 1030 in FIG. 15. Please refer to the description of step 1030 in FIG. 15.

步驟1270中，係對該圖片中人的注視區域的估算。承步驟1240與步驟1250的執行結果(輸出)，步驟1270可以估算出圖片中某一人所注視的區域是在何處。步驟1270執行完畢後即可知道「圖片中的某一人是否正在注視該影音顯示裝置850上的螢幕890」，即代表圖片中的該某一人是否正在觀看播放中的影音廣告。所以收集「圖片中正在觀看播放中的影音廣告的人」的相關資訊(屬性)是最重要的流程。對於此流程還必須包括「持續觀看正在播出的影音廣告一段時間以上」的條件，該一段時間的條件是5秒。又，該5秒的數值僅作為本發明的一實施例，並非對本發明的一限制，上述該5秒的時間長度可以是任何秒數的時間長度。承上步驟1270、步驟1260、步驟1230的輸出結果，本發明中需要被統計的標的對象(Target Person)，其基本條件是：「圖片中人持續觀看正在播出的影音廣告的時間長度超過一預設的時間門檻值」。該步驟1270、步驟1260、步驟1230執行完畢後則執行步驟1280的判斷。In step 1270, it is the estimation of the gaze area of the person in the picture. Following the execution results (outputs) of steps 1240 and 1250, step 1270 can estimate where the area that a person is looking at in the picture is. After step 1270 is executed, it is possible to know "whether a person in the picture is watching the screen 890 on the audio-visual display device 850", that is, whether the person in the picture is watching the video advertisement being played. Therefore, the most important process is to collect relevant information (attributes) of the "people in the picture who are watching the video advertisement being played". For this process, it must also include the condition of "continue to watch the video and audio advertisement being broadcast for more than a period of time", and the condition for this period of time is 5 seconds. In addition, the value of 5 seconds is only used as an embodiment of the present invention, and is not a limitation of the present invention. The above-mentioned time length of 5 seconds can be any number of seconds. Continuing from the output results of step 1270, step 1260, and step 1230, the basic condition of the target person (Target Person) that needs to be counted in the present invention is: "The length of time that the person in the picture continues to watch the video advertisement being broadcast exceeds one The preset time threshold". After the steps 1270, 1260, and 1230 are executed, the judgment of step 1280 is executed.

步驟1280中，係判斷圖片中的人是否正在注視該影音顯示裝置850的螢幕890且持續注視的時間長度超過一時間門檻值，例如：5秒以上。為使依據本發明的一實施例，圖片中同時具備上述2種條件的人才是本發明的所揭露的方法中的標的，其他的人則會予以忽略。步驟1280中的判斷，其結果若為真(Yes)，則執行步驟1290，否則執行步驟1330。In step 1280, it is judged whether the person in the picture is looking at the screen 890 of the audio-visual display device 850 and the duration of looking at the screen 890 continuously exceeds a time threshold, for example, more than 5 seconds. In order to make according to an embodiment of the present invention, the talents in the picture who meet the above two conditions at the same time are the targets in the disclosed method of the present invention, and other people will be ignored. If the result of the judgment in step 1280 is true (Yes), then step 1290 is executed; otherwise, step 1330 is executed.

步驟1290中，係即時統計圖片中該正在注視螢幕890的人及其屬性(Attribute)。上述圖片中該正在注視螢幕890的人的屬性係指圖片中正在注視螢幕890的人的「性別」、「年紀」、「身高」、「體態(胖/瘦)」、「配件」、「衣著」、「職業」…等各種特徵，步驟1290即時的對該圖片中該正在注視螢幕890的人的屬性做一統計(即時屬性統計，圖未示)。該即時屬性統計的統計結果可能是「20-26歲的女性：6人」、「70歲的年長者：6人、6-10歲的小孩：2人」、「戴眼鏡的女性：10人」、「45-60歲的男性上班族：16人、45-55歲女性上班族：3人」、「體態偏肥胖的女性：9人、體態適中的女性：2人、體態偏瘦的女性：4人」…等於圖片中各種人的屬性統計。上述該即時屬性統計的結果會做為是否變更目前正在播放的影音廣告內容的依據。In step 1290, the person who is looking at the screen 890 in the picture and its attributes are counted in real time. The attributes of the person looking at the screen 890 in the above picture refer to the "gender", "age", "height", "body (fat/slim)", "accessories", and "clothing" of the person looking at the screen 890 in the picture. "", "Occupation"... and other features, step 1290 performs real-time statistics on the attributes of the person who is looking at the screen 890 in the picture (real-time attribute statistics, not shown). The statistical results of the real-time attribute statistics may be "women aged 20-26: 6 people", "elders aged 70: 6 people, children aged 6-10: 2 people", "women with glasses: 10 people" ", "45-60 year old male office worker: 16 people, 45-55 year old female office worker: 3 people", "obese female: 9 people, moderate female: 2 people, thin female : 4 people"...It is equal to the attribute statistics of various people in the picture. The result of the above real-time attribute statistics will be used as a basis for whether to change the content of the video and audio advertisement currently being played.

步驟1300中，係判斷圖片中所有的人是否都已經偵測/統計完成。依據本發明的一實施例，對於圖片中的每一人的偵測，該影像分析應用程式460M幾乎都是平行的處理的。承上述說明，該影像分析應用程式460M包括對：頭部方向之偵測、頭部位置之估算、性別之判斷、年紀之估算、身高的估算、胖瘦的判斷、職業(衣著)的判斷…等屬於臉/頭部與身體方面的偵測。當該影像分析應用程式460M完成以上工作後，則執行步驟1310，否則跳回至步驟1210中執行。In step 1300, it is determined whether all people in the picture have been detected/statistically completed. According to an embodiment of the present invention, the image analysis application 460M processes the detection of everyone in the picture almost in parallel. Following the above description, the image analysis application 460M includes: detection of head orientation, estimation of head position, judgment of gender, estimation of age, estimation of height, estimation of fatness and thinness, estimation of occupation (clothing)... It belongs to the detection of face/head and body. When the image analysis application 460M completes the above tasks, it executes step 1310, otherwise, it jumps back to step 1210 for execution.

步驟1310中，係依據即時人的屬性統計的結果、各影音廣告的內容屬性來挑選目前最相符(適合)的影音廣告。上述即時人的屬性統計係步驟1290中統計圖片中所有正在注視螢幕890的人及其屬性(Attribute)；該影音廣告屬性係由步驟1205中所提供的各影音廣告的內容屬性的相關資料。該影像分析應用程式460M透過智慧型的比對與分析，該影像分析應用程式460M會挑選出最相符(適合)「目前圖片中正在觀看影音廣告的人」的影音廣告。該所挑選出的影音廣告可能只有1則，也可能是1則以上。當該所選出的影音廣告是1則以上時，則進一步還會包括決定其播出的先後順序。上述該智慧型的比對與分析可能是：圖片中所包括的人的三分之二以上為30-65歲的男性時，選擇播出關於汽車的影音廣告；圖片中所包括的人的四分之三以上為12-50歲的女性時，選擇播出關於女性生理用品的影音廣告；圖片中所包括的人的二分之一以上有戴眼鏡時，選擇播出關於近視眼鏡的影音廣告…等分別屬於不同的內容屬性的影音廣告。In step 1310, the most suitable (appropriate) video and audio advertisements are selected based on the results of real-time human attribute statistics and the content attributes of each video and audio advertisement. The above-mentioned real-time person attribute statistics is the statistics of all people who are looking at the screen 890 in the picture in step 1290 and their attributes; the audio-visual advertisement attributes are related data of the content attributes of each audio-visual advertisement provided in step 1205. The image analysis application 460M uses intelligent comparison and analysis. The image analysis application 460M will select the most suitable (suitable) audio-visual advertisements of "the person watching the audio-visual advertisement in the current picture". The selected audio-visual advertisement may only be one or more than one. When the selected audio-visual advertisement is more than one, it will further include determining the order of its broadcast. The above-mentioned intelligent comparison and analysis may be: when more than two-thirds of the people included in the picture are males aged 30-65, choose to broadcast audio-visual advertisements about cars; four of the people included in the picture When more than three-thirds of women are 12-50 years old, choose to broadcast audio-visual advertisements about female physiological products; when more than one-half of the people included in the picture wear glasses, choose to broadcast audio-visual advertisements about myopia glasses …And other video and audio advertisements that belong to different content attributes.

步驟1320中，係通知該內容伺服器870變更目前播放的影音廣告與順序。由於該影音廣告的檔案是儲存於該內容伺服器870內，且該影音廣告的播放與影音串流的輸出也是由該內容伺服器870所完成。故該內容伺服器870可以決定目前要播放的影音廣告的內容(內容屬性)及其播放順序。所以該影像處理裝置300M通知該內容伺服器870，以「步驟1310中，透過智慧的比對與分析後所挑選出的影音廣告及其順序」來播放該影音廣告。又，此時，對於目前尚未播放完畢的影音廣告，該內容伺服器870可以是即時中斷該影音廣告並切換成新順序的影音廣告的方式來播出。在切換新順序後所播出的影音廣告，其廣告屬性是與圖片中目前正在注視影音廣告播出的人的屬性是最相關的，所以也極可能是圖片中目前正在注視影音廣告播出的人最有興趣的影音廣告。就以該影音廣告來說，該影音廣告係引起該圖片中最多人的興趣。故，本發明中的智慧型電子看板系統100M可以將該影音廣告的廣告效益發揮到最大。In step 1320, the content server 870 is notified to change the currently playing video and audio advertisement and its sequence. Since the file of the video and audio advertisement is stored in the content server 870, the playback of the video and audio advertisement and the output of the video and audio stream are also completed by the content server 870. Therefore, the content server 870 can determine the content (content attributes) of the current video advertisement to be played and its play order. Therefore, the video processing device 300M notifies the content server 870 to play the video and audio advertisements with "in step 1310, the video and audio advertisements and their order selected after intelligent comparison and analysis". In addition, at this time, for the audio-visual advertisement that has not been played so far, the content server 870 may immediately interrupt the audio-visual advertisement and switch to a new sequence of audio-visual advertisements to broadcast. After switching the new order, the advertisement attribute of the video advertisement is the most relevant to the attribute of the person currently watching the video advertisement broadcast in the picture, so it is very likely that the picture is currently watching the video advertisement broadcast. People are most interested in video and audio ads. In the case of the audio-visual advertisement, the audio-visual advertisement aroused the interest of the most people in the picture. Therefore, the smart electronic signage system 100M of the present invention can maximize the advertising benefits of the audio-visual advertisement.

步驟1330中，係該影像處理裝置300M判斷是否收到該內容伺服器870所送出的暫停/關機信號。為使影音廣告的播映成本降低，該等影音廣告不一定是24小時來循環播放，而是在每日中的某一段時間來播放，例如：07：00至24：00的時間來播放。當該影像處理裝置300M收到該內容伺服器870所送出的暫停/關機信號時，則結束圖16中該智慧電子看板應用階段的流程，否則，跳回至步驟1210中繼續執行。In step 1330, the image processing device 300M determines whether the pause/shutdown signal sent by the content server 870 is received. In order to reduce the cost of broadcasting video and audio advertisements, such video and audio advertisements are not necessarily played in a loop for 24 hours, but are played at a certain time of the day, for example, from 07:00 to 24:00. When the image processing device 300M receives the pause/shutdown signal sent by the content server 870, it ends the process of the smart electronic signage application stage in FIG. 16; otherwise, it jumps back to step 1210 to continue execution.

又，上述步驟1240、步驟1250、步驟1270的執行係為了判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890，若圖片中的人不是正在注視影音顯示裝置850上的螢幕890時，則圖片中此種情況的人不會被列入統計的對象。依據本發明的另一實施例，當不需要判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890時，則步驟1240、步驟1250、步驟1270可以省略。此外，在此情況下，也不需步驟1280去判斷圖片中人是否注視影音顯示裝置850上的螢幕890的時間長度是否超過一預設的時間門檻值，此時步驟1280也會同步省略。Moreover, the execution of the above steps 1240, 1250, and 1270 is to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850. If the person in the picture is not looking at the screen 890 on the audio-visual display device 850, then People in this situation in the picture will not be included in the statistics. According to another embodiment of the present invention, when it is not necessary to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850, step 1240, step 1250, and step 1270 can be omitted. In addition, in this case, step 1280 is not required to determine whether the person in the picture looks at the screen 890 on the audio-visual display device 850 for a period of time that exceeds a preset time threshold. In this case, step 1280 will also be omitted simultaneously.

又，雖然圖14中繪示步驟910與步驟920是依照先後順序執行，並分別以圖15與圖16的2個子流程來代表步驟910與步驟920的執行細節；但，依據本發明的另一實施例，圖14中的步驟910及步驟920亦可以各自獨立地平行執行，也就是說，代表步驟910及步驟920之執行內容的2個子流程(亦即，圖15及圖16)可以同時執行。Furthermore, although step 910 and step 920 are shown in FIG. 14 to be executed in sequence, and the two sub-processes of FIG. 15 and FIG. 16 are used to represent the execution details of step 910 and step 920 respectively; however, according to another aspect of the present invention In an embodiment, step 910 and step 920 in FIG. 14 can also be executed independently in parallel, that is, the two sub-processes representing the execution content of step 910 and step 920 (ie, FIGS. 15 and 16) can be executed at the same time .

本發明的實施例中，該影像處理裝置300M取得即時的影像資料後，首先辨識/分析圖片中該正在注視影音廣告的人的屬性，再經過與各影音廣告的內容屬性執行一智慧的分析比對，以得到最新的影音廣告播放順序且即時地變更影音廣告的播放，所以該最新播出的影音廣告的廣告內容極可能引起圖片中正在觀看影音廣告的人的興趣，也表示目前正在播出的影音廣告都是被目前最相關的人來觀看。所以，使用本發明所揭露的該智慧電子看板系統與方法會讓該影音廣告的廣告效益發揮到最大，以節省廣告經費。以上所述僅為本發明之較佳實施例，凡依本發明申請專利範圍所做之均等變化與修飾，皆應屬本發明之涵蓋範圍。In the embodiment of the present invention, after the image processing device 300M obtains the real-time image data, it first identifies/analyzes the attributes of the person watching the audio-visual advertisement in the picture, and then performs a smart analysis comparison with the content attributes of each audio-visual advertisement. Yes, in order to get the latest video and audio advertisement play order and change the playback of the video and audio advertisement in real time, so the advertisement content of the latest video and audio advertisement is very likely to arouse the interest of the people watching the video and audio advertisement in the picture, and it also means that it is currently being broadcast. 'S video ads are all watched by the most relevant people at the moment. Therefore, the use of the smart electronic signage system and method disclosed in the present invention will maximize the advertising benefits of the audio-visual advertisement, so as to save advertising expenses. The foregoing descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made in accordance with the scope of the patent application of the present invention shall fall within the scope of the present invention.

40:注視視線 50:標的(Target) 60, 70:影像攝影機 100:影像資料擷取與分析系統 150:雲資料儲存單元 180, 190:網路 300:影像分析伺服器 300M:影像處理裝置 310:中央處理單元 320:動態隨機存取記憶體 330:非揮發性記憶體 340:唯讀記憶體 350:儲存介面控制器 360:網路介面控制器 370:圖形處理單元 380:實體儲存裝置陣列 385:實體儲存裝置 390:匯流排 400, 400A, 400B, 400C, 400N, 400M:影像擷取裝置 405:人流分析模組 410:頭部偵測模組 415:訪客身份辨識模組 420:商品偵測模組 425:注視區域分析模組 430:注視物品分析模組 435:興趣度分析模組 440:移動路徑分析模組 445:訪客屬性分析模組 450:資料庫 455:回流客分析模組 460:影像分析應用程式 465:雲通道服務 465M:多媒體播放程式 470:超控制器 475:作業系統 480:硬體 800A, 800B:頭部 805A, 805B, 805C, 805D:切線 810:屏幕 815A, 815B:投影 820:投影區域(機率95%) 820E:投影區域(機率98%) 830:投影區域(機率65%) 840:投影區域(機率15%) 850:影音顯示裝置 855:纜線 860:網路 870:內容伺服器 880:媒體播放器 890:螢幕 510, 520, 530, 550, 560, 570, 610, 620, 630, 640,700, 705, 710, 715, 720, 725, 772A, 722B, 722C, 730,735, 740, 745, 750, 755, 760,765, 900, 910, 920,1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090,1110, 1120, 1130, 1140, 1200, 1205, 1210, 1220, 1230, 1240,1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320, 1330:步驟40: gaze at sight 50: Target 60, 70: Video camera 100: Image data acquisition and analysis system 150: Cloud data storage unit 180, 190: Internet 300: image analysis server 300M: image processing device 310: Central Processing Unit 320: dynamic random access memory 330: Non-volatile memory 340: Read Only Memory 350: storage interface controller 360: network interface controller 370: Graphics Processing Unit 380: Physical Storage Device Array 385: physical storage device 390: Bus 400, 400A, 400B, 400C, 400N, 400M: image capture device 405: People Flow Analysis Module 410: Head Detection Module 415: visitor identification module 420: Commodity Detection Module 425: Gaze Area Analysis Module 430: Gaze Item Analysis Module 435: Interest Analysis Module 440: Moving Path Analysis Module 445: visitor attribute analysis module 450: database 455: Return customer analysis module 460: Image Analysis Application 465: Cloud Channel Service 465M: Multimedia player 470: Ultra Controller 475: Operating System 480: hardware 800A, 800B: head 805A, 805B, 805C, 805D: tangent 810: screen 815A, 815B: projection 820: Projection area (95% probability) 820E: Projection area (98% chance) 830: Projection area (65% chance) 840: Projection area (15% chance) 850: audio-visual display device 855: Cable 860: Network 870: Content Server 880: Media Player 890: screen 510, 520, 530, 550, 560, 570, 610, 620, 630, 640,700, 705, 710, 715, 720, 725, 772A, 722B, 722C, 730,735, 740, 745, 750, 755, 760,765, 900, 910, 920,1000, 1010, 1020, 1030, 1040, 1050, 1060, 1070, 1080, 1090,1110, 1120, 1130, 1140, 1200, 1205, 1210, 1220, 1230, 1240, 1250, 1260, 1270, 1280, 1290, 1300, 1310, 1320, 1330: steps

圖1A係依據先前技術中以2部Camera來偵測女性所注視的物體之示意圖。圖1B係依據圖1A中Camera 1 與Camera 2所記錄女性注視物體時臉部眼睛的示意圖。圖2係依據本發明的一實施例中的影像資料擷取與分析系統的系統架構之方塊圖。圖3A係依據圖2中的影像分析伺服器的硬體基本架構之方塊圖。圖3B係依據圖2中的影像分析伺服器的軟、硬體的架構關係之方塊圖。圖4係依據圖3B中的影像資料之分析與應用中各功能模組之方塊圖。圖5係依據本發明的一實施例中基於CNN架構下的辨識影像資料中人們及物體的整體應用概念之流程圖。圖6係依據圖5中所示的訓練階段之流程圖。圖7A係依據本發明的一實施例中基於CNN架構下關於人的資料集。圖7B係依據本發明的一實施例中基於CNN架構下關於物體的資料集(一)。圖7C係依據本發明的一實施例中基於CNN架構下關於物體的資料集(二)。圖8A係依據本發明的一實施例中該影像資料之分析與應用執行辨識影像圖片中的顧客的臉/頭部之示意圖。圖8B係依據本發明的一實施例中該影像資料之分析與應用執行辨識影像圖片中的顧客身份與商品名稱之示意圖。圖8C係依據本發明的一實施例中該影像資料之分析與應用執行辨識影像圖片中的商品名稱之示意圖。圖9係依據本發明的一實施例中分析賣場中購物顧客最有興趣的商品之流程圖。圖10係依據圖9的步驟755中所揭露的資訊記錄之示意圖。圖11A係依據本發明的一實施例中人的頭部方向(1)、人的頭部方向(2)與投影光線之間的關係之示意圖。圖11B係依據圖11A中的人的頭部方向與投影光線之間所形成的一區域範圍之示意圖。圖11C係依據本發明的一實施例中人的頭部方向(3)、人的頭部方向(3)轉動角度計算與投影光線之間的關係之示意圖。圖11D係依據本發明的一實施例中人的頭部方向(4)、人的頭部方向(4)轉動角度計算與投影光線之間的關係之示意圖。圖12A係依據本發明的一實施例中的智慧電子看板系統之系統架構圖。圖12B係依據本發明的一實施例中的智慧電子看板系統之情境示意圖。圖13A係依據圖12A中的影像處理裝置的硬體的基本架構之方塊圖。圖13B係依據圖12A中的影像處理裝置的軟、硬體的架構之關係圖。圖14係依據本發明的一實施例中的智慧電子看板系統的整體操作之流程圖。圖15係依據圖14中關於智慧電子看板系統的人物偵測/統計分析階段之流程圖。圖16係依據圖14中關於智慧電子看板系統的應用階段之流程圖。FIG. 1A is a schematic diagram of detecting objects that a woman is looking at by using two cameras in the prior art. Fig. 1B is a schematic diagram of the face and eyes of a female when looking at an object recorded by Camera 1 and Camera 2 in Fig. 1A. FIG. 2 is a block diagram of the system architecture of the image data acquisition and analysis system according to an embodiment of the present invention. FIG. 3A is a block diagram based on the basic hardware architecture of the image analysis server in FIG. 2. FIG. 3B is a block diagram based on the structural relationship between the software and hardware of the image analysis server in FIG. 2. FIG. 4 is a block diagram of each functional module in the analysis and application of the image data in FIG. 3B. FIG. 5 is a flowchart of the overall application concept of recognizing people and objects in image data based on the CNN architecture according to an embodiment of the present invention. FIG. 6 is a flowchart based on the training phase shown in FIG. 5. FIG. 7A is a data set of people based on the CNN architecture in an embodiment of the present invention. FIG. 7B is a data set (1) of objects based on the CNN architecture in an embodiment of the present invention. FIG. 7C is a data set (2) of objects based on the CNN architecture according to an embodiment of the present invention. FIG. 8A is a schematic diagram of identifying the customer's face/head in the image picture according to the analysis of the image data and the application execution in an embodiment of the present invention. FIG. 8B is a schematic diagram of identifying the customer identity and product name in the image picture according to the analysis of the image data and the application execution in an embodiment of the present invention. FIG. 8C is a schematic diagram of identifying the product name in the image picture according to the analysis of the image data and the application execution in an embodiment of the present invention. FIG. 9 is a flow chart of analyzing the most interesting products of shopping customers in a store according to an embodiment of the present invention. FIG. 10 is a schematic diagram based on the information record disclosed in step 755 of FIG. 9. FIG. 11A is a schematic diagram of the relationship between the head direction (1), the head direction (2) of the person and the projected light in an embodiment of the present invention. FIG. 11B is a schematic diagram of a region formed between the direction of the human head and the projected light in FIG. 11A. 11C is a schematic diagram of the relationship between the calculation of the rotation angle of the human head direction (3) and the human head direction (3) and the projected light according to an embodiment of the present invention. FIG. 11D is a schematic diagram of the relationship between the calculation of the rotation angle of the person's head direction (4) and the person's head direction (4) and the projected light according to an embodiment of the present invention. FIG. 12A is a system architecture diagram of a smart electronic signage system according to an embodiment of the present invention. FIG. 12B is a schematic diagram of a scenario of a smart electronic signage system according to an embodiment of the present invention. FIG. 13A is a block diagram based on the basic hardware structure of the image processing device in FIG. 12A. FIG. 13B is a diagram based on the relationship between the software and hardware architecture of the image processing device in FIG. 12A. FIG. 14 is a flowchart of the overall operation of the smart electronic signage system in an embodiment according to the present invention. FIG. 15 is a flowchart based on the person detection/statistical analysis stage of the smart electronic signage system in FIG. 14. FIG. 16 is a flowchart based on the application phase of the smart electronic signage system in FIG. 14.

700,705,710,715,720, 725,722A,722B,722C,730, 735,740,745,750,755,760, 765:步驟700,705,710,715,720, 725,722A,722B,722C,730, 735,740,745,750,755,760, 765: step

Claims

A visitor's interest level analysis system is used to analyze a visitor's interest level in at least one item. The system includes: At least one image capturing device is set at a place to capture an image data of the place, the image data records a first head image of the visitor; and An image analysis server connected to the at least one image capture device for calculating and analyzing the image data from the at least one image capture device, the image analysis server further includes: A data processing center for executing an image analysis application program can determine a first head image corresponding to a first feature mapping (feature mapping) obtained from the first head image Head direction, and calculate a first projection area according to the first head direction; and A memory unit for temporarily storing the image data, the first head image, the first feature mapping, and other related data required or produced by the data processing center during the operation; Wherein, the calculation of the first projection area uses a virtual light source that is placed behind the visitor's head and projects light in a direction consistent with the direction of the first head to form the simulated first projection area When the first projection area covers the location of the at least one item, it means that the visitor has a degree of interest in the at least one item.

The system according to claim 1, wherein the image analysis server further has a storage unit composed of at least one physical storage device for providing storage space of the image analysis server.

The system according to claim 1, wherein the data processing center is composed of one or both of a central processing unit and a graphics processing unit.

The system according to claim 1, wherein the first feature mapping (feature mapping) is obtained by the image analysis server by analyzing the relationship between pixels in the first head image, which can describe the first head image. A set of data on the characteristics of a head image.

The system according to claim 1, wherein the first head direction refers to a degree of freedom of movement of the visitor's head in space in the first head image.

The system according to claim 5, wherein the degree of freedom is expressed in three rotational dimensions of roll, pitch, and yaw.

The system according to claim 5, wherein the degree of freedom is represented by a three-dimensional rectangular coordinate system representing front and rear, left and right, and up and down.

The system according to claim 1, wherein the at least one image capturing device and the image analysis server are directly connected or connected through a network.

The system according to claim 1, wherein the image data is only captured by a single lens of the at least one image capturing device.

The system according to claim 1, wherein the image analysis application is a software or a module with both software and hardware for analyzing the image data.

The system according to claim 1, which further includes a cloud data storage unit connected to the at least one image capture device and the image analysis server, and the cloud data storage unit is used to store the at least one image capture device The captured image data.

The system according to claim 11, wherein the image analysis server further includes a cloud channel service application program as an intermediary service program for data transmission between the image analysis server and the cloud data storage unit.

The system according to claim 11, wherein the image analysis server is a virtual machine located in the cloud data storage unit.

The system according to claim 1, wherein the first projection area is the same as the cross-sectional area represented by the first head image in space.

The system according to claim 1, wherein the first projection area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, which represents that when the position of the at least one item is covered by the sub-area, the at least one item Probability of being watched.

The system according to claim 15, wherein the probability corresponding to each of the plurality of sub-areas represents the degree of interest of the visitor to the at least one item.

The system according to claim 1, wherein it is further calculated whether the length of time that the first projection area covers the location of the at least one item exceeds a time threshold, as a method for judging whether the visitor has the degree of interest in the at least one item in accordance with.

The system according to claim 1, wherein the position of the at least one item is obtained by referring to an input location data, or calculated by analyzing all or part of the image of the at least one item in the image data.

The system according to claim 1, wherein words, numbers or symbols are added to the image data to indicate the identity (ID) or code of the visitor.

The system according to claim 1, wherein words, numbers or symbols are added to the image data to indicate the name or code of the at least one item.

The system according to claim 1, wherein the image data further includes all or part of the image of the at least one item, and a visual effect is added to or around all or part of the image of the at least one item for Highlight the visitor's degree of interest in the at least one item.

A visitor interest level analysis system, connected to at least one image capturing module and receiving an image data from the at least one image capturing module, is used to analyze a visitor's degree of interest in at least one subject. The system includes: A data processing center executes an image analysis application program, which can determine a first head image corresponding to a first feature mapping (feature mapping) obtained from a first head image in the image data A head direction, and a first projection area is calculated according to the first head direction; and A memory unit for temporarily storing the image data, the first head image, the first feature mapping, and other related data required or produced by the data processing center during the operation; Wherein, the calculation of the first projection area uses a virtual light source that is placed behind the visitor's head and projects light in a direction consistent with the direction of the first head to form the simulated first projection area When the first projection area covers the location of the at least one target, it means that the visitor has a degree of interest in the at least one target.

The system according to claim 22, wherein there is a storage unit composed of at least one physical storage device for providing storage space.

The system according to claim 22, wherein the data processing center is composed of one or both of a central processing unit and a graphics processing unit.

The system according to claim 22, wherein the first feature mapping (feature mapping) is obtained by the data processing center by analyzing the relationship between pixels in the first head image and can describe the first A set of data on the characteristics of the head image.

The system according to claim 22, wherein the first head direction refers to a degree of freedom of movement of the visitor's head in space in the first head image.

The system according to claim 26, wherein the degree of freedom is represented by three rotational dimensions of roll, pitch, and yaw.

The system according to claim 26, wherein the degree of freedom is represented by a three-dimensional rectangular coordinate system representing front and rear, left and right, and up and down.

The system according to claim 22, wherein the system is connected to the at least one image capturing module directly or through a network.

The system according to claim 22, wherein the image data is only captured by a single lens of the at least one image capturing module.

The system according to claim 22, wherein the image analysis application is a software or a module with both software and hardware for analyzing the image data.

The system according to claim 22, wherein the system is further connected to the at least one image capturing device through a cloud data storage unit, and the cloud data storage unit is configured to store the at least one image capturing device captured by the at least one image capturing device. video material.

The system according to claim 32, which further includes a cloud channel service application program as an intermediary service program for data transmission with the cloud data storage unit.

The system according to claim 32, wherein the system is a virtual machine located in the cloud data storage unit.

The system according to claim 22, wherein the first projection area is the same as the cross-sectional area represented by the first head image in space.

The system according to claim 22, wherein the first projection area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, which represents that when the position of the at least one target is covered by the sub-region, the at least one target Probability of being watched.

The system according to claim 36, wherein the probability corresponding to each of the plurality of sub-areas represents the degree of interest of the visitor to the at least one subject.

The system according to claim 22, wherein it is further calculated whether the length of time that the simulated first projection area covers the at least one target position exceeds a time threshold value, to determine whether the visitor has the interest in the at least one target The basis of the degree.

The system according to claim 22, wherein the position of the at least one target is obtained by referring to an input position data, or is calculated by analyzing all or part of the image of the at least one target in the image data.

The system according to claim 22, wherein words, numbers or symbols are added to the image data to indicate the identity (ID) or code of the visitor.

The system according to claim 22, wherein words, numbers, or symbols are added to the image data to describe the name or code of the at least one subject.

The system according to claim 22, wherein the image data further includes all or part of the image of the at least one item, and a visual effect is added to or around all or part of the image of the at least one item, It is used to highlight the visitor's degree of interest in at least one target.

According to claim 22, the system further includes: A content management module, which is connected to the data processing center and provides a stream of audiovisual data according to the analysis result of the data processing center, wherein the stream of audiovisual data further includes a content attribute; and A playback module is connected to the content management module and decodes the streaming video and audio data from the content management module, and transmits the decoded streaming video and audio data to a display device for playback.

The system according to claim 43, wherein the display device is used to receive the streaming video and audio data decoded by the playback module.

The system according to claim 43, wherein the display device includes a screen.

The system according to claim 43, wherein the playback module and the data processing center are integrated.

The system according to claim 43, wherein the content management module, the playback module, and the data processing center are integrated.

The system according to claim 45, wherein the image capturing module is integrated in the display device.

The system according to claim 43, wherein the content management module, the playback module, the image capturing module, and the data processing center are connected to each other through at least one network.

The system according to claim 43, wherein the data processing center analyzes the number and attributes of the visitors in the image data, and adjusts the content of the streaming audio-visual data according to the analysis result.

The system according to claim 43, wherein the data processing center determines whether the visitor is looking at the display device according to the first head direction, and adjusts the content of the streaming video and audio data according to the analysis result.

The system according to claim 43, wherein the first projection area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, which represents that when the position of the display device is covered by the sub-area, the playback of the display device Probability of content being watched.

The system according to claim 52, wherein the probability corresponding to each of the plurality of sub-regions represents the degree of interest of the visitor to the content played on the display device.

The system according to claim 43, wherein it is further calculated whether the length of time that the simulated first projection area covers the position of the display device exceeds a time threshold to determine whether the visitor has the content of the display device. The basis for the degree of interest.

The system according to claim 22, wherein the image capturing module captures the appearance of the visitor to generate the image data.

The method according to claim 22, wherein the virtual light source is located behind the position of the first head image in a three-dimensional space about 3 times the diameter of the simulated sphere of the first head image.

A method for analyzing the degree of interest of visitors. The method is executed by an image analysis server to determine the degree of interest of a visitor in at least one target. The method includes: Provide an image analysis application in the image analysis server; The image analysis server obtains an image data; The image analysis server detects a first head image with a first head feature in the image data; The image analysis server analyzes the first head image, and determines a first head direction corresponding to the first head image through a first feature mapping obtained by the analysis; The image analysis server calculates the position of the first head image in a three-dimensional space, and calculates a simulated first head image based on the position of the first head image, the first head direction, and a virtual light source. A projection area; and The image analysis server determines whether the visitor corresponding to the first head image has a degree of interest in the at least one object according to the coverage of the first projection area and the position of the at least one object.

The method according to claim 57, wherein the image analysis application is a software or a module with both software and hardware for analyzing the image data.

The method according to claim 57, wherein the method for the image analysis server to obtain the image data is that the image analysis server receives the image data transmitted by an image capture device through a network or directly.

The method according to claim 59, wherein the image data is only captured by a single lens of the image capturing device.

The method according to claim 57, wherein the image analysis server obtains the image data in a manner that the image analysis server downloads the image data from a cloud data storage unit through a network.

The method according to claim 61, wherein a cloud channel service application is further provided as an intermediary service program for data transmission between the image analysis server and the cloud data storage unit.

The method according to claim 57, wherein the image analysis server further includes: A data processing center for executing the image analysis application program can determine the first head image corresponding to the first head image based on the first feature mapping obtained from the first head image Head direction, and calculate the first projection area according to the first head direction; and A memory unit is used to temporarily store the image data, the first head image, the first feature mapping, and other related data required or produced by the data processing center during the operation.

The method according to claim 63, wherein the data processing center is composed of one or both of a central processing unit and a graphics processing unit.

The method according to claim 57, wherein the first feature mapping (feature mapping) is obtained by the image analysis server by analyzing the relationship between pixels in the first head image, which can describe the first head image. A set of data on the characteristics of a head image.

The method according to claim 57, wherein the first head direction refers to a degree of freedom of movement of the visitor's head in space in the first head image.

The method according to claim 66, wherein the degree of freedom is expressed in three rotation dimensions: roll, pitch, and yaw.

The method according to claim 66, wherein the degree of freedom is represented by a three-dimensional rectangular coordinate system representing front and rear, left and right, and up and down.

The method according to claim 57, wherein the first projection area and the first head image represent the same cross-sectional area in space.

The method according to claim 57, wherein the first projection area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, which represents that when the position of the at least one target is covered by the sub-region, the at least one target Probability of being watched.

The method according to claim 70, wherein the probability corresponding to each of the plurality of sub-areas represents the degree of interest of the visitor to the at least one subject.

The method according to claim 57, wherein it is further calculated whether the length of time that the first projection area covers the position of the at least one target exceeds a time threshold, and is used to determine whether the visitor has the degree of interest in the at least one target in accordance with.

The method according to claim 57, wherein the position of the at least one target is obtained by referring to an input position data, or is calculated by analyzing all or part of the image of the at least one target in the image data.

The method according to claim 57, wherein words, numbers or symbols are added to the image data to indicate the identity (ID) or code of the visitor.

The method according to claim 57, wherein words, numbers, or symbols are added to the image data to describe the name or code of the at least one subject.

The method according to claim 57, wherein the image data further includes all or part of the image of the at least one item, and a visual effect is added to or around all or part of the image of the at least one item for Highlight the visitor's degree of interest in at least one target.

The method according to claim 57, wherein the image analysis server is a virtual machine located in a cloud data storage unit.

The method according to claim 57, wherein the image analysis server is further connected to a storage unit, and the storage unit is composed of at least one physical storage device for providing storage space of the image analysis server.

The method according to claim 57, wherein the virtual light source is located behind the position of the first head image in a three-dimensional space about 3 times the diameter of the simulated sphere of the first head image.

The method according to claim 57, wherein the at least one target is a display device.

The method described in claim 80 further includes the following steps: When the image analysis server determines that the visitor has the degree of interest in the display device, it counts the number and attributes of the visitor; The image analysis server determines a stream of audiovisual data based on the statistical results of the number of visitors and their attributes, where the streamed audiovisual data further includes a content attribute; and The image analysis server notifies a content server and requests it to use the determined streaming video and audio data as the playback content of the display device.

The method according to claim 81, wherein the display device further includes a multimedia player program for receiving and decoding the streaming video and audio data.

The method according to claim 81, wherein the image analysis server further refers to the content attribute of the streaming video and audio data, and determines that the visitor has the degree of interest in the content attribute.

The method described in claim 81 further includes the following steps: The image analysis server determines whether the visitor is looking at the display device according to the first head direction, and uses the result of statistical analysis as a basis for adjusting the streaming audio and video data.

The method according to claim 81, wherein the first projection area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, which represents that when the position of the display device is covered by the sub-areas, the streaming audio-visual data Probability of being watched.

The method according to claim 85, wherein the probability corresponding to each of the plurality of sub-regions represents the degree of interest of the visitor to the content played on the display device.

The method described in claim 81 further includes the following steps: The image analysis server calculates whether the length of time that the first projection area covers the position of the display device exceeds a time threshold, as a basis for judging whether the visitor has the degree of interest in the playback content of the display device.

The method according to claim 80, wherein the attributes of the visitor are composed of part or all of the following combinations, including: gender, age, height, posture, accessories, clothing, and occupation.