TW201918934A

TW201918934A - Intelligent image information and big data analysis system and method using deep learning technology by integrating video monitoring and image deep learning technology to provide acial expression recognition information and motion recognition information

Info

Publication number: TW201918934A
Application number: TW106138811A
Authority: TW
Inventors: 林耿呈; 李柏漢; 高義和
Original assignee: 慧穩科技股份有限公司
Priority date: 2017-11-09
Filing date: 2017-11-09
Publication date: 2019-05-16
Also published as: TWI647626B

Abstract

The invention discloses an intelligent image information and big data analysis system and method using deep learning technology, which includes an image capturing device, an information processing device, and a cloud image recognition unit. The image capturing device is used to capture the image of an object, and the object image at least comprises a human image. The information processing device is signally connected with the image capturing device through the signal transmission module so as to collect the received human images. The cloud image identification unit is signally connected with the information processing device through the network transmission module so as to receive the human image. A deep learning algorithm module built in the cloud image identification unit is used to perform image recognition. The deep learning algorithm module can perform image recognition of facial expressions and motions on human images, so as to obtain at least one facial expression recognition information and motion recognition information, so that the identification cloud service of facial expression recognition and motion recognition can be provided by integrating the video monitoring and image deep learning technology.

Description

Smart image information and big data analysis system using deep learning technology

本發明係有關一種運用深度學習技術之智慧影像資訊及大數據分析系統及方法，尤指一種可以藉由整合視頻監控與影像深度學習技術而提供臉部表情及肢體動作識別之辨識雲服務的智慧影像大數據分析技術。 The invention relates to a smart image information and big data analysis system and method using deep learning technology, in particular to an intelligent cloud service capable of providing facial expression and limb motion recognition by integrating video monitoring and image deep learning technology. Image big data analysis technology.

近年來，隨著人工智慧技術的突破發展之下，使得生物辨識包括人臉偵測辨識、聲紋辨識、虹膜比對以及指紋辨識等技術的辨識成功率已經達到進入商業應用的程度。在生物辨識的技術領域中，尤其是人臉辨識技術，由於是屬於非接觸性辨識，所以受到相關技術領域業者的重視程度已是與日俱增。再者，依據專利檢索後得知，具備人臉辨識功能的代表性專利如下列所示： In recent years, with the breakthrough of artificial intelligence technology, the recognition success rate of biometrics including face detection recognition, voiceprint recognition, iris comparison and fingerprint identification has reached the level of commercial application. In the technical field of biometrics, especially the face recognition technology, since it belongs to non-contact identification, it has been increasing the importance of the relevant technical field. Furthermore, according to the patent search, the representative patents with face recognition function are as follows:

1.發明公開第201317903號『人臉辨識監控管理方法』專利，其揭示一種利用人臉辨識功能並配合文字標示以提供監控管理，用於解決習用技術之監控人員容易出現疲倦或視覺疲勞的缺失。 1. Invention Patent No. 201317903 "Face Recognition Monitoring Management Method" patent, which discloses a face recognition function and a text mark to provide monitoring management, and a monitoring tool for solving a conventional technique is prone to fatigue or visual fatigue. .

2.新型第M432892號『人臉偵測辨識裝置』專利，其揭示一種利用包括選擇模組、過濾模組、區塊模組以及標定模組，來達到簡化流程，以降低運算量之人臉辨識裝置。 2. The new patent No. M432892 "Face Detection and Identification Device" discloses a face that uses a selection module, a filter module, a block module and a calibration module to simplify the process and reduce the amount of calculation. Identification device.

3.發明公開第201220211號『高清影像式人臉辨識及監控系統』專利，其是由取像模組、人臉辨識模組、資料處理模組及搜尋模組所構成，以提供一種在超大場景可同時對多人進行辨識且具有人物搜尋機制及事件還原之監控系統。 3. The invention patent No. 201220211 "High-definition image-based face recognition and monitoring system" is composed of an image capturing module, a face recognition module, a data processing module and a search module to provide an oversized The scene can identify multiple people at the same time and has a person search mechanism and a monitoring system for event restoration.

由上述得知，雖然上述專利可以實現人臉的辨識功能；惟，實際在應用上，人臉辨識較容易受到光線、亮度、臉部表情變化等因素影響而增加影像辨識的困難度，為解決此一問題，以大幅提升影像辨識的成功率，現階段的相關技術領域業者係採用影像深度學習的方式來改善此一問題，至於採用影像深度學習的代表性專利如新型第M488698號『智慧型影像式顧客分析系統』以及新型第M443899號『主動辨識客層智慧型銷售點(POS)系統』等專利所示。其中，第M488698號專利可以計算出客層屬性提袋率分析資料，提供店家清楚了解當前經營成效，做為後續的營運模式與行銷策略；而第M443899號專利可依據所辨識之年齡層、性別，搜尋貨架商品中對應之適性廣告，即時播放並背景記錄時間戳記、辨識客層與廣告注目度等資訊，再透過傳輸管理將相關資料回傳，以作為後續分析之用。 It is known from the above that although the above patent can realize the recognition function of the face; however, in practical application, the face recognition is more susceptible to the influence of light, brightness, facial expression changes and the like, thereby increasing the difficulty of image recognition, This problem is to greatly improve the success rate of image recognition. At this stage, the relevant technical field is to use image deep learning to improve this problem. As for the representative patent using image deep learning, the new type M488698 "Smart" The image-based customer analysis system and the new M443899 "active identification of the customer-level intelligent point of sale (POS) system" and other patents are shown. Among them, the No. M488698 patent can calculate the customer layer property bagging rate analysis data, provide the store with a clear understanding of the current business results, as a follow-up business model and marketing strategy; and the M443899 patent can be based on the identified age layer, gender, Search for the corresponding suitable advertisements in the shelf products, instantly play and record the time stamps, identify the customer layer and the advertisement attention degree, and then return the relevant data through the transmission management for subsequent analysis.

雖然該等專利可以應用於一般商店而實現上述功效，然而，該等專利會因技術架構建置因素而無法適用於諸如餐廳、停車場、公司、居家以及門禁管制區域等場所，以致無法於上述場所做進一步影像辨識的有效利用，因而減損了商業的競爭力與價值性，因此，該等專利確實未臻完善，仍有再改善的必要性。 Although these patents can be applied to general stores to achieve the above-mentioned functions, these patents cannot be applied to places such as restaurants, parking lots, companies, homes, and access control areas due to technical building factors, so that they cannot be used in the above places. The effective use of further image recognition has detracted from the competitiveness and value of the business. Therefore, these patents have not been perfected and there is still a need for further improvement.

有鑑於此，尚未有一種整合智能視頻監控與影像深度學習技術而實現提供智慧城市影像辨識雲服務的專利或是論文被提出，而且基於相關產業的迫切需求之下，本發明創作人乃經不斷的努力研發之下，終於研發出一套有別於上述習知技術與專利的本發明。 In view of this, there is no patent or paper that integrates intelligent video surveillance and image deep learning technology to provide smart city image recognition cloud service, and based on the urgent needs of related industries, the creators of the present invention continue to Under the arduous efforts of research and development, a set of inventions different from the above-mentioned prior art and patents has finally been developed.

本發明第一目的，在於提供一種運用深度學習技術之智慧影像資訊及大數據分析系統及方法，主要是藉由整合視頻監控與影像深度學習技術而提供如餐廳、百貨商店、居家保全、停車場、影視廣告以及公司學校等區域的臉部表情識別及肢體動作識別的智慧城市影像辨識雲服務。達成本發明第一目的採用之技術手段，係包括影像擷取裝置、資訊處理裝置及雲端影像辨識單元。影像擷取裝置用以擷取物體的物體影像，物體影像至少包含有人影像。資訊處理裝置透過訊號傳輸模組與各影像擷取裝置訊號連結，以彙集接收人影像。雲端影像辨識單元透過網路傳輸模組與資訊處理裝置訊號連結，以接收人影像，雲端影像辨識單元內建有執行影像辨識的深度學習演算模組，深度學習演算模組可對人影像進行臉部表情及肢體動作的影像辨識，以得到至少一種臉部表情識別資訊及肢體動作識別資訊。 A first object of the present invention is to provide a smart image information and big data analysis system and method using deep learning technology, which mainly provides a restaurant, a department store, a home security, a parking lot, by integrating video monitoring and image deep learning technology. Smart city image recognition cloud service for facial expression recognition and body motion recognition in film and television advertisements and corporate schools. The technical means adopted for achieving the first object of the present invention include an image capturing device, an information processing device, and a cloud image recognition unit. The image capturing device is used to capture an image of an object, and the image of the object includes at least a human image. The information processing device is connected to each image capturing device signal through the signal transmission module to collect the recipient image. The cloud image recognition unit is connected to the information processing device through the network transmission module to receive the human image. The cloud image recognition unit has a deep learning calculation module for performing image recognition, and the deep learning calculation module can face the human image. Image recognition of facial expressions and body movements to obtain at least one facial expression recognition information and body motion recognition information.

本發明第二目的，在於提供一種可以避免直播影像播出血腥、暴力及色情等不當畫面而影響身心健康之智慧影像資訊的大數據分析系統。達成本發明第二目的採用之技術手段，係包括影像擷取裝置、資訊處理裝置及雲端影像辨識單元。影像擷取裝置用以擷取物體的物體影像，物體影像至少包含有人影像。資訊處理裝置透過訊號傳輸模組與各影像擷取裝置訊號連結，以彙集接收人影像。雲端影像辨識單元透過網路傳輸模組與資訊處理裝置訊號連結，以接收人影像，雲端影像辨識單元內建有執行影像辨識的深度學習演算模組，深度學習演算模組可對人影像進行臉部表情及肢體動作的影像辨識，以得到至少一種臉部表情識別資訊及肢體動作識別資訊。其中，該物體影像中包含有至少一直播影像，該雲端影像辨識單元透過該網路傳輸模組與該資訊處理裝置接收該直播影像，該深度學習演算模組可對該直播影像進行暴力、血腥及色情的影像辨識，當該深度學習演算模組偵測到該直播影像中具有暴力、血腥或是色情等其中一種影像內容時，則中斷該直播影像的播放；或是做馬賽克的影像處理。 A second object of the present invention is to provide a big data analysis system that can avoid intelligent video information that affects physical and mental health by improperly broadcasting live images, such as bleeding, violence, and pornography. The technical means for achieving the second object of the present invention include an image capturing device, an information processing device, and a cloud image recognition unit. The image capturing device is used to capture an image of an object, and the image of the object includes at least a human image. The information processing device is connected to each image capturing device signal through the signal transmission module to collect the recipient image. The cloud image recognition unit is connected to the information processing device through the network transmission module to receive the human image. The cloud image recognition unit has a deep learning calculation module for performing image recognition, and the deep learning calculation module can face the human image. Image recognition of facial expressions and body movements to obtain at least one facial expression recognition information and body motion recognition information. The image of the object includes at least one live image, and the cloud image recognition unit receives the live image through the network transmission module and the information processing device, and the deep learning algorithm module can perform violence and blood smash on the live image. And the image recognition of the erotic image, when the deep learning calculation module detects that the live image has one of the image content such as violence, bloodyness or pornography, the playback of the live image is interrupted; or the image processing of the mosaic is performed.

本發明第三目的，在於提供一種具備廣告商標辨識及播放時間與次數統計以供商業應用分析利用之智慧影像資訊的大數據分析系統。達成本發明第一目的採用之技術手段，係包括影像擷取裝置、資訊處理裝置及雲端影像辨識單元。影像擷取裝置用以擷取物體的物體影像，物體影像至少包含有人影像。資訊處理裝置透過訊號傳輸模組與各影像擷取裝置訊號連結，以彙集接收人影像。雲端影像辨識單元透過網路傳輸模組與資訊處理裝置訊號連結，以接收人影像，雲端影像辨識單元內建有執行影像辨識的深度學習演算模組，深度學習演算模組可對人影像進行臉部表情及肢體動作的影像辨識，以得到至少一種臉部表情識別資訊及肢體動作識別資訊。其中，該物體影像中包含有至少一影視影像，該雲端影像辨識單元更包含一內建有複數商標樣本影像的商標特徵資料庫，並於每一該商標樣本影像設定有一商標特徵資料及一與該商標特徵資料對應的商標識別資料，該雲端影像辨識單元透過該網路傳輸模組與該資訊處理裝置接收該影視影像，並將該影視影像之特徵擷取為包含至少一商標特徵影像，再於該商標特徵資料庫辨識出與該商標特徵影像之特徵符合的該商標特徵資料，並讀取特徵符合的該商標識別資料，再輸出相應的該商標識別資訊，使該深度學習演算模組可對該影視影像進行廣告商標之辨識，並對該影視影像中出現的廣告商標進行次數及播放時間的統計，進而輸出該廣告商標的次數及播放時間統計資訊。 A third object of the present invention is to provide a big data analysis system with intelligent image information for advertising trademark identification and playing time and frequency statistics for commercial application analysis. The technical means adopted for achieving the first object of the present invention include an image capturing device, an information processing device, and a cloud image recognition unit. The image capturing device is used to capture an image of an object, and the image of the object includes at least a human image. The information processing device is connected to each image capturing device signal through the signal transmission module to collect the recipient image. The cloud image recognition unit is connected to the information processing device through the network transmission module to receive the human image. The cloud image recognition unit has a deep learning calculation module for performing image recognition, and the deep learning calculation module can face the human image. Image recognition of facial expressions and body movements to obtain at least one facial expression recognition information and body motion recognition information. The image of the object includes at least one video image, and the cloud image recognition unit further comprises a trademark feature database with a plurality of trademark sample images, and a trademark feature data and a The trademark identification data corresponding to the trademark feature data, the cloud image recognition unit receives the movie image through the network transmission module and the information processing device, and extracts the feature of the movie image into at least one trademark feature image, and then The trademark feature database identifies the trademark feature data that matches the feature of the trademark feature image, and reads the trademark identification data that the feature meets, and then outputs the corresponding trademark identification information, so that the deep learning calculation module can The film and television image is identified by the advertisement trademark, and the number of times and the playing time of the advertisement trademark appearing in the film and television image are counted, and the number of times of the advertisement trademark and the statistical information of the playing time are output.

1‧‧‧場所 1‧‧‧ places

10‧‧‧影像擷取裝置 10‧‧‧Image capture device

20‧‧‧資訊處理裝置 20‧‧‧Information processing device

21‧‧‧訊號傳輸模組 21‧‧‧Signal transmission module

30‧‧‧雲端影像辨識單元 30‧‧‧Cloud Image Identification Unit

31‧‧‧網路傳輸模組 31‧‧‧Network Transmission Module

32‧‧‧深度學習演算模組 32‧‧‧Deep learning calculus module

320‧‧‧深度學習模型 320‧‧‧Deep learning model

33‧‧‧統計分析單元 33‧‧‧Statistical Analysis Unit

34‧‧‧影像特徵資料庫 34‧‧‧Image Feature Database

40‧‧‧電子看板 40‧‧‧Electronic board

41‧‧‧門禁管制設備 41‧‧‧Access control equipment

50‧‧‧人影像 50‧‧ ‧ people image

51‧‧‧臉部 51‧‧‧Face

60‧‧‧手肘 60‧‧‧ elbow

61‧‧‧膝蓋 61‧‧‧ knee

圖1係本發明具體架構的實施示意圖。 1 is a schematic diagram of the implementation of a specific architecture of the present invention.

圖2係本發明深度學習模型的訓練階段的實施示意圖。 2 is a schematic diagram of the implementation of the training phase of the deep learning model of the present invention.

圖3係本發明深度學習模型的運行預測階段的實施示意圖。 3 is a schematic diagram of the implementation of the operational prediction phase of the deep learning model of the present invention.

圖4係本發明自人影像中擷取臉部表情特徵的實施示意圖。 FIG. 4 is a schematic diagram of an implementation of extracting facial expression features from a human image in the present invention.

圖5係本發明自人影像中擷取肢體動作特徵的實施示意圖。 FIG. 5 is a schematic view showing the implementation of the limb movement feature extracted from the human image of the present invention.

圖6係本發明應用於餐廳的實施示意圖。 Fig. 6 is a schematic view showing the implementation of the present invention applied to a restaurant.

圖7係本發明應用於百貨商店的實施示意圖。 Fig. 7 is a schematic view showing the implementation of the present invention applied to a department store.

圖8係本發明應用於居家保全的實施示意圖。 Fig. 8 is a schematic view showing the implementation of the present invention applied to home security.

圖9係本發明應用於停車場的實施示意圖。 Figure 9 is a schematic view showing the implementation of the present invention applied to a parking lot.

為讓貴審查委員能進一步瞭解本發明整體的技術特徵與達成本發明目的之技術手段，玆以具體實施例並配合圖式加以詳細說明：請配合參看圖1~3所示，為達成本發明第一目的之具體實施例，係包括影像擷取裝置10、資訊處理裝置20及雲端影像辨識單元30等技術特徵。複數影像擷取裝置10分別設於至少一場所1，用以擷取該場所1之物體而產生至少一種物體影像，且物體影像至少包含有人影像。資訊處理裝置20透過一訊號傳輸模組21(如藍芽、USB、RS232等傳輸模組)與各影像擷取裝置10訊號連結，用以彙集接收人影像及其他物體影像。雲端影像辨識單元30(如雲端伺服器)係透過一網路傳輸模組31(如網路與數據機或是路由器的組合)與資訊處理裝置20訊號連結，用以接收人影像及其他物體影像，雲端影像辨識單元30內建有具備深度學習訓練功能以執行影像辨識的深度學習演算模組32，深度學習演算模組32可對人影像進行臉部表情及肢體動作的影像辨識，以得到至少一種可供後續商業利用的臉部表情識別資訊及肢體動作識別資訊。 In order to allow the reviewing committee to further understand the technical features of the present invention and the technical means for achieving the object of the present invention, it will be described in detail by way of specific embodiments and drawings: please refer to FIGS. 1 to 3 to achieve the present invention. The specific embodiments of the first object include technical features such as the image capturing device 10, the information processing device 20, and the cloud image recognition unit 30. The plurality of image capturing devices 10 are respectively disposed in at least one location 1 for capturing an object of the location 1 to generate at least one object image, and the object image includes at least a human image. The information processing device 20 is coupled to each of the image capturing devices 10 via a signal transmission module 21 (such as a Bluetooth, USB, RS232, etc. transmission module) for collecting recipient images and other object images. The cloud image recognition unit 30 (such as a cloud server) is connected to the information processing device 20 through a network transmission module 31 (such as a combination of a network and a data machine or a router) for receiving images of people and other objects. The cloud image recognition unit 30 has a deep learning algorithm module 32 with a deep learning function to perform image recognition. The deep learning algorithm module 32 can perform facial image recognition and body motion image recognition on the human image to obtain at least A facial expression recognition information and body motion recognition information for subsequent commercial use.

再請配合參看圖6~9所示的場所1係指餐廳、百貨公司、商店、住家、大樓、停車場、學校、公司或是門禁設備而言。 Please refer to the location 1 shown in Figure 6~9 for restaurant, department store, store, home, building, parking lot, school, company or access control equipment.

如圖1~4所示，上述雲端影像辨識單元30更包含一內建有複數人樣本影像的影像特徵資料庫34，該深度學習演算模組32將人樣本影像之特徵擷取為包含有一臉部表情特徵資料，並於每一臉部表情特徵資料設定有一對應的表情識別資料，當深度學習演算模組32接收到即時擷取的至少一張人影像50時，則於影像特徵資料庫34辨識出與該人樣本影像之特徵符合的臉部表情特徵資料，並讀取特徵符合的表情識別資料，再輸出相應的臉部表情識別資訊。 As shown in FIG. 1 to FIG. 4, the cloud image recognition unit 30 further includes an image feature database 34 having a plurality of sample images embedded therein. The depth learning algorithm module 32 extracts features of the human sample image to include a face. And the facial expression feature data is set, and a corresponding facial expression recognition data is set in each facial expression feature data. When the deep learning calculation module 32 receives the at least one human image 50 captured in the instant, the image feature database 34 is displayed. The facial expression feature data corresponding to the characteristics of the sample image of the person is identified, and the facial expression recognition data corresponding to the feature is read, and the corresponding facial expression recognition information is output.

具體的，上述臉部表情識別資訊可以是略為高興、高興、極為高興、略為悲傷、悲傷、極為悲傷、略為不悅、不悅或是極為不悅的其中至少二種表情。可以是喜、努、哀、樂的臉部表情，更可以是緊張、驚恐的臉部表情。 Specifically, the facial expression recognition information may be at least two expressions that are slightly happy, happy, extremely happy, slightly sad, sad, extremely sad, slightly unpleasant, unpleasant, or extremely unpleasant. It can be a facial expression of joy, nos, sorrow, and joy, but it can also be a nervous, frightening facial expression.

承上所述，上述實施例可應用在如餐廳或是百貨公司等場所1中，係將影像擷取裝置10裝設在電子看板40附近，以對觀看電子看板40的顧客偵測是否觀看電子看板40，據此以分析出何種廣告較能吸引何種消費族群，並進行包含人臉以及臉部表情的深度學習演算辨識，進而得到顧客的臉部51表情為何？例如微笑則表示對於提供商品或服務感到滿意，如圖4所示；相反的，臉部無表情、失望、不悅則表示對於提供商品或服務感到不滿意或是極為不滿意，於此，即可於電子看板40播放適合該顧客族群的廣告圖像或影片，以達到商品的精準行銷之目的。其中，更可以配合肢體動作識別資訊，例如點頭的肢體動作，表示對於提供商品或服務感到滿意；反之，搖頭的肢體動作，表示對於提供商品或服務感到不滿意或是極為不滿意。 As described above, the above embodiment can be applied to a place 1 such as a restaurant or a department store, and the image capturing device 10 is installed near the electronic board 40 to detect whether or not the electronic device is viewed by the customer who views the electronic board 40. Kanban 40, according to which analysis of which kind of advertising is more attractive to what kind of consumer group, and to carry out deep learning calculations including face and facial expressions, and then get the customer's face 51 expression? For example, a smile means satisfaction with the provision of goods or services, as shown in Figure 4; on the contrary, facial expressions, disappointment, and dissatisfaction indicate dissatisfaction or extreme dissatisfaction with the provision of goods or services, The advertisement image or movie suitable for the customer group can be played on the electronic board 40 to achieve the purpose of accurate marketing of the product. Among them, it is more suitable to recognize the movement information of the limbs, such as the nod movement of the limbs, indicating that it is satisfied with the provision of goods or services; on the contrary, the movement of the limbs of the head indicates that it is unsatisfied or extremely dissatisfied with the provision of goods or services.

具體而言，上述雲端影像辨識單元30更包含一內建有複數人樣本影像的影像特徵資料庫34(如資料伺服器；但不以此為限)，深度學習演算模組32將人樣本影像之特徵擷取為包含有一肢體動作特徵資料，並於每一肢體動作特徵資料設定有一對應的肢體動作識別資料，當深度學習演算模組32接收到即時擷取之連續數張的人影像時，則於影像特徵資料庫34辨識出與人樣本影像之特徵符合的肢體動作特徵資料，並讀取特徵符合的肢體動作識別資料，再輸出相應的肢體動作識別資訊。 Specifically, the cloud image recognition unit 30 further includes an image feature database 34 (such as a data server; but not limited to) a built-in human sample image, and the deep learning algorithm module 32 images the human sample. The feature capture includes a limb motion feature data, and a corresponding limb motion recognition data is set in each limb motion feature data. When the deep learning calculation module 32 receives a continuous number of consecutive human images, Then, the image feature database 34 identifies the limb motion feature data that matches the characteristics of the human sample image, and reads the limb motion recognition data that the feature meets, and then outputs the corresponding limb motion recognition information.

進一步而言，上述肢體動作識別資訊可以包括異常肢體動作識別資訊，例如攀爬之異常肢體動作識別資訊(在一預定時間內(例如五秒以內)，手肘部位高於其手臂向下伸直時原手肘部位的二分之一高度及膝蓋部位高於其大腿向下伸直時原膝蓋部位的三分之一高度，如圖5c所示，則判定為攀爬之異常肢體動作)、偷竊之異常肢體動作識別資訊(如將手伸入別人的口袋或是皮包內，即可判定為偷竊之異常肢體動作)、搶奪之異常肢體動作識別資訊(例如在一預定時間內(例如一分鐘內)，接近至少一他人五十公分之內的距離之後，快速做抬腿或舉手動作及在五秒之內遠離該他人至少五公尺的距離，即可判定為搶奪之異常肢體動作)、鬥毆之異常肢體動作識別資訊(如至少二人身體靠近五十公分之內，並於一預定時間內(例如二秒之內)彼此做抬腿或舉手等互毆的肢體動作，或手持武器(例如刀、棍)並做抬腿或舉手等肢體動作，即可判定為鬥毆之異常肢體動作)、暴力之異常肢體動作識別資訊(如一人的身體靠近另一他人五十公分之內，並於一預定時間內(例如二秒之內)做抬腿或舉手等毆打的肢體動作，或手持武器(例如刀、棍)並做抬腿或舉手等肢體動作，即可判定為暴力之異常肢體動作)、猥褻之異常肢體動作識別資訊 (如一人露出生殖器官或是該人對他人強制擁抱而他人做出反抗動作，即可判定為猥褻之異常肢體動作)、色情之異常肢體動作識別資訊(如至少一人做露出生殖器官或至少二人裸體做擁抱的動作，即可判定為色情之異常肢體動作)、怒罵之異常肢體動作識別資訊(如做出憤怒之表情及其嘴部於一預部時間內(例如一秒之內)快速張合及舉手等肢體動作，即可判定為怒罵之異常肢體動作)或是倒臥不起之異常肢體動作識別資訊(例如於地上身體倒臥過久超過一預定時間(例如二分鐘)，即可判定為倒臥不起之異常肢體動作)等異常的肢體動作識別資訊。 Further, the above-mentioned limb motion recognition information may include abnormal limb motion recognition information, such as an abnormal limb motion recognition information of climbing (for a predetermined time (for example, within five seconds), the elbow portion is higher than the arm thereof. When the height of one half of the original elbow and the knee is higher than the height of one third of the original knee when the thigh is extended downward, as shown in Fig. 5c, it is determined that the abnormal limb movement of the climb, Abnormal physical movement recognition information (such as abnormal hand movements that are determined to be stolen by inserting your hand into someone's pocket or purse), and abnormal physical movement recognition information of the robbing (for example, within a predetermined time (for example, one minute) Inside), after approaching at least one other person's distance of fifty centimeters, quickly lift the leg or raise the hand and move away from the other person by at least five meters within five seconds to determine the abnormal limb movement for robbing) Abnormal physical movement recognition information of the fight (such as at least two people close to fifty centimeters, and within a predetermined time (for example, within two seconds) to raise their legs or raise their hands, etc. , or holding a weapon (such as a knife, a stick) and doing a limb movement such as lifting a leg or raising a hand, can be judged as an abnormal limb movement of the fight, and an abnormal physical movement recognition information of violence (such as one person's body is close to another person fifty centimeters) Within a predetermined time (for example, within two seconds), doing a limb movement such as raising a leg or raising a hand, or holding a weapon (such as a knife or a stick) and doing a limb movement such as lifting a leg or raising a hand, An abnormal physical movement recognition that is determined to be violent, and an abnormal physical movement recognition information (such as a person showing a genital organ or a person who is compulsively hugged by others and a rebellious action by another person, can be judged as an abnormal physical movement of the sputum), pornography Abnormal physical movement recognition information (such as at least one person doing a genital or at least two naked to do a hug, you can determine the abnormal physical movements of erotic), roaring abnormal physical movement identification information (such as making angry expressions and The mouth is in a pre-part time (for example, within one second) to quickly open and raise the hand and other physical movements, which can be determined as an abnormal limb movement of the roar or an abnormal limb that cannot be recumbent Information for identification (e.g., lying on the floor of the body for too long more than a predetermined time (e.g. two minutes), to afford the abnormality is determined lying body movements), and other identifying information abnormal body movements.

承上所述，上述實施例可以應用在如餐廳、百貨公司或是居家大樓等場所1中，係將影像擷取裝置10裝設在餐廳、百貨公司內或是居家大樓牆面的附近，以監控顧客是否做出如上述的攀爬、偷竊、搶奪、鬥毆、暴力、猥褻、怒罵或是倒臥不起等肢體動作；或是監控居家大樓是否有宵小攀爬而攀越牆面侵入之肢體動作，經深度學習演算辨識後發現有上述異常肢體動作時，則發出警示訊號，並做出緊急因應的相關報警處理。 As described above, the above embodiment can be applied to a place 1 such as a restaurant, a department store, or a home building, and the image capturing device 10 is installed in a restaurant, a department store, or a wall of a home building. Monitor whether the customer has made physical movements such as climbing, stealing, snatching, fighting, violence, swearing, roaring or lying down; or monitoring whether the home building has a small climb and climbs over the wall. The limb movement, after the deep learning calculation and identification, finds that the above abnormal limb movement, it sends a warning signal, and makes an emergency response related alarm processing.

如圖5a~c所示，依序顯示有宵小攀爬而攀越牆面侵入、匪徒握持刀具欲傷人以及老者倒臥不起等肢體動作畫面。 As shown in Fig. 5a~c, in sequence, there are small movements that climb over the wall, the gangsters hold the knives to hurt people, and the old man can't afford to move.

請配合參看圖1~3所示，為達成本發明第二目的之具體實施例，係包括影像擷取裝置10、資訊處理裝置20及雲端影像辨識單元30等技術特徵。複數影像擷取裝置10分別設於至少一場所1，用以擷取該場所1之物體而產生至少一種物體影像，且此物體影像至少包含有至少一人影像。資訊處理裝置20透過一訊號傳輸模組21(如藍芽、USB、RS232,網路、類比通訊方式等傳輸模組)與各影像擷取裝置10訊號連結，用以彙集接收人影像。雲端影像辨識單元30(如雲端伺服器)，其透過一網路傳輸模組 31(如網路與數據機或是路由器的組合,採HLS或RTMP通訊方式)與資訊處理裝置20訊號連結，用以接收人影像，雲端影像辨識單元30內建有具備深度學習訓練功能以執行影像辨識的深度學習演算模組32，深度學習演算模組32可對人影像進行臉部表情、年紀、性別、情緒、穿著及肢體動作的影像辨識，以得到至少一種可供後續商業利用的臉部表情識別資訊及肢體動作識別資訊。其中，上述物體影像中包含有至少一直播影像(如從電子看板、手機或是電腦所播放的直播視頻影像)，雲端影像辨識單元30透過網路傳輸模組31與資訊處理裝置20接收直播影像，深度學習演算模組32可對直播影像進行暴力、血腥及色情的影像辨識，當深度學習演算模組32偵測到直播影像中具有暴力、血腥或是色情等其中一種影像內容時，則中斷直播影像的播放(如透過硬體控制方式關閉電視、電腦；或是電子看板)；或是做馬賽克的影像處理。 Referring to FIG. 1 to FIG. 3, the specific embodiments for achieving the second object of the present invention include technical features such as the image capturing device 10, the information processing device 20, and the cloud image recognition unit 30. The plurality of image capturing devices 10 are respectively disposed in at least one location 1 for capturing an object of the location 1 to generate at least one object image, and the object image includes at least one human image. The information processing device 20 is connected to the image capturing device 10 via a signal transmission module 21 (such as a Bluetooth, USB, RS232, network, analog communication device, etc.) to collect the recipient image. The cloud image recognition unit 30 (such as a cloud server) is connected to the information processing device 20 through a network transmission module 31 (such as a combination of a network and a data machine or a router, and adopts an HLS or RTMP communication method). In the cloud image recognition unit 30, the cloud image recognition unit 30 has a deep learning algorithm module 32 with a deep learning function to perform image recognition. The depth learning algorithm module 32 can perform facial expression, age, gender, and emotion on the human image. Image recognition of wearing, and body movements to obtain at least one facial expression recognition information and body motion recognition information for subsequent commercial use. The image of the object includes at least one live image (such as a live video image played from an electronic billboard, a mobile phone, or a computer), and the cloud image recognition unit 30 receives the live image through the network transmission module 31 and the information processing device 20. The deep learning calculus module 32 can perform violent, bloody and erotic image recognition on the live image. When the deep learning calculus module 32 detects that the live image has one of the violent, bloody or pornographic images, the interruption is interrupted. Play live video (such as turning off the TV or computer through hardware control; or electronic signage); or do mosaic image processing.

請配合參看圖1~3所示，為達成本發明第三目的之具體實施例，係包括影像擷取裝置10、資訊處理裝置20及雲端影像辨識單元30等技術特徵。複數影像擷取裝置10分別設於至少一場所1，用以擷取該場所1之物體而產生至少一種物體影像，且此物體影像至少包含有至少一人影像。資訊處理裝置20透過一訊號傳輸模組21(如藍芽、USB、RS232等傳輸模組)與各影像擷取裝置10訊號連結，用以彙集接收人影像。雲端影像辨識單元30(如雲端伺服器)，其透過一網路傳輸模組31(如網路與數據機或是路由器的組合)與資訊處理裝置20訊號連結，用以接收人影像，雲端影像辨識單元30內建有具備深度學習訓練功能以執行影像辨識的深度學習演算模組32，深度學習演算模組32可對人影像進行臉部表情及肢體動作的影像辨識，以得到至少一種可供後續商業利用的臉部表情識別資訊及肢體動作識別資訊。其中，上述物體影像中包含有至少一影視影像(如從電子看板或是電腦所播放的視頻影像)，雲端影像辨識單元30更包含一內建有複數商標樣本影像的影像特徵資料庫34，並於每一商標樣本影像設定有一商標特徵資料及一與該商標特徵資料對應的商標識別資料，雲端影像辨識單元30透過網路傳輸模組31與資訊處理裝置20接收影視影像，並將影視影像之特徵擷取為包含至少一商標特徵影像，再於影像特徵資料庫34辨識出與該商標特徵影像之特徵符合的商標特徵資料，並讀取特徵符合的商標識別資料，再輸出相應的商標識別資訊，使深度學習演算模組32可對影視影像進行廣告商標之辨識，並對影視影像中出現的廣告商標進行次數及播放時間的統計，進而輸出廣告商標的次數及播放時間統計資訊。 Referring to FIG. 1 to FIG. 3, the specific embodiments for achieving the third object of the present invention include technical features such as the image capturing device 10, the information processing device 20, and the cloud image recognition unit 30. The plurality of image capturing devices 10 are respectively disposed in at least one location 1 for capturing an object of the location 1 to generate at least one object image, and the object image includes at least one human image. The information processing device 20 is coupled to each of the image capturing devices 10 via a signal transmission module 21 (such as a Bluetooth, USB, RS232, etc. transmission module) for collecting the recipient images. The cloud image recognition unit 30 (such as a cloud server) is connected to the information processing device 20 through a network transmission module 31 (such as a combination of a network and a data machine or a router) for receiving human images and cloud images. The recognition unit 30 has a deep learning algorithm module 32 with a deep learning function to perform image recognition. The deep learning algorithm module 32 can perform facial image recognition and body motion image recognition on a human image to obtain at least one available. Subsequent commercial use of facial expression recognition information and body motion recognition information. The image of the object includes at least one video image (such as a video image played by an electronic billboard or a computer), and the cloud image recognition unit 30 further includes an image feature database 34 having a plurality of trademark sample images, and A trademark feature data and a trademark identification data corresponding to the trademark feature data are set in each of the trademark sample images, and the cloud image recognition unit 30 receives the video image through the network transmission module 31 and the information processing device 20, and the video image is The feature capture includes at least one trademark feature image, and the image feature database 34 identifies the trademark feature data that matches the feature of the trademark feature image, and reads the trademark identification data that the feature meets, and then outputs the corresponding trademark identification information. The deep learning calculus module 32 can identify the advertisement trademark of the film and television image, and count the number of times and the playing time of the advertisement trademark appearing in the film and television image, and then output the number of times of the advertisement trademark and the statistical information of the playing time.

除此之外，請參看圖1所示的實施例，本發明更包含一統計分析單元33，此統計分析單元33可以是一種資料處理伺服器；或是內建於雲端影像辨識單元30的軟體模組，用以接收、統計及分析經深度學習演算模組32所做的其他影像識別資訊，此影像識別資訊包括如人流識別、車流識別、顧客識別、人臉識別、性別識別、年紀識別、餐點識別、車牌識別、車款識別、肢體動作行為識別、停車場狀態識別、商品識別、直播影像識別、影視廣告識別、人物識別、商標識別；或是通勤等識別資訊。具體的，上述統計分析單元33可以針對上述影像識別資訊做出一種統計/分析資訊，此統計/分析資訊可以是一種人流統計、性別統計、年紀統計、熱點分析、客層分析、來客統計、顧客動線分析、電子看板推播分析、車流統計以及停車場空間管理分析等其中的一種資訊。 In addition, referring to the embodiment shown in FIG. 1, the present invention further includes a statistical analysis unit 33, which may be a data processing server or a software built in the cloud image recognition unit 30. The module is configured to receive, count, and analyze other image recognition information performed by the deep learning calculus module 32, such as person flow identification, traffic identification, customer identification, face recognition, gender recognition, age recognition, Meal identification, license plate recognition, vehicle identification, limb movement behavior recognition, parking status recognition, product identification, live image recognition, film and television advertisement recognition, character recognition, trademark recognition; or identification information such as commuting. Specifically, the statistical analysis unit 33 may perform a statistical/analysis information for the image identification information, and the statistical/analysis information may be a human flow statistics, gender statistics, age statistics, hotspot analysis, customer layer analysis, visitor statistics, and customer dynamics. One of the information such as line analysis, electronic billboard analysis, traffic statistics, and parking space management analysis.

更具體的，本發明採用之深度學習演算模組32(即深度學習演算技術)可以是一種卷積類神經網路(CNN)演算法、專家系統演算法或是隨機森林演算法等諸多的人工智慧演算法。此深度學習演算模組32執行時則包含下列步驟： More specifically, the deep learning calculus module 32 (ie, the deep learning algorithm) used in the present invention may be a convolutional neural network (CNN) algorithm, an expert system algorithm, or a random forest algorithm. Wisdom algorithm. When the deep learning calculus module 32 executes, the following steps are included:

(a)訓練階段步驟，如圖2所示，係先建立有至少一深度學習模型320，並於深度學習模型320輸入巨量的物體樣本影像及影像辨識參數(如供辨識或比對的特徵資料)，並由深度學習模型320測試影像辨識的正確率，再判斷影像辨識正確率是否足夠，當判斷結果為是，則將辨識結果輸出及儲存；當判斷結果為否，則使深度學習模型320藉由調整影像辨識參數或其他方式而實現自我修正學習。 (a) The training phase step, as shown in FIG. 2, is to first establish at least one deep learning model 320, and input a huge amount of object sample images and image recognition parameters (such as features for identification or comparison) in the deep learning model 320. Data), and the depth learning model 320 tests the correct rate of image recognition, and then determines whether the image recognition correct rate is sufficient. When the judgment result is yes, the identification result is output and stored; when the judgment result is no, the depth learning model is made. 320 implements self-correction learning by adjusting image recognition parameters or other means.

(b)運行預測階段步驟，如圖3所示，係於深度學習模型320輸入即時擷取之物體影像，並由深度學習模型320進行預測性影像辨識，以得到至少一個上述臉部表情識別資訊及肢體動作識別資訊。 (b) running the prediction phase step, as shown in FIG. 3, inputting the image of the object captured in the deep learning model 320, and performing predictive image recognition by the deep learning model 320 to obtain at least one facial expression recognition information. And body movement identification information.

具體而言，深度學習模型320包含複數不同屬性的分類器(如人型輪廓、人臉輪廓、髮型輪廓、表情輪廓、餐點輪廓、車輛輪廓、車牌輪廓、車款輪廓及商標輪廓等屬性)，而且深度學習演算模組32執行時更包含一特徵影像擷取步驟，係將樣本視覺影像之局部特徵予以擷取為上述物體樣本影像，再將物體樣本影像像輸入至深度學習模型320中，再由相同屬性的分類器進行比對，以將比對結果輸出至深度學習模型320，以作為輸出識別資訊的判斷依據。 Specifically, the deep learning model 320 includes a plurality of classifiers of different attributes (such as a human profile, a face profile, a hair profile, an expression profile, a meal profile, a vehicle profile, a license plate profile, a car profile, and a trademark profile). And the deep learning calculus module 32 further includes a feature image capturing step, wherein the local feature of the sample visual image is captured as the object sample image, and the object sample image image is input into the deep learning model 320, The comparison is performed by a classifier of the same attribute to output the comparison result to the depth learning model 320 as a basis for judging the output identification information.

請配合參看圖1、6所示為本發明應用於餐廳場所1的應用實施例，主要是對顧客族群、喜好、忠誠度等，透過實際的影像辨識分析，協助業者擬定適當商業對策，並提升內外場服務的品質及暖度。具體的服務內容包括下列所述： Please refer to FIG. 1 and FIG. 6 for the application example of the invention applied to the restaurant place 1, mainly for the customer group, preferences, loyalty, etc., through the actual image identification analysis, assisting the industry to formulate appropriate commercial countermeasures and improve The quality and warmth of the internal and external service. The specific services include the following:

1.來客統計服務，係將其一影像擷取裝置10裝設在餐廳出入口附近，以對進出該出入口的顧客進行人流的視覺影像擷取，並對人流的人影像進行人型輪廓的影像辨識，進而得到人流識別資訊，再由統計分析單元33統計進出的顧客人數，以得到人流統計資訊，於此，即可提供餐飲業者分析經營趨勢及員工排班優化等後續處理的依據。 1. Visitor statistics service, which installs an image capturing device 10 near the entrance and exit of the restaurant to capture the visual image of the flow of people entering and leaving the entrance and exit, and image recognition of the human silhouette of the human flow image. Then, the flow identification information is obtained, and the number of customers entering and exiting is counted by the statistical analysis unit 33 to obtain the flow statistics information. Here, the basis for the subsequent processing of the catering business analysis business trend and employee scheduling optimization can be provided.

2.電子看板推播服務，係將其二影像擷取裝置10裝設在電子看板40附近，以對觀看電子看板40的顧客進行包含人臉、性別及年紀輪廓的深度學習演算辨識，進而得到顧客族群的識別資訊，並由統計分析單元33分析出即時觀看顧客的顧客族群分析資訊，於此，即可於電子看板40播放適合該顧客族群的菜單圖像或影片，以提升餐廳的服務暖度。 2. The electronic kanban push service is installed in the vicinity of the electronic kanban 40 to display the deep learning calculations including the face, gender and age profile of the customer who views the electronic kanban 40, thereby obtaining The identification information of the customer group is analyzed by the statistical analysis unit 33 to analyze the customer group analysis information of the instant viewing customer. Here, the menu image or movie suitable for the customer group can be played on the electronic board 40 to enhance the service warmth of the restaurant. degree.

3.客層分析服務，係依據性別統計、年紀統計及人流統計等資訊，以分析出各同溫層顧客分眾客群，而達到精準行銷之目的。 3. Customer-level analysis service is based on gender statistics, age statistics and flow statistics to analyze the customers of each stratosphere segment and achieve precise marketing.

4.餐點辨別，係將其三影像擷取裝置10裝設在餐廳天花板上，以對桌上的餐點進行物體影像擷取，並對餐點的物體影像進行餐點輪廓的深度學習演算辨識，進而得到餐點識別資訊，再由統計分析單元33統計各餐點冷熱門程度，以建議餐飲業者對熱門餐點進行促銷，而對冷門餐點進行改良或替換。 4. Meal identification, the three image capturing device 10 is installed on the ceiling of the restaurant, to capture the object image on the table, and to carry out the depth learning calculation of the meal outline for the object image of the meal. Identification, and then get the meal identification information, and then the statistical analysis unit 33 counts the cold popularity of each meal to suggest that the restaurant operator to promote the popular meal, and to improve or replace the cold meal.

5.顧客辨別，係透過上述深度學習演算辨識而取得人臉識別資訊，再對鎖定之人臉識別資訊的顧客進行消費行為辨識，以判定該顧客為初次消費、再次消費、熟客、會員或是黑名單，於此，即可藉由主動辨識顧客群而提升服務品質。 5. Customer identification, which obtains face recognition information through the above-mentioned deep learning calculation and identification, and then identifies the consumer behavior of the locked face recognition information to determine whether the customer is the first consumption, re-consumption, regular customer, member or The blacklist, hereby, can improve the quality of service by proactively identifying the customer base.

請配合參看圖1、5所示為本發明應用於零售百貨場所1的應用實施例，主要是以智慧影像做出熱點分析及動線統計，讓商家瞭解不同族群消費者的思維，訂定相對熱門商品更易曝光的佈置。具體的服務內容包括下列所述： Please refer to FIG. 1 and FIG. 5 for the application example of the present invention applied to the retail department store place 1. The hotspot analysis and the dynamic line statistics are mainly used to make the merchant understand the thinking of different ethnic consumers and set the relative Popular items are more easily exposed. The specific services include the following:

1.顧客辨別，係將其一影像擷取裝置10裝設在零售百貨場所1內，以擷取顧客人臉的人影像，並將人影像透過上述深度學習演算辨識而取得人臉識別資訊，再對鎖定之人臉識別資訊的顧客進行消費行為辨識，以判定該顧客為初次消費、再次消費、熟客、會員或是黑名單，於此，即可藉由主動辨識顧客群而提升服務品質。 1. The customer discriminates that an image capturing device 10 is installed in the retail department store 1 to capture the image of the person's face and obtain the face recognition information through the above-mentioned deep learning calculation and identification. The customer who has the locked face recognition information is identified by the consumer behavior to determine whether the customer is the first time consumer, re-consumer, regular customer, member or blacklist. Here, the customer quality can be improved by actively identifying the customer group.

2.電子看板推播服務，係將其二影像擷取裝置10裝設在電子看板40附近，以對觀看電子看板40的顧客偵測是否觀看電子看板40，分析何種廣告較吸引何種消費族群，並進行包含人臉、性別、表情(如微笑則表示服務滿意)及年紀輪廓的深度學習演算辨識，進而得到顧客族群的識別資訊，並由統計分析單元33分析出即時觀看顧客的顧客族群分析資訊，於此，即可於電子看板40播放適合該顧客族群的廣告圖像或影片，以達到商品的精準行銷。 2. The electronic kanban push service is installed in the vicinity of the electronic kanban 40 to detect whether the electronic kanban 40 is viewed by the customer who views the electronic kanban 40, and analyzes which advertisement is more attractive for consumption. The ethnic group, and the deep learning calculation identification including the face, the gender, the expression (such as the smile indicates the service satisfaction) and the age profile, and then the identification information of the customer group is obtained, and the statistical analysis unit 33 analyzes the customer group of the instant viewing customer. By analyzing the information, an advertisement image or a movie suitable for the customer group can be played on the electronic billboard 40 to achieve accurate marketing of the product.

3.客層分析服務，係依據性別統計、年紀統計及人流統計等資訊，除了統計顧客年齡層以調整販賣商品樣式之外，並可分析出各同溫層顧客分眾客群，並統計顧客來訪率及熟客率，而達到精準行銷及掌握顧客忠誠度之目的。 3. Customer-level analysis service is based on gender statistics, age statistics and flow statistics. In addition to counting the age of customers to adjust the style of selling goods, we can analyze the customers of different stratosphere customers and count the customer visit rate. And the customer rate, and achieve the purpose of accurate marketing and customer loyalty.

4.熱點分析，係將其三影像擷取裝置10裝設在零售百貨場所1的天花板上，以對櫃上的商品進行物體影像擷取，並對商品的物體影像進行商品輪廓的深度學習演算辨識，進而得到商品識別資訊，再由統計分析單元33統計各商品冷門/熱門停留區，以建議零售百貨業者改善商品陳列方式，使走道效能優化。 4. Hot spot analysis, the three image capturing device 10 is installed on the ceiling of the retail department store 1 to capture the object image on the product on the cabinet, and perform the deep learning calculation of the product outline on the object image of the product. After identification, the product identification information is obtained, and the statistical analysis unit 33 counts the unpopular/popular stay areas of each commodity to suggest that the retail department store improve the merchandise display mode and optimize the aisle performance.

5.來客統計服務，係將其四影像擷取裝置10裝設在零售百貨場所1出入口附近，以對進出該出入口的顧客進行人流的人影像擷取，並對人影像進行人型輪廓的影像辨識，進而得到人流識別資訊，再由統計分析單元33統計進出的顧客人數，以得到人流統計資訊，於此，即可掌握門市離尖峰時段，以提供服務人員排班優化等後續處理的依據。 5. The visitor statistics service is to install the four image capturing device 10 in the vicinity of the entrance and exit of the retail department store, to capture the image of the person who enters and exits the entrance and exit, and perform the image of the human silhouette on the human image. Identification, and then get the flow identification information, and then the statistical analysis unit 33 counts the number of customers entering and exiting to obtain the flow statistics information. Here, the store can be grasped from the peak time to provide the basis for subsequent processing such as service staff scheduling optimization.

6.顧客動線，依據人流統計資訊而獲得動線資訊，並依據此動線資訊調整動線，以優化商品陳列方式。 6. The customer moves the line, obtains the moving line information according to the flow statistics, and adjusts the moving line according to the moving line information to optimize the product display mode.

請配合參看圖1、7所示為本發明應用於居家保全場所1的應用實施例，主要是藉由人臉及車牌辨識來協助保全掌握社區狀況，透過肢體動作行為辨識，防止可疑宵小進入社區或就近關懷照護行動不便的老人，達成打造智慧社區的功效置。具體的服務內容包括下列所述：人臉辨別/偵測，係將其一影像擷取裝置10裝設在居家的門禁管制設備41附近，以擷取進出人員的人影像，並透過特徵擷取技術擷取出人臉影像，再透過上述深度學習演算辨識而取得人臉識別資訊，以判斷該人員是否為居家的住戶成員，是則開啟門禁管制設備41，並且保存人員的人臉影像及記錄出入時間，於此可有效嚇阻不肖外來者的犯罪心理。 Please refer to FIG. 1 and FIG. 7 for the application example of the present invention applied to the home security site 1, mainly by using face and license plate recognition to help preserve the community condition, and to prevent suspicious small entry into the community through the identification of physical action behaviors. Or close care to care for the elderly who are inconvenient to move, to achieve the effect of building a smart community. The specific service content includes the following: face recognition/detection, and an image capturing device 10 is installed near the home access control device 41 to capture the image of the person entering and leaving the person, and capture the feature through the feature. The technology extracts the face image, and then obtains the face recognition information through the above-mentioned deep learning calculation and identification to determine whether the person is a household member at home, and then opens the access control device 41, and saves the face image and records of the person. Time, this can effectively deter the criminal psychology of unscrupulous outsiders.

2.車牌辨識，係將其二影像擷取裝置10裝設在居家停車場的門禁管制設備41附近，以擷取進出車輛的物體影像，並透過特徵擷取技術擷取出車牌影像，再透過上述深度學習演算辨識而取得車牌識別資訊，以判斷該車輛是否為居家的住戶所有，是，則開啟門禁管制設備41。 2. License plate recognition, the second image capturing device 10 is installed in the vicinity of the access control device 41 of the home parking lot to capture the image of the object entering and leaving the vehicle, and through the feature extraction technology, the license plate image is taken out and then passed through the above depth. The license recognition information is obtained by learning the calculation to determine whether the vehicle is owned by the household at home, and the access control device 41 is turned on.

3.肢體動作行為偵測，係將其三影像擷取裝置10裝設在居家內，以連續擷取數張住戶成員之肢體動作行為的人影像，並將連續擷取數張之肢體動作人影像透過上述深度學習演算辨識而取得肢體動作行為識別資訊，以判斷該成員是否發生異常狀況，是，則警示緊急連絡人進行因應之處理。 3. Limb action detection, the three image capture device 10 is installed in the home, in order to continuously capture the image of several members of the members of the body movement behavior, and will continue to take several pieces of body movements The image obtains the body motion behavior recognition information through the above-mentioned deep learning calculation and identification to determine whether the member has an abnormal condition, and then alerts the emergency contact person to perform the corresponding treatment.

請配合參看圖1、8所示為本發明應用於公共區域之停車場場所1的應用實施例，主要是透過車款辨識，讓停車場可以智慧化引導不同大小的車型，前往適合的停車位，有效規劃停車空間。 Please refer to FIG. 1 and FIG. 8 for an application example of the parking lot site 1 applied to the public area of the present invention, mainly through the identification of the vehicle, so that the parking lot can intelligently guide different sizes of vehicles to the appropriate parking space, effectively Plan your parking space.

另，除上述場所1的應用實施例之外，本發明亦可應用於公司、學校等場所1，主要是藉由人臉及車牌辨識以判斷該人員、車輛是否為該公司、學校的成員或該成員所有，是，則開啟門禁管制設備41，況，並可透過電子圍籬機制，以偵測是否有人員入侵、逗留等，如有異常肢體動作行為即時通報警衛巡查，不僅如此，還可透過利用人臉及車牌辨識來取代傳統通勤卡鐘及感應標籤，以見避免因遺失通勤卡片及感應標籤所致的不便與困擾。 In addition, in addition to the application embodiment of the above-mentioned site 1, the present invention can also be applied to a company, a school, and the like, mainly by using face and license plate recognition to determine whether the person or the vehicle is a member of the company, the school, or The member is all, yes, the access control device 41 is turned on, and the electronic fence mechanism can be used to detect whether there is any person intrusion, stay, etc., if there is abnormal limb movement behavior, the instant alarm guard inspection, not only that, but also Replacing traditional commuter clocks and sensor tags with face and license plate recognition to avoid inconvenience and trouble caused by lost commuter cards and sensor tags.

又，除上述場所1的應用實施例之外，本發明亦可應用於影視廣告的不當畫面過濾的應用實施中，主要是經由智慧辨識，對民眾進行商品、廣告自動推薦或喜愛人物的節目推播，對兒童不宜畫面，亦可做自動分級或進行畫面處置。具體的服務內容包括下列所述： In addition, in addition to the application embodiment of the above-mentioned site 1, the present invention can also be applied to the application implementation of the inappropriate image filtering of the movie and television advertisement, mainly through intelligent identification, and the product is automatically recommended by the public, and the program of the favorite character is pushed. Broadcasting, not suitable for children, can also be automatically graded or processed. The specific services include the following:

1.影像辨識，將影像擷取裝置10裝設可以擷取電子看板、電視或是電腦螢幕之影像畫面的位置，於是即可自動偵測含有暴力、血腥、情色的影像，依照後台設定(自動將畫面上霧或馬賽克處理，以避免兒童觀看。 1. Image recognition, the image capturing device 10 can be installed to capture the position of the image frame of the electronic board, the television or the computer screen, so that the image containing violence, bloodyness and erotic color can be automatically detected, according to the background setting (automatic Apply fog or mosaic to the screen to avoid children watching.

2.商品辨識，藉由電子看板40播放的商品，利用電子看板40上影像擷取裝置10偵測觀看人員的表情及行為反應，進而辨識是否為感興趣的商品之一，以利達到精準的行銷；或是偵測觀看人員的性別及年紀，轉而播放適合該人員的廣告影像。 2. Product identification, using the electronic display board 40 to capture the product, using the image capture device 10 on the electronic board 40 to detect the viewer's expression and behavioral response, thereby identifying whether it is one of the goods of interest, in order to achieve accurate Marketing; or detecting the gender and age of the viewer, and then playing an advertisement image suitable for the person.

3.人物辨識，從日常觀看的節目裡辨識影視明星人物，藉由觀看率而分派推薦相關影視新聞內容。 3. Character recognition, identify the movie star characters from the daily viewing program, and distribute the recommended related film and television news content by the viewing rate.

4.商標辨識，辨識廣告內的商品商標，藉由點擊率，依照數據分析運用廣告派播，則之後觀看的內容會收到相關的商品廣告。 4. Trademark identification, identify the trademark of the product in the advertisement, use the click rate, use the advertisement to distribute according to the data analysis, and then the content to be viewed will receive the relevant product advertisement.

再者，卷積神經網路從影像擷取裝置獲得視覺影像後，經過影像預處理、特徵擷取、特徵選擇及特徵資料輸入，再到推理以及做出預測性辨識。另一方面，卷積神經網路的深度學習實質，是通過構建具有多個隱層的機器學習模型及海量訓練數據，來達到學習更有用的特徵，從而最終提升分類或預測的準確性。卷積神經網路利用海量訓練數據來學習特徵辨識，於此方能刻畫出數據的豐富內在訊息。由於卷積神經網路為一種權值共享的網路結構，所以除了可以降低網路模型的複雜度之外，並可減少權值的數量。此優點在網路的輸入是多維圖像時表現的更為明顯，使圖像可以直接作為網路的輸入，避免了傳統影像辨識演算法中複雜的特徵擷取與數據重建過程。物件分類方式幾乎都是基於統計特徵，這就意味著在進行分辨前必須提取某些特徵。卷積神經網路可避免顯式的特徵取樣，隱式地從訓練數據中進行學習。這使得卷積神經網路明顯有別於其他基於神經網路的分類器，通過結構重組與減少權值將特徵擷取功能融合進多層感知器。它可以直接處理灰度圖片，能夠直接用於處理基於圖像的分類。卷積網路較一般神經網路在圖像處理方面有如下優點：輸入圖像與網路的拓撲結構能很好的吻合；特徵擷取與模式分類同時進行，並同時在訓練中產生；權重共享可以減少網路的訓練參數，使神經網路結構變得更為簡單，適應性更強。 Furthermore, the convolutional neural network obtains the visual image from the image capturing device, and then performs image preprocessing, feature extraction, feature selection and feature data input, and then into reasoning and predictive identification. On the other hand, the deep learning essence of convolutional neural networks is to build more useful features by constructing machine learning models with multiple hidden layers and massive training data, so as to improve the accuracy of classification or prediction. Convolutional neural networks use massive training data to learn feature recognition, in order to characterize the rich intrinsic information of the data. Since the convolutional neural network is a weight-sharing network structure, in addition to reducing the complexity of the network model, the number of weights can be reduced. This advantage is more obvious when the input of the network is a multi-dimensional image, so that the image can be directly used as the input of the network, avoiding the complicated feature extraction and data reconstruction process in the traditional image recognition algorithm. Object classification is almost always based on statistical features, which means that certain features must be extracted before resolving. The convolutional neural network avoids explicit feature sampling and implicitly learns from the training data. This makes the convolutional neural network distinct from other neural network-based classifiers, merging feature extraction functions into multilayer perceptrons through structural reorganization and reduced weights. It can process grayscale images directly and can be used directly to process image-based classifications. The convolutional network has the following advantages in image processing compared with the general neural network: the input image and the network topology can be well matched; the feature extraction is performed simultaneously with the pattern classification, and simultaneously generated in the training; Sharing can reduce the training parameters of the network, making the neural network structure simpler and more adaptable.

以上所述，僅為本發明之可行實施例，並非用以限定本發明之專利範圍，凡舉依據下列請求項所述之內容、特徵以及其精神而為之其他變化的等效實施，皆應包含於本發明之專利範圍內。本發明所具體界定於請求項之結構特徵，未見於同類物品，且具實用性與進步性，已符合發明專利要件，爰依法具文提出申請，謹請鈞局依法核予專利，以維護本申請人合法之權益。 The above is only a possible embodiment of the present invention, and is not intended to limit the scope of the patents of the present invention, and the equivalent implementations of other changes according to the contents, features and spirits of the following claims should be It is included in the patent of the present invention. The invention is specifically defined in the structural features of the request item, is not found in the same kind of articles, and has practicality and progress, has met the requirements of the invention patent, and has filed an application according to law, and invites the bureau to approve the patent according to law to maintain the present invention. The legal rights of the applicant.

Claims

A smart image information and big data analysis system using deep learning technology, comprising: at least one image capturing device for capturing an image of an object of at least one object, the object image comprising at least one image; at least one information The processing device is connected to each of the image capturing device signals through a signal transmission module for collecting and receiving the human image; and at least one cloud image recognition unit transmits the signal through the network transmission module and the information processing device The connection is configured to receive the image of the person, and the cloud image recognition unit has at least one deep learning calculation module with a deep learning training function to perform image recognition, and the deep learning calculation module can perform a facial expression on the image of the person And image recognition of the limb movement to obtain at least one facial expression recognition information and limb motion recognition information.

The smart image information and the big data analysis system using the deep learning technology described in claim 1, wherein the cloud image recognition unit further comprises an image feature database with a plurality of human sample images, the deep learning calculation module. Extracting the feature of the sample image to include a facial expression feature data, and setting a corresponding expression recognition data for each facial expression feature data, when the deep learning calculation module receives at least instant capture When the image of the person is imaged, the facial expression feature data corresponding to the feature of the sample image of the person is recognized in the image feature database, and the expression recognition data corresponding to the feature is read, and a corresponding face is output. Facial expression information.

The smart image information and big data analysis system using the deep learning technology described in claim 2, wherein the facial expression recognition information is selected from a slightly happy, happy, extremely happy, slightly sad, sad, extremely sad, slightly not At least two expressions of pleasing, unpleasant, and extremely unpleasant.

The smart image information and the big data analysis system using the deep learning technology described in claim 1, wherein the cloud image recognition unit further comprises an image feature database with a plurality of human sample images, the deep learning calculation module. The feature of the human sample image is extracted to include a limb motion characteristic data, and a corresponding limb motion recognition data is set in each of the limb motion feature data, and the deep learning calculus module receives the continuous number of consecutive captures. When the image of the person is imaged, the body motion feature data corresponding to the feature of the sample image is identified in the image feature database, and the body motion recognition data corresponding to the feature is read, and a corresponding limb is output. Motion recognition information.

The smart image information and the big data analysis system using the deep learning technology described in claim 4, wherein the limb motion recognition information is an abnormal limb motion recognition information, and the abnormal limb motion recognition information is selected from the group consisting of climbing, stealing, and At least one of the abnormal limb movement identification information of snatching, fighting, violent, defamatory, erotic, roaring, and recumbent; the abnormal limb movement recognition information of the climbing is within five seconds, the elbow is higher than the arm When the height is one-half of the original elbow and the knee is higher than the one-third of the height of the original knee when the thigh is extended downward, it is determined that the abnormal limb movement of the climb; the theft Abnormal limb movement recognition information, one person puts his hand into a person's pocket or purse, that is, an abnormal physical movement determined as stealing; the abnormal physical movement recognition information of the snatch is within one minute, approaching at least one other person After a distance of fifty centimeters, you can quickly lift your leg or raise your hand and move away from the other person by at least five meters within five seconds. Abnormal limb movement; the abnormal limb movement recognition information of the fight is determined by the fact that at least two people are close to fifty centimeters in body and perform mutual limb movements such as raising their legs or raising their hands within two seconds. An abnormal physical movement of the fighting; the abnormal physical movement recognition information of the violence is determined by one person's body being within 50 centimeters of another person, and within two seconds of doing a leg movement or raising a hand, etc. An abnormal physical movement for violence; the abnormal physical movement recognition information of the sputum is that one person reveals his genital organs or the person is forced to hug a person and the other person makes a rebellious action, and then the abnormal physical movement can be determined; The abnormal physical movement recognition information of the pornography is that at least one person performs an action of revealing at least one of the genitals or at least two naked persons to make a hug, and can determine the abnormal physical movement of the erotic; the abnormal physical movement recognition information of the roar It is judged to be a roar when a person’s face makes an expression of anger and his mouth is quickly opened and raised in one second. The abnormal limb movement; the abnormal limb movement recognition information that can not be recuperated is one person who has been lying on the ground for more than two minutes, and can be judged as an abnormal limb movement that cannot be recumbent.

The smart image information and the big data analysis system using the deep learning technology described in claim 1, wherein the object image includes at least one live image, and the cloud image recognition unit transmits the information through the network transmission module The device receives the live video image, and the deep learning calculation module can perform violent, bloody and pornographic image recognition on the live video image, and when the deep learning calculation module detects that the live video image is violent, bloody or pornographic, etc. When an image content is used, the playback of the live video is interrupted; or the image processing of the mosaic is performed.

The smart image information and the big data analysis system using the deep learning technology described in claim 1, wherein the object image includes at least one video image, and the cloud image recognition unit further includes a plurality of built-in trademark sample images. An image feature database, and each of the trademark sample images is provided with a trademark feature data and a trademark identification data corresponding to the trademark feature data, and the cloud image recognition unit receives the same through the network transmission module and the information processing device a video image, and capturing the feature of the film and television image as containing at least one trademark feature image, and identifying the trademark feature data corresponding to the feature of the trademark feature image in the image feature database, and reading the feature matching The trademark identification data is further outputted with a corresponding identification information of the trademark, so that the deep learning calculation module can identify the advertisement trademark of the film and television image, and count the number of times and the playing time of the advertisement trademark appearing in the film and television image. In turn, the number of times the advertisement trademark and the play time statistics are output.

A method for analyzing a big data using deep learning intelligent image information, comprising the steps of: providing at least one image capturing device, at least one information processing device, and at least one cloud image recognition unit; capturing the at least one image capturing device An image of the object of at least one object, wherein the image of the object includes at least one image; the information processing device is coupled to each of the image capturing device signals through a signal transmission module for collecting and receiving the image of the person; and the cloud The image recognition unit is coupled to the information processing device via a network transmission module for receiving the image of the person. The cloud image recognition unit has at least one deep learning algorithm module with deep learning training function for performing image recognition. The deep learning calculus module can perform image recognition of facial expressions and limb movements on the human image to obtain at least one facial expression recognition information and limb motion recognition information.

The big data analysis method for applying the deep learning optical image information according to claim 8, wherein the deep learning calculation module comprises the following steps: (a) a training phase step, and at least one deep learning is established. Model, and input a huge amount of sample object image in the deep learning model, and test the correct rate of image recognition by the deep learning model, and then determine whether the image recognition correct rate is sufficient. When the judgment result is yes, the identification result is output and Storing; when the judgment result is no, the deep learning model is self-corrected learning; and (b) running the prediction phase step, the object is captured by the deep learning model, and the object is predicted by the deep learning model. Sexual image recognition to obtain at least one of the facial expression recognition information and the limb motion recognition information.

The big data analysis method for applying the deep learning smart image information according to claim 9, wherein the deep learning model comprises a plurality of different attribute classifiers, and the deep learning calculation module further comprises a feature image capturing step The local feature of the image of the sample object is captured as the image of the sample object, and then the sample visual image is input into the deep learning model, and then compared by the classifier of the same attribute to compare The result is output to the deep learning model to output the facial expression recognition information and the judgment basis of the limb motion recognition information as the deep learning model.