TWI759534B - Activity recognition system based on machine vision for multiple objects - Google Patents
Activity recognition system based on machine vision for multiple objects Download PDFInfo
- Publication number
- TWI759534B TWI759534B TW107130040A TW107130040A TWI759534B TW I759534 B TWI759534 B TW I759534B TW 107130040 A TW107130040 A TW 107130040A TW 107130040 A TW107130040 A TW 107130040A TW I759534 B TWI759534 B TW I759534B
- Authority
- TW
- Taiwan
- Prior art keywords
- visual
- activity
- information
- objects
- unit
- Prior art date
Links
Images
Landscapes
- Image Analysis (AREA)
Abstract
Description
本發明係關於活動辨識之系統。特別是一種基於視覺進行活動識別之系統。The present invention relates to a system for activity recognition. In particular, a system based on vision for activity recognition.
基於視覺進行活動識別之系統係利用視覺資訊,對視覺資訊中可能活動進行辨別。活動並非單一物體於某一圖框中之動作(Action),而指由一個或一個以上之物體,其所表現個別、或彼此之間之動作關係所構成之活動(Activity);如單一物體,以人為例,其動作辨認為跑步;但若進行多物體偵測,如偵測發現一群人皆跑步,則其活動可辨識為馬拉松比賽;若多物體偵測其背景物體在監獄圍牆外,則活動可辨識為逃獄;若偵測人員手中持槍,其後有另一物體為警員在跑步追趕,則活動可辨識為追捕逃犯。而先前技術多為對特定追蹤物體作單一動作之辨識,而非偵測場景與各物體間關係之活動偵測,若僅是先前技術之動作識別,其識別結果於產業利用效益上有限。此外,先前技術專注於追蹤視覺資料中某範圍中某一特定物體,亦造成限制無法達成活動層級之判別能力。另,先前技術多基於以時間為序之動態影像,以時間為序切割為影像序列後,比對前後序列影像,鎖定追蹤物體之相關特徵相對關係後進行動作辨識,此追蹤物體之計算,即在基於時序之多個影像序列之間特徵追蹤比對,往往在動作辨識運算本身以外又額外耗費大量計算,在有限運算資源中難具有追蹤大量物體之能力,造成效能瓶頸與高資源門檻。Vision-based activity recognition systems use visual information to identify possible activities in visual information. Activity is not the action of a single object in a certain frame, but refers to the activity (Activity) composed of one or more objects, the individual or the action relationship between them; such as a single object, Taking a person as an example, its action is identified as running; however, if multi-object detection is performed, if a group of people is detected to be running, their activity can be identified as a marathon; if the background object of multi-object detection is outside the prison wall, the The activity can be identified as a prison escape; if the detective holds a gun and another object is followed by a police officer running and chasing, the activity can be identified as chasing a fugitive. The prior art mostly recognizes a single action of a specific tracking object, rather than activity detection that detects the relationship between the scene and each object. If it is only the action recognition of the prior art, the recognition results are limited in terms of industrial utilization benefits. In addition, the prior art's focus on tracking a specific object within a range of visual data also limits the ability to achieve activity-level discrimination. In addition, the prior art is mostly based on time-sequenced dynamic images. After the time-sequenced dynamic images are cut into image sequences, the before and after images are compared, and the relative relationship between the relevant features of the tracked object is locked to perform motion recognition. The calculation of the tracked object is: Feature tracking and comparison between multiple image sequences based on time series often consumes a lot of computation in addition to the motion recognition operation itself, and it is difficult to have the ability to track a large number of objects with limited computing resources, resulting in performance bottlenecks and high resource thresholds.
本發明即是提出一種活動識別系統,但同時亦可應用於動作辨識,可根據影像資訊產生對場景之各物體與彼此關係之全盤語意型描述,使用此全盤語意描述可產生不只是動作,更是活動層次的識別結果,對於產業在機器視覺的價值與運用層面有大幅提升。且其融合分析過程為靜態影像辨識結果集合之統計,一則不必然需要使用動態影像,二則不需額外在時序影像序列之間進行物體特徵追蹤,對於識別之效率能有大幅度提升,同時能具有辨識影像中大量物體之效能,達成更有效率、更省成本、更高功能的整體果效。本發明對於未來機器智能能具有識別人類較複雜活動、並採取相應反饋與行動,實是關鍵之發展,具有高度產業價值。The present invention proposes an activity recognition system, but it can also be applied to motion recognition, which can generate a comprehensive semantic description of the relationship between each object in the scene and each other according to the image information. Using this comprehensive semantic description can generate not only actions, but also It is the recognition result of the activity level, which has greatly improved the value and application of machine vision in the industry. And its fusion analysis process is the statistics of the static image recognition result set, one does not necessarily need to use dynamic images, and the other does not require additional object feature tracking between time-series image sequences, which can greatly improve the efficiency of recognition, and at the same time can It has the performance of recognizing a large number of objects in the image, and achieves the overall effect of more efficiency, lower cost and higher function. The present invention is a key development for future machine intelligence to recognize complex human activities and take corresponding feedback and actions, and has high industrial value.
視覺感測設備(如相機)接收影像資訊(可為單一靜態影像)之後,將影像資訊送入物體識別單元,物體識別單元運用相關演算法(如機器學習、或基於規則性之運算),判讀出畫面中之各物件種類、並各物件在畫面中之位置,經處理後,可產生各物體彼此之對應關係(如前後左右上下、並相對距離比例等資訊),如此便可獲得此影像資訊的場景訊息。另一方面,姿態識別單元可接收視覺感測設備之原始影像資訊,及/或物體識別單元判讀後之部分物體影像資訊,用以辨識各個物體之姿態資訊,如以人體而言,可經由相關演算法(如機器學習,或基於規則性之運算),產生該人體之姿態判讀,如站躺坐臥;以車輛而言,可為車門打開,車窗搖下等等姿態資訊;姿態識別單元經運算後產生視覺資訊中各物體之姿態狀態。經物體識別單元產生之物體種類資訊、及/或空間相對關係資訊等,加上姿態識別單元產生之各物體狀態資訊,融合分析單元可將該資料進行整合,各物體的情況與彼此之間之相對關係將會明確,此為場景資訊,為語意型資料,而後融合分析單元對於場景資訊進行辨識演算法的運算(如機器學習,或基於規則性之運算),產生一個活動判讀之結果,此結果便可以進行有價值之各類運用。此系統因此具有辨別整個場景的能力,而非單一物體的單一動作,得以產生活動層次的判讀結果。而產生活動判讀結果之後,系統亦可具有事件判斷之單元,設定相關活動及/或某動作發生時,產生特定的事件通報,事件通報傳輸至相關相應動作的產生設備或模組,以產生相應動作的處理,如發出警報、傳出警告短訊信息、啟動相關致動器或設備等。此外,此系統並不必然需要一連串之動態影像,使用單一靜態影像資訊即可進行判別,但若具有動態影像,亦可採用於增加活動判讀精準度,但不需進行追蹤比對運算,融合分析單元把基於時序為基礎之影像序列所產生出來之物體種類資訊、及/或空間相對關係資訊等,加上各物體狀態資訊,即場景資訊,依各影像序列一次或分次放入相關演算法中判讀,將各影像序列之活動判讀結果進行統計,無須針對各影像序列作物體特徵位置追蹤,若各影像序列判讀結果經簡單統計後可輸出最合適之活動判讀結果,舉例而言,若動態影像切割為若干個靜態影像,而若干個靜態影像產生之場景資訊經由融合分析單元判讀後,將產生相同數量或不同數量之活動判讀結果,將此多個活動判讀結果進行各式演算法統計(例如可用簡單大數法則,多者為贏,產生 100 個活動判讀結果,若90 個為馬拉松,10個為遊行時,取其結果為馬拉松)。此系統將減少物體追蹤之計算耗費,將計算資源保留給辨識計算本身。此外,此系統亦可與其他視覺類型感測器(如紅外線相機、深度相機等),或非視覺感測器(如裝設在物體上之陀螺儀等),將感測之相關資料一併帶入融合分析單元,由融合分析單元加入額外之相關演算法進行運算。除此之外,融合分析單元亦可為一獨立之伺服器及/或雲端系統,相關場景資料可結由有線網路、無線網路、或其他相關電波、光波等之傳輸方式彼此連結,進行更有效率或安全之計算。然而,應理解,雖然詳細說明及具體實例指示本發明之較佳實施例,但其僅以圖解說明之方式給出,此乃因熟習此項技術者將自此詳細說明明瞭本發明之範疇內之各種改變及修改。 因此,應理解,本發明並不限於所闡述之裝置之特定組件部分或所闡述之方法之步驟,此乃因此類裝置及方法可變化。亦應理解,本文中所使用之術語僅係出於闡述特定實施例之目的並不意欲具限制性。必須注意,如本說明書及隨附申請專利範圍中所使用,除非內容脈絡另外明確規定,否則冠詞「一」及「該」意欲意指存在元件中之一或多者。因此,舉例而言,提及「一單元」或「該單元」可包含數個裝置等。此外,詞語「包括」、「包含」、「含有」及類似措辭不排除其他元件或步驟。After the visual sensing device (such as a camera) receives the image information (which can be a single static image), the image information is sent to the object recognition unit, and the object recognition unit uses relevant algorithms (such as machine learning, or arithmetic based on rules) to interpret The types of objects in the screen and the positions of the objects in the screen are output. After processing, the corresponding relationship between the objects can be generated (such as information such as front, back, left, right, up, down, and relative distance ratios, etc.), so that the image information can be obtained. scene information. On the other hand, the gesture recognition unit can receive the original image information of the visual sensing device, and/or part of the object image information after interpretation by the object recognition unit, so as to recognize the gesture information of each object. Algorithms (such as machine learning, or operations based on rules), generate the posture interpretation of the human body, such as standing, lying, sitting, and lying down; in the case of vehicles, it can be posture information such as door opening, window rolling down, etc.; posture recognition unit After the operation, the posture state of each object in the visual information is generated. The object type information and/or spatial relative relationship information, etc. generated by the object recognition unit, plus the status information of each object generated by the gesture recognition unit, the fusion analysis unit can integrate the data, the conditions of each object and the relationship between each other. The relative relationship will be clear, this is the scene information, which is the semantic data, and then the fusion analysis unit performs the operation of the identification algorithm (such as machine learning, or the operation based on the rules) on the scene information, and generates an activity interpretation result. As a result, valuable various applications can be made. The system thus has the ability to discriminate the entire scene, rather than a single action of a single object, resulting in activity-level interpretations. After the activity interpretation result is generated, the system can also have an event judging unit, which is set to generate a specific event notification when a related activity and/or a certain action occurs. The processing of actions, such as issuing an alarm, sending out a warning message, activating the relevant actuator or device, etc. In addition, this system does not necessarily need a series of dynamic images, and can use a single static image information for identification, but if there are dynamic images, it can also be used to increase the accuracy of activity interpretation, but does not require tracking and comparison operations, fusion analysis The unit adds the object type information, and/or spatial relative relationship information, etc. generated from the time-series-based image sequence, plus the state information of each object, that is, the scene information, and puts it into the relevant algorithm once or in stages according to each image sequence. In the middle interpretation, the activity interpretation results of each image sequence are counted, and there is no need to track the object feature position for each image sequence. If the interpretation results of each image sequence are simply counted, the most suitable activity interpretation results can be output. The image is divided into several static images, and after the scene information generated by several static images is interpreted by the fusion analysis unit, the same or different number of activity interpretation results will be generated. For example, the simple law of large numbers can be used, the more is the winner, and 100 event interpretation results are generated. If 90 are marathons and 10 are parades, the result is the marathon). This system will reduce the computational cost of object tracking and reserve computational resources for the identification calculation itself. In addition, this system can also be combined with other visual sensors (such as infrared cameras, depth cameras, etc.), or non-visual sensors (such as gyroscopes installed on objects, etc.) It is brought into the fusion analysis unit, and the fusion analysis unit adds additional related algorithms for operation. In addition, the fusion analysis unit can also be an independent server and/or cloud system, and the relevant scene data can be connected to each other by wired network, wireless network, or other related transmission methods such as radio waves and light waves. More efficient or secure computing. It should be understood, however, that the detailed description and specific examples, while indicating preferred embodiments of the invention, are given by way of illustration only, since those skilled in the art will understand the scope of the invention from this detailed description various changes and modifications. Therefore, it is to be understood that the invention is not limited to the particular component parts of the devices described or the steps of the methods described, as such devices and methods may vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting. It must be noted that, as used in this specification and the appended claims, the articles "a" and "the" are intended to mean that there are one or more of the elements unless the context clearly dictates otherwise. Thus, for example, reference to "a unit" or "the unit" can include several devices, etc. Furthermore, the words "comprising", "comprising", "comprising" and similar words do not exclude other elements or steps.
綜合以上所述,本發明基於基於多重物體之視覺活動識別系統,進行物體種類與空間相對關係辨認、各物體姿態辨識與彙整,產生語意型的場景資訊,進而進行活動層次之識別,達成機器智能基於人類較複雜活動之辨識、而能採取相應反饋與行動,實是非常關鍵的發展。其特性不但增加現有產業對於活動識別之技術,並且可擴大現有影像辨識技術可完成之應用,如偵蒐、探勘、保安、生產監控、居家醫療等等,實具高度產業利用性,其融合分析,不僅使用靜態影像即可運作,且得以在動態影像中使用非追蹤之簡單統計方式產生結果,更是現有技術中尚未達到者,實具有新穎性與進步性。總結而言,本案實為具新穎性、進步性,及產業上應用價值之發明,已符合發明專利之要件,遂依法提出專利申請。惟上述者只為本發明之較佳實施方式,不能以此限制本發明之申請專利範圍,舉凡援依本發明之精神而實施等效變化或修飾者,仍皆應涵蓋於本發明專利涵蓋之範圍內。Based on the above, the present invention is based on a visual activity recognition system based on multiple objects, which can identify the type of objects and the relative relationship in space, identify and integrate the gestures of each object, generate semantic scene information, and then perform activity level recognition to achieve machine intelligence. The ability to take corresponding feedback and action based on the recognition of more complex human activities is a very critical development. Its characteristics not only increase the technology of activity recognition in the existing industry, but also expand the application of existing image recognition technology, such as detection, exploration, security, production monitoring, home medical treatment, etc. It is highly industrially applicable. , not only can be operated by using static images, but also can generate results using a simple non-tracking statistical method in moving images, which has not yet been achieved in the prior art, which is novel and progressive. In conclusion, this case is indeed an invention with novelty, progress, and industrial application value, and has met the requirements of an invention patent, so a patent application was filed in accordance with the law. However, the above are only the preferred embodiments of the present invention, and cannot limit the scope of the patent application of the present invention. Any equivalent changes or modifications implemented in accordance with the spirit of the present invention should still be covered by the patent of the present invention. within the range.
現將以所附圖式更全面闡述本發明,其中展示本發明之當前較佳實施例之。但本發明可以諸多不同形式體現且不應被解釋為限於本文中所陳述之實施例;此等實施例係為透徹及完整起見而提供,將本發明之範疇完全傳達給熟習此項技術者。圖1在說明可基於多重物體之視覺活動識別系統10,該系統包括一數位網路相機110,其影像資料經由傳輸介面傳至物體識別單元120,傳輸介面係使用有線網路或無線網路或匯流排或其他光波電波方式連接。物體識別單元120將影像資訊中物體進行識別,產生一物體識別資料結構為例如圖2,為一圖論(graph theory)中之圖(graph)結構,包含物體種類與空間位置為節點,透過邊(edge)彼此相連,邊具有其特性表示關係(如前後左右上下),並可帶有距離資訊(以該靜態圖片比例計算,如百分之一即代表兩物體相距距離為圖片之百分之一),此資料結構可連同影像資訊傳至融合分析單元140,傳輸介面係前述數位網路相機110與物體識別單元120之連結方式。姿態識別單元130係接收影像資訊從數位網路相機110及/或物體識別資料結構從物體識別單元120,而後經運算後產生各物體的姿態資訊,並將各物體的姿態資訊經由傳輸介面傳至融合分析單元140,融合分析單元140將圖2之資料結構加上各物體姿態資訊進行整合之後,即產生場景資訊,為一語意型而非影像型資料,場景資訊即可使用相關演算法(如seq2seq, RNN等語意型演算法)進行辨識,取得活動識別之結果。此結果透過傳輸介面傳輸至事件判斷單元20,事件判斷單元中存有規則資料庫,若動作識別之結果符合規則資料庫中觸發事件之條件,事件判斷單元20即可啟動相關對應動作。The present invention will now be described more fully with reference to the accompanying drawings, in which presently preferred embodiments of the invention are shown. This invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; these embodiments are provided for thoroughness and completeness, and fully convey the scope of the invention to those skilled in the art . FIG. 1 illustrates a visual
10‧‧‧視覺活動識別系統110‧‧‧視覺感測設備(如數位網路相機)120‧‧‧物體識別單元130‧‧‧姿態識別單元140‧‧‧融合分析單元20‧‧‧事件判斷單元10‧‧‧Visual
[第1圖]係根據本發明之一視覺活動識別系統架構圖。 [第2圖]係本發明之物體識別單元產生之物體種類與空間關係資料結構示例圖。[Fig. 1] is an architecture diagram of a visual activity recognition system according to the present invention. [Fig. 2] is an example diagram of the structure of the object types and spatial relationship data generated by the object recognition unit of the present invention.
10‧‧‧視覺活動識別系統 10‧‧‧Visual Activity Recognition System
110‧‧‧視覺感測設備(如數位網路相機) 110‧‧‧Visual sensing equipment (such as digital network cameras)
120‧‧‧物體識別單元 120‧‧‧Object recognition unit
130‧‧‧姿態識別單元 130‧‧‧ Gesture Recognition Unit
140‧‧‧融合分析單元 140‧‧‧Fusion Analysis Unit
20‧‧‧事件判斷單元 20‧‧‧Event Judgment Unit
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW107130040A TWI759534B (en) | 2018-08-28 | 2018-08-28 | Activity recognition system based on machine vision for multiple objects |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW107130040A TWI759534B (en) | 2018-08-28 | 2018-08-28 | Activity recognition system based on machine vision for multiple objects |
Publications (2)
Publication Number | Publication Date |
---|---|
TW202009872A TW202009872A (en) | 2020-03-01 |
TWI759534B true TWI759534B (en) | 2022-04-01 |
Family
ID=70766725
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW107130040A TWI759534B (en) | 2018-08-28 | 2018-08-28 | Activity recognition system based on machine vision for multiple objects |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI759534B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200841737A (en) * | 2006-09-22 | 2008-10-16 | Objectvideo Inc | Video analytics for banking business process monitoring |
CN107911653A (en) * | 2017-11-16 | 2018-04-13 | 王磊 | The module of intelligent video monitoring in institute, system, method and storage medium |
US9977572B2 (en) * | 2014-04-01 | 2018-05-22 | Hallmark Cards, Incorporated | Augmented reality appearance enhancement |
-
2018
- 2018-08-28 TW TW107130040A patent/TWI759534B/en active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TW200841737A (en) * | 2006-09-22 | 2008-10-16 | Objectvideo Inc | Video analytics for banking business process monitoring |
US9977572B2 (en) * | 2014-04-01 | 2018-05-22 | Hallmark Cards, Incorporated | Augmented reality appearance enhancement |
CN107911653A (en) * | 2017-11-16 | 2018-04-13 | 王磊 | The module of intelligent video monitoring in institute, system, method and storage medium |
Also Published As
Publication number | Publication date |
---|---|
TW202009872A (en) | 2020-03-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Lim et al. | iSurveillance: Intelligent framework for multiple events detection in surveillance videos | |
Popoola et al. | Video-based abnormal human behavior recognition—A review | |
JP2013088870A (en) | Suspicious person detecting device, and suspicious person detection method and program | |
Chang et al. | A hybrid CNN and LSTM-based deep learning model for abnormal behavior detection | |
Abdo et al. | Fall detection based on RetinaNet and MobileNet convolutional neural networks | |
Lin et al. | Vision-based fall detection through shape features | |
Zaidi et al. | Video anomaly detection and classification for human activity recognition | |
Zhou et al. | A study on attention-based LSTM for abnormal behavior recognition with variable pooling | |
Mudgal et al. | Suspicious action detection in intelligent surveillance system using action attribute modelling | |
TWI759534B (en) | Activity recognition system based on machine vision for multiple objects | |
Abbas et al. | An adaptive algorithm based on principal component analysis-deep learning for anomalous events detection | |
Ezatzadeh et al. | ViFa: an analytical framework for vision-based fall detection in a surveillance environment | |
Shahad et al. | Complex event detection in an intelligent surveillance system using CAISER platform | |
Li et al. | Loitering detection based on trajectory analysis | |
Zhang et al. | Experiments with computer vision methods for fall detection | |
Ahir et al. | A review on abnormal activity detection methods | |
Varalakshmi et al. | Performance analysis of various machine learning algorithm for fall detection-a survey | |
Raman et al. | Beyond estimating discrete directions of walk: a fuzzy approach | |
Velychko et al. | Artificial Intelligence Based Emergency Identification Computer System | |
Wang et al. | Human Action Recognition of Autonomous Mobile Robot Using Edge-AI | |
CN114373142A (en) | Pedestrian falling detection method based on deep learning | |
Gao | Abnormal behavior detection and warning based on deep intelligent video analysis for geriatric patients | |
Septyadi et al. | Analysis of Home Security System Design Based on 4 PIR Sensors Using Deep Learning Method | |
El Kaid et al. | Real-World Case Study of a Deep Learning Enhanced Elderly Person Fall Video-Detection System. | |
Rashidan et al. | Analysis of artificial neural network and Viola-Jones algorithm based moving object detection |