TWI734297B

TWI734297B - Multi-task object recognition system sharing multi-range features

Info

Publication number: TWI734297B
Application number: TW108145764A
Authority: TW
Inventors: 侯力仁; 蘇亞凡; 游輝亮; 柳恆崧
Original assignee: 中華電信股份有限公司
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2021-07-21
Also published as: TW202123069A

Abstract

The invention discloses a multi-task object recognition system sharing multi-range features. First, a multi-range generation module generates multi- range information of an image of an object. Then, a multi-channel image merge module samples multiple images in different regions from the object according to the multi-range information of the image of the object to combine the multiple images into one multi-channel image. Thereafter, a shared feature extraction module with a shared network layer extracts shared features from the multi-channel image using a neural network. Final, one or more different specific task feature extraction modules of a task-specific feature extraction module group extract one or more different specific task features from the sharing feature using the neural network, and one or more different specific task models output a recognition result of the image of the object according to the one or more different specific task features.

Description

Multi-task object recognition system sharing multi-range features

本發明係關於一種物件辨識技術，特別是指一種共享多範圍特徵之多任務物件辨識系統。 The present invention relates to an object identification technology, in particular to a multi-task object identification system that shares multiple range features.

有鑒於深度學習技術的蓬勃發展，現今各式的影像辨識任務大多可透過類神經模型的訓練來達成，但實際應用情境常需要多個辨識任務的整合，若每一個辨識任務皆由各自的模型負責，則計算複雜度將隨著辨識任務的數目增加。以人臉辨識為例，應用的場所經常為嚴格的權限控管，因而需要人臉辨識任務與假臉辨識任務兩者交互搭配，但若人臉辨識任務與假臉辨識任務使用兩個獨立的模型，則計算複雜度也會隨之變成兩倍。 In view of the vigorous development of deep learning technology, most of today’s image recognition tasks can be achieved through the training of neural-like models, but practical application scenarios often require the integration of multiple recognition tasks. If each recognition task is based on its own model Responsible, the computational complexity will increase with the number of identification tasks. Take face recognition as an example. The application site is often under strict authority control, so it needs to interact and match both the face recognition task and the fake face recognition task. However, if the face recognition task and the fake face recognition task use two independent Model, the computational complexity will also double.

近期有相關技術為解決因任務增加而效率倍增的問題，將不同任務整合由單一模型負責，但因每一個影像辨識任務對於物件的區域範圍要求不同，以致強制合併後反造成各任務的辨識正確率下降。例如，雖然人臉辨識與假臉辨識兩者皆以人臉影像作為輸入，但人臉辨識講究人臉的五官細節且需要的影像的範圍較小，而假臉辨識考量人臉周圍的背景環境且需要的影像的範圍較大，導致人臉辨識與假臉辨識兩者造成影像的輸入範圍決定不易。 Recently, in order to solve the problem of increased efficiency due to increased tasks, a single model is responsible for the integration of different tasks. However, because each image recognition task has different requirements for the area range of the object, the forced merging will cause the recognition of each task to be correct. The rate drops. For example, although both face recognition and fake face recognition use face images as input, face recognition pays attention to the facial features of the face and requires a small range of images, while fake face recognition considers the background ring around the face. Due to the environment and the required image range is relatively large, it is difficult to determine the input range of the image due to both face recognition and fake face recognition.

因此，如何提供一種新穎或創新之物件(影像)辨識技術，以解決多辨識任務整合問題，實已成為本領域技術人員之一大研究課題。 Therefore, how to provide a novel or innovative object (image) recognition technology to solve the integration problem of multiple recognition tasks has become a major research topic for those skilled in the art.

本發明提供一種新穎或創新之共享多範圍特徵之多任務物件辨識系統，例如能解決在相同的物件但多任務應用時面臨的多個獨立模型不易整合的問題，或者提供共享網路層以有效減少重複或不必要的網路層，抑或者讓多個辨識任務之間能共享多範圍影像抽取出的特徵以提升辨識正確率。 The present invention provides a novel or innovative multi-task object recognition system that shares multi-range features. For example, it can solve the problem that multiple independent models are not easy to integrate when the same object is used in multi-task applications, or provide a shared network layer for effective Reduce repetitive or unnecessary network layers, or allow multiple recognition tasks to share features extracted from multi-range images to improve recognition accuracy.

本發明中共享多範圍特徵之多任務物件辨識系統包括：一多範圍產生模組，係產生或提供物件的影像的多範圍資訊；一多通道影像合併模組，係依據多範圍產生模組所產生或提供之物件的影像的多範圍資訊自物件中取樣出不同區域範圍的多張影像，以將多張影像合併成一張多通道影像；一具有共享網路層之共享特徵抽取模組，係利用類神經網路自多通道影像合併模組所合併之多通道影像中抽取出共享特徵；以及一特定任務特徵抽取模組群，係具有一個或多個不同的特定任務特徵抽取模組，以利用類神經網路自具有共享網路層之共享特徵抽取模組所抽取之共享特徵中抽取出一個或多個不同特定任務特徵，俾由特定任務特徵抽取模組群之一個或多個特定任務模型依據一個或多個不同特定任務特徵輸出物件的影像的辨識結果。 The multi-task object recognition system that shares multi-range features in the present invention includes: a multi-range generation module that generates or provides multi-range information of the image of the object; a multi-channel image merging module that is based on the multi-range generation module The generated or provided multi-range information of the image of the object samples multiple images of different areas from the object to merge the multiple images into a multi-channel image; a shared feature extraction module with a shared network layer, is A neural network is used to extract shared features from the multi-channel images merged by the multi-channel image merging module; and a task-specific feature extraction module group has one or more different task-specific feature extraction modules to Use neural networks to extract one or more different specific task features from the shared features extracted by the shared feature extraction module with the shared network layer, so as to extract one or more specific tasks of the module group from the specific task feature The model outputs the recognition result of the image of the object according to one or more different specific task features.

為讓本發明之上述特徵和優點能更明顯易懂，下文特舉實施例，並配合所附圖式作詳細說明。在以下描述內容中將部分闡述本發明之額外特徵及優點，且此等特徵及優點將部分自所述描述內容可得而知，或可藉由對本發明之實踐習得。本發明之特徵及優點借助於在申請專利範圍中特別指出的元件及組合來認識到並達到。應理解，前文一般描述與以下詳細描述二者均僅為例示性及解釋性的，且不欲約束本發明所欲主張之範圍。 In order to make the above-mentioned features and advantages of the present invention more comprehensible, embodiments are specifically described below in conjunction with the accompanying drawings. In the following description, the additional features and advantages of the present invention will be partially explained, and these features and advantages will be partly known from the description, or can be learned by practicing the present invention. The features and advantages of the present invention are realized and achieved by means of the elements and combinations specifically pointed out in the scope of the patent application. It should be understood that both the foregoing general description and the following detailed description are only illustrative and explanatory, and are not intended to limit the scope of the present invention.

1‧‧‧共享多範圍特徵之多任務物件辨識系統 1‧‧‧Multi-task object recognition system with shared multi-range features

10‧‧‧影像擷取模組 10‧‧‧Image capture module

20‧‧‧物件偵測模組 20‧‧‧Object Detection Module

30‧‧‧多範圍產生模組 30‧‧‧Multi-range generation module

40‧‧‧多通道影像合併模組 40‧‧‧Multi-channel image merging module

50‧‧‧共享特徵抽取模組 50‧‧‧Shared feature extraction module

51‧‧‧共享網路層 51‧‧‧Shared network layer

52‧‧‧共享特徵圖 52‧‧‧Shared feature map

60‧‧‧特定任務特徵抽取模組群 60‧‧‧Special task feature extraction module group

61‧‧‧特定任務特徵抽取模組 61‧‧‧Special task feature extraction module

62‧‧‧特定任務模型 62‧‧‧Specific mission model

70‧‧‧跨任務應用服務模組 70‧‧‧Cross-task Application Service Module

A‧‧‧影像或影像畫面 A‧‧‧Video or video screen

B‧‧‧位置資訊 B‧‧‧Location Information

C‧‧‧多通道影像 C‧‧‧Multi-channel image

C1至Cn‧‧‧影像 C1 to Cn‧‧‧Image

F‧‧‧特徵圖 F‧‧‧Characteristic map

L0‧‧‧第0層共享網路層 L0‧‧‧Level 0 shared network layer

L1‧‧‧第1層共享網路層 L1‧‧‧The first shared network layer

Lk‧‧‧第k層共享網路層 Lk‧‧‧k-th shared network layer

m、n‧‧‧正整數 m, n‧‧‧positive integer

第1圖為本發明中共享多範圍特徵之多任務物件辨識系統之架構示意圖； Figure 1 is a schematic diagram of the architecture of the multi-task object recognition system sharing multiple range features in the present invention;

第2圖為本發明中共享多範圍特徵之多任務物件辨識系統之實施例示意圖； Figure 2 is a schematic diagram of an embodiment of a multi-task object recognition system that shares multi-range features in the present invention;

第3圖為本發明之第1圖至第2圖中有關影像擷取模組、物件偵測模組、多範圍產生模組與多通道影像合併模組之輸入之實施例示意圖；以及 Figure 3 is a schematic diagram of an embodiment of the input of the image capture module, the object detection module, the multi-range generation module, and the multi-channel image merging module in Figures 1 to 2 of the present invention; and

第4圖為本發明之第2圖中有關共享特徵抽取模組之共享網路層之實施例示意圖。 Figure 4 is a schematic diagram of an embodiment of the shared network layer of the shared feature extraction module in Figure 2 of the present invention.

以下藉由特定的具體實施形態說明本發明之實施方式，熟悉此技術之人士可由本說明書所揭示之內容了解本發明之其他優點與功效，亦可因而藉由其他不同的具體等同實施形態加以施行或應用。 The following describes the implementation of the present invention with specific specific embodiments. Those familiar with this technology can understand the other advantages and effects of the present invention from the content disclosed in this specification. It can also be implemented or applied by other different specific equivalent embodiments.

第1圖為本發明中共享多範圍特徵之多任務物件辨識系統1之架構示意圖，第2圖為本發明中共享多範圍特徵之多任務物件辨識系統1之實施例示意圖。 Fig. 1 is a schematic diagram of the architecture of the multi-task object recognition system 1 sharing multiple-range features in the present invention, and Fig. 2 is a schematic diagram of an embodiment of the multi-task object recognition system 1 sharing multiple-range features in the present invention.

如第1圖與第2圖所示，共享多範圍特徵之多任務物件辨識系統1可包括彼此互相通訊之一影像擷取模組10、一物件偵測模組20、一多範圍產生模組30、一多通道影像合併模組40、一共享特徵抽取模組50、一特定任務特徵抽取模組群60及一跨任務應用服務模組70。共享特徵抽取模組50可具有共享網路層51(如第0層共享網路層L0至第k層共享網路層Lk，且特定任務特徵抽取模組群60可具有一個或多個不同的特定任務特徵抽取模組61。 As shown in Fig. 1 and Fig. 2, the multi-task object recognition system 1 sharing multi-range features may include an image capturing module 10, an object detection module 20, and a multi-range generation module that communicate with each other. 30. A multi-channel image merging module 40, a shared feature extraction module 50, a specific task feature extraction module group 60, and a cross-task application service module 70. The shared feature extraction module 50 may have a shared network layer 51 (such as the 0th shared network layer L0 to the kth shared network layer Lk, and the task-specific feature extraction module group 60 may have one or more different Specific task feature extraction module 61.

影像擷取模組10可擷取影像或影像畫面，物件偵測模組20可自影像擷取模組10所擷取之影像或影像畫面中偵測出物件的位置資訊，多範圍產生模組30可利用物件偵測模組20所偵測之物件的位置資訊產生或提供物件的影像的多範圍資訊。多通道影像合併模組40可依據多範圍產生模組30所產生或提供之物件的影像的多範圍資訊自物件中取樣出不同區域範圍的多張影像，以將多張影像合併成一張多通道影像。共享特徵抽取模組50可利用類神經網路自多通道影像合併模組40所合併之多通道影像中抽取出共享特徵。特定任務特徵抽取模組群60之一個或多個不同的特定任務特徵抽取模組61可利用類神經網路自具有共享網路層51之共享特徵抽取模組50所抽取之共享特徵中抽取出一個或多個不同特定任務特徵，俾由特定任務特徵抽取模組群60之一個或多個特定任務模型62依據一個或多個不同特定任務特徵輸出物件的影像的辨識結果。 The image capture module 10 can capture images or image frames, and the object detection module 20 can detect the position information of objects from the images or image frames captured by the image capture module 10, and a multi-range generation module 30 can use the position information of the object detected by the object detection module 20 to generate or provide multi-range information of the image of the object. The multi-channel image merging module 40 can sample multiple images of different regions and ranges from the object according to the multi-range information of the image of the object generated or provided by the multi-range generation module 30 to merge the multiple images into one multi-channel image. The shared feature extraction module 50 can extract shared features from the multi-channel image merged by the multi-channel image merging module 40 by using a neural network. One or more different task-specific feature extraction modules 61 of the task-specific feature extraction module group 60 can be extracted from the shared features extracted by the shared feature extraction module 50 with the shared network layer 51 by using a neural network. One or more different specific task features, so that one or more specific task models 62 of the specific task feature extraction module group 60 are based on one Or multiple different specific task feature output object image recognition results.

例如，二個不同的特定任務模型62分別為人臉辨識模型與假臉辨識模型以作為防偽人臉辨識門禁系統之應用，且跨任務應用服務模組70可利用特定任務特徵抽取模組群60之一個或多個不同的特定任務特徵抽取模組61所抽取之一個或多個特定任務特徵提供、執行或達成跨任務應用服務。亦即，特定任務特徵抽取模組群60中每一個任務對應至一個特定任務模型62，且特定任務模型62是由類神經網路組成並從共享特徵圖52抽取所需的特徵，藉以判斷輸入的影像有怎樣的特質，以人臉來說，假設有二個特定任務模型62分別為人臉辨識模型與性別辨識模型，則此二個特定任務模型62從共享特徵圖52中抽取所需特徵後就可以輸出物件的影像的辨識結果，例如美國總統的影像經過影像擷取模組10至特定任務特徵抽取模組群60處理後，此二個特定任務模型62就會輸出"川普"、"男性"等二個資訊給跨任務應用服務模組70。 For example, two different task-specific models 62 are a face recognition model and a fake face recognition model respectively, which are used as the application of the anti-counterfeiting face recognition access control system, and the cross-task application service module 70 can use the specific task feature extraction module group 60 One or more specific task features extracted by one or more different specific task feature extraction modules 61 provide, execute, or achieve cross-task application services. That is, each task in the specific task feature extraction module group 60 corresponds to a specific task model 62, and the specific task model 62 is composed of a neural network and extracts required features from the shared feature map 52 to determine the input What are the characteristics of the image of, in terms of face, suppose there are two specific task models 62 which are face recognition model and gender recognition model respectively, then these two specific task models 62 extract the required features from the shared feature map 52 After that, the recognition result of the image of the object can be output. For example, after the image of the President of the United States is processed by the image capture module 10 to the specific task feature extraction module group 60, the two specific task models 62 will output "Trump", Two pieces of information such as "male" are given to the cross-task application service module 70.

影像擷取模組10可為硬體之攝影機、照相機、監視器或感測器等，且影像擷取模組10所擷取之影像可為二維影像或三維影像，如深度影像、紅外線影像等，而二維影像之通道可為單通道或多通道。 The image capturing module 10 can be a hardware camera, camera, monitor, or sensor, etc., and the image captured by the image capturing module 10 can be a two-dimensional image or a three-dimensional image, such as a depth image, an infrared image And so on, and the channel of the two-dimensional image can be a single channel or multiple channels.

物件偵測模組20可為硬體之物件偵測器或軟體之物件偵測程式等，且物件偵測模組20可依據不同應用使用不同的物件偵測器。例如，人臉分析或辨識之應用可使用有關MTCNN(Multi-task cascaded convolutional networks，多任務串聯卷積神經網路)的物件偵測器或物件偵測程式，而車輛偵測或辨識之應用可使用有關YOLO(You Only Look Once；你只看一次)的物件偵測器或物件偵測程式。 The object detection module 20 can be a hardware object detector or a software object detection program, etc., and the object detection module 20 can use different object detectors according to different applications. For example, face analysis or recognition applications can use object detectors or object detection programs related to MTCNN (Multi-task cascaded convolutional networks), and vehicle detection or recognition applications can Use the object detector or object detection program related to YOLO (You Only Look Once).

多範圍產生模組30可為軟體之多範圍產生程式等，多通道影像合併模組40可為軟體之多通道影像合併程式等。多範圍產生模組30或多通道影像合併模組40之多範圍擴展方式與數量可依據不同應用有不同的變化，若著重於物件之細節與質地特徵，則多範圍產生模組30或多通道影像合併模組40可以多個縮小範圍的影像為主；相反地，若著重於物件周圍之附屬物或背景，則多範圍產生模組30或多通道影像合併模組40以多個擴張範圍的影像為主(即添加較多擴張範圍的影像)。另外，若著重於共享網路層51的模型的大小或效能，則多範圍產生模組30或多通道影像合併模組40可以疊合較少不同範圍的影像的張數(即減少疊合的影像的張數)；相反地，若著重於提升共享網路層51的泛用性，則多範圍產生模組30或多通道影像合併模組40可以疊合較多不同範圍的影像的張數(即增加疊合的影像的張數)，以利於擷取影像更多的特徵。 The multi-range generating module 30 may be a software multi-range generating program, etc., and the multi-channel image merging module 40 may be a software multi-channel image merging program, etc. The multi-range expansion method and quantity of the multi-range generation module 30 or the multi-channel image merging module 40 can vary according to different applications. If the details and texture characteristics of the object are emphasized, the multi-range generation module 30 or multi-channel The image merging module 40 can be based on multiple reduced-range images; on the contrary, if it focuses on the appendages or background around the object, the multi-range generation module 30 or the multi-channel image merging module 40 uses multiple expanded-range images. The image is the main one (that is, the image with more expanded range is added). In addition, if the focus is on the size or performance of the model of the shared network layer 51, the multi-range generation module 30 or the multi-channel image merging module 40 can overlap the number of images of different areas (that is, reduce the number of overlapped images). The number of images); on the contrary, if the focus is on improving the versatility of the shared network layer 51, the multi-range generation module 30 or the multi-channel image merging module 40 can superimpose the number of images of more different ranges (In other words, increase the number of superimposed images) to facilitate capturing more features of the image.

共享特徵抽取模組50可為軟體之共享特徵抽取程式等，且共享特徵抽取模組50的共享網路層51(類神經網路層)的種類與層數可依據不同應用來調整。例如，在某些應用中，共享特徵抽取模組50的共享網路層51使用三維(3D)卷積的效果可能較佳，或者當多個特定任務特徵抽取模組61相當類似時，共享特徵抽取模組50的共享網路層51的層數便可以增加。 The shared feature extraction module 50 can be a software shared feature extraction program, etc., and the type and number of the shared network layer 51 (neural network layer) of the shared feature extraction module 50 can be adjusted according to different applications. For example, in some applications, the shared network layer 51 of the shared feature extraction module 50 may use three-dimensional (3D) convolution to achieve better results, or when multiple task-specific feature extraction modules 61 are quite similar, the shared feature The number of layers of the shared network layer 51 of the extraction module 50 can be increased.

特定任務特徵抽取模組群60之特定任務特徵抽取模組61可為軟體之特定任務特徵抽取程式等，跨任務應用服務模組70可為軟體之跨任務應用服務程式等。各特定任務特徵抽取模組61的網路層的數量無須相同，也沒有數量限制，且各特定任務特徵抽取模組61可依據任務的特性使用不同的網路層的種類與數量。 The specific task feature extraction module 61 of the specific task feature extraction module group 60 can be a software specific task feature extraction program, etc., and the cross-task application service module 70 can be a software cross-task application service program, etc. The number of network layers of each specific task feature extraction module 61 does not need to be the same, and there is no limit to the number, and each specific task feature extraction module 61 can be based on the characteristics of the task Use different types and quantities of network layers.

前述類神經網路可為例如卷積神經網路(Convolutional Neural Network；CNN)、遞歸神經網路(recurrent neural network；RNN)、深度神經網路(Deep Neural Network；DNN)、長短期記憶(LSTM)神經網路等，但不以此為限。 The aforementioned neural network can be, for example, Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), Deep Neural Network (DNN), Long Short-term Memory (LSTM) ) Neural network, etc., but not limited to this.

舉例而言，多範圍產生模組30可擷取多個縮小或擴展的影像的範圍，且多通道影像合併模組40可依據多範圍產生模組30所擷取之多個影像來疊合多通道的維度以整合為單一的輸入，使得單一的輸入所包括之特徵可以跨多種辨識任務。以人臉分析或辨識為例，擴大範圍的人臉影像可以涵蓋頭髮及耳朵，有助於性別判斷與人臉辨識應用；而且，縮小範圍與擴大範圍的人臉影像分別有助於擷取人臉影像的質地與背景特徵，有助於活體(如人體)識別的應用。 For example, the multi-range generation module 30 can capture multiple reduced or expanded image ranges, and the multi-channel image merging module 40 can superimpose multiple images based on the multiple images captured by the multi-range generation module 30. The dimensions of the channel are integrated into a single input, so that the features included in a single input can span multiple recognition tasks. Taking face analysis or recognition as an example, the expanded face image can cover hair and ears, which is helpful for gender judgment and face recognition applications; moreover, the reduced and expanded face images are helpful to capture people. The texture and background characteristics of the face image help the application of living body (such as human body) recognition.

共享特徵抽取模組50可具有共享網路層51，能有效減少重複或不必要的網路層，以降低模型的大小與提升執行速度。在應用時，也能依據不同需求抽換特定任務特徵抽取模組群60之特定任務模型62來提高彈性。例如，以實例說明抽換的概念，假設設計了一個具有人臉辨識、性別辨識與年齡辨識的產品，但在某一個應用的場景，客戶不需要年齡資訊，則可移除年齡辨識的特定任務模型62，而保留人臉辨識與性別辨識兩者的特定任務模型62，以降低整體模型大小與運算量。因此，本發明可讓多個辨識任務之間得以共享此共享網路層51，能減少整體計算複雜度，並讓多個辨識任務之間得以共享多範圍影像抽取出的特徵，提升辨識正確率。 The shared feature extraction module 50 may have a shared network layer 51, which can effectively reduce repetitive or unnecessary network layers, so as to reduce the size of the model and increase the execution speed. In application, the specific task model 62 of the specific task feature extraction module group 60 can also be exchanged according to different needs to improve flexibility. For example, take an example to illustrate the concept of swap. Suppose a product with face recognition, gender recognition and age recognition is designed, but in a certain application scenario, the customer does not need age information, and the specific task of age recognition can be removed The model 62 retains the task-specific model 62 for both face recognition and gender recognition to reduce the overall model size and computational complexity. Therefore, the present invention allows the shared network layer 51 to be shared between multiple recognition tasks, reduces the overall computational complexity, and allows multiple recognition tasks to share features extracted from multiple ranges of images, and improves the recognition accuracy. .

本發明可提供多範圍影像疊合、跨任務共享特徵與可抽換的特徵抽取群等技術，利用這些技術可使相同的物件但多個辨識任務的應用易於整合為單一模型，能有效降低模型的大小與預測時間，亦能增加應用的彈性。亦即，本發明可利用多範圍共享特徵的技術，透過共享特徵抽取模組50之共享網路層51將相同的物件但多個辨識任務的應用可整合為單一模型，以有效降低模型的大小與預測時間。 The invention can provide multi-range image overlay, cross-task sharing features and swappable Technology such as feature extraction group, using these technologies can make the application of the same object but multiple identification tasks easy to integrate into a single model, which can effectively reduce the size of the model and the prediction time, and also increase the flexibility of the application. That is, the present invention can utilize the technology of sharing features in multiple ranges, and the application of the same object but multiple identification tasks can be integrated into a single model through the shared network layer 51 of the shared feature extraction module 50, so as to effectively reduce the size of the model. And forecast time.

本發明能解決模型龐大與效能低下的問題。亦即，在多個獨立模型下，由於每個辨識任務所需的影像的範圍可能都有些微不同，造成欲辨識的物件相同，但卻需要擷取不同範圍的多張影像(如n張影像或圖片)以分別輸入至多個(如n個)獨立模型而降低整體辨識速度。因此，本發明利用多範圍(多通道)疊合與共享網路層51的技術將模型需要的特徵整合為單一模型，使得模型的大小得以縮小。 The invention can solve the problems of large model and low efficiency. That is, under multiple independent models, since the range of images required for each recognition task may be slightly different, the object to be recognized is the same, but multiple images with different ranges (such as n images) need to be captured. (Or pictures) to input into multiple (such as n) independent models to reduce the overall recognition speed. Therefore, the present invention uses the technology of multi-range (multi-channel) overlay and shared network layer 51 to integrate the features required by the model into a single model, so that the size of the model can be reduced.

共享特徵抽取模組50係為了減少計算量及提升預測速度，且其概念是共享才會減少計算量，進而提升預測速度。舉例來說，假設獨立模型X的計算量為50，且獨立模型Y的計算量為50，若各別進行預測，則獨立模型X與獨立模型Y的總計算量為50+50=100，但若引入共享的概念，將獨立模型X與獨立模型Y可以共用的部分拆分出來將之稱為Z，而這共用部分Z的計算量為20，則獨立模型X與獨立模型Y剩餘的不可共用的部分分別稱為X'及Y'，在共享後，總計算量會變成Z+X'+Y'=20+(50-20)+(50-20)=80，相較於原本的計算量100就節省了20的計算量。 The shared feature extraction module 50 is designed to reduce the amount of calculation and increase the speed of prediction, and the concept is that sharing can reduce the amount of calculation, thereby increasing the speed of prediction. For example, suppose that the calculation amount of independent model X is 50 and the calculation amount of independent model Y is 50. If the prediction is made separately, the total calculation amount of independent model X and independent model Y is 50+50=100, but If the concept of sharing is introduced, the part that can be shared by independent model X and independent model Y is split and called Z, and the calculation amount of this shared part Z is 20, then the remaining independent model X and independent model Y cannot be shared The parts of are called X'and Y'respectively. After sharing, the total calculation amount will become Z+X'+Y'=20+(50-20)+(50-20)=80, compared to the original calculation A quantity of 100 saves 20 calculations.

本發明可具有高度的應用彈性。亦即，本發明將模型需要的特徵整合為單一模型下，仍然保有獨立模型的彈性，可以依據不同應用抽換不同的特定任務特徵抽取模組61，並依據不同應用來客製化不同任務組合。 The invention can have a high degree of application flexibility. That is, the present invention integrates the features required by the model into a single model, and still retains the flexibility of an independent model. Different task-specific feature extraction modules 61 can be exchanged according to different applications, and different task groups can be customized according to different applications. combine.

第3圖為本發明之第1圖至第2圖中有關影像擷取模組10、物件偵測模組20、多範圍產生模組30與多通道影像合併模組40之輸入之實施例示意圖。 Figure 3 is a schematic diagram of an embodiment of the input of the image capture module 10, the object detection module 20, the multi-range generation module 30, and the multi-channel image merging module 40 in Figures 1 to 2 of the present invention. .

如第3圖與上述第1圖至第2圖所示，本發明之關鍵在於產生影像擷取模組10、物件偵測模組20、多範圍產生模組30與多通道影像合併模組40等之輸入，並提供共享特徵抽取模組50之共享網路層51(如第0層共享網路層L0至第k層共享網路層Lk)、特定任務特徵抽取模組群60之至少二特定任務模型62與跨任務應用服務模組70來構成系統之整個架構。需說明者，前述第0層共享網路層L0表示從0開始計數共享網路層，但若從1開始計數共享網路層，則應將第0層共享網路層L0改稱為第1層共享網路層，以此類推。 As shown in Fig. 3 and Figs. 1 to 2 above, the key of the present invention is to generate an image capture module 10, an object detection module 20, a multi-range generation module 30, and a multi-channel image merging module 40 It also provides the shared network layer 51 of the shared feature extraction module 50 (such as the 0th shared network layer L0 to the kth shared network layer Lk), and at least two of the specific task feature extraction module group 60 The specific task model 62 and the cross-task application service module 70 constitute the entire architecture of the system. It should be noted that the aforementioned layer 0 shared network layer L0 means that the shared network layer is counted from 0, but if the shared network layer is counted from 1, the shared network layer L0 of the 0th layer should be changed to the first The layer shares the network layer, and so on.

在產生影像擷取模組10、物件偵測模組20、多範圍產生模組30與多通道影像合併模組40之輸入時，本發明可包括下列程序P11至程序P14，並以人臉作為範例進行說明。 When generating input from the image capture module 10, the object detection module 20, the multi-range generation module 30, and the multi-channel image merging module 40, the present invention may include the following procedures P11 to P14, and the human face is used as Examples are explained.

程序P11：由影像擷取模組10擷取影像或影像畫面A。例如，影像擷取模組10(如攝影機或感測器等)可擷取影像或影像畫面A，如RGB影像(圖片)或影像畫面，其中RGB表示紅/綠/藍三原色。 Procedure P11: the image or image frame A is captured by the image capturing module 10. For example, the image capturing module 10 (such as a camera or a sensor, etc.) can capture images or image frames A, such as RGB images (pictures) or image frames, where RGB represents the three primary colors of red/green/blue.

程序P12：由物件偵測模組20自影像擷取模組10所擷取之影像或影像畫面A中偵測出物件的位置資訊B。例如，物件偵測模組20可使用物件偵測演算法(如人臉偵測演算法)自影像擷取模組10所擷取之影像或影像畫面A中偵測出物件(如人臉)的位置資訊B(如邊界盒座標)。 Procedure P12: The object detection module 20 detects the position information B of the object from the image or image frame A captured by the image capturing module 10. For example, the object detection module 20 can use an object detection algorithm (such as a face detection algorithm) to detect an object (such as a human face) from the image captured by the image capture module 10 or the image frame A The location information B (such as bounding box coordinates).

程序P13：由多範圍產生模組30利用物件偵測模組20所偵測之物件的位置資訊B產生或提供物件的影像的多範圍資訊。例如，多範圍產生模組30可依據不同應用自物件偵測模組20所偵測之物件(如人臉)的中央或其他位置進行物件(如人臉)的邊界擴展或縮減，並決定物件(如人臉)的擴展或縮減幅度與所需擷取物件(如人臉)的影像的張數。繼之，多範圍產生模組30可在擷取物件(如人臉)的多張影像C1-Cn完畢後，再縮放物件(如人臉)的多張影像C1-Cn至固定長寬大小，在此以物件(如人臉)的影像為224x224(不限單位)的固定長寬大小為例，且假設不同範圍的物件(如人臉)的影像(如RGB影像)為多張影像C1-Cn(如n張影像)，其中n為正整數。 Procedure P13: The multi-range generation module 30 uses the position information B of the object detected by the object detection module 20 to generate or provide the multi-range information of the image of the object. For example, the multi-range generation module 30 can expand or reduce the boundary of the object (such as a human face) from the center or other positions of the object (such as a human face) detected by the object detection module 20 according to different applications, and determine the object (Such as a human face) the extent of expansion or reduction and the number of images of the object (such as a human face) to be captured. Subsequently, the multi-range generation module 30 can scale the multiple images C1-Cn of the object (such as a human face) to a fixed length and width after capturing multiple images C1-Cn of the object (such as a human face). Here we take the image of an object (such as a human face) with a fixed length and width of 224x224 (unlimited units) as an example, and assume that the images (such as an RGB image) of objects with different ranges (such as a human face) are multiple images C1- Cn (such as n images), where n is a positive integer.

程序P14：由多通道影像合併模組40依據多範圍產生模組30所產生或提供之物件的影像的多範圍資訊自物件中取樣出不同區域範圍的多張影像C1-Cn(如n張影像)，以將多張影像C1-Cn合併成一張多通道影像C。例如，多通道影像合併模組40可依據多範圍產生模組30所產生之物件(如人臉)的影像的多範圍資訊自物件中取樣出不同區域範圍的多張影像C1-Cn(如n張影像)，以將多張影像C1-Cn在通道維度上合併成一張多通道影像C。假設每一張影像(圖片)的維度為224x224x3，則合併n張影像(圖片)後，n張影像的維度為224x224x3n。 Procedure P14: The multi-channel image merging module 40 samples multiple images C1-Cn (such as n images) of different regions from the object according to the multi-range information of the image of the object generated or provided by the multi-range generation module 30 ) To merge multiple images C1-Cn into a multi-channel image C. For example, the multi-channel image merging module 40 can sample multiple images C1-Cn (such as n Images) to merge multiple images C1-Cn into one multi-channel image C in the channel dimension. Assuming that the dimension of each image (picture) is 224x224x3, after merging n images (pictures), the dimension of n images is 224x224x3n.

第4圖為本發明之第2圖中有關共享特徵抽取模組50之共享網路層51之實施例示意圖。如第3圖與第4圖所示，在第3圖中產生影像擷取模組10、物件偵測模組20、多範圍產生模組30與多通道影像合併模組40等之輸入後，便會依序進入第4圖中共享特徵抽取模組50之共享網路層51 之第0層共享網路層L0、第1層共享網路層L1至第k層共享網路層Lk。 Figure 4 is a schematic diagram of an embodiment of the shared network layer 51 of the shared feature extraction module 50 in Figure 2 of the present invention. As shown in Figures 3 and 4, after the input of the image capture module 10, the object detection module 20, the multi-range generation module 30, and the multi-channel image merging module 40 are generated in Figure 3, Then it will sequentially enter the shared network layer 51 of the shared feature extraction module 50 in Figure 4 The 0th layer shared network layer L0, the first layer shared network layer L1 to the kth layer shared network layer Lk.

如第4圖所示，具有共享網路層51之共享特徵抽取模組50可利用類神經網路自多通道影像合併模組40所合併之多通道影像C中抽取出共享特徵。例如，共享特徵抽取模組50之共享網路層51可透過下列程序P21至程序P22，以利用類神經網路(如卷積神經網路CNN)自多通道影像合併模組40所合併之多通道影像C中抽取出共享特徵。同時，本發明下列以第0層共享網路層L0與二維(2D)卷積為例，而第1層共享網路層L1至第k層共享網路層Lk可依此類推，且共享特徵抽取模組50亦可依據不同應用選擇二維(2D)卷積或三維(3D)卷積。 As shown in FIG. 4, the shared feature extraction module 50 with the shared network layer 51 can extract shared features from the multi-channel image C merged by the multi-channel image merging module 40 by using a neural network. For example, the shared network layer 51 of the shared feature extraction module 50 can use the following procedures P21 to P22 to utilize the neural network (such as convolutional neural network CNN) merged from the multi-channel image merging module 40 The shared features are extracted from the channel image C. At the same time, the present invention takes the 0th layer shared network layer L0 and two-dimensional (2D) convolution as an example in the following, and the first layer shared network layer L1 to the kth layer shared network layer Lk can be deduced by analogy, and shared The feature extraction module 50 can also select two-dimensional (2D) convolution or three-dimensional (3D) convolution according to different applications.

程序P21：共享特徵抽取模組50之共享網路層51(如第0層共享網路層L0)可使用(如依序使用)不同長寬大小的核心(如卷積核心)對多通道影像合併模組40所合併之多通道影像C進行卷積。須注意者，當共享網路層51(如第0層共享網路層L0)之輸入通道為3n時，共享網路層51(如第0層共享網路層L0)之二維(2D)卷積之核心的深度為3n。 Procedure P21: The shared network layer 51 of the shared feature extraction module 50 (such as the 0th shared network layer L0) can use (if used in sequence) cores of different length and width (such as convolution cores) for multi-channel images The multi-channel image C merged by the merging module 40 is convolved. It should be noted that when the input channel of shared network layer 51 (such as layer 0 shared network layer L0) is 3n, the two-dimensional (2D) of shared network layer 51 (such as layer 0 shared network layer L0) The depth of the core of the convolution is 3n.

程序P22：假設總共有多個(如m個)不同長寬大小的核心，則共享特徵抽取模組50之共享網路層51(如第0層共享網路層L0)對多通道影像C進行卷積後可產生深度為m的特徵圖F，以作為或提供予下一層共享網路層(如第1層共享網路層L1)，其中m為正整數。 Procedure P22: Assuming that there are a total of multiple (such as m) cores with different lengths and widths, the shared network layer 51 of the shared feature extraction module 50 (such as the shared network layer L0 of the 0th layer) performs the multi-channel image C After convolution, a feature map F with a depth of m can be generated, which can be used as or provided to the next shared network layer (such as the first shared network layer L1), where m is a positive integer.

共享特徵抽取模組50在產生共享網路層51(如第0層共享網路層L0)之特徵圖F後，依據不同情境可以產生更深的共享網路層(如第1層共享網路層L1至第k層共享網路層Lk)。例如，第2圖所示特定任務特徵抽取模組群60之一個或多個特定任務模型62皆使用ResNet或ResNet系列，如ResNet18、ResNet34或ResNet50，則共享特徵抽取模組50可以將後續的批量正規化層(Batch Normalization Layer)、ReLu激發層(ReLu Activation Function Layer)、最大池化層(Max Pooling Layer)等列入共享網路層51(如第0層共享網路層L0至第k層共享網路層Lk)。所以，由於有了共享特徵抽取模組50之共享網路層51(如第0層共享網路層L0至第k層共享網路層Lk)，讓多個獨立模型可以透過共享網路層51整合為一個模型，以使整體大小減少，亦即透過共享整合的單一模型大小會小於各獨立模型的總和大小。 After the shared feature extraction module 50 generates the feature map F of the shared network layer 51 (such as the 0th shared network layer L0), it can generate a deeper shared network layer (such as the 1st shared network layer) according to different situations. L1 to the k-th shared network layer Lk). For example, one or more specific task models 62 of the specific task feature extraction module group 60 shown in Figure 2 all use ResNet or ResNet system Column, such as ResNet18, ResNet34 or ResNet50, the shared feature extraction module 50 can combine the subsequent batch normalization layer (Batch Normalization Layer), ReLu activation layer (ReLu Activation Function Layer), maximum pooling layer (Max Pooling Layer), etc. Listed in the shared network layer 51 (for example, the 0th shared network layer L0 to the kth shared network layer Lk). Therefore, due to the shared network layer 51 of the shared feature extraction module 50 (such as the 0th layer shared network layer L0 to the kth layer shared network layer Lk), multiple independent models can pass through the shared network layer 51 Integrate into one model to reduce the overall size, that is, the size of a single model integrated through sharing will be smaller than the total size of each independent model.

共享特徵抽取模組50在計算完成所有的共享網路層51(如第0層共享網路層L0至第k層共享網路層Lk)後，可產生共享特徵圖52，以將共享特徵圖52輸入至第2圖所示特定任務特徵抽取模組群60之各個特定任務模型62。 The shared feature extraction module 50 can generate a shared feature map 52 after calculating all the shared network layers 51 (such as the 0th layer shared network layer L0 to the kth layer shared network layer Lk), so as to convert the shared feature map 52 is input to each specific task model 62 of the specific task feature extraction module group 60 shown in FIG. 2.

舉例而言，由於第3圖所示多範圍產生模組30的縮減與擴展尺寸的作法可以有效的囊括物件(如人臉/頭部/背景)之細微人臉質地特徵、頭部周圍特徵與背景特徵，因此人臉辨識模型、假臉辨識模型、性別辨識模型與年齡辨識模型等，皆是第2圖所示特定任務特徵抽取模組群60中適合共享的特定任務模型62。 For example, due to the reduction and expansion of the multi-range generation module 30 shown in Figure 3, the subtle facial texture features, head surrounding features, and features of objects (such as face/head/background) can be effectively included. The background features, therefore, the face recognition model, the fake face recognition model, the gender recognition model, and the age recognition model are all specific task models 62 suitable for sharing in the specific task feature extraction module group 60 shown in FIG. 2.

又如第1圖至第2圖所示，在特定任務特徵抽取模組群60之不同的特定任務特徵抽取模組61輸出(如平行輸出)不同的特定任務模型62的結果至跨任務應用服務模組70後，跨任務應用服務模組7O便可針對不同應用搭配不同的特定任務模型62來使用。例如，跨任務應用服務模組70可利用第一個特定任務模型62(如假臉辨識模型)先行篩選掉仿冒人臉，再使用第二個特定任務模型62(如人臉辨識模型)、第三個特定任務模型62(如性別辨識模型)與第四個特定任務模型62(如年齡辨識模型)的辨識結果，以避免系統遭受有心人士盜用。 As shown in Figures 1 to 2, different task-specific feature extraction modules 61 in the task-specific feature extraction module group 60 output (eg, output in parallel) the results of different task-specific models 62 to the cross-task application service After the module 70, the cross-task application service module 70 can be used with different specific task models 62 for different applications. For example, the cross-task application service module 70 can use the first specific task model 62 (such as a fake face recognition model) to filter out fake human faces first. Then use the recognition results of the second specific task model 62 (such as the face recognition model), the third specific task model 62 (such as the gender recognition model), and the fourth specific task model 62 (such as the age recognition model) to avoid The system was stolen by someone who wanted to.

綜上，本發明中共享多範圍特徵之多任務物件辨識系統可至少具有下列特色、優點或技術功效。 In summary, the multi-task object recognition system sharing multiple-range features in the present invention can at least have the following features, advantages, or technical effects.

一、本發明能解決在相同的物件但多任務應用時，面臨的多個獨立模型不易整合的問題。 1. The present invention can solve the problem that multiple independent models are not easy to integrate when the same object is used in multi-task applications.

二、本發明之共享特徵抽取模組可具有共享網路層，能有效減少重複或不必要的網路層，以降低模型的大小與提升執行速度。 2. The shared feature extraction module of the present invention can have a shared network layer, which can effectively reduce repetitive or unnecessary network layers, so as to reduce the size of the model and increase the execution speed.

三、本發明可讓多個辨識任務之間得以共享此共享網路層，能減少整體計算複雜度，並讓多個辨識任務之間得以共享多範圍影像抽取出的特徵，提升辨識正確率。 3. The present invention allows the shared network layer to be shared among multiple identification tasks, reduces the overall computational complexity, and allows multiple identification tasks to share features extracted from multi-range images, thereby improving the accuracy of identification.

四、本發明可提供多範圍影像疊合、跨任務共享特徵與可抽換的特徵抽取群等技術，利用這些技術可使相同的物件但多個辨識任務的應用易於整合為單一模型，能有效降低模型的大小與預測時間，亦能增加應用的彈性。 4. The present invention can provide technologies such as multi-range image overlay, cross-task sharing features, and interchangeable feature extraction groups. Using these technologies, applications of the same object but multiple identification tasks can be easily integrated into a single model, which is effective Reducing the size of the model and the prediction time can also increase the flexibility of the application.

五、本發明可利用多範圍共享特徵的技術，透過共享特徵抽取模組之共享網路層將相同的物件但多個辨識任務的應用整合為單一模型，以有效降低模型的大小與預測時間。 5. The present invention can utilize the technology of multi-range sharing of features to integrate the same object but multiple recognition task applications into a single model through the shared network layer of the shared feature extraction module, so as to effectively reduce the size of the model and the prediction time.

六、本發明利用多範圍(多通道)疊合與共享網路層的技術將模型需要的特徵整合為單一模型，使得模型的大小得以縮小，俾減少共享特徵抽取模組預測時的計算量及提升預測速度。 6. The present invention uses multi-range (multi-channel) overlay and shared network layer technology to integrate the features required by the model into a single model, so that the size of the model can be reduced, so as to reduce the amount of calculation and the shared feature extraction module prediction. Improve the speed of forecasting.

七、本發明將模型需要的特徵整合為單一模型下，仍然保有獨立模型的彈性，能依據不同應用抽換不同的特定任務特徵抽取模組，或者依據不同應用來客製化不同任務組合。 7. The present invention integrates the features required by the model into a single model, and still retains the flexibility of an independent model. Different specific task feature extraction modules can be exchanged according to different applications, or different task combinations can be customized according to different applications.

八、本發明之應用範疇相當廣泛，能用於各種影像物件之辨識或監控任務，例如人臉辨識、活體辨識、性別年齡辨識、人形辨識、車牌辨識、車輛辨識、影像監控、智慧零售等。同時，本發明可能應用之產品為例如刷臉差勤產品、智慧門禁產品、來客分析產品、電子圍籬產品等。 8. The application scope of the present invention is quite wide, and it can be used for various image object recognition or monitoring tasks, such as face recognition, living body recognition, gender and age recognition, human figure recognition, license plate recognition, vehicle recognition, image monitoring, smart retail, etc. At the same time, the products that the present invention may be applied to are, for example, facial cleaning products, smart access control products, visitor analysis products, electronic fence products, and the like.

上述實施形態僅例示性說明本發明之原理、特點及其功效，並非用以限制本發明之可實施範疇，任何熟習此項技藝之人士均能在不違背本發明之精神及範疇下，對上述實施形態進行修飾與改變。任何使用本發明所揭示內容而完成之等效改變及修飾，均仍應為申請專利範圍所涵蓋。因此，本發明之權利保護範圍，應如申請專利範圍所列。 The above embodiments are only illustrative of the principles, features and effects of the present invention, and are not intended to limit the scope of implementation of the present invention. Anyone familiar with the art can comment on the above without departing from the spirit and scope of the present invention. Modifications and changes to the implementation form. Any equivalent changes and modifications made using the content disclosed in the present invention should still be covered by the scope of the patent application. Therefore, the protection scope of the present invention should be as listed in the scope of the patent application.

10‧‧‧影像擷取模組 10‧‧‧Image capture module

20‧‧‧物件偵測模組 20‧‧‧Object Detection Module

30‧‧‧多範圍產生模組 30‧‧‧Multi-range generation module

50‧‧‧共享特徵抽取模組 50‧‧‧Shared feature extraction module

Claims

A multi-task object recognition system that shares multi-range features includes: a multi-range generation module that generates or provides multi-range information of the object's image; a multi-channel image merging module based on the multi-range generation module The generated or provided multi-range information of the object’s image is to sample multiple images of different areas from the object to merge the multiple images into a multi-channel image; a shared feature extraction model with a shared network layer The group is a multi-channel image merged from multiple images of different regions and ranges sampled by the multi-channel image merging module based on the multi-range information of the object's image of the multi-range generation module by using a neural network The shared features are extracted from the shared feature extraction module, where the shared network layer of the shared feature extraction module uses cores of different length and width. The multi-channel image formed by merging multiple images of different regions sampled by the information is convolved to generate the feature map of the shared network layer; and a specific task feature extraction module group, which has one or more different The specific task feature extraction module of the specific task feature extraction module group from the shared feature extraction module having the shared network layer for the multi-channel image merging module generates the object of the module according to the multi-range One or more different specific task features are extracted from the shared features extracted from the multi-channel image by combining multiple images from different regions and ranges sampled by the multi-range information of the image, so that the specific task feature extraction model One or more specific task models of the group output the recognition result of the image of the object according to the one or more different specific task characteristics.

The system described in item 1 of the scope of patent application uses the technology of multi-range sharing of features to integrate applications of the same object but multiple identification tasks into a single model through the shared network layer of the shared feature extraction module .

For example, the system described in item 1 of the scope of patent application further includes an image capture module and an object detection module, wherein the image capture module is used to capture images or image frames, and the object detection module The detection module detects the position information of the object from the image or image screen captured by the image capture module, so that the multi-range generation module uses the object detected by the object detection module The location information of produces the multi-range information of the image of the object.

For example, the system described in item 1 of the scope of patent application, wherein the multi-range generation module or the multi-range expansion method of the multi-channel image merging module varies according to different applications. If the focus is on the details of the object and Texture feature, the multi-range generation module or the multi-channel image merging module mainly focuses on multiple reduced-range images, and if it focuses on the appendages or background around the object, the multi-range generation module or the The multi-channel image merging module is based on multiple images with an expanded range.

For example, the system described in item 1 of the scope of patent application, in which, if the size or performance of the model of the shared network layer is emphasized, the multi-range generation module or the multi-channel image merging module overlaps less different ranges If the versatility of the shared network layer is emphasized, the multi-range generation module or the multi-channel image merging module overlaps the number of images of more different ranges.

For the system described in item 1 of the scope of patent application, the multi-range generation module captures multiple reduced or expanded image ranges, and the multi-channel image merging module is based on The multiple images captured by the multi-range generation module are superimposed on the dimensions of multiple channels to be integrated into a single input, so that the features included in the single input span multiple recognition tasks.

For example, in the system described in item 1 of the scope of patent application, the shared feature extraction module further includes a batch normalization layer, a ReLu excitation layer, or a maximum pooling layer in the shared network layer.

The system described in item 1 of the scope of patent application, wherein the shared feature extraction module generates a shared feature map after calculating all the shared network layers, so as to input the shared feature map to the specific task feature extraction module Each specific task model of the group.

For example, the system described in item 1 of the scope of patent application further includes a cross-task application service module that uses the one or more different specific task feature extraction modules of the specific task feature extraction module group to extract the One or more specific task features provide or execute cross-task application services.