TWI722383B

TWI722383B - Pre feature extraction method applied on deep learning

Info

Publication number: TWI722383B
Application number: TW108104306A
Authority: TW
Inventors: 許巍嚴; 李成軒
Original assignee: 國立中正大學
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2021-03-21
Also published as: TW202030651A

Abstract

A pre feature extraction method applied on deep learning includes: obtaining a first data; extracting a plurality of feature data of a first data by using at least two different feature extraction algorithms; constructing the feature data to a second data; extracting feature data of the second data by using a deep learning algorithms. Therefore, the efficiency and the accuracy of the deep learning algorithms can be enhanced.

Description

Pre-feature extraction method applied to deep learning

本發明係關於一種預特徵萃取方法；更特別言之，本發明係關於一種於進行深度學習前，預先進行特徵萃取之預特徵萃取方法。 The present invention relates to a pre-feature extraction method; more particularly, the present invention relates to a pre-feature extraction method that performs feature extraction in advance before performing deep learning.

人工智慧(AI)為電腦科學中，受到高度關注的一個領域，其係用於解決與人類智慧相關的問題，例如學習、解決問題和特徵辨識等。機器學習為實現人工智慧的一種途徑，亦即，以機器學習為手段解決人工智慧中的問題。機器學習在近30多年已發展為一門多領域交叉學科，涉及機率論、統計學、逼近論、凸分析、計算複雜性理論等多門學科。機器學習理論主要是設計和分析一些讓電腦可以自動「學習」的演算法。機器學習演算法是一類從資料中自動分析獲得規律，並利用規律對未知資料進行預測的演算法。習知機器學習(Machine Learning)之流程，大致包含輸入數據、特徵萃取、特徵選擇以及特徵分類等步驟。取得有用、效果良好之特徵，需要經各領域的專家對資料投入相當數量的分析及研究，了解資料特性，架構出合適之演算法，方可達到。因此，又可稱為特徵工程(Feature engineering)。 Artificial intelligence (AI) is a field of computer science that has received high attention. It is used to solve problems related to human intelligence, such as learning, problem solving, and feature recognition. Machine learning is a way to realize artificial intelligence, that is, to use machine learning as a means to solve artificial intelligence problems. Machine learning has developed into a multi-disciplinary interdisciplinary in the past 30 years, involving probabilistic theory, statistics, approximation theory, convex analysis, computational complexity theory and other disciplines. Machine learning theory is mainly about designing and analyzing some algorithms that allow computers to "learn" automatically. Machine learning algorithm is a kind of algorithm that automatically analyzes and obtains rules from data, and uses the rules to predict unknown data. The process of machine learning generally includes the steps of input data, feature extraction, feature selection, and feature classification. To obtain useful and effective features, experts in various fields need to invest a considerable amount of analysis and research on the data to understand Data characteristics, structure and proper algorithm can only be achieved. Therefore, it can also be called Feature engineering.

深度學習(Deep Learning)為機器學習的一個分支，其目的為將資料透過多個處理層(layer)中的線性或非線性轉換(linear or non-linear transform)，自動萃取出足以代表資料特性的特徵(feature)。深度學習具有自動萃取特徵(feature extraction)的能力，也被視為是一種特徵學習(Feature Learning，representation learning)，目前具有深厚的發展潛力，為可取代專家的特徵工程。 Deep Learning is a branch of machine learning. Its purpose is to automatically extract data that is sufficient to represent the characteristics of the data through linear or non-linear transforms in multiple processing layers. Feature. Deep learning has the ability to automatically extract features (feature extraction), and is also regarded as a feature learning (representation learning). It currently has deep development potential and is a feature engineering that can replace experts.

近年來，深度學習方法在建立預測模型上取得了重大進展。此進展來自於計算能力的突破與大數據。傳統的淺層學習方法中，使用者必須選取適當的指標，以協助系統快速學習到特徵。深度學習則利用大量資料，令系統可自動地學習重要的特徵。一般而言，深度學習應用於表情辨識上，輸入的資料都未加經過處理，而辨識結果則容易有準確率低且不穩定的狀況。 In recent years, deep learning methods have made significant progress in establishing predictive models. This progress comes from breakthroughs in computing power and big data. In traditional shallow learning methods, users must select appropriate indicators to help the system quickly learn features. Deep learning uses a large amount of data so that the system can automatically learn important features. Generally speaking, deep learning is applied to facial expression recognition, and the input data is not processed, and the recognition result is prone to have low accuracy and unstable conditions.

再者，習知於應用深度學習時，各種數位資料(包括：文字、聲音、影像、動畫等)係直接輸入至深度學習進行自動特徵萃取，因此無法廣泛對應各種情況進行判定。另一方面，遇到特殊情況時，需要特殊方法進行特徵萃取，若直接使用原始的深度學習，容易造成辨識結果不穩定，以及準確率低的情形。再者，常出現深度學習的層數增加，成本增加，惟效果卻不佳的狀況。 Furthermore, when applying deep learning, various digital data (including: text, sound, video, animation, etc.) are directly input to deep learning for automatic feature extraction, so it is not possible to make a wide range of judgments in response to various situations. On the other hand, when encountering special situations, special methods are required for feature extraction. If the original deep learning is used directly, it is easy to cause unstable recognition results and low accuracy. Moreover, the number of layers of deep learning often increases and the cost increases, but the effect is not good.

因此，仍有必要對現行應用深度學習的特徵萃取方法進行改良。 Therefore, it is still necessary to improve the current feature extraction methods that apply deep learning.

本發明係於進行深度學習演算法前，先以至少二種相異之特徵萃取演算法進行預特徵萃取。藉此，可增加深度學習演算法於特徵萃取之演算效率及準確率，並可減少深度學習演算法所使用之層數，降低使用成本。 The present invention uses at least two different feature extraction algorithms to perform pre-feature extraction before performing the deep learning algorithm. As a result, the calculation efficiency and accuracy of the deep learning algorithm in feature extraction can be increased, the number of layers used by the deep learning algorithm can be reduced, and the use cost can be reduced.

於本發明一實施方式中，揭示一種應用於深度學習之預特徵萃取方法，其包含：取得一第一數據資料；以至少二相異之特徵萃取演算法，同時對第一數據資料進行特徵萃取，以取得第一數據資料之複數特徵資料；將此些特徵資料匯集為一第二數據資料；以及透過一深度學習演算法對第二數據資料進行特徵萃取，以取得第二數據資料之複數特徵資料。 In one embodiment of the present invention, a pre-feature extraction method applied to deep learning is disclosed, which includes: obtaining a first data data; using at least two different feature extraction algorithms to perform feature extraction on the first data data at the same time , To obtain the plural feature data of the first data data; gather these feature data into a second data data; and perform feature extraction on the second data data through a deep learning algorithm to obtain the plural features of the second data data data.

於上述實施方式之應用於深度學習之預特徵萃取方法中，此些特徵萃取演算法為一臉部動作編碼系統特徵萃取方法、一局部二值模式特徵萃取方法或一方向梯度直方圖特徵萃取方法中任二者。 In the pre-feature extraction method applied to deep learning in the above embodiment, these feature extraction algorithms are a facial motion coding system feature extraction method, a local binary pattern feature extraction method, or a directional gradient histogram feature extraction method Either of them.

於上述實施方式之應用於深度學習之預特徵萃取方法中，深度學習演算法可為一深度神經網路演算法或一卷積神經網路演算法。 In the pre-feature extraction method applied to deep learning in the above embodiment, the deep learning algorithm can be a deep neural network algorithm or a convolutional neural network algorithm.

於上述實施方式之應用於深度學習之預特徵萃取方法中，第一數據資料可為一靜態影像、一文字、一圖形或一動態影像。 In the pre-feature extraction method applied to deep learning in the above embodiment, the first data data can be a static image, a text, a graphic, or a dynamic image.

於上述實施方式之應用於深度學習之預特徵萃取方法中，第二數據資料為一靜態影像、一文字、一圖形或一動態影像。 In the pre-feature extraction method applied to deep learning in the above embodiment, the second data data is a static image, a text, a graphic, or a dynamic image.

於上述實施方式之應用於深度學習之預特徵萃取方法中，此些特徵萃取演算法之數量為三以上。 In the pre-feature extraction method applied to deep learning in the foregoing embodiment, the number of these feature extraction algorithms is three or more.

於上述實施方式之應用於深度學習之預特徵萃取方法中，第二數據資料可為一多維度資料。 In the pre-feature extraction method applied to deep learning in the above embodiment, the second data data may be a multi-dimensional data.

201‧‧‧圖像 201‧‧‧Image

202‧‧‧圖像 202‧‧‧Image

211‧‧‧隱藏層 211‧‧‧Hidden layer

212‧‧‧隱藏層 212‧‧‧Hidden layer

213‧‧‧隱藏層 213‧‧‧Hidden layer

214‧‧‧輸入層 214‧‧‧Input layer

215‧‧‧輸出層 215‧‧‧Output layer

h1_0至h1_29‧‧‧神經元 h1_0 to h1_29‧‧‧neurons

h2_0至h2_19‧‧‧神經元 h2_0 to h2_19‧‧‧ neurons

h3_0至h3_9‧‧‧神經元 h3_0 to h3_9‧‧‧neuron

AU1至AU22‧‧‧形變單元 AU1 to AU22‧‧‧Deformation unit

S101、S102、S103、S104‧‧‧步驟 S101, S102, S103, S104‧‧‧Step

301‧‧‧卷積層 301‧‧‧Convolutional layer

302‧‧‧池化層 302‧‧‧Pooling layer

303‧‧‧Dropout層 303‧‧‧Dropout layer

304‧‧‧平坦層 304‧‧‧Flat layer

305‧‧‧啟動函數Dense Relu 305‧‧‧Start function Dense Relu

306‧‧‧啟動函數Dense SoftMax 306‧‧‧Start function Dense SoftMax

307‧‧‧全連結層 307‧‧‧Fully connected layer

第1圖係繪示依據本發明一實施例之應用於深度學習之預特徵萃取方法的流程示意圖；第2圖係繪示依據本發明另一實施例之臉部動作編碼系統特徵萃取方法之應用示意圖；第3圖係繪示依據本發明又一實施例之局部二值模式特徵萃取方法之應用示意圖；第4圖係繪示依據本發明再一實施例之方向梯度直方圖特徵萃取方法之應用示意圖；第5圖係繪示依據本發明一實施例之深度神經網路演算法之架構示意圖；第6圖係繪示依據本發明一實施例之卷積神經網路演算法之架構示意圖；第7圖係繪示於進行深度神經網路演算法前未進行預特徵萃取及已進行預特徵萃取的比較圖；以及第8圖係繪示於進行卷積神經網路演算法前未進行預特徵萃取及已進行預特徵萃取的比較圖。 Fig. 1 is a schematic flow diagram of a pre-feature extraction method applied to deep learning according to an embodiment of the present invention; Fig. 2 is an application of a feature extraction method of a facial motion coding system according to another embodiment of the present invention Schematic diagram; Figure 3 is a schematic diagram showing the application of the local binary mode feature extraction method according to another embodiment of the present invention; Figure 4 is a diagram showing the application of the feature extraction method of directional gradient histogram according to another embodiment of the present invention Schematic diagram; Figure 5 is a schematic diagram showing the architecture of a deep neural network algorithm according to an embodiment of the present invention; Figure 6 is a schematic diagram showing the architecture of a convolutional neural network algorithm according to an embodiment of the present invention; Figure 7 It is a comparison diagram showing the comparison between the pre-feature extraction and the pre-feature extraction before the deep neural network algorithm is performed; and Figure 8 is a comparison diagram showing the comparison of pre-feature extraction and pre-feature extraction before performing the convolutional neural network algorithm.

請參照第1圖。第1圖係繪示依據本發明一實施例之應用於深度學習之預特徵萃取方法的流程示意圖。本發明所揭示之一種應用於深度學習之預特徵萃取方法，包含步驟S101、步驟S102、步驟S103及步驟S104。步驟S101係取得一第一數據資料；步驟S102係以至少二相異之特徵萃取演算法，同時對第一數據資料進行特徵萃取，以取得第一數據資料之複數特徵資料；步驟S103係將此些特徵資料匯集為一第二數據資料；以及步驟S104係透過一深度學習演算法對第二數據資料進行特徵萃取，以取得第二數據資料之複數特徵資料。 Please refer to Figure 1. FIG. 1 is a schematic flowchart of a pre-feature extraction method applied to deep learning according to an embodiment of the present invention. A pre-feature extraction method applied to deep learning disclosed in the present invention includes step S101, step S102, step S103, and step S104. Step S101 is to obtain a first data data; Step S102 is to use at least two different feature extraction algorithms to simultaneously perform feature extraction on the first data data to obtain the plural feature data of the first data data; Step S103 is to use this These feature data are collected into a second data data; and step S104 is to perform feature extraction on the second data data through a deep learning algorithm to obtain plural feature data of the second data data.

於上述步驟S101中，第一數據資料可為一靜態影像、一文字、一圖形或一動態影像。於上述步驟S102中，係同時使用至少二以上相異之特徵萃取演算法對第一數據資料進行預先特徵萃取。更佳為使用三種以上相異之特徵萃取演算法對第一數據資料進行預先特徵萃取。一實施例中，此些特徵萃取演算法可為一臉部動作編碼系統特徵萃取方法、一局部二值模式特徵萃取方法或一方向梯度直方圖特徵萃取方法中任二者，或者三者皆使用。於上述步驟S103中，第二數據資料係由第一數據資料之複數特徵資料構成，第二數據資料相對應亦可為一靜態影像、一文字、一圖形或一動態影像。於上述步驟S104 中，深度學習演算法可為一深度神經網路演算法或一卷積神經網路演算法。 In the above step S101, the first data data can be a static image, a text, a graphic, or a dynamic image. In the above step S102, at least two different feature extraction algorithms are simultaneously used to perform pre-feature extraction on the first data. More preferably, three or more different feature extraction algorithms are used to perform feature extraction on the first data data in advance. In one embodiment, these feature extraction algorithms can be any two of a facial motion coding system feature extraction method, a local binary pattern feature extraction method, or a directional gradient histogram feature extraction method, or all three are used . In the above step S103, the second data data is composed of plural feature data of the first data data, and the second data data can correspondingly be a static image, a text, a graphic or a dynamic image. In the above step S104 Among them, the deep learning algorithm can be a deep neural network algorithm or a convolutional neural network algorithm.

請參照第2圖。第2圖係繪示依據本發明另一實施例之臉部動作編碼系統特徵萃取方法之應用示意圖；臉部動作編碼系統(Facial Action Coding System,FACS)為著名心理學家P.Ekman及W.Friesen教授所提出。於此套系統中，主要係依據人臉上肌肉的分佈，嘗試找出肉眼可區別的變化，並將人類的表情分解為多個對應臉部的基本形變單元(Action Units,AU)，並分析了此些形變單元的動作特徵、其所控制的主要區域以及與之相關的表情，同時提供對應各表情之圖片。於第2圖實施例中，列出了由AU1到AU22等22個形變單元。舉例而言，揚眉為AU1、皺眉為AU4、撅嘴為AU10等。此些對應臉部的形變單元，其組合可用以表達人類的任何臉部表情。 Please refer to Figure 2. Figure 2 is a schematic diagram showing the application of a facial action coding system feature extraction method according to another embodiment of the present invention; Facial Action Coding System (Facial Action Coding System, FACS) is a famous psychologist P.Ekman and W. Proposed by Professor Friesen. In this system, it is mainly based on the distribution of muscles on the human face, trying to find the changes that can be distinguished by the naked eye, and decomposing the human expression into multiple basic action units (AU) corresponding to the face, and analyzing The action characteristics of these deformation units, the main areas controlled by them, and the expressions related to them are analyzed, and pictures corresponding to the expressions are provided at the same time. In the embodiment in Figure 2, 22 deformation units from AU1 to AU22 are listed. For example, raised eyebrows are AU1, frowns are AU4, and pouts are AU10. The combination of these deformation units corresponding to the face can be used to express any facial expressions of human beings.

本發明之一實施例，係以FACS為導向，試圖辨識出人類表情的豐富變化，例如眉毛的上揚、嘴角的變化等，改進以往過於簡化的分類方式，使電腦的辨識更接近我們人類的認知，更符合真實世界的情況。由於表情的變化大多是靠著眼腈、眉毛與嘴巴的大小或位置變化，所以在特徵值的決定上就定義出12個特徵。 An embodiment of the present invention is based on FACS, trying to recognize the rich changes in human expressions, such as raising eyebrows, changes in the corners of the mouth, etc., to improve the previous over-simplified classification methods, and make computer recognition closer to our human perception , More in line with the real world situation. Since most of the changes in facial expressions depend on changes in the size or position of the eye nitrile, eyebrows, and mouth, 12 features are defined in the determination of feature values.

請參照第3圖。第3圖係繪示依據本發明又一實施例之局部二值模式特徵萃取方法之應用示意圖。局部二值模式(Local Binary Pattern、LBP)為一種用來描述圖像局部紋理特徵的算子；它具有旋轉不變性和灰度不變性等顯著的優點。它是首先由T.Ojala,M.Pietikäinen，和D.Harwood於1994年提出，用於局部紋理特徵萃取。第3圖實施例中，先將圖像201區隔為3×3像素區域，並取出各像素區域對應之像素點值。將中心區域之像素點值作為閾值，與其周圍鄰近區域之8個個像素點值進行比較。若周圍鄰近區域之像素點值大於或等於閾值，則將該鄰近區域之像素點位置定義為1，否則定義為0。接續，將已閾值化後的值分別與其對應位置像素點的權重相乘再相加計算的值即為LBP值，計算公式如下式(1)：

Please refer to Figure 3. FIG. 3 is a schematic diagram showing the application of the local binary mode feature extraction method according to another embodiment of the present invention. Local Binary Pattern (LBP) is an operator used to describe the local texture characteristics of an image; it has significant advantages such as rotation invariance and gray invariance. It was first proposed by T. Ojala, M. Pietikäinen, and D. Harwood in 1994 for local texture feature extraction. In the embodiment in FIG. 3, the image 201 is first divided into 3×3 pixel areas, and the pixel value corresponding to each pixel area is taken out. The pixel value in the central area is used as the threshold, and the value of 8 pixels in the surrounding area is compared. If the pixel value of the surrounding neighboring area is greater than or equal to the threshold, the pixel position of the neighboring area is defined as 1, otherwise it is defined as 0. Next, multiply the thresholded value with the weight of the pixel at the corresponding position and then add the calculated value to be the LBP value. The calculation formula is as follows (1):

請參照第4圖。第4圖係繪示依據本發明再一實施例之方向梯度直方圖特徵萃取方法之應用示意圖。方向梯度直方圖(Histogram of oriented gradient,HOG)是法國研究人員Dalal在2005年CVPR上提出的一種人體目標檢測的圖像描述方法，其係通過計算和統計圖像局部區域的梯度方向直方圖來構成特徵。如第4圖所示，首先將圖像202區隔成多個連通區域，稱為細胞單元。接續，取得細胞單元中，各像素點的梯度或邊緣的方向直方圖，最終再把此些直方圖組合起來，即可構成一特徵描述器。其優點為對圖像幾何和光學的形變皆能維持相當好的不變性。 Please refer to Figure 4. FIG. 4 is a schematic diagram of the application of the method for extracting features of the directional gradient histogram according to another embodiment of the present invention. Histogram of oriented gradient (HOG) is an image description method for human target detection proposed by French researcher Dalal on CVPR in 2005. It is calculated and counted by the histogram of the gradient direction of the local area of the image. Composition characteristics. As shown in Figure 4, the image 202 is first divided into multiple connected areas, called cell units. Next, obtain the histogram of the gradient or the direction of the edge of each pixel in the cell unit, and finally combine these histograms to form a feature descriptor. Its advantage is that it can maintain quite good invariance to image geometry and optical deformation.

請參照第5圖。第5圖係繪示依據本發明一實施例之深度神經網路演算法之架構示意圖。深度神經網路(Deep Neural Networks,DNN)通常係稱包含多個隱藏層(Hidden Layers)211、212、213的神經網路，如第5圖所示，其係採用了與神經網絡相似的分層架構。於一深度神經網路系統中，包含由一輸入層214、多個隱藏層211、212、213以及一輸出層215所組成之多層網絡。僅有相鄰之各層節點之間有連接，同一層以及跨層節點之間相互無連接，各層可以視為一logistic regression模型。此種分層架構，較接近人類大腦的結構。此外，DNN可使用反向傳播演算法進行訓練。本發明係利用三種相異之特徵萃取方法進行預特徵萃取，再由各自萃取出的特徵資料，組合成142維度的數據資料作為深度神經網路之特徵值輸入，隱藏層211分別包含30個神經元(h1_0至h1_29)、隱藏層212包含20個神經元(h2_0至h2_19)和隱藏層213包含10個神經元(h3_0至h3_9)，最終預測圖像為何種表情，亦即可分類出六個不同態樣之分類值1至分類值6。權重更新可以使用下式(2)進行隨機梯度下降法求解：

其中，w_i,j為所要得到的參數，η為學習率，c為代價函式，t為第t次參數更新。 Please refer to Figure 5. FIG. 5 is a schematic diagram of the architecture of a deep neural network algorithm according to an embodiment of the present invention. Deep Neural Networks (DNN) are usually referred to as neural networks that contain multiple

Hidden Layers

211, 212, and 213. Layer architecture. A deep neural network system includes a multi-layer network composed of an input layer 214, a plurality of

hidden layers

211, 212, and 213, and an output layer 215. Only the nodes of each adjacent layer are connected, and the nodes of the same layer and cross-layers are not connected to each other, and each layer can be regarded as a logistic regression model. This layered structure is closer to the structure of the human brain. In addition, DNNs can be trained using backpropagation algorithms. The present invention uses three different feature extraction methods to perform pre-feature extraction, and then combine the extracted feature data into 142-dimensional data as the feature value input of the deep neural network. The hidden layer 211 contains 30 nerves respectively. Element (h1_0 to h1_29), hidden layer 212 contains 20 neurons (h2_0 to h2_19) and hidden layer 213 contains 10 neurons (h3_0 to h3_9), the final prediction image is what kind of expression, you can also classify six The classification value 1 to the classification value 6 for different aspects. The weight update can be solved by the stochastic gradient descent method using the following formula (2):

Among them, w _{i, j} are the parameters to be obtained, η is the learning rate, c is the cost function, and t is the tth parameter update.

上述式(2)的函式的選擇與學習的類型(例如監督學習、無監督學習、增強學習)以及活化函數相關。 The selection of the function of the above formula (2) is related to the type of learning (for example, supervised learning, unsupervised learning, reinforcement learning) and activation function.

請續參照第6圖。第6圖係繪示依據本發明一實施例之卷積神經網路演算法之架構示意圖。卷積神經網路(Convolutional Neural Network,CNN)為最常見的深度學習網路架構之一，因為網路架構中的卷積層(Convolutional layer)及池化層(Pooling layer)強化了模式辨識(Pattern recognition)及相鄰資料間的關係，使卷積神經網路應用在影像、聲音等訊號類型的資料型態能得到很好的效果。本發明之卷積神經網路演算法之架構中，主要包含三層：即卷積層(convolutional layer)301、池化層(pooling layer)302以及全連結層(fully connected layer)307。 Please refer to Figure 6 again. FIG. 6 is a schematic diagram showing the architecture of a convolutional neural network algorithm according to an embodiment of the present invention. Convolutional Neural Network (CNN) is one of the most common deep learning network architectures, because the convolutional layer (Convolutional Neural Network) in the network architecture layer) and pooling layer (Pooling layer) strengthen the pattern recognition (Pattern recognition) and the relationship between adjacent data, so that the application of convolutional neural network to image, sound and other signal types of data types can get very good results . The architecture of the convolutional neural network algorithm of the present invention mainly includes three layers: a convolutional layer 301, a pooling layer 302, and a fully connected layer 307.

如第6圖所示，首先，利用三種相異特徵萃取方法進行預特徵萃取，再由各自取得之特徵資料，組合成14 x 10維數據資料作為卷積神經網路之特徵值輸入，接續於卷積層301中經由3 x 3卷積核(kernel)的運算後，可萃取出其相對應之特徵圖(feature map)。接續，於池化層302對特徵圖進行壓縮，一方面可使特徵圖變小以簡化網路計算複雜度；另一方面可進行特徵壓縮以提取主要特徵。此外，於經過卷積與池化後，加入Dropout層303，其主要目的為避免過度擬合。於深度學習的訓練過程中，Dropout層303將使每一批次回合(batch run)都依據機率丟棄一定比例的神經元不予計算，使得每一次訓練皆像在訓練不同的神經網路一樣。接續，使用一平坦層304，將特徵值轉為一維資料以供後續之全連結層307使用。接續，建立全連結層307中之隱藏層(請參照第5圖實施例)，指定其神經元數目為128個，啟動則使用啟動函數Dense Relu305。最終為輸出層(請參照第5圖實施例)，亦即輸出六個不同態樣之分類值1至分類值6，此時係使用啟動函數Dense SoftMax306啟動輸出層。 As shown in Figure 6, first, three different feature extraction methods are used for pre-feature extraction, and then the feature data obtained by each is combined into 14 x 10 dimensional data data as the feature value input of the convolutional neural network, followed by After the convolution layer 301 is operated by a 3 x 3 convolution kernel (kernel), its corresponding feature map can be extracted. Next, the feature map is compressed in the pooling layer 302. On the one hand, the feature map can be made smaller to simplify network calculation complexity; on the other hand, feature compression can be performed to extract main features. In addition, after convolution and pooling, the Dropout layer 303 is added, the main purpose of which is to avoid overfitting. During the deep learning training process, the Dropout layer 303 will discard a certain proportion of neurons in each batch run according to the probability without calculation, so that each training is like training a different neural network. Next, a flat layer 304 is used to convert the characteristic values into one-dimensional data for the subsequent fully connected layer 307 to use. Then, create a hidden layer in the fully connected layer 307 (please refer to the embodiment in Figure 5), specify the number of neurons to be 128, and use the activation function Dense Relu305 for activation. Finally, it is the output layer (please refer to the embodiment in Figure 5), that is, output six different patterns of classification value 1 to classification value 6. At this time, the startup function Dense SoftMax306 is used to start the output layer.

請參照第7圖。第7圖係繪示於進行深度神經網路演算法前未進行預特徵萃取及先進行預特徵萃取的比較圖。於此須先說明，本發明所稱之先進行預特徵萃取，係如第1圖實施例中之步驟S102，先對第一數據資料以至少二相異特徵萃取演算法取得第一數據資料之複數特徵資料，再將複數特徵資料匯集為第二數據資料，而深度神經網路演算法係對第二數據資料進行特徵萃取。另需提及的是，本發明一實施例中，將由至少二相異特徵萃取演算法所各自取得之第一數據資料之複數特徵資料，組合成多維度之第二數據資料，以作為深度學習演算法輸入之特徵值。此對於利用深度學習進行二維以上之數據資料(例如：圖像)之特徵萃取有相當大的助益。所稱之未進行預特徵萃取，係指直接以深度神經網路演算法對第一數據資料進行特徵萃取。第7圖中，以一圖像之表情辨識為展示例，須知本發明中所揭示之方法，並不限制於圖像之表情辨識。第7圖中，未進行預先特徵萃取而直接將圖像對應之數據資料輸入至DNN進行表情辨識時，經由驗證獲得之平均準確率84.6%，其中最高與最低準確率相差了13.76%，出現辨識率低、幅度大且不穩定的狀況。當使用前所述三種相異之特徵萃取演算法對數據資料進行預特徵萃取，再將取得之特徵資料輸入至DNN進行表情辨識時，所獲得之平均準確率為93.91%，提高了9.31%，明顯改善了準確率低的問題，而最高與最低的準確率相差6.45%，誤差值改善了將近一半，同樣解決了辨識率不穩定的問題。 Please refer to Figure 7. Figure 7 is a comparison diagram showing the comparison of pre-feature extraction and pre-feature extraction before performing the deep neural network algorithm. It should be explained here that the pre-feature extraction in the present invention is referred to as step S102 in the embodiment in FIG. 1. First, the first data data is obtained by using at least two different feature extraction algorithms to obtain the first data data. The complex feature data is then collected into the second data data, and the deep neural network algorithm is to perform feature extraction on the second data data. It should also be mentioned that in an embodiment of the present invention, the complex feature data of the first data data respectively obtained by at least two different feature extraction algorithms are combined into a multi-dimensional second data data for deep learning The characteristic value of the algorithm input. This is of great help to the use of deep learning to extract features of data (such as images) that are more than two-dimensional. The term “no pre-feature extraction” refers to the feature extraction of the first data directly using the deep neural network algorithm. In Figure 7, the expression recognition of an image is taken as an example. It should be noted that the method disclosed in the present invention is not limited to the expression recognition of the image. In Figure 7, when the data corresponding to the image is directly input into the DNN for facial expression recognition without pre-feature extraction, the average accuracy rate obtained through verification is 84.6%, of which the difference between the highest and lowest accuracy rates is 13.76%, and recognition occurs Low rate, large amplitude and unstable conditions. When using the aforementioned three different feature extraction algorithms to perform pre-feature extraction on the data, and then input the obtained feature data into DNN for expression recognition, the average accuracy obtained is 93.91%, which is an increase of 9.31%. The problem of low accuracy is obviously improved, and the difference between the highest and the lowest accuracy is 6.45%, and the error value is improved by nearly half, which also solves the problem of unstable recognition rate.

第8圖係繪示於進行卷積神經網路演算法前未進行預特徵萃取及先進行預特徵萃取的比較圖。類似的結果亦發生於卷積神經網路演算法中。未進行預特徵萃取而直接將圖像對應之數據資料輸入至CNN進行表情辨識時，經由驗證獲得之平均準確率為85.86%，其中最高與最低準確率相差了9.93%，同樣出現辨識率低、幅度大且不穩定的情況。當使用前所述三種相異之特徵萃取演算法對數據資料進行預特徵萃取，再將取得之特徵資料輸入至CNN進行表情辨識時，所獲得之平均準確率為97.91%，提高了12.05%，明顯改善了準確率低的問題，且比使用DNN所取得之效果來的好，而最高與最低的準確率相差2.48%，誤差值改善了將近3/4，解決了辨識率不穩定的問題，且其效果仍然比使用DNN來好。 Figure 8 shows the comparison of the pre-feature extraction and the pre-feature extraction before the convolutional neural network algorithm. Similar results also occur in the convolutional neural network algorithm. When the data corresponding to the image is directly input to CNN for facial expression recognition without pre-feature extraction, the average accuracy rate obtained through verification is 85.86%, of which the difference between the highest and the lowest accuracy rate is 9.93%, and the recognition rate is also low. A large and unstable situation. When using the aforementioned three different feature extraction algorithms to perform pre-feature extraction on the data, and then input the obtained feature data to CNN for facial expression recognition, the average accuracy obtained is 97.91%, which is an increase of 12.05%. The problem of low accuracy is obviously improved, and it is better than the effect achieved by using DNN, and the difference between the highest and lowest accuracy is 2.48%, and the error value is improved by nearly 3/4, which solves the problem of unstable recognition rate. And its effect is still better than using DNN.

由上述，本發明係提出一應用於深度學習之預特徵萃取方法，其係針對各式輸入之數據資料，首先進行各式各樣的預特徵萃取，接續將取得的特徵資訊串接成另一數據資料(如：影像或圖像)，再透過深度學習進行分類辨識。此方法由於輸入至深度學習前，已預先進行過特徵萃取的動作，因此，不管於特定環境、特定受測者或物體時，皆可過濾出特定且有用之特徵資訊，而後再透過深度學習演算法進行特徵再萃取與選擇，並進行分類辨識，最終獲得相較於原始僅使用深度學習演算法，準確率與穩定度提高之效果。另一方面，亦可降低深度學習的負擔，以達到層數少、成本低且效果佳的優勢。 Based on the above, the present invention proposes a pre-feature extraction method for deep learning. It first performs various pre-feature extractions for various input data, and then concatenates the acquired feature information into another Data (such as images or images) are classified and identified through deep learning. This method has already performed feature extraction before inputting to deep learning. Therefore, no matter in a specific environment, specific subject or object, specific and useful feature information can be filtered out, and then calculated through deep learning. The method performs feature extraction and selection, and classification and identification, and finally obtains the effect of improving accuracy and stability compared with the original deep learning algorithm only. On the other hand, it can also reduce the burden of deep learning to achieve the advantages of fewer layers, low cost and good results.

雖然本發明已以實施方式揭露如上，然其並非用以限定本發明，任何熟習此技藝者，在不脫離本發明的精神和範圍內，當可作各種的更動與潤飾，因此本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone familiar with the art will not depart from the spirit and spirit of the present invention. Within the scope, various changes and modifications can be made. Therefore, the scope of protection of the present invention shall be subject to those defined by the attached patent scope.

S101、S102、S103、S104‧‧‧步驟 S101, S102, S103, S104‧‧‧Step

Claims

A pre-feature extraction method applied to deep learning, comprising: obtaining a first data data from a face image; using at least two different feature extraction algorithms, and simultaneously performing feature extraction on the first data data to obtain The plural feature data of the first data data; the collection of the feature data into a second data data; and the feature extraction of the second data data through a deep learning algorithm to obtain the plural features of the second data data Data; wherein the feature extraction algorithms are any two of a facial motion coding system feature extraction method, a local binary pattern feature extraction method, or a directional gradient histogram feature extraction method; wherein the first data data is A static image, a text, a graphic or a dynamic image.

The pre-feature extraction method applied to deep learning as described in item 1 of the scope of patent application, wherein the deep learning algorithm is a deep neural network algorithm or a convolutional neural network algorithm.

The pre-feature extraction method applied to deep learning as described in item 1 of the scope of patent application, wherein the second data data is a static image, a text, a graphic or a dynamic image.

The pre-feature extraction method applied to deep learning as described in item 1 of the scope of patent application, wherein the number of the feature extraction algorithms is three or more.

The pre-feature extraction method applied to deep learning as described in item 1 of the scope of patent application, wherein the second data data is a multi-dimensional data.