TWI592897B

TWI592897B - Image Recognition Accelerator System

Info

Publication number: TWI592897B
Application number: TW106105547A
Authority: TW
Inventors: Chen-Jian Xu; Wei-Yan Wang; Shi-An Li; Wei-Zheng Pan; Yi-Xing Jian
Priority date: 2017-02-20
Filing date: 2017-02-20
Publication date: 2017-07-21
Also published as: TW201832178A

Description

Image recognition accelerator system

本發明是有關於一種影像辨識加速器系統，尤指一種將SIFT影像辨識演算法實現於FPGA上之影像辨識系統。The invention relates to an image recognition accelerator system, in particular to an image recognition system for implementing a SIFT image recognition algorithm on an FPGA.

近年來由於視覺感測器的進步以及影像技術的日漸成熟，影像辨識已經成為電腦視覺領域不可或缺的一環，其廣泛應用於軍事、工業、醫學領域等，如影像縫合(image stitching)、物體辨識(object recognition)、機器人地圖感知與導航(robotic mapping and navigation)、3D模型建立(3D modeling)、手勢辨識(gesture recognition)以及影像追蹤和動作比對(video tracking and match moving)等。In recent years, due to the advancement of visual sensors and the maturity of imaging technology, image recognition has become an indispensable part of the field of computer vision. It is widely used in military, industrial, medical fields, such as image stitching, objects. Object recognition, robotic map and navigation, 3D modeling, gesture recognition, and video tracking and match moving.

影像辨識主要將擷取到之影像進行特徵偵測，近十年來有許多影像特徵辨識演算法被提出，而其中最為知名的是David G. Lowe於1999年電腦視覺會議中提出之尺度特徵不變性轉換(Scale-invariant feature transform , SIFT) SIFT演算法主要是在影像上偵測特徵點，再賦予每個特徵點不同之高維度向量描述，如此一來，影像之間即可進行匹配，而相似的兩特徵向量點則會被比對出來，值得一提的是，SIFT演算法有將每個特徵點之方向考慮進去，所以也成功解決Harris角點偵測非rotation-invariant的問題，雖然SIFT在尺度以及視角旋轉改變下可以得到非常好的匹配結果，不過此演算法的缺點即是運算量非常龐大，導致整體之運算非常耗時，而無法達到即時運算之效果。Image recognition mainly uses the captured image for feature detection. In the past decade, many image feature recognition algorithms have been proposed, and the most well-known one is David G. Lowe's scale feature invariance proposed in the 1999 Computer Vision Conference. Scale-invariant feature transform (SIFT) The SIFT algorithm mainly detects feature points on images and assigns different high-dimensional vector descriptions to each feature point. Thus, images can be matched and similar. The two feature vector points will be compared. It is worth mentioning that the SIFT algorithm takes the direction of each feature point into consideration, so it also successfully solves the problem of Harris corner detection non-rotation-invariant, although SIFT Very good matching results can be obtained under the scale and the rotation of the angle of view. However, the shortcoming of this algorithm is that the amount of computation is very large, which makes the overall operation very time consuming and cannot achieve the effect of real-time operation.

習知專利前案，例如中華民國TW201142718專利「用於在均勻及非均勻照明變化中改善特徵偵測的尺度空間正規化技術」，係一種關於用於改善影像辨識系統之效能效率的方法及技術。其特徵方法是：包含：藉由獲取一影像之兩個不同經平滑版本之間的差而產生一尺度空間影像差；藉由將該尺度空間影像差除以該影像之一第三經平滑版本而產生一經正規化之尺度空間影像差，其中該影像之該第三經平滑版本係與該影像之該兩個不同經平滑版本中之最平滑者一樣平滑或比該最平滑者平滑；及使用該經正規化之尺度空間影像差以偵測該影像之一或多個特徵。唯上述之專利前案，未將每個特徵點之方向考慮進去，致使在視角旋轉改變下無法獲得好的匹配結果。A prior art patent, for example, the Republic of China TW201142718 "Scale Space Normalization Technique for Improving Feature Detection in Uniform and Non-Uniform Illumination Changes", a method and technique for improving the efficiency of an image recognition system . The method comprises: generating a scale spatial image difference by acquiring a difference between two different smoothed versions of an image; and dividing the scale spatial image difference by one of the third smoothed version of the image Generating a normalized scale-space image difference, wherein the third smoothed version of the image is as smooth as or smoother than the smoothest of the two different smoothed versions of the image; and The normalized scale space image difference is used to detect one or more features of the image. Only in the above patent case, the direction of each feature point is not taken into consideration, so that a good matching result cannot be obtained under the change of the viewing angle rotation.

近年來有一些研究將SIFT演算法實現於FPGA處理平台上，主要透過平行處理之概念來加快運算時間，如2008年Vanderlei Bonato提出以軟硬體協同設計的概念，將SIFT部分演算法於FPGA上使用硬體電路加速實現，Jianhui Wang也於2014年提出一種基於嵌入式系統特徵點偵測與匹配的架構，其結果顯示已經可以達到每秒處理60張影像，Jie Jiang也提出以FPGA全硬體架構實現SIFT偵測以及匹配演算法。In recent years, some studies have implemented the SIFT algorithm on the FPGA processing platform, mainly through the concept of parallel processing to speed up the computing time. For example, in 2008 Vanderlei Bonato proposed the concept of software and hardware collaborative design, the SIFT part of the algorithm on the FPGA. Using hardware to accelerate the implementation, Jianhui Wang also proposed an architecture based on embedded system feature point detection and matching in 2014. The results show that it can process 60 images per second. Jie Jiang also proposed to use FPGA full hardware. The architecture implements SIFT detection and matching algorithms.

然而以FPGA全硬體架構來實現SIFT演算法時，仍需運算指數函數、浮點數及大幅使用除法器邏輯閘，使得影像辨識耗費大量運算時間，而無法達到即時辨識之目的。However, when implementing the SIFT algorithm with the FPGA full hardware architecture, it is still necessary to calculate the exponential function, the floating point number and the large use of the divider logic gate, which makes the image recognition consume a lot of computing time, and can not achieve the purpose of instant identification.

本發明的目的在於提供一種影像辨識加速器系統，其中該影像金字塔建構模組，與該影像輸入模組耦接，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，用以克服習知技術在高斯模板運算時使用指數函數所產生的硬體浮點數及耗費大量運算成本之問題，以達到有效的提升系統效能之目的。An object of the present invention is to provide an image recognition accelerator system, wherein the image pyramid construction module is coupled to the image input module, and is configured to first find a plurality of Gaussian template mask parameters of different scales by software, and then pass through a plurality of The Gaussian filter module performs a plurality of convolution operations in parallel, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain a plurality of Gaussian images for overcoming the prior art in the Gaussian template. The calculation uses the hardware floating point number generated by the exponential function and the cost of a large amount of computing, so as to effectively improve the system performance.

為達上述目的，本發明提供一種影像辨識加速器系統，其包括: 一影像輸入模組，用以輸入一影像資料；一影像金字塔建構模組，與該影像輸入模組耦接，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，之後，再將所述複數個高斯影像兩兩輸入至一差分影像模組，進行高斯影像相減；一SIFT偵測模組，與該影像金字塔建構模組耦接，係對該差分影像模組輸出之影像資料經由一極值偵測模組及一不穩定特徵點偵測模組進行一極值偵測及一不穩定特徵點偵測運算，以判斷是否為穩定之特徵點，並將該極值偵測、該不穩定特徵點偵測之結果進行一及運算，並儲存至一先入先出暫存器；以及一SIFT描述模組，與該影像金字塔建構模組耦接，係用以對該等高斯濾波器模組輸出之該些高斯影像經由一一階偏微分矩陣模組以及一CORDIC模組進行運算，以求出所有影像點之梯度資料，再以一影像梯度直方圖統計模組及一正規化運算模組對該梯度資料進行運算，以求出該特徵點之描述子資料後，並與該特徵點之位置資料進行結合，俾以提供一即時影像辨識功能。To achieve the above objective, the present invention provides an image recognition accelerator system, comprising: an image input module for inputting an image data; and an image pyramid construction module coupled to the image input module, which is pre-software Finding a plurality of Gaussian template mask parameters of different scales, and performing a plurality of convolution operations in parallel through the plurality of Gaussian filter modules, wherein each of the convolution operations is performed according to the image data and a mask parameter Obtaining a plurality of Gaussian images, and then inputting the plurality of Gaussian images into a differential image module to perform Gaussian image subtraction; and a SIFT detecting module coupled to the image pyramid construction module The image data outputted by the differential image module is subjected to an extreme value detection and an unstable feature point detection operation via an extreme value detection module and an unstable feature point detection module to determine whether Stabilizing the feature point, and performing the extreme value detection, the result of the unstable feature point detection, and storing the result to a first-in first-out register; and a SIFT description module, and The image pyramid construction module is coupled to calculate the Gaussian images output by the Gaussian filter modules via a first-order partial differential matrix module and a CORDIC module to obtain all image points. Gradient data is further calculated by an image gradient histogram statistical module and a normalized computing module to obtain the descriptor data of the feature point, and combined with the position data of the feature point.俾 to provide a real-time image recognition function.

本發明的另一目的在於提供一種影像辨識加速器系統，其中該極值偵測模組係包括一極大值偵測電路及一極小值偵測電路，用以同時執行一極大值偵測及一極小值偵測，再將其輸出訊號經由一或閘進行一或運算，以求取一特徵點，該不穩定特徵點偵測模組進一步包括一一階偏微分矩陣模組、一海森矩陣模組、一海森反矩陣模組、一低對比度特徵偵測模組及一邊緣特徵偵測模組，再將該低對比度特徵偵測模組及該邊緣特徵偵測模組之輸出訊號經由一及閘進行一及運算。Another object of the present invention is to provide an image recognition accelerator system, wherein the extreme value detection module includes a maximum value detection circuit and a minimum value detection circuit for simultaneously performing a maximum value detection and a minimum The value detection, and then the output signal is subjected to an OR operation through a gate or a gate to obtain a feature point. The unstable feature point detection module further includes a first-order partial differential matrix module and a Hessian matrix module. a group, a Hessian anti-matrix module, a low-contrast feature detection module and an edge feature detection module, and then outputting the output signals of the low-contrast feature detection module and the edge feature detection module And the gate performs one operation.

該海森反矩陣模組係利用伴隨矩陣的方式計算出其伴隨矩陣及行列式之值後，輸出至該低對比度特徵偵測模組，並利用數值推導方式計算以取代複數個除法器之使用，以達到有效的提升系統效能之目的。The Heisen inverse matrix module calculates the value of the adjoint matrix and the determinant by using the adjoint matrix, and outputs the value to the low-contrast feature detection module, and uses the numerical derivation method to replace the use of the plurality of dividers. In order to achieve effective system performance.

本發明的又一目的在於提供一種影像辨識加速器系統，其中該正規化運算模組係在計算特徵點向量之正規化數值時，乘上一增益值後，使用右移運算，用以大幅減少除法器之使用，以達到有效的提升系統效能之目的。It is still another object of the present invention to provide an image recognition accelerator system, wherein the normalization operation module is configured to calculate a normalized value of a feature point vector, multiply a gain value, and use a right shift operation to greatly reduce the division. The use of the device to achieve effective system performance.

本發明的又一目的在於提供一種影像辨識加速器系統，其中該影像輸入模組、該影像金字塔建構模組、該SIFT偵測模組、該SIFT描述模組均由管線架構設計而成，並使用管線保持電路進行訊號等待，以使時序保持同步。Another object of the present invention is to provide an image recognition accelerator system, wherein the image input module, the image pyramid construction module, the SIFT detection module, and the SIFT description module are all designed and used by a pipeline architecture. The pipeline hold circuit waits for the signal to keep the timing synchronized.

為使貴審查委員能其進一步瞭解本發明之結構、特徵及其目的，茲附以圖示及較佳具體實施例之詳細說明如後。The detailed description of the drawings and the preferred embodiments are set forth in the accompanying drawings.

請參照圖1，其繪示本發明一較佳實施例之影像辨識加速器系統之組合示意圖。Please refer to FIG. 1 , which is a schematic diagram of a combination of an image recognition accelerator system according to a preferred embodiment of the present invention.

如圖1所示，本發明之影像辨識加速器系統，其包括：一影像輸入模組100；一影像金字塔建構模組200；一SIFT偵測模組300；以及一SIFT描述模組400。As shown in FIG. 1 , the image recognition accelerator system of the present invention comprises: an image input module 100; an image pyramid construction module 200; a SIFT detection module 300; and a SIFT description module 400.

其中，該影像輸入模組100，具有一影像輸入單元110與一一第一移位暫存器120，該影像輸入單元110係用以輸入一影像資料；該第一移位暫存器120與該影像輸入單元110耦接，係用以對該影像資料進行移位及暫存。The image input module 100 has an image input unit 110 and a first shift register 120, and the image input unit 110 is configured to input an image data; the first shift register 120 and The image input unit 110 is coupled to shift and temporarily store the image data.

該影像金字塔建構模組200，與該影像輸入模組100耦接，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組210平行進行複數個卷積運算(convolution operation)，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，之後，再將所述複數個高斯影像兩兩輸入至一差分影像模組220，進行高斯影像相減。另外，該影像金字塔建構模組200與該SIFT偵測模組300之間耦接有複數個第二移位暫存器230以對該影像金字塔建構模組200輸出之該差分影像模組220進行移位及暫存。The image pyramid construction module 200 is coupled to the image input module 100, and is configured to find a plurality of Gaussian template mask parameters of different scales in advance by software, and then perform multiple volumes in parallel through the plurality of Gaussian filter modules 210. a convolution operation, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain a plurality of Gauss images, and then input the plurality of Gauss images into one The differential image module 220 performs Gaussian image subtraction. In addition, a plurality of second shift registers 230 are coupled between the image pyramid construction module 200 and the SIFT detection module 300 to output the differential image module 220 output by the image pyramid construction module 200. Shift and temporary storage.

該等高斯濾波器模組 210進一步各具有一高斯遮罩值選擇模組 211、一乘法累加器模組 212、一並行加法器模組 213 以及一多工器模組 214。The Gaussian filter modules 210 further each have a Gaussian mask value selection module 211, a multiply accumulator module 212, a parallel adder module 213, and a multiplexer module 214.

該SIFT偵測模組300，與該影像金字塔建構模組200耦接，以對該差分影像模組220輸出之差分影像經由一極值偵測模組310及一不穩定特徵點偵測模組320進行一極值偵測及一不穩定特徵點偵測之運算，以判斷是否為穩定之特徵點，並將該極值偵測模組310、該不穩定特徵點偵測模組320之輸出經由一第二及閘329進行一及運算（AND），再將結果儲存至一先入先出暫存器330。The SIFT detection module 300 is coupled to the image pyramid construction module 200, and the differential image outputted by the differential image module 220 is passed through an extreme value detection module 310 and an unstable feature point detection module. The 320 performs an extreme value detection and an unstable feature point detection operation to determine whether the feature point is stable, and outputs the extreme value detection module 310 and the unstable feature point detection module 320. An AND operation is performed via a second AND gate 329, and the result is stored in a first-in first-out register 330.

該影像金字塔建構模組200與該SIFT描述模組400之間進一步耦接有一第三移位暫存器240，以對該等高斯濾波器模組210輸出之該些高斯影像進行移位及暫存。A third shift register 240 is further coupled between the image pyramid construction module 200 and the SIFT description module 400 to shift and temporarily display the Gauss images output by the Gaussian filter module 210. Save.

該SIFT描述模組400再對所接收的該高斯影像透過一一階偏微分矩陣模組321以及一CORDIC模組410進行運算，以求出所有影像點之梯度值及方向，該SIFT描述模組400並設置有一第四移位暫存器420，用以對該CORDIC模組410輸出之影像點之梯度值進行移位及暫存，再透過一影像梯度直方圖統計模組430及一正規化運算模組440進行運算，以求出該特徵點之描述子資料後，並與該特徵點之位置資料進行結合，即能得知哪些像素為特徵點及該特徵點位置之描述子資料，俾以提供一即時影像辨識功能。其中，該先入先出暫存器330係用以對該SIFT偵測模組300進行延時，等待該SIFT描述模組400輸出運算完成的訊號，如圖1箭頭所示，以使SIFT偵測模組300與該SIFT描述模組400的時序保持同步。The SIFT description module 400 further calculates the received Gauss image through the first-order partial differential matrix module 321 and a CORDIC module 410 to determine the gradient values and directions of all image points. The SIFT description module And a fourth shift register 420 is configured to shift and temporarily store the gradient value of the image point outputted by the CORDIC module 410, and then pass through an image gradient histogram statistical module 430 and a normalization. The operation module 440 performs an operation to obtain the descriptor data of the feature point, and combines with the position data of the feature point to know which pixels are the feature points and the descriptor data of the feature point position, To provide an instant image recognition function. The first in first out buffer 310 is used to delay the SIFT detection module 300, and waits for the SIFT description module 400 to output the completed signal, as shown by the arrow in FIG. 1 to enable the SIFT detection mode. Group 300 is synchronized with the timing of the SIFT description module 400.

該影像輸入模組100、該影像金字塔建構模組200、該SIFT偵測模組300、該SIFT描述模組400均係以現場可編程邏輯閘陣列（FPGA）實施。The image input module 100, the image pyramid construction module 200, the SIFT detection module 300, and the SIFT description module 400 are all implemented by a field programmable logic gate array (FPGA).

請參照圖2，其繪示本發明一較佳實施例之影像辨識加速器系統之高斯濾波器模組硬體架構示意圖。Please refer to FIG. 2 , which is a schematic diagram of a hardware architecture of a Gaussian filter module of an image recognition accelerator system according to a preferred embodiment of the present invention.

習知技術係不斷地透過高斯濾波器來建立連續尺度空間影像，為了解決其中產生的指數函數以及浮點數之運算問題，本發明使用軟體預先計算複數個不同尺度之高斯模板遮罩參數輸入硬體，將尺度可變高斯函數之方程式(1) <TABLE border="1" borderColor="#000000" width="_0002"><TBODY><tr><td> <img wi="231" he="70" file="02_image001.jpg" img-format="jpg"></img></td><td> (1) </td></tr></TBODY></TABLE>左移n位元可得方程式(2)， <TABLE border="1" borderColor="#000000" width="85%"><TBODY><tr><td><img wi="218" he="69" file="02_image003.jpg" img-format="jpg"></img></td><td> (2) </td></tr></TBODY></TABLE>The conventional technology continuously uses a Gaussian filter to establish a continuous-scale spatial image. In order to solve the problem of the exponential function and the floating point number generated therein, the present invention uses software to pre-calculate a plurality of Gaussian template mask parameters input with different scales. Body, the equation of the variable-scale Gaussian function (1) <TABLE border="1" borderColor="#000000" width="_0002"><TBODY><tr><td> <img wi="231" he="70" file="02_image001.jpg" img-format ="jpg"></img></td><td> (1) </td></tr></TBODY></TABLE> Left shift n bits to get equation (2), <TABLE border="1" borderColor="#000000" width="85%"><TBODY><tr><td><img wi="218" he="69" file="02_image003.jpg" img- Format="jpg"></img></td><td> (2) </td></tr></TBODY></TABLE>

其中（ x, y）為影像中像素點的坐標，σ為尺度空間因子。 Where ( x , y ) is the coordinate of the pixel in the image and σ is the scale space factor.

選擇該所述參數與原始影像 I( x, y)進行卷積運算，如方程式(3)， <TABLE border="1" borderColor="#000000" width="_0003"><TBODY><tr><td><img wi="247" he="48" file="02_image005.jpg" img-format="jpg"></img></td><td> (3) </td></tr></TBODY></TABLE>Select the parameter to convolve with the original image I ( x , y ), such as equation (3), <TABLE border="1"borderColor="#000000"width="_0003"><TBODY><tr><td><imgwi="247"he="48"file="02_image005.jpg"img-format="jpg"></img></td><td> (3) </td></ Tr></TBODY></TABLE>

將輸出之結果右移n位元，即可算出高斯影像，在精準度以及資源的考量下，本發明之較佳實施例為高斯遮罩值選擇模組 211之遮罩大小為7X7、計算位元數n=10。The Gaussian image can be calculated by shifting the output result by n bits to the right. Under the consideration of accuracy and resources, the preferred embodiment of the present invention has a mask size of 7×7 and a calculation bit of the Gaussian mask value selection module 211. The number of elements is n=10.

如圖2所示，該等高斯濾波器模組210會先判斷輸入之iGaussian_num訊號，將該等高斯濾波器模組210所需之所有遮罩參數值輸入至高斯遮罩值選擇模組211，再將該參數與7條線中之影像數值，於7個乘法累加器模組212以及一並行加法器模組213進行卷積運算，再使用一多工器模組214判斷iRead_en訊號，即可得知該輸出是否為有效值。在軟體預先計算時，已將算出之高斯模板放大2的10次方，並取整數輸入至該等高斯濾波器模組210中，因此判斷後需將該結果右移10位元進行輸出。As shown in FIG. 2, the Gaussian filter module 210 first determines the input iGaussian_num signal, and inputs all the mask parameter values required by the Gaussian filter module 210 to the Gaussian mask value selection module 211. The parameter is further convoluted with the image values of the seven lines in the seven multiply accumulator modules 212 and a parallel adder module 213, and then the multiplexer module 214 is used to determine the iRead_en signal. Know if the output is a valid value. When the software is pre-calculated, the calculated Gaussian template has been enlarged by 2 to the power of 10, and an integer is input to the Gaussian filter module 210. Therefore, the result is shifted to the right by 10 bits for output.

請參照圖3，其繪示本發明一較佳實施例之影像辨識加速器系統之差分金字塔示意圖。Please refer to FIG. 3, which is a schematic diagram of a differential pyramid of an image recognition accelerator system according to a preferred embodiment of the present invention.

如圖3所示，此處每層共有6張高斯影像(6個尺度值)，求出其對應之高斯模板，即可將原始影像不斷與不同尺度之高斯模板進行卷積運算，以得到一連續之高斯影像，計算完第一層後，將此層第三張高斯影像之長、寬各縮小一半，即將影像面積縮小四分之一，再使用剛算出之6個尺度高斯模板繼續經由該高斯濾波器模組210進行運算，並依照所需之層數不斷的重複此動作，即可建立出符合影像尺度越遠越模糊以及越小之概念，建構出連續尺度影像，本發明中選擇使用例如但不限為4層進行影像金字塔之實現，當建立完連續模糊化之高斯金字塔後，再將連續高斯影像兩兩輸入至一差分影像模組220，進行高斯影像相減，若每層有6張高斯影像，則每層會產生5張差分影像，待所有差分影像皆運算完成後，即建構出差分金字塔。As shown in Fig. 3, there are 6 Gaussian images (6 scale values) in each layer, and the corresponding Gaussian template is obtained, and the original image is continuously convoluted with Gauss templates of different scales to obtain a For the continuous Gaussian image, after calculating the first layer, the length and width of the third Gaussian image of the layer are reduced by half, that is, the image area is reduced by a quarter, and then the 6-scale Gaussian template just calculated is used to continue. The Gaussian filter module 210 performs the operation and repeats the action according to the required number of layers, thereby establishing a concept that the image is larger and more blurred and smaller according to the image scale, and constructs a continuous scale image, which is selected in the present invention. For example, the implementation of the image pyramid is not limited to four layers. After the continuous fuzzy Gaussian pyramid is created, the continuous Gaussian images are input to a differential image module 220 for Gaussian image subtraction, if each layer has For 6 Gaussian images, 5 differential images are generated for each layer. After all the differential images are calculated, the differential pyramid is constructed.

請參照圖4，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組硬體架構示意圖。Please refer to FIG. 4 , which is a schematic diagram of a hardware architecture of a SIFT detection module of an image recognition accelerator system according to a preferred embodiment of the present invention.

如圖4所示，該SIFT偵測模組300，與該影像金字塔建構模組200耦接，以對該差分影像模組220所輸出之差分影像資料透過該極值偵測模組310及不穩定特徵點偵測模組320進行極值偵測及不穩定特徵點偵測之運算，以判斷是否為穩定之特徵點，之後將該極值偵測模組310、該不穩定特徵點偵測模組320之輸出結果經由一第二及閘329進行一及運算。As shown in FIG. 4, the SIFT detection module 300 is coupled to the image pyramid construction module 200, and the differential image data outputted by the differential image module 220 is transmitted through the extreme value detection module 310. The stable feature point detection module 320 performs an extreme value detection and an unstable feature point detection operation to determine whether it is a stable feature point, and then the extreme value detection module 310 and the unstable feature point detection The output of the module 320 is subjected to a sum operation via a second AND gate 329.

其中，該極值偵測模組310進一步包括一極大值偵測電路312及一極小值偵測電路311，用以同時執行一最大值偵測及一最小值偵測，再將其輸出訊號經由一或閘313進行一或（OR）運算，即可得知該影像之像素值是否為鄰近點中之最大值或最小值，再經過一第一管線保持電路314進行訊號等待。The extreme value detecting module 310 further includes a maximum value detecting circuit 312 and a minimum value detecting circuit 311 for simultaneously performing a maximum value detection and a minimum value detection, and then outputting the signal through the maximum value. The OR gate 313 performs an OR operation to know whether the pixel value of the image is the maximum or minimum value of the adjacent points, and then waits for a signal through a first pipeline holding circuit 314.

該不穩定特徵點偵測模組320進一步包括一一階偏微分矩陣模組321、一海森矩陣模組322、一海森反矩陣模組324、一低對比度特徵偵測模組326、一邊緣特徵偵測模組325、一第二管線保持電路323及一第三管線保持電路327，再將該低對比度特徵偵測模組326及該第三管線保持電路327之輸出訊號經由一第一及閘328進行一及（AND）運算，即可得知該特徵點是否為穩定之特徵點。The unstable feature point detection module 320 further includes a first-order partial differential matrix module 321 , a Hessian matrix module 322 , a Hessian inverse matrix module 324 , a low contrast feature detection module 326 , and a The edge feature detecting module 325, a second pipeline holding circuit 323, and a third pipeline holding circuit 327, and then outputting the output signals of the low contrast feature detecting module 326 and the third pipeline holding circuit 327 through a first The AND gate 328 performs an AND operation to know whether the feature point is a stable feature point.

其中，由於該極值偵測模組310及該不穩定特徵點偵測模組320皆是管線架構設計而成，所以使用管線保持電路進行訊號等待。如圖4所示，進行該偵測極值模組310運算需時為4個時脈（clk）時間，而進行該不穩定特徵點偵測模組320運算需時為12個時脈時間，因此該極值偵測模組310之輸出結果需要8個時脈時間之資料保持，待該不穩定特徵點偵測模組320判斷結束後，再將兩者之訊號經由一第二及閘329進行一及運算，即能得知該點是否真的為特徵點，而且是穩定的特徵。Because the extreme value detection module 310 and the unstable feature point detection module 320 are all designed by the pipeline architecture, the pipeline holding circuit is used for signal waiting. As shown in FIG. 4, the detection of the extremum module 310 takes 4 clocks (clk) time, and the unstable feature detection module 320 takes 12 clock times. Therefore, the output of the extreme value detection module 310 requires data retention of 8 clock times. After the unstable feature point detection module 320 determines the end, the signals of the two are passed through a second gate 329. By performing an operation, it is known whether the point is really a feature point and is a stable feature.

請參照圖5，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組之極值偵測模組硬體架構示意圖。Please refer to FIG. 5 , which is a schematic diagram of the hardware architecture of the extreme value detection module of the SIFT detection module of the image recognition accelerator system according to a preferred embodiment of the present invention.

如圖所示，該極值偵測模組310進一步包括一極大值偵測電路312及一極小值偵測電路311，用以同時執行一最大值偵測及一最小值偵測，運算結束後分別輸出一個位元之obig_en及osmall_en訊號，再將obig_en及osmall_en訊號經由該或閘313進行一或運算，產生oextrema_en訊號。若結果為1，則該點為特徵點，若結果為0，則反之，並透過idval訊號以及odval訊號來判斷有效資料，若輸入之idval為1，則表示該資料為一有效資料，當運算結束會輸出一odval訊號。As shown in the figure, the extreme value detecting module 310 further includes a maximum value detecting circuit 312 and a minimum value detecting circuit 311 for simultaneously performing a maximum value detection and a minimum value detection. The obig_en and osmall_en signals of one bit are respectively output, and then the obig_en and osmall_en signals are subjected to an OR operation via the OR gate 313 to generate an oextrema_en signal. If the result is 1, the point is a feature point. If the result is 0, the opposite is true, and the valid data is judged by the idval signal and the odval signal. If the input idval is 1, it indicates that the data is a valid data. At the end, an odval signal is output.

請參照圖6，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組之一階偏微分矩陣模組硬體架構示意圖。Please refer to FIG. 6 , which is a schematic diagram of a hardware architecture of a step partial differential matrix module of a SIFT detection module of an image recognition accelerator system according to a preferred embodiment of the present invention.

如圖6所示，該一階偏微分矩陣模組321之硬體架構，係為運算方程式(4)之功能， <TABLE border="1" borderColor="#000000" width="_0005"><TBODY><tr><td><img wi="351" he="144" file="02_image007.jpg" img-format="jpg"></img></td><td> (4) </td></tr></TBODY></TABLE>As shown in FIG. 6, the hardware architecture of the first-order partial differential matrix module 321 is a function of the operation equation (4). <TABLE border="1" borderColor="#000000" width="_0005"><TBODY><tr><td><img wi="351" he="144" file="02_image007.jpg" img-format ="jpg"></img></td><td> (4) </td></tr></TBODY></TABLE>

該一階偏微分矩陣模組321進一步設置複數個第一暫存器321a，輸入該第二移位暫存器230中一特徵點之左右、上下以及前後之像素值，並將同方向之像素值進行相減，之後放入所述第一暫存器321a進行暫存，下一個時脈再將結果右移1個位元，意即將所述第一暫存器321a之值除以2，即可完成該特徵點之一階偏微分矩陣之運算。The first-order partial differential matrix module 321 further defines a plurality of first temporary registers 321a, and inputs pixel values of left and right, upper and lower, and front and rear of a feature point in the second shift register 230, and pixels in the same direction. The value is subtracted, and then placed in the first register 321a for temporary storage, and the next clock is shifted to the right by 1 bit, meaning that the value of the first register 321a is divided by 2. The operation of one order partial differential matrix of the feature point can be completed.

請參照圖7，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組之海森矩陣模組硬體架構示意圖。Please refer to FIG. 7 , which is a schematic diagram of a hardware architecture of a Hessian matrix module of a SIFT detection module of an image recognition accelerator system according to a preferred embodiment of the present invention.

方程式(5)為海森矩陣方程式， <TABLE border="1" borderColor="#000000" width="_0006"><TBODY><tr><td><img wi="210" he="86" file="02_image009.jpg" img-format="jpg"></img></td><td> (5) </td></tr></TBODY></TABLE>Equation (5) is the Hessian matrix equation. <TABLE border="1" borderColor="#000000" width="_0006"><TBODY><tr><td><img wi="210" he="86" file="02_image009.jpg" img-format ="jpg"></img></td><td> (5) </td></tr></TBODY></TABLE>

如圖7所示，該海森矩陣模組322之硬體架構，係為實現運算方程式(6)至方程式(11)之功能。 <TABLE border="1" borderColor="#000000" width="_0007"><TBODY><tr><td> <img wi="316" he="29" file="02_image011.jpg" img-format="jpg"></img></td><td> (6) </td></tr><tr><td> <img wi="329" he="29" file="02_image013.jpg" img-format="jpg"></img></td><td> (7) </td></tr><tr><td> <img wi="334" he="29" file="02_image015.jpg" img-format="jpg"></img></td><td> (8) </td></tr><tr><td> <img wi="441" he="48" file="02_image017.jpg" img-format="jpg"></img></td><td> (9) </td></tr><tr><td> <img wi="402" he="48" file="02_image019.jpg" img-format="jpg"></img></td><td> (10) </td></tr><tr><td> <img wi="402" he="48" file="02_image021.jpg" img-format="jpg"></img></td><td> (11) </td></tr></TBODY></TABLE>As shown in FIG. 7, the hardware architecture of the Hessian matrix module 322 is to implement the functions of equations (6) to (11). <TABLE border="1" borderColor="#000000" width="_0007"><TBODY><tr><td> <img wi="316" he="29" file="02_image011.jpg" img-format ="jpg"></img></td><td> (6) </td></tr><tr><td> <img wi="329" he="29" file="02_image013. Jpg" img-format="jpg"></img></td><td> (7) </td></tr><tr><td> <img wi="334" he="29" File="02_image015.jpg" img-format="jpg"></img></td><td> (8) </td></tr><tr><td> <img wi="441" He="48" file="02_image017.jpg" img-format="jpg"></img></td><td> (9) </td></tr><tr><td> <img Wi="402" he="48" file="02_image019.jpg" img-format="jpg"></img></td><td> (10) </td></tr><tr> <td> <img wi="402" he="48" file="02_image021.jpg" img-format="jpg"></img></td><td> (11) </td></ Tr></TBODY></TABLE>

該海森矩陣模組322進一步設置複數個暫存器322a，輸入第二位移暫存器230中一特徵點鄰近方向之像素值，計算其x方向、y方向、s方向、xy方向、xs方向以及ys方向之二階偏微分，將計算所得之6個運算值代入方程式(5)之海森矩陣，再輸出至一海森反矩陣模組324。其中，方程式(9)至方程式(11)之運算結果需要除以4，此處一樣使用將所述暫存器322a之值右移2位元進行實現。The Hessian matrix module 322 further defines a plurality of registers 322a, and inputs pixel values in a direction adjacent to a feature point in the second displacement register 230, and calculates x-direction, y-direction, s-direction, xy-direction, and xs direction. And the second-order partial differential of the ys direction, the calculated six operational values are substituted into the Hessian matrix of the equation (5), and then output to a Heisen inverse matrix module 324. The operation result of the equations (9) to (11) needs to be divided by 4, and the value of the register 322a is shifted to the right by 2 bits.

該海森反矩陣模組324係接收該海森矩陣模組322之運算結果，並依對應之位置進行輸入。為了有效的利用硬體平行處理之優勢，本發明使用伴隨矩陣進行方程式(12)及方程式(13)之反矩陣運算。 <TABLE border="1" borderColor="#000000" width="_0008"><TBODY><tr><td><img wi="316" he="153" file="02_image023.jpg" img-format="jpg"></img></td><td> (12) </td></tr><tr><td><img wi="321" he="29" file="02_image025.jpg" img-format="jpg"></img> <img wi="115" he="144" file="02_image027.jpg" img-format="jpg"></img></td><td> (13) </td></tr></TBODY></TABLE>The Heisen inverse matrix module 324 receives the operation result of the Hessian matrix module 322 and inputs it according to the corresponding position. In order to effectively utilize the advantages of hardware parallel processing, the present invention uses the adjoint matrix to perform the inverse matrix operations of equations (12) and (13). <TABLE border="1" borderColor="#000000" width="_0008"><TBODY><tr><td><img wi="316" he="153" file="02_image023.jpg" img-format ="jpg"></img></td><td> (12) </td></tr><tr><td><img wi="321" he="29" file="02_image025. Jpg" img-format="jpg"></img> <img wi="115" he="144" file="02_image027.jpg" img-format="jpg"></img></td>< Td> (13) </td></tr></TBODY></TABLE>

其中，方程式(12)部分係使用平行處理將矩陣內之9個運算式同時使用2個乘法器以及1個加法器或是減法器進行實現，方程式(13)部分係使用平行處理先計算出 d ₁ 至 d ₆ 之值，再將該些數值進行相減以及加總。 Where, equation (12) using a parallel processing system part 9 matrix calculation formula used simultaneously within two multipliers and an adder or subtractor be implemented, equation (13) using a parallel processing system to partially calculate d The values from ₁ to d ₆ are subtracted and summed.

習知技術在計算反矩陣時，需將該伴隨矩陣中每個元素除上矩陣之行列式，為避免在硬體中使用除法器而降低系統效能，本發明將計算出之伴隨矩陣及行列式之值輸出至該低對比度特徵偵測模組326，並利用數值推導公式的方式進行計算，以取代除法器之使用。In the calculation of the inverse matrix, the conventional technique needs to divide each element in the companion matrix by the determinant of the matrix. In order to avoid the use of the divider in the hardware to reduce the system performance, the present invention will calculate the adjoint matrix and the determinant. The value is output to the low-contrast feature detection module 326, and is calculated by using a numerical derivation formula to replace the use of the divider.

方程式(14)、方程式(15)為習知技術在判斷低對比度特徵之算式， <TABLE border="1" borderColor="#000000" width="_0009"><TBODY><tr><td><img wi="270" he="48" file="02_image029.jpg" img-format="jpg"></img></td><td> (14) </td></tr><tr><td><img wi="224" he="57" file="02_image031.jpg" img-format="jpg"></img></td><td> (15) </td></tr></TBODY></TABLE>Equation (14) and equation (15) are algorithms for judging low contrast features in the prior art. <TABLE border="1" borderColor="#000000" width="_0009"><TBODY><tr><td><img wi="270" he="48" file="02_image029.jpg" img-format ="jpg"></img></td><td> (14) </td></tr><tr><td><img wi="224" he="57" file="02_image031. Jpg" img-format="jpg"></img></td><td> (15) </td></tr></TBODY></TABLE>

本發明將方程式(14)左右式進行平方可得方程式(16)， <TABLE border="1" borderColor="#000000" width="_0010"><TBODY><tr><td><img wi="421" he="53" file="02_image033.jpg" img-format="jpg"></img></td><td> (16) </td></tr></TBODY></TABLE>The present invention squares the left and right equations of equation (14) to obtain equation (16). <TABLE border="1" borderColor="#000000" width="_0010"><TBODY><tr><td><img wi="421" he="53" file="02_image033.jpg" img-format ="jpg"></img></td><td> (16) </td></tr></TBODY></TABLE>

並將方程式(15)之反矩陣替換成伴隨矩陣之形式，如方程式(17)， <TABLE border="1" borderColor="#000000" width="_0011"><TBODY><tr><td><img wi="226" he="83" file="02_image035.jpg" img-format="jpg"></img></td><td> (17) </td></tr></TBODY></TABLE>And replace the inverse matrix of equation (15) with the form of the adjoint matrix, such as equation (17), <TABLE border="1" borderColor="#000000" width="_0011"><TBODY><tr><td><img wi="226" he="83" file="02_image035.jpg" img-format ="jpg"></img></td><td> (17) </td></tr></TBODY></TABLE>

將方程式(17) 代入方程式(16)，整理後可得方程式(18)、方程式(19)、方程式(20)。 <TABLE border="1" borderColor="#000000" width="_0012"><TBODY><tr><td><img wi="279" he="29" file="02_image037.jpg" img-format="jpg"></img></td><td> (18) </td></tr><tr><td><img wi="275" he="83" file="02_image039.jpg" img-format="jpg"></img></td><td> (19) </td></tr><tr><td><img wi="285" he="90" file="02_image041.jpg" img-format="jpg"></img></td><td> (20) </td></tr></TBODY></TABLE>Substituting equation (17) into equation (16), after finishing, equation (18), equation (19), and equation (20) are obtained. <TABLE border="1" borderColor="#000000" width="_0012"><TBODY><tr><td><img wi="279" he="29" file="02_image037.jpg" img-format ="jpg"></img></td><td> (18) </td></tr><tr><td><img wi="275" he="83" file="02_image039. Jpg" img-format="jpg"></img></td><td> (19) </td></tr><tr><td><img wi="285" he="90" File="02_image041.jpg" img-format="jpg"></img></td><td> (20) </td></tr></TBODY></TABLE>

再將此方程式(18)、方程式(19)及方程式(20)左右同乘，則可推出式方程式(21)、方程式(22)、方程式(23)，如此不須用使用除法器，也可以實現低對比度之判斷。 <TABLE border="1" borderColor="#000000" width="85%"><TBODY><tr><td> <img wi="347" he="44" file="02_image045.jpg" img-format="jpg"></img></td><td> (21) </td></tr><tr><td> <img wi="347" he="44" file="02_image047.jpg" img-format="jpg"></img></td><td> (22) </td></tr><tr><td> <img wi="283" he="53" file="02_image049.jpg" img-format="jpg"></img></td><td> (23) </td></tr></TBODY></TABLE>Then multiply this equation (18), equation (19), and equation (20) Then, equations (21), equations (22), and equations (23) can be derived, so that it is not necessary to use a divider, and a low contrast judgment can be realized. <TABLE border="1"borderColor="#000000"width="85%"><TBODY><tr><td><imgwi="347"he="44"file="02_image045.jpg" img- Format="jpg"></img></td><td> (21) </td></tr><tr><td><imgwi="347"he="44"file="02_image047.jpg"img-format="jpg"></img></td><td> (22) </td></tr><tr><td><imgwi="283"he="53 File==============

本發明之該低對比度特徵偵測模組326係與該一階偏微分矩陣模組321及該海森反矩陣模組324輸出之伴隨矩陣及行列式之值進行訊號連接，並將方程式(21)左右相乘1024(左移10位元)後，再與透過方程式(24)比對進行判斷。 <TABLE border="1" borderColor="#000000" width="_0014"><TBODY><tr><td> <img wi="251" he="48" file="02_image051.jpg" img-format="jpg"></img></td><td> (24) </td></tr><tr><td> <img wi="182" he="48" file="02_image053.jpg" img-format="jpg"></img></td><td> </td></tr><tr><td> <img wi="317" he="48" file="02_image055.jpg" img-format="jpg"></img></td><td> </td></tr><tr><td> <img wi="290" he="58" file="02_image057.jpg" img-format="jpg"></img></td><td> </td></tr></TBODY></TABLE>The low-contrast feature detection module 326 of the present invention is connected to the first-order partial differential matrix module 321 and the output matrix of the Heisen inverse matrix module 324 and the value of the determinant, and the equation (21) After multiplying left and right by 1024 (shifting 10 bits to the left), it is judged by comparison with equation (24). <TABLE border="1" borderColor="#000000" width="_0014"><TBODY><tr><td> <img wi="251" he="48" file="02_image051.jpg" img-format ="jpg"></img></td><td> (24) </td></tr><tr><td> <img wi="182" he="48" file="02_image053. Jpg" img-format="jpg"></img></td><td> </td></tr><tr><td> <img wi="317" he="48" file=" 02_image055.jpg" img-format="jpg"></img></td><td> </td></tr><tr><td> <img wi="290" he="58" file ="02_image057.jpg" img-format="jpg"></img></td><td> </td></tr></TBODY></TABLE>

本發明之該邊緣特徵偵測模組325係將一海森矩陣之運算結果，進行方程式(25)、方程式(26)之運算，以求得該海森矩陣之trace以及determinant之值， <TABLE border="1" borderColor="#000000" width="_0015"><TBODY><tr><td><img wi="269" he="38" file="02_image059.jpg" img-format="jpg"></img></td><td> (25) </td></tr><tr><td><img wi="260" he="46" file="02_image061.jpg" img-format="jpg"></img></td><td> (26) </td></tr></TBODY></TABLE>The edge feature detection module 325 of the present invention performs the operations of the equations (25) and (26) on the operation result of a Hessian matrix to obtain the values of the trace and determinant of the Hessian matrix. <TABLE border="1" borderColor="#000000" width="_0015"><TBODY><tr><td><img wi="269" he="38" file="02_image059.jpg" img-format ="jpg"></img></td><td> (25) </td></tr><tr><td><img wi="260" he="46" file="02_image061. Jpg" img-format="jpg"></img></td><td> (26) </td></tr></TBODY></TABLE>

再將兩數值進行方程式(27)之判斷， <TABLE border="1" borderColor="#000000" width="_0016"><TBODY><tr><td><img wi="228" he="40" file="02_image063.jpg" img-format="jpg"></img></td><td> (27) </td></tr></TBODY></TABLE>Then judge the two values by equation (27). <TABLE border="1" borderColor="#000000" width="_0016"><TBODY><tr><td><img wi="228" he="40" file="02_image063.jpg" img-format ="jpg"></img></td><td> (27) </td></tr></TBODY></TABLE>

若determinant大於0且右式之值大於左式值，則該模組會輸出該點不是邊緣特徵之訊號，反之，則該點為邊緣特徵。If the determinant is greater than 0 and the value of the right-form is greater than the left-form value, the module outputs a signal that the point is not an edge feature, and conversely, the point is an edge feature.

本發明之該SIFT描述模組400對所接收的該高斯影像經由該一階偏微分矩陣模組321進行一階偏微分矩陣運算，計算其出x方向以及y方向之變化量，再使用下列方程式(28)、方程式(29)計算其幅值以及方向，之後將結果輸出至一CORDIC模組410進行運算。 <TABLE border="1" borderColor="#000000" width="_0017"><TBODY><tr><td><img wi="422" he="29" file="02_image065.jpg" img-format="jpg"></img></td><td> (28) </td></tr><tr><td><img wi="259" he="48" file="02_image067.jpg" img-format="jpg"></img></td><td> (29) </td></tr></TBODY></TABLE>The SIFT description module 400 of the present invention performs a first-order partial differential matrix operation on the received Gauss image via the first-order partial differential matrix module 321 to calculate the amount of change in the x-direction and the y-direction, and then uses the following equation. (28) Equation (29) calculates its amplitude and direction, and then outputs the result to a CORDIC module 410 for calculation. <TABLE border="1" borderColor="#000000" width="_0017"><TBODY><tr><td><img wi="422" he="29" file="02_image065.jpg" img-format ="jpg"></img></td><td> (28) </td></tr><tr><td><img wi="259" he="48" file="02_image067. Jpg" img-format="jpg"></img></td><td> (29) </td></tr></TBODY></TABLE>

該CORDIC模組410之功能為計算平方相加開根號以及tan ^-1函數運算，需要輸入兩個變數 x ₀ 以及 y ₀ ，而迭代運算式為方程式(30)， <TABLE border="1" borderColor="#000000" width="_0018"><TBODY><tr><td><img wi="201" he="106" file="02_image069.jpg" img-format="jpg"></img></td><td> (30) </td></tr></TBODY></TABLE>The function of the CORDIC module 410 is to calculate the square root plus root number and the tan ^-1 function operation, and input two variables x ₀ and y ₀ , and the iterative expression is equation (30), <TABLE border="1"borderColor="#000000"width="_0018"><TBODY><tr><td><imgwi="201"he="106"file="02_image069.jpg"img-format="jpg"></Img></td><td> (30) </td></tr></TBODY></TABLE>

首先將一階偏微分之值 D _x、D _y 輸入至 x ₀ 、 y ₀ ，並透過不斷的迭代來求得函數之輸出，本發明使用10個迭代次數進行運算。考慮浮點數的問題，本發明將輸入值以及暫存器皆放大6個bits，所以輸出之平方相加開根號以及tan ^-1函數運算皆要右移6個位元，而平方相加開根號還要乘上一個值，此處使用定位小數運算實現。 First, the first-order partial differential values D _x and D _{y are} input to x ₀ and y ₀ , and the output of the function is obtained through continuous iteration. The present invention uses 10 iterations to perform the operation. Considering the problem of floating point number, the present invention amplifies the input value and the register by 6 bits, so the square of the output plus the root number and the tan ^-1 function operation are shifted to the right by 6 bits, and the squares are added. The root number is also multiplied by one The value, which is implemented here using a positioning fractional operation.

請一併參照圖8~9，其中，圖8繪示本發明一較佳實施例之影像辨識加速器系統之SIFT描述模組之影像梯度直方圖統計模組硬體架構示意圖；圖9繪示本發明一較佳實施例之影像辨識加速器系統之SIFT描述模組之八方位統計容器模組硬體架構示意圖。Referring to FIG. 8 to FIG. 8 , FIG. 8 is a schematic diagram showing the hardware structure of the image gradient histogram statistical module of the SIFT description module of the image recognition accelerator system according to a preferred embodiment of the present invention; A schematic diagram of a hardware architecture of an eight-azimuth statistical container module of a SIFT description module of an image recognition accelerator system according to a preferred embodiment of the present invention.

如圖8所示，該影像梯度直方圖統計模組430係利用該SIFT描述模組400設置的一第四移位暫存器420，對該CORDIC模組410運算出之影像點之梯度資料進行移位及暫存，再將各影像點之特徵點描述區分成16個子區域，並且統計所述子區域內之影像梯度直方圖，本發明係使用平行處理之架構同時使用16個八方位統計容器模組431進行16個子區域影像梯度直方圖統計。As shown in FIG. 8 , the image gradient histogram statistic module 430 performs a gradient shifting of the image points calculated by the CORDIC module 410 by using a fourth shift register 420 provided by the SIFT description module 400. Shifting and temporarily storing the feature point description of each image point into 16 sub-regions, and counting the image gradient histogram in the sub-area. The present invention uses a parallel processing architecture to simultaneously use 16 eight-dimensional statistical containers. The module 431 performs 16 sub-area image gradient histogram statistics.

如圖9所示，每45度為一方向，360度為8個方向，所述的16個子區域各需要統計8個方位的梯度幅值，本發明之八方位統計容器模組431係同時使用8個方位統計容器432~439進行計算。As shown in FIG. 9 , each of 45 degrees is one direction, 360 degrees is 8 directions, and each of the 16 sub-regions needs to count the gradient amplitudes of 8 directions. The eight-azimuth statistical container module 431 of the present invention is simultaneously used. The eight orientation statistical containers 432 to 439 are calculated.

習知技術之正規化運算係將128維的描述子資料進行加總時，必須同時使用128個乘法器以及除法器進行平行處理運算，使得系統效率大幅下降以及耗費大量邏輯單元。本發明之正規化運算模組440係在計算時，乘上一增益值，例如但不限為1023，使得正規化後之向量能以整數型態表示，再使用右移運算，即能求得正規化之值，用以大幅減少除法器之使用，以達到有效的提升系統效能之目的。The normalized computing system of the prior art sums up the 128-dimensional descriptor data, and must use 128 multipliers and dividers for parallel processing, which greatly reduces system efficiency and consumes a large number of logic units. The normalization operation module 440 of the present invention is multiplied by a gain value, for example, but not limited to 1023, so that the normalized vector can be represented by an integer type, and then the right shift operation can be used. The value of normalization is used to greatly reduce the use of the divider to achieve effective system performance.

經由本發明影像辨識加速器系統之實施，其具有影像金字塔建構模組，係預先以軟體找出複數個不同尺度之高斯模板遮罩參數，再透過複數個高斯濾波器模組平行進行複數個卷積運算，其中各所述卷積運算係依該影像資料與一所述遮罩參數進行，以獲得複數個高斯影像，以克服習知技術在高斯模板運算時使用指數函數所產生的硬體浮點數及耗費大量運算成本之問題；該海森反矩陣模組運算係利用伴隨矩陣的方式，將計算出之伴隨矩陣及行列式之值輸出至低對比度特徵偵測模組，並利用數值推導方式計算以取代複數個除法器之使用；該正規化運算模組係在計算特徵點向量之正規化數值時，乘上一增益值後，使用右移運算，用以大幅減少除法器之使用。藉由減少計算量與增進特徵點匹配正確率之方式，提升系統運算效能，以達到即時影像辨識之目的。因此，確實較習知之影像辨識系統具有進步性。Through the implementation of the image recognition accelerator system of the present invention, the image pyramid construction module is configured to find a plurality of Gaussian template mask parameters of different scales in advance by software, and then perform multiple convolutions in parallel through a plurality of Gaussian filter modules. The operation, wherein each of the convolution operations is performed according to the image data and a mask parameter to obtain a plurality of Gaussian images to overcome the hardware floating point generated by the prior art using the exponential function in the Gaussian template operation The number and the cost of a large amount of computing; the Heisen inverse matrix module computing system uses the accompanying matrix method to output the calculated matrix and determinant values to the low-contrast feature detection module, and uses the numerical derivation method. The calculation replaces the use of a plurality of dividers; when the normalized value of the feature point vector is calculated, the normalization value is multiplied by a gain value, and the right shift operation is used to greatly reduce the use of the divider. By reducing the amount of calculation and improving the correct matching rate of feature points, the system performance is improved to achieve the purpose of instant image recognition. Therefore, it is indeed more advanced than the conventional image recognition system.

本案所揭示者，乃較佳實施例，舉凡局部之變更或修飾而源於本案之技術思想而為熟習該項技藝之人所易於推知者，俱不脫本案之專利權範疇。The disclosure of the present invention is a preferred embodiment. Any change or modification of the present invention originating from the technical idea of the present invention and being easily inferred by those skilled in the art will not deviate from the scope of patent rights of the present invention.

綜上所陳，本案無論就目的、手段與功效，在在顯示其迥異於習知之技術特徵，且其首先發明合於實用，亦在在符合新型之專利要件，懇請　貴審查委員明察，並祈早日賜予專利，俾嘉惠社會，實感德便。In summary, this case, regardless of its purpose, means and efficacy, is showing its technical characteristics that are different from the conventional ones, and its first invention is practical and practical, and it is also in compliance with the new patent requirements. I will be granted a patent at an early date.

100‧‧‧影像輸入模組100‧‧‧Image input module

110‧‧‧影像輸入單元110‧‧‧Image input unit

120‧‧‧第一移位暫存器120‧‧‧First shift register

200‧‧‧影像金字塔建構模組200‧‧‧Image Pyramid Construction Module

210‧‧‧高斯濾波器模組210‧‧‧Gauss filter module

211‧‧‧高斯遮罩值選擇模組211‧‧‧Gaussian mask value selection module

212‧‧‧乘法累加器模組212‧‧‧Multiply accumulator module

213‧‧‧並行加法器模組213‧‧‧Parallel Adder Module

214‧‧‧多工器模組214‧‧‧Multiplexer Module

220‧‧‧差分影像模組220‧‧‧Differential Image Module

230‧‧‧第二移位暫存器230‧‧‧Second shift register

240‧‧‧第三移位暫存器240‧‧‧ Third shift register

300‧‧‧SIFT偵測模組300‧‧‧SIFT detection module

310‧‧‧極值偵測模組310‧‧‧Extreme Detection Module

311‧‧‧極小值偵測電路311‧‧‧minimum detection circuit

312‧‧‧極大值偵測電路312‧‧‧Maximum value detection circuit

313‧‧‧或閘313‧‧‧ or gate

314‧‧‧第一管線保持電路314‧‧‧First pipeline retention circuit

320‧‧‧不穩定特徵點偵測模組320‧‧‧Unstable feature point detection module

321‧‧‧一階偏微分矩陣模組321‧‧‧first-order partial differential matrix module

321a‧‧‧第一暫存器321a‧‧‧First register

322‧‧‧海森矩陣模組322‧‧‧Hessen Matrix Module

322a‧‧‧第二暫存器322a‧‧‧Second register

323‧‧‧第二管線保持電路323‧‧‧Second pipeline retention circuit

324‧‧‧海森反矩陣模組324‧‧‧Heisen inverse matrix module

325‧‧‧邊緣特徵偵測模組325‧‧‧Edge feature detection module

326‧‧‧低對比度特徵偵測模組326‧‧‧Low Contrast Feature Detection Module

327‧‧‧第三管線保持電路327‧‧‧ Third pipeline retention circuit

328‧‧‧第一及閘328‧‧‧First Gate

329‧‧‧第二及閘329‧‧‧Second Gate

330‧‧‧先入先出暫存器330‧‧‧First in, first out register

400‧‧‧SIFT描述模組400‧‧‧SIFT Description Module

410‧‧‧CORDIC模組410‧‧‧CORDIC module

420‧‧‧第四移位暫存器420‧‧‧4th shift register

430‧‧‧影像梯度直方圖統計模組430‧‧‧Image Gradient Histogram Statistics Module

431‧‧‧八方位統計容器模組431‧‧‧ Eight-dimensional statistical container module

432‧‧‧第一方位統計容器432‧‧‧First orientation statistical container

433‧‧‧第二方位統計容器433‧‧‧Second orientation statistical container

434‧‧‧第三方位統計容器434‧‧‧ Third-party statistical container

435‧‧‧第四方位統計容器435‧‧‧Four orientation statistical container

436‧‧‧第五方位統計容器436‧‧‧ fifth orientation statistical container

437‧‧‧第六方位統計容器437‧‧‧ Sixth orientation statistical container

438‧‧‧第七方位統計容器438‧‧‧ seventh orientation statistical container

439‧‧‧第八方位統計容器439‧‧‧ Eighth orientation statistical container

440‧‧‧正規化運算模組440‧‧‧Normalized computing module

圖1為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之組合示意圖。圖2為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之高斯濾波器模組硬體架構示意圖。圖3為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之差分金字塔示意圖。圖4為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組硬體架構示意圖。圖5為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組之極值偵測模組硬體架構示意圖。圖6為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組之一階偏微分矩陣模組硬體架構示意圖。圖7為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT偵測模組之海森矩陣模組硬體架構示意圖。圖8為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT描述模組之影像梯度直方圖統計模組架構示意圖。圖9為一示意圖，其繪示本發明一較佳實施例之影像辨識加速器系統之SIFT描述模組之八方位統計容器模組硬體架構示意圖。FIG. 1 is a schematic diagram showing the combination of an image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 2 is a schematic diagram showing a hardware architecture of a Gaussian filter module of an image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 3 is a schematic diagram showing a differential pyramid of an image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 4 is a schematic diagram showing the hardware architecture of the SIFT detection module of the image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 5 is a schematic diagram showing the hardware architecture of the extreme value detection module of the SIFT detection module of the image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 6 is a schematic diagram showing a hardware architecture of a step partial differential matrix module of a SIFT detection module of an image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 7 is a schematic diagram showing the hardware architecture of the Hessian matrix module of the SIFT detection module of the image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 8 is a schematic diagram showing the architecture of an image gradient histogram statistical module of a SIFT description module of an image recognition accelerator system according to a preferred embodiment of the present invention. FIG. 9 is a schematic diagram showing a hardware architecture of an eight-azimuth statistical container module of a SIFT description module of an image recognition accelerator system according to a preferred embodiment of the present invention.

100‧‧‧影像輸入模組 100‧‧‧Image input module

110‧‧‧影像輸入單元 110‧‧‧Image input unit

120‧‧‧第一移位暫存器 120‧‧‧First shift register

200‧‧‧影像金字塔建構模組 200‧‧‧Image Pyramid Construction Module

210‧‧‧高斯濾波器模組 210‧‧‧Gauss filter module

220‧‧‧差分影像模組 220‧‧‧Differential Image Module

230‧‧‧第二移位暫存器 230‧‧‧Second shift register

240‧‧‧第三移位暫存器 240‧‧‧ Third shift register

300‧‧‧SIFT偵測模組 300‧‧‧SIFT detection module

310‧‧‧極值偵測模組 310‧‧‧Extreme Detection Module

320‧‧‧不穩定特徵點偵測模組 320‧‧‧Unstable feature point detection module

321‧‧‧一階偏微分矩陣模組 321‧‧‧first-order partial differential matrix module

322‧‧‧海森矩陣模組 322‧‧‧Hessen Matrix Module

324‧‧‧海森反矩陣模組 324‧‧‧Heisen inverse matrix module

325‧‧‧邊緣特徵偵測模組 325‧‧‧Edge feature detection module

326‧‧‧低對比度特徵偵測模組 326‧‧‧Low Contrast Feature Detection Module

328‧‧‧第一及閘 328‧‧‧First Gate

329‧‧‧第二及閘 329‧‧‧Second Gate

330‧‧‧先入先出暫存器 330‧‧‧First in, first out register

400‧‧‧SIFT描述模組 400‧‧‧SIFT Description Module

410‧‧‧CORDIC模組 410‧‧‧CORDIC module

420‧‧‧第四移位暫存器 420‧‧‧4th shift register

430‧‧‧影像梯度直方圖統計模組 430‧‧‧Image Gradient Histogram Statistics Module

440‧‧‧正規化運算模組 440‧‧‧Normalized computing module

Claims

An image recognition accelerator system includes: an image input module for inputting an image data; and an image pyramid construction module comprising a plurality of Gaussian filter modules and a plurality of differential image modules, and the image input mode The group coupling is performed by using a software to calculate a plurality of Gaussian template mask parameters of different scales, and then performing a plurality of convolution operations in parallel through a plurality of Gaussian filter modules, wherein each of the convolution operations is based on the image data. Performing a mask parameter to obtain a plurality of Gaussian images, and then inputting the plurality of Gaussian images to the differential image module to perform Gaussian image subtraction; a SIFT detection module, and The image pyramid construction module is coupled to include an extreme value detection module and an unstable feature point detection module, wherein the image data outputted by the differential image module is passed through the extreme value detection module and the The stable feature point detection module performs an extreme value detection and an unstable feature point detection operation to determine whether it is a stable feature point, and the extreme value detection and the unstable feature point detection The result of the measurement is performed and stored in a first-in first-out register; and a SIFT description module includes a first-order partial differential matrix module and a CORDIC module coupled to the image pyramid construction module. The Gaussian images outputted by the Gaussian filter modules are operated by the first-order partial differential matrix module and the CORDIC module to obtain gradient data of all image points, and then an image gradient histogram The graph statistics module and a normalization operation module calculate the gradient data to obtain the descriptor data of the feature point, and combine with the location data of the feature point to provide a real-time image recognition function. .

The image recognition accelerator system of claim 1, wherein the image input module has an image input unit and a first shift register, and the first shift register is configured to The output image of the image input module is shifted and temporarily stored; the image pyramid construction module and the SIFT detection module are coupled with a plurality of second shift registers for outputting the differential image module The image data is shifted and temporarily stored; a third shift register is coupled between the image pyramid construction module and the SIFT description module for outputting the Gaussian images to the Gaussian filter module. Shifting and temporarily storing; the SIFT description module is provided with a fourth shift register for computing the image of the CORDIC module The gradient data of the points are shifted and temporarily stored.

The image recognition accelerator system according to claim 1, wherein when the software calculation is performed, the Gaussian template is calculated to be enlarged by 10 to the power of 10, and an integer input is taken to overcome the floating point operation problem.

The image recognition accelerator system of claim 1, wherein the extreme value detection module performs a maximum value detection and a minimum value detection simultaneously, and then performs an OR operation on the output signal to obtain The feature point detection module further includes a first-order partial differential matrix module, a Hessian matrix module, a Hessian inverse matrix module, a low-contrast feature detection module, and an edge feature. The detection module performs an operation on the output signals of the low-contrast feature detection module and the edge feature detection module.

The image recognition accelerator system according to claim 4, wherein the Heisen inverse matrix module calculates the value of the adjoint matrix and the determinant using the adjoint matrix method, and outputs the value to the low contrast feature detection module. It is calculated by numerical derivation to replace the use of a plurality of dividers.

The image recognition accelerator system of claim 1, wherein the first-in first-out register is used to delay the SIFT detection module, so that the SIFT detection module and the SIFT description module are The timing is kept in sync.

The image recognition accelerator system according to claim 1, wherein the normalization operation module is configured to calculate a normalized value of the feature point vector, multiply a gain value, and use a right shift operation to substantially reduce Use of the divider.

The image recognition accelerator system of claim 1, wherein the image input module, the image pyramid construction module, the SIFT detection module, and the SIFT description module are all designed by a pipeline architecture, and Use the pipeline hold circuit for signal waits to keep the timings in sync.

The image recognition accelerator system of claim 1, wherein the image input module, the image pyramid construction module, the SIFT detection module, and the SIFT description module are all field programmable logic gate arrays. (FPGA) implementation.