TW202001692A

TW202001692A - Framebuffer-less system and method of convolutional neural network

Info

Publication number: TW202001692A
Application number: TW107122430A
Authority: TW
Inventors: 楊得煒
Original assignee: 奇景光電股份有限公司
Priority date: 2018-06-29
Filing date: 2018-06-29
Publication date: 2020-01-01
Also published as: TWI696127B

Abstract

A framebuffer-less system of convolutional neural network (CNN) includes a region of interest (ROI) unit that extracts features, according to which a region of interest in an input image frame is generated; a convolutional neural network (CNN) unit that processes the region of interest of the input image frame to detect an object; and a tracking unit that compares the features extracted at different times, according to which the CNN unit selectively processes the input image frame.

Description

Convolutional neural network system and method without frame buffer

本發明係有關一種卷積神經網路(CNN)，特別是關於一種無訊框緩衝器的卷積神經網路系統。The invention relates to a convolutional neural network (CNN), in particular to a convolutional neural network system without frame buffer.

卷積神經網路(convolutional neural network, CNN)為人工神經網路(artificial neural network)的一種，可用於機器學習(machine learning)。卷積神經網路可應用於信號處理，例如影像處理及電腦視覺。Convolutional neural network (CNN) is a type of artificial neural network (artificial neural network), which can be used for machine learning. Convolutional neural networks can be applied to signal processing, such as image processing and computer vision.

第一圖顯示傳統卷積神經網路900的方塊圖，揭示於Li Du等人所提出的“用於物聯網的可重置串流之深卷積神經網路加速器(A Reconfigurable Streaming Deep Convolutional Neural Network Accelerator for Internet of Things)”，2017年8月，電機電子工程師學會(IEEE)電路與系統會刊(IEEE Transactions on Circuits and Systems)I：定期論文，其內容視為本說明書的一部份。卷積神經網路900包含緩衝組(buffer bank)91，其包含單埠的靜態隨機存取記憶體(SRAM)，用以儲存中間資料(intermediate data)且與訊框緩衝器(frame buffer)92交換資料，該訊框緩衝器92包含動態隨機存取記憶體(DRAM)，例如雙倍資料率同步動態隨機存取記憶體(DDR DRAM)，用以儲存整個影像訊框，供卷積神經網路操作之用。緩衝組91被分為二部分：輸入層與輸出層。卷積神經網路900包含行(column)緩衝器93，用以將緩衝組91的輸出重映射(remap)至卷積單元(convolution unit, CU)引擎陣列94。卷積單元引擎陣列94包含複數卷積單元以執行高度平行的卷積運算。卷積單元引擎陣列94包含預取(pre-fetch)控制器941，用以週期的從直接記憶體存取(direct memory access, DMA)控制器(未顯示)取得參數且更新卷積單元引擎陣列94的權重與偏壓值。卷積神經網路900還包含累積(accumulation)緩衝器95，具草稿(scratchpad)記憶體，用以儲存卷積單元引擎陣列94的部分卷積結果。累積緩衝器95包含最大池化(max pool)951以池化輸出層資料。卷積神經網路900包含指令解碼器96，用以儲存預存於訊框緩衝器92的命令。The first figure shows a block diagram of a traditional convolutional neural network 900, revealed in "Reconfigurable Streaming Deep Convolutional Neural Accelerator for Resetable Streaming for Internet of Things" proposed by Li Du et al. Network Accelerator for Internet of Things”, August 2017, IEEE Transactions on Circuits and Systems I: Periodical paper, the content of which is considered part of this manual. The convolutional neural network 900 includes a buffer bank 91, which includes a static random access memory (SRAM) for storing intermediate data and a frame buffer 92 To exchange data, the frame buffer 92 includes dynamic random access memory (DRAM), such as double data rate synchronous dynamic random access memory (DDR DRAM), for storing the entire image frame for the convolutional neural network Road operation. The buffer group 91 is divided into two parts: the input layer and the output layer. The convolutional neural network 900 includes a column buffer 93 to remap the output of the buffer group 91 to a convolution unit (CU) engine array 94. The convolution unit engine array 94 includes complex convolution units to perform highly parallel convolution operations. The convolution unit engine array 94 includes a pre-fetch controller 941 for periodically obtaining parameters from a direct memory access (DMA) controller (not shown) and updating the convolution unit engine array The weight and bias value of 94. The convolutional neural network 900 also includes an accumulation buffer 95 with scratchpad memory for storing partial convolution results of the convolution unit engine array 94. The accumulation buffer 95 includes a max pool 951 to pool output layer data. The convolutional neural network 900 includes an instruction decoder 96 for storing commands pre-stored in the frame buffer 92.

如第一圖所示的傳統卷積神經網路系統，訊框緩衝器包含動態隨機存取記憶體(DRAM)，例如雙倍資料率同步動態隨機存取記憶體(DDR DRAM)，用以儲存整個影像訊框，供卷積神經網路操作之用。舉例而言，解析度為320x240的影像訊框需佔用空間為320x240x8位元的訊框緩衝器。然而，雙倍資料率同步動態隨機存取記憶體(DDR DRAM)並不適用於低功率應用，例如穿戴式或物聯網(IoT)裝置。因此亟需提出一種新穎的卷積神經網路系統，以適用於低功率應用。As shown in the first picture of the traditional convolutional neural network system, the frame buffer contains dynamic random access memory (DRAM), such as double data rate synchronous dynamic random access memory (DDR DRAM), for storage The entire image frame is used for convolutional neural network operation. For example, an image frame with a resolution of 320x240 requires a frame buffer with a space of 320x240x8 bits. However, double data rate synchronous dynamic random access memory (DDR DRAM) is not suitable for low-power applications, such as wearable or Internet of Things (IoT) devices. Therefore, there is an urgent need to propose a novel convolutional neural network system for low power applications.

鑑於上述，本發明實施例的目的之一在於提出一種無訊框緩衝器的卷積神經網路系統。本實施例可使用簡易系統架構以執行卷積神經網路操作於高解析度影像訊框。In view of the above, one of the objectives of the embodiments of the present invention is to provide a convolutional neural network system without frame buffer. In this embodiment, a simple system architecture can be used to perform a convolutional neural network operation on a high-resolution image frame.

根據本發明實施例，無訊框緩衝器的卷積神經網路系統包含感興趣區域單元、卷積神經網路單元及追蹤單元。感興趣區域單元萃取特徵，據以產生輸入影像訊框的感興趣區域。卷積神經網路單元處理輸入影像訊框的感興趣區域以偵測物件。追蹤單元比較不同時間萃取的特徵，使得卷積神經網路單元據以選擇地處理輸入影像訊框。According to an embodiment of the present invention, a frameless buffer convolutional neural network system includes a region of interest unit, a convolutional neural network unit, and a tracking unit. The region of interest unit extracts the features, and generates the region of interest based on the input image frame. The convolutional neural network unit processes the region of interest in the input image frame to detect objects. The tracking unit compares the features extracted at different times, so that the convolutional neural network unit selectively processes the input image frame accordingly.

第二A圖顯示本發明實施例之無訊框緩衝器(framebuffer-less)的卷積神經網路(CNN)系統100的方塊圖，第二B圖顯示本發明實施例之無訊框緩衝器的卷積神經網路(CNN)方法200的流程圖。Figure 2A shows a block diagram of a framebuffer-less convolutional neural network (CNN) system 100 according to an embodiment of the present invention, and Figure 2B shows a frameless buffer-less convolutional neural network (CNN) system 100. Flow chart of the CNN method 200.

在本實施例中，無訊框緩衝器的卷積神經網路系統(以下簡稱系統)100可包含感興趣區域(region of interest, ROI)單元11，用以於輸入影像訊框中產生感興趣區域(步驟21)。由於本實施例之系統100不含訊框緩衝器，感興趣區域單元11可採用基於掃描線的技術與基於區塊的機制，用以於輸入影像訊框中找出感興趣區域。其中，輸入影像訊框分割為複數影像區塊，排列為矩陣形式，例如4x6影像區塊。In this embodiment, the frame buffer-free convolutional neural network system (hereinafter referred to as the system) 100 may include a region of interest (ROI) unit 11 for generating interest in the input image frame Area (step 21). Since the system 100 of this embodiment does not include a frame buffer, the region of interest unit 11 may use a scan line-based technology and a block-based mechanism to find the region of interest in the input image frame. Among them, the input image frame is divided into a plurality of image blocks, arranged in a matrix form, such as 4x6 image blocks.

在本實施例中，感興趣區域單元11產生基於區塊的特徵，據以決定每一影像區塊是否執行卷積神經網路(CNN)操作。第三圖顯示第二A圖之感興趣區域單元11的細部方塊圖。在本實施例中，感興趣區域單元11可包含特徵萃取器111，例如用以從輸入影像訊框中萃取淺特徵(shallow feature)。於一例子中，特徵萃取器111根據基於區塊的直方圖(histogram)以產生區塊的(淺)特徵。於另一例子中，特徵萃取器111根據頻率分析以產生區塊的(淺)特徵。In this embodiment, the region-of-interest unit 11 generates block-based features to determine whether each image block performs a convolutional neural network (CNN) operation. The third diagram shows a detailed block diagram of the region of interest unit 11 of the second diagram A. In this embodiment, the region of interest unit 11 may include a feature extractor 111, for example, to extract shallow features from the input image frame. In one example, the feature extractor 111 generates (shallow) features of a block based on a block-based histogram. In another example, the feature extractor 111 generates a (shallow) feature of the block according to frequency analysis.

感興趣區域單元11還可包含分類器112，例如支援向量機(support vector machine, SVM)，用以決定輸入影像訊框之每一區塊是否執行卷積神經網路操作。藉此，可產生決定圖(decision map)12，其包含代表輸入影像訊框的複數區塊(其可排列為矩陣形式)。第四A圖例示決定圖12，其包含4x6區塊，其中X表示相關區塊不需執行卷積神經網路操作，C表示相關區塊需執行卷積神經網路操作，且D表示相關區塊偵測到物件(例如一隻狗)。藉此，可決定感興趣區域並執行卷積神經網路操作。The region of interest unit 11 may further include a classifier 112, such as a support vector machine (SVM), to determine whether each block of the input image frame performs a convolutional neural network operation. In this way, a decision map 12 can be generated, which includes a plurality of blocks representing the input image frame (which can be arranged in a matrix form). The fourth diagram A illustrates the decision diagram 12, which includes 4x6 blocks, where X indicates that the relevant block does not need to perform a convolutional neural network operation, C indicates that the related block needs to perform a convolutional neural network operation, and D indicates the relevant area The block detected an object (such as a dog). With this, the region of interest can be determined and convolutional neural network operations can be performed.

參閱第二B圖，系統100可包含暫存器13，例如靜態隨機存取記憶體(SRAM)，用以儲存(感興趣區域單元11之)特徵萃取器111所產生的(淺)特徵(步驟22)。第五圖顯示第二A圖之暫存器13的細部方塊圖。在本實施例中，暫存器13可包含二個特徵圖(feature map)，亦即，第一特徵圖131A，用以儲存前一影像訊框(於前一時間t-1)的特徵；及第二特徵圖131B，用以儲存目前影像訊框(於目前時間t)的特徵。暫存器13還可包含滑動視窗(sliding window)132，其大小可為40x40x8位元，用以儲存輸入影像訊框的一區塊。Referring to FIG. 2B, the system 100 may include a register 13, such as a static random access memory (SRAM), for storing (shallow) features generated by the feature extractor 111 (of the region of interest 11) (step twenty two). The fifth figure shows a detailed block diagram of the register 13 of the second figure A. In this embodiment, the register 13 may include two feature maps, that is, the first feature map 131A, for storing the features of the previous image frame (at the previous time t-1); And the second feature map 131B is used to store the features of the current image frame (at the current time t). The register 13 may further include a sliding window 132, which may be 40x40x8 bits in size, and is used to store a block of the input image frame.

參閱第二A圖，本實施例之系統100可包含卷積神經網路(CNN)單元14，其接收並處理(感興趣區域單元11)所產生之輸入影像訊框的感興趣區域，以偵測物件(步驟23)。其中，本實施例之卷積神經網路單元14僅於感興趣區域執行，而非如具訊框緩衝器之傳統系統係執行於整個輸入影像訊框。Referring to FIG. 2A, the system 100 of this embodiment may include a convolutional neural network (CNN) unit 14 that receives and processes the region of interest of the input image frame generated by (region of interest unit 11) to detect Measure the object (step 23). Among them, the convolutional neural network unit 14 of this embodiment is only executed in the region of interest, rather than the conventional system with a frame buffer, which is executed on the entire input image frame.

第六圖顯示第二A圖之卷積神經網路單元14的細部方塊圖。其中，卷積神經網路單元14可包含卷積單元141，其包含複數卷積引擎(convolution engine)，用以執行卷積操作。卷積神經網路單元14可包含激勵(activation)單元142，當偵測到預設特徵時，可執行激勵功能。卷積神經網路單元14還可包含池化(pooling)單元143，用以對輸入影像訊框執行降低取樣率(down-sampling)或池化(pooling)。The sixth figure shows a detailed block diagram of the convolutional neural network unit 14 of the second figure A. The convolutional neural network unit 14 may include a convolution unit 141, which includes a complex convolution engine for performing convolution operations. The convolutional neural network unit 14 may include an activation unit 142, which may perform an activation function when a predetermined feature is detected. The convolutional neural network unit 14 may further include a pooling unit 143 for performing down-sampling or pooling on the input image frame.

本實施例之系統100可包含追蹤單元15，用以比較(前一影像訊框之)第一特徵圖131A與(目前影像訊框之)第二特徵圖131B，接著更新決定圖12(步驟24)。追蹤單元15分析第一特徵圖131A與第二特徵圖131B之間的內容變化。第四B圖例示另一決定圖12，其更新於第四A圖之後。在這個例子中，於前一時間，位於第5~6行與第3列之區塊有偵測到物件(如第四A圖所標示的D)，但於目前時間，該物件消失(如第四B圖所標示的X)。據此，卷積神經網路單元14不需針對無特徵變化的區塊執行卷積神經網路操作。換句話說，卷積神經網路單元14選擇地針對具特徵變化的區塊執行卷積神經網路操作。因此，系統100可大量地加速操作。The system 100 of this embodiment may include a tracking unit 15 for comparing the first feature map 131A (of the previous image frame) with the second feature map 131B (of the current image frame), and then updating the decision map 12 (step 24 ). The tracking unit 15 analyzes the content change between the first feature map 131A and the second feature map 131B. The fourth diagram B illustrates another decision diagram 12, which is updated after the fourth diagram A. In this example, at the previous time, an object was detected in the blocks located in rows 5-6 and column 3 (as indicated by D in Figure 4A), but at the current time, the object disappeared (as X marked in Figure 4B). According to this, the convolutional neural network unit 14 does not need to perform convolutional neural network operations on blocks with no feature changes. In other words, the convolutional neural network unit 14 selectively performs convolutional neural network operations on blocks with characteristic changes. Therefore, the system 100 can greatly speed up operations.

相較於傳統卷積神經網路系統，上述實施例之卷積神經網路操作可大量降低(且加速)。此外，由於本發明實施例不需訊框緩衝器，本實施例可較佳適用於低功率應用，例如穿戴式或物聯網(IoT)裝置。對於解析度為320x240且(非重疊)滑動視窗大小為40x40的影像訊框，具訊框緩衝器的傳統系統需要8x6滑動視窗以執行卷積神經網路操作。相反的，本實施例之系統100僅需很少(小於10)的滑動視窗以執行卷積神經網路操作。Compared with the traditional convolutional neural network system, the operation of the convolutional neural network in the above embodiment can be greatly reduced (and accelerated). In addition, since the embodiment of the present invention does not require a frame buffer, this embodiment can be preferably applied to low-power applications, such as wearable or Internet of Things (IoT) devices. For image frames with a resolution of 320x240 and a (non-overlapping) sliding window size of 40x40, conventional systems with frame buffers require 8x6 sliding windows to perform convolutional neural network operations. Conversely, the system 100 of this embodiment requires only a few (less than 10) sliding windows to perform convolutional neural network operations.

以上所述僅為本發明之較佳實施例而已，並非用以限定本發明之申請專利範圍；凡其它未脫離發明所揭示之精神下所完成之等效改變或修飾，均應包含在下述之申請專利範圍內。The above are only the preferred embodiments of the present invention and are not intended to limit the scope of the patent application of the present invention; all other equivalent changes or modifications made without departing from the spirit of the invention should be included in the following Within the scope of patent application.

100‧‧‧無訊框緩衝器的卷積神經網路系統11‧‧‧感興趣區域單元111‧‧‧特徵萃取器112‧‧‧分類器12‧‧‧決定圖13‧‧‧暫存器131A‧‧‧第一特徵圖131B‧‧‧第二特徵圖132‧‧‧滑動視窗14‧‧‧卷積神經網路單元141‧‧‧卷積單元142‧‧‧激勵單元143‧‧‧池化單元15‧‧‧追蹤單元200‧‧‧無訊框緩衝器的卷積神經網路方法21‧‧‧於輸入影像訊框中產生感興趣區域22‧‧‧儲存特徵於特徵圖23‧‧‧處理感興趣區域以偵測物件24‧‧‧比較特徵並於具特徵變化的區塊執行卷積神經網路操作900‧‧‧卷積神經網路91‧‧‧緩衝組92‧‧‧訊框緩衝器93‧‧‧行緩衝器94‧‧‧卷積單元引擎陣列941‧‧‧預取控制器95‧‧‧累積緩衝器951‧‧‧最大池化96‧‧‧指令解碼器100‧‧‧Convolutional neural network system without frame buffer 11‧‧‧region of interest unit 111‧‧‧feature extractor 112‧‧‧ classifier 12‧‧‧decision diagram 13‧‧‧ scratch 131A‧‧‧first feature map 131B‧‧‧second feature map 132‧‧‧sliding window 14‧‧‧convolutional neural network unit 141‧‧‧convolution unit 142‧‧‧ excitation unit 143‧‧‧ pool Conversion unit 15‧‧‧Tracking unit 200‧‧‧Convolutional neural network method without frame buffer 21‧‧‧Generate region of interest 22 in input image frame 22‧‧‧Save feature in feature map 23‧‧ ‧Process the region of interest to detect objects 24‧‧‧Compare features and perform convolutional neural network operations in blocks with feature changes 900‧‧‧Convolutional neural network 91‧‧‧Buffer group 92‧‧‧ Frame buffer 93 ‧ ‧ ‧ line buffer 94 ‧ ‧ ‧ convolution unit engine array 941 ‧ ‧ ‧ prefetch controller 95 ‧ ‧ ‧ cumulative buffer 951 ‧ ‧ ‧ maximum pooling 96 ‧ ‧ ‧ instruction decoder

第一圖顯示傳統卷積神經網路的方塊圖。第二A圖顯示本發明實施例之無訊框緩衝器的卷積神經網路系統的方塊圖。第二B圖顯示本發明實施例之無訊框緩衝器的卷積神經網路方法的流程圖。第三圖顯示第二A圖之感興趣區域單元的細部方塊圖。第四A圖例示決定圖，其包含4x6區塊。第四B圖例示另一決定圖，其更新於第四A圖之後。第五圖顯示第二A圖之暫存器的細部方塊圖。第六圖顯示第二A圖之卷積神經網路單元的細部方塊圖。The first figure shows the block diagram of a traditional convolutional neural network. Figure 2A shows a block diagram of a frameless buffer convolutional neural network system according to an embodiment of the present invention. FIG. 2B shows a flowchart of a frameless buffer convolution neural network method according to an embodiment of the present invention. The third diagram shows a detailed block diagram of the region of interest unit in the second diagram A. The fourth diagram A illustrates the decision diagram, which includes 4x6 blocks. The fourth picture B illustrates another decision picture, which is updated after the fourth picture A. The fifth figure shows a detailed block diagram of the register in the second figure A. The sixth figure shows a detailed block diagram of the convolutional neural network unit of the second figure A.

100‧‧‧無訊框緩衝器的卷積神經網路系統 100‧‧‧Convolutional neural network system without frame buffer

11‧‧‧感興趣區域單元 11‧‧‧ Region of Interest Unit

12‧‧‧決定圖 12‧‧‧Decision map

13‧‧‧暫存器 13‧‧‧register

14‧‧‧卷積神經網路單元 14‧‧‧Convolutional Neural Network Unit

15‧‧‧追蹤單元 15‧‧‧Tracking unit

Claims

A convolutional neural network system without frame buffer includes: a region-of-interest unit for extracting features to generate the region of interest of the input image frame; a convolutional neural network unit for processing the input The region of interest of the image frame is used to detect objects; and a tracking unit compares the features extracted at different times, so that the convolutional neural network unit selectively processes the input image frame accordingly.

The convolutional neural network system without frame buffer according to item 1 of the patent application scope, wherein the region of interest unit uses a scan line-based technology and a block-based mechanism for the input image frame Find the region of interest, where the input image frame is divided into plural image blocks.

The convolutional neural network system without frame buffer according to item 2 of the patent application scope, wherein the region-of-interest unit generates block-based features, thereby determining whether each image block performs a convolutional neural network operating.

The convolutional neural network system without frame buffer according to item 2 of the patent application scope, wherein the region of interest unit includes: a feature extractor that extracts the feature from the input image frame; and a classifier, It is determined whether each image block performs a convolutional neural network operation, so a decision map is generated and the region of interest is determined accordingly.

The convolutional neural network system without frame buffer according to item 4 of the patent application scope, wherein the feature extractor generates shallow features of the image block based on block-based histogram or frequency analysis.

The convolutional neural network system without frame buffer according to item 4 of the patent application scope further includes a temporary memory for storing the feature.

The convolutional neural network system without frame buffer according to item 6 of the patent application scope, wherein the temporary memory includes a first feature map for storing the features of the previous image frame; and a second feature map, Used to store the characteristics of the current image frame.

The convolutional neural network system without frame buffer according to item 6 of the patent application scope, wherein the temporary memory includes a sliding window for storing a block of the input image frame.

According to the convolutional neural network system without frame buffer described in item 7 of the patent application scope, wherein the tracking unit compares the first feature map and the second feature map to update the decision map accordingly.

The convolutional neural network system without frame buffer according to item 1 of the patent application scope, wherein the convolutional neural network unit includes: a convolution unit, including a complex convolution engine, used to perform convolution operations in The region of interest; an excitation unit, which performs an excitation function when a preset feature is detected; and a pooling unit, which is used to perform a reduced sampling rate on the input image frame.

A convolutional neural network method without frame buffer includes: extracting features and generating an interest region of an input image frame based thereon; performing a convolutional neural network operation on the interest region of the input image frame to Detecting objects; and comparing features extracted at different times to selectively process the input image frame.

According to the convolutional neural network method without frame buffer described in item 11 of the patent application scope, wherein the generation of the region of interest uses a scanning line-based technology and a block-based mechanism, wherein the input image frame Divide into multiple image blocks.

According to the convolutional neural network method without frame buffer described in item 12 of the patent scope, the step of generating the region of interest includes: generating block-based features, based on which it is determined whether each image block performs volume Integrate neural network operation.

The convolutional neural network method without frame buffer according to item 12 of the patent application scope, wherein the step of generating the region of interest includes: extracting the feature from the input image frame; and determining each image by classification method Whether the block performs a convolutional neural network operation, so a decision graph is generated to determine the region of interest.

The convolutional neural network method without frame buffer according to item 14 of the patent application scope, wherein the step of extracting the feature includes: generating shallow features of the image block based on block-based histogram or frequency analysis.

The convolutional neural network method without frame buffer according to item 14 of the patent application scope further includes a step to temporarily store the feature.

The convolutional neural network method without frame buffer according to item 16 of the patent application scope, wherein the step of temporarily storing the feature includes: generating a first feature map for storing the feature of the previous image frame; and generating The second feature map is used to store the features of the current image frame.

According to the convolutional neural network method without frame buffer described in item 16 of the patent scope, the step of temporarily storing the feature includes: generating a sliding window for storing a block of the input image frame.

The convolutional neural network method without frame buffer according to item 17 of the patent application scope, wherein the step of comparing the features includes: comparing the first feature map and the second feature map, and updating the decision map accordingly.

The convolutional neural network method without frame buffer according to item 11 of the patent application scope, wherein the step of performing the operation of the convolutional neural network includes: using a complex convolution engine to perform a convolution operation on the sense Area of interest; when a preset feature is detected, an excitation function is performed; and a reduced sampling rate is performed on the input image frame.