KR101864000B1

KR101864000B1 - Multi-purpose image processing core

Info

Publication number: KR101864000B1
Application number: KR1020157033283A
Authority: KR
Inventors: 이즈메일 오즈사락; 오즈구르 일마즈; 오메르 구네이
Original assignee: 아셀산 엘렉트로닉 사나이 베 티카렛 아노님 시르케티
Priority date: 2013-09-17
Filing date: 2013-09-17
Publication date: 2018-07-05
Also published as: WO2015040450A1; KR20160003020A

Abstract

물체 검출, 인식 및 추적 알고리즘이 비젼의 많은 어플리케이션에 이용된다. 이러한 알고리즘의 출력은 상황 인식과 의사 결정에 필수적이다. 이러한 알고리즘의 정확도 및 프로세싱 레이턴시는 시스템의 성공에 중요한 파라미터다. 이러한 발명은 테크닉에 기반하는 뉴럴 네트워크를 가능하게 하고, 레이턴시 제약을 충족시킨다.Object detection, recognition, and tracking algorithms are used in many applications of vision. The output of these algorithms is essential for context awareness and decision making. The accuracy and processing latency of these algorithms are important parameters for the success of the system. This invention enables a neural network based on techniques and meets latency constraints.

Description

Multi-Purpose Image Processing Core {MULTI-PURPOSE IMAGE PROCESSING CORE}

본 발명은 뉴럴 네트워크 기반의 기술을 이용하여, 플래폼에 내재되어 실시간으로 동작하는 비디오 프레임을 분석하기 위한 FPGA의 이미지 프로세싱 방법과 연관된다.The present invention relates to an image processing method of an FPGA for analyzing video frames embedded in a platform and operating in real time, using a neural network-based technique.

비전의 뉴럴 네트워크 큰 스케일의 분류[참조 1] 또는 다중 모드 방식[참조 2]와 같은 복잡한 임무에서의 성능으로 인해 더 많이 이용된다. 이러한 성공은 분류되지 않은 데이터로부터의 지시되지 않는 기능의 학습[참조 3], [참조 4], 깊은 구조를 경유하는 계층 프로세싱[참조 5]-[참조 7] 및 리커런트 프로세싱[참조 3], [참조 8]을 이용하는 넓은 범위의 통계적 의존성의 개발 등과 같은 많은 장점으로 인한 것이다. 뉴럴 네트워크 접근법은 커널 방법에 대한 오쏘고널 접근법: 입력은 숨겨진 유닛의 비선형 높은 차원의 공간에 투영되고, 선형 하이퍼 평면은 데이터를 분할할 수 있다[참조 9]. 이러한 비선형 투영은 시각 데이터의 강력한 표현이고, 분류, 투영, 추적, 클러스터링, 관심 포인트 검출 등과 같은 다중의 다른 임무를 위해 이용되는 것이 가능하다. 따라서, 이미지 또는 비디오 블록이 다중의 레이어 프로세싱을 경유하여 뉴럴 네트워크에 의해 "분석"되고, 시각 입력을 나타내는 숨겨진 레이어 활동은 필요에 따라 많은 다른 임무로 다중 송신될 수 있고, 코르티컬 프로세싱[참조 10]에서 실행된다.The neural network of vision is used more often because of its performance in complex tasks such as large-scale classification [ref 1] or multimodal [ref 2]. This success can be attributed to learning unspecified functions from unclassified data [ref 3], ref [4], layer processing via deep structures [ref 5] - ref 7 and recursive processing [ref 3] And the development of a broad range of statistical dependencies using [8]. The neural network approach is an orthonormous approach to the kernel method: the input is projected into the nonlinear high-dimensional space of the hidden unit, and the linear hyperplane can partition the data [ref 9]. This nonlinear projection is a powerful representation of visual data and is available for multiple other tasks such as classification, projection, tracking, clustering, point of interest detection, and the like. Thus, the image or video block is "analyzed" by the neural network via multiple layer processing, hidden layer activity representing visual input can be multiplexed into many different tasks as needed, ].

실시간으로 내재된 시각 프로세싱의 수요는 늘어나고 있고, UAV(Unmanned Aerial Vehicles)와 같은 지능형 로봇 플랫폼에서의 수요도 증가하고 있다. 이러한 시스템은 자동화 방식으로 동작하고 항해하는 것으로 예측되고, 이는 이미지 및 비디오의 지식 기능의 성공적인 구현을 필요로 한다. 장면 인식, 이미지에서의 특정 물체의 검출, 움직이는 물체의 분류 및 물체 추적은 자동화 로봇 시스템에서 요구되는 필수적 시각 기능의 일부이다. 이러한 시스템의 무게 및 에너지 스펙은 시각 프로세싱 기능, 동작 용량의 감소의 복잡성 및 수 모두에 제한을 받는다. 이러한 기능의 적어도 하위 집합에 일반적인 시간 프로세싱 코어는 제한을 완화시킬 수 있다.Demand for real-time embedded visual processing is growing, and demand for intelligent robot platforms such as UAV (Unmanned Aerial Vehicles) is also increasing. Such a system is expected to operate and navigate in an automated manner, which requires successful implementation of image and video knowledge functions. Scene recognition, detection of a specific object in an image, classification of a moving object, and object tracking are some of the essential visual functions required in an automated robot system. The weight and energy specifications of these systems are limited both by the visual processing function, the complexity and the number of reductions in operating capacity. At least a subset of these functions can mitigate the constraints of a typical time processing core.

본 발명에서, 희소성 있고 및 완성도 있는 이미지 표현이 뉴럴 네트워크의 숨겨진 레이어에서 형성되고, 여러모로 유용한 파워를 제공한다[참조 4], [참조 11]. 특히, 이는 감시와 인식을 목적으로 한 UAV 플랫폼에 내장될 수 있다.In the present invention, sparse and complete image representations are formed in the hidden layers of the neural network and provide useful power in many ways [ref 4]. In particular, it can be embedded in a UAV platform for monitoring and recognition purposes.

[참조 1] A. Krizhevsky, I. Sutskever, 및 G. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25, 2012, pp. 1106-1114.[1] A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25, 2012, pp. 1106-1114.

[참조 2] N. Srivastava 및 R. Salakhutdinov, "Multimodal learning with deep boltzmann machines," in Advances in Neural Information Processing Systems 25, 2012, pp. 2231-2239.[2] N. Srivastava and R. Salakhutdinov, "Multimodal learning with deep boltzmann machines," in Advances in Neural Information Processing Systems 25, 2012, pp. 2231-2239.

[참조 3] G. E. Hinton, S. Osindero, 및 Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554,2006.[3] G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006.

[참조 4] B. A. Olshausen et al., "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol.381, no. 6583, pp. 607-609, 1996.[Reference 4] B. A. Olshausen et al., "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol. 6583, pp. 607-609, 1996.

[참조 5] Y. LeCun, L. Bottou, Y. Bengio, 및 P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324,1998.

[참조 6] M. Ranzato, F. J. Huang, Y.-L. Boureau, 및 Y. Lecun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007, pp. 1-8.[Reference 6] M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. Lecun, "Unsupervised learning of invariant feature hierarchies with application to object recognition," in Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007, pp. 1-8.

[참조 7] Y. Bengio, P. Lamblin, D. Popovici, 및 H. Larochelle, "Greedy layerwise training of deep networks," Advances in neural information processing systems, vol. 19, p. 153, 2007.[7] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layerwise training of deep networks," Advances in neural information processing systems, vol. 19, p. 153, 2007.

[참조 8] D. E. Rumelhart, G. E. Hinton, 및 R. J. Williams, "Learning representations by back-propagating errors," Cognitive modeling, vol. 1, p. 213, 2002.12[8] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Cognitive modeling, vol. 1, p. 213, 2002.12

[참조 9] A. Coates, A. Y. Ng, 및 H. Lee, "An analysis of single-layer networks in unsupervised feature learning," in International Conference on Artificial Intelligence and Statistics, 2011, pp. 215-223.[9] A. Coates, A. Y. Ng, and H. Lee, "An analysis of single-layer networks in unsupervised feature learning," in International Conference on Artificial Intelligence and Statistics, 2011, pp. 215-223.

[참조 10] T. S. Lee, D. Mumford, R. Romero, 및 V. A. Lamme, "The role of the primary visual cortex in higher level vision," Vision research, vol. 38, no. 15, pp. 2429-2454, 1998.[10] T. S. Lee, D. Mumford, R. Romero, and V. A. Lamme, "The role of the primary visual cortex in higher level vision," Vision research, vol. 38, no. 15, pp. 2429-2454, 1998.

[참조 11] Y.-L. Boureau, F. Bach, Y. LeCun, 및 J. Ponce, "Learning mid-level features for recognition," in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 2559-2566.[Reference 11] Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce, "Learning Mid-level Features for Recognition," IEEE Vision and Pattern Recognition (2010). IEEE, 2010, pp. 2559-2566.

본 발명의 목표는 이미지 프로세싱 코어에 의존하는 뉴럴 네트워크의 FPGA 구현을 제공한다.The goal of the present invention is to provide an FPGA implementation of a neural network that relies on an image processing core.

일측에 따르면, 인벤티브 다목적 이미지 프로세싱 코어(101)는 적어도 하나의 이미지 분석기(102) 블록; 및 적어도 하나의 메모리 인터페이스(103) 블록을 포함한다.According to one aspect, the inventive multi-purpose image processing core 101 comprises at least one image analyzer 102 block; And at least one memory interface 103 block.

일실시예에 따르면, 상기 이미지 분석기(102) 블록은 적어도 하나의 특징 추출(104) 블록; 적어도 하나의 특징 합산(105) 블록; 및 적어도 하나의 분류(106) 블록을 포함한다.According to one embodiment, the image analyzer 102 block includes at least one feature extraction block 104; At least one feature summing (105) block; And at least one classification 106 block.

일실시예에 따르면, 상기 이미지 분석기(102) 블록은 비디오 프레임(107); 특징 딕셔너리(108); 분류 매트릭스(110); 특징 계산 요청(109); 및 희소성 곱셈기(114)를 포함한다.According to one embodiment, the image analyzer 102 block comprises a video frame 107; Feature dictionary 108; A classification matrix 110; Feature calculation request 109; And a scarcity multiplier 114.

일실시예에 따르면, 상기 이미지 분석기(102) 블록은 특징 벡터(111); 및 분류 라벨(112)를 포함한다.According to one embodiment, the image analyzer 102 block includes a feature vector 111; And a classification label 112.

일실시예에 따르면, 상기 특징 추출(104) 블록은 패치(310) 획득 프로세스; P 벡터(302) 구축 프로세스; P 벡터 평균값(303) 계산 프로세스; 이진 PB 벡터(304) 구축 프로세스; 딕셔너리(dictionary; D)와 함께 PB의 비트 플리핑 거리 벡터(distance vector; DV)(305) 계산 프로세스; DV의 평균값(306) 계산 프로세스; DV의 표준편차값(307) 계산 프로세스; DV의 활성화 임계치(activation threshold; AT)(308) 계산 프로세스; 및 DV의 픽셀 특징 벡터(pixel feature vector; PFV)(309) 계산 프로세스를 포함한다.According to one embodiment, the feature extraction 104 block includes a patch 310 acquisition process; P vector (302) building process; P vector average value (303) calculation process; A process of building a binary PB vector 304; A bit flipping distance vector (DV) 305 calculation process of the PB together with a dictionary (D); An average value (306) calculation process of DV; A standard deviation value (307) calculation process of DV; Activation threshold (AT) 308 computation process of DV; And a pixel feature vector (PFV) 309 of DV.

일실시예에 따르면, 상기 패치 획득 프로세스는 상기 각각의 새롭게 들어오는 비디오 라인을 바닥 라인 FIFO(401)에 기록하는 단계; 상기 다음 비디오 라인이 오는 경우, 바닥 라인 FIFO(401)로부터 상기 이전의 비디오 라인을 판독하고 상위 라인 FIFO(401)에 기록하는 단계; 모든 라인 FIFO (401)가 패치를 구축(204)하도록 상기 필요한 라인으로 채워질 때까지 상기 처음 두 단계를 지속하는 단계; 모든 라인이 사용 가능한 경우, 상기 라인 FIFO(401)로부터 K번 판독하고 상기 패치를 획득(240)하는 단계; 상기 라인 FIFO(401)로부터 상기 K+1의 판독을 이용하여 상기 다음 픽셀 패치를 획득하는 단계; 비디오 라인을 통해 모든 패치가 획득될 때까지 상기 판독 작업을 지속하는 단계; 및 패치(204)의 하향 이동을 생성하도록 상기 비디오 라인을 상위 라인 FIFO(401)로 이동시키는 단계를 포함한다.According to one embodiment, the patch acquisition process includes writing each new incoming video line to a bottom line FIFO (401); Reading the previous video line from the bottom line FIFO (401) and writing to the upper line FIFO (401) if the next video line is coming; Continuing the first two steps until all line FIFOs 401 are populated with the required lines to build 204; If all lines are available, reading K times from the line FIFO 401 and acquiring (240) the patch; Obtaining the next pixel patch using the reading of K + 1 from the line FIFO (401); Continuing the read operation until all patches have been acquired through the video line; And moving the video line to an upper line FIFO (401) to produce a downward movement of the patch (204).

일실시예에 따르면, 인벤티브 다목적 이미지 프로세싱 코어(101)는 특징 딕셔너리 D(701) 사이의 상기 거리 계산(305)을 위해 스칼라 P(501) 벡터 대신 이진 PB(603) 벡터를 이용한다.According to one embodiment, the inventive versatile image processing core 101 uses a binary PB (603) vector instead of the scalar P (501) vector for the distance calculation 305 between the feature dictionary D 701.

일실시예에 따르면, 인벤티브 다목적 이미지 프로세싱 코어(101)는 상기 거리 계산(305)을 위해 스칼라 특징 딕셔너리 대신 이진 특징 딕셔너리 D(701)를 이용한다.According to one embodiment, the inventive versatile image processing core 101 uses the binary feature dictionary D 701 instead of the scalar feature dictionary for the distance calculation 305.

일실시예에 따르면, 인벤티브 다목적 이미지 프로세싱 코어(101)는 상기 시나리오에 따라 상기 연산 동안 다른 특징 사전(D)(701)을 로딩하는 상기 가능성을 가진다.According to one embodiment, the inventive multipurpose image processing core 101 has the possibility to load another feature dictionary (D) 701 during the operation according to the scenario.

일실시예에 따르면, 상기 비트 플리핑 거리 계산(305) 프로세스는 xor(801) 연산을 이용함으로써 PB(603) 벡터를 특징 사전 D(701)의 모든 열(702)과 비교하는 단계; xor 연산(801) 이후에 "1"과 동일한 엔트리의 상기 수를 계산하는 단계(802); 및 상기 거리 벡터 DV(804)를 구축하는 단계(803)를 포함한다.According to one embodiment, the bit flipping distance calculation 305 process includes comparing the PB 603 vector to all columns 702 of feature dictionary D 701 by using the xor (801) operation; computing (802) said number of entries equal to "1" after xor operation 801; And constructing the distance vector DV (804) (803).

일실시예에 따르면, 인벤티브 다목적 이미지 프로세싱 코어(101)는 상기 거리 벡터 DV(804)는 모든 xor 연산(801)의 결과 대신에 "1"과 동일한 엔트리의 상기 수만을 유지한다.According to one embodiment, the inventive versatile image processing core 101 maintains the number of entries equal to "1" instead of the result of all xor operations 801,

일실시예에 따르면, 상기 픽셀 특징 벡터(PFV)(309) 계산 프로세스는 DV(805)의 상기 엔트리를 AT(901)와 비교하는 단계; DV(805)의 엔트리가 AT(901)보다 더 큰 경우, "0"을 할당하는 단계; DV(805)의 엔트리가 AT(901)보다 더 작은 경우, "1"을 할당하는 단계; 및 픽셀 특징 벡터(PFV)(309)를 구축하는 단계(904)를 포함한다.According to one embodiment, the pixel feature vector (PFV) computation process 309 includes comparing the entry of DV 805 with AT 901; Assigning "0" if the entry of DV 805 is larger than AT 901; Assigning "1" if the entry of DV 805 is smaller than AT 901; And constructing a pixel feature vector (PFV) 309 (904).

일실시예에 따르면, 상기 픽셀 특징 벡터(PFV)(904)는 AT 및 DV 엔트리 사이의 상기 비교의 상기 결과를 2진 값으로서 유지한다.According to one embodiment, the pixel feature vector (PFV) 904 maintains the result of the comparison between AT and DV entries as a binary value.

일실시예에 따르면, 상기 특징 합산(105) 블록은 상기 통합 벡터 IV(1201)를 계산하는 적어도 하나의 통합 벡터 계산기(1001); 상기 특징 계산 요청(109)에 따라 상기 내부 RAM(1004)을 계산하는 적어도 하나의 어드레스 계산기(1002); 상기 특징 계산 요청(109)을 저장하는 적어도 하나의 특징 계산 요청 FIFO(1003); 및 상기 PFV(309)를 저장하는 적어도 하나의 내부 RAM(1004)의 하위 블록을 포함한다.According to one embodiment, the feature summing (105) block comprises at least one integrated vector calculator (1001) for calculating the unified vector IV (1201); At least one address calculator (1002) for calculating the internal RAM (1004) according to the feature calculation request (109); At least one feature calculation request FIFO (1003) for storing the feature calculation request (109); And at least one sub-block of internal RAM 1004 that stores the PFV 309.

일실시예에 따르면, 상기 특징 합산(105) 블록은 영역을 네 개의 동일한 하위 영역; 사분면(1103, 1104, 1105, 1106)으로 분할하도록 상기 특징 계산 요청(109)의 경계 좌표(1101)를 픽셀 값으로서 수신하고, 상기 다른 좌표(1102)를 계산한다.According to one embodiment, the feature summing (105) block includes four identical sub-regions; The boundary coordinate 1101 of the feature calculation request 109 is divided into the quadrants 1103, 1104, 1105 and 1106 as pixel values and the other coordinates 1102 are calculated.

일실시예에 따르면, 상기 특징 합산(105) 블록은 상기 계산을 더 빠르게 하기 위해 외부 메모리(113)으로부터 상기 PFV(309)를 판독하고 내부 RAM(1004)을 기록한다.According to one embodiment, the feature summing 105 block reads the PFV 309 from the external memory 113 and writes the internal RAM 1004 to make the calculation faster.

일실시예에 따르면, 상기 통합 벡터 계산기(1001)의 하위 블록은 내부 RAM(1004)으로부터 상기 PFV(309)를 판독하는 단계; 및 수평 및 수직 차원 모두에서 이전 PFV(904)의 상기 모든 엔트리를 추가함으로써 상기 사분면 통합 벡터 QIV(1103-1)를 계산하는 단계를 포함한다.According to one embodiment, the sub-block of the unified vector calculator 1001 includes reading the PFV 309 from the internal RAM 1004; And computing the quadrant integration vector QIV 1103-1 by adding all of the entries of the previous PFV 904 both in the horizontal and vertical dimensions.

일실시예에 따르면, 인벤티브 다목적 이미지 프로세싱 코어(101)는 사분면 통합 벡터 및 제1 PVF 사이의 상기 뺄셈 연산을 생략하도록 상기 제1 PVF(309)의 값을 모두 "0"으로서 획득한다.According to one embodiment, the inventive multipurpose image processing core 101 obtains all values of the first PVF 309 as "0" to omit the subtraction operation between the quadrant integration vector and the first PVF.

일실시예에 따르면, 분류(106) 블록은 C 매트릭스 행과 FV(111)의 상기 곱셈을 제어하는 적어도 하나의 C 매트릭스 행 아비터(1303); 및 상기 매트릭스 벡터 곱셈 연산을 구현하는 적어도 하나의 곱셈 & 덧셈(1304) 연산기를 포함한다.According to one embodiment, the classification block 106 comprises at least one C matrix row arbiter 1303 for controlling the multiplication of the C matrix row and the FV 111; And at least one multiplication & add (1304) operator to implement the matrix vector multiplication operation.

본 발명의 목적을 충족시키기 위한 다목적 이미지 프로세싱(image processing; IP) 코어는 첨부된 도면에서 설명된다.
도 1은 외부 요소와 함께 FPGA의 IP 코어가 도시한다.
도 2는 비디오 및 패치 구조를 도시한다.
도 3은 기능 추출기의 흐름을 도시한다.
도 4는 패치 획득 프로세스의 구조이다.
도 5는 P 벡터의 구조이다.
도 6은 이진 PB 벡터의 구조이다.
도 7은 딕셔너리 D이다.
도 8은 거리 벡터 DV의 구조이다.
도 9는 픽셀 기능 벡터 PFV의 계산이다.
도 10은 기능 합산기의 구조이다.
도 11은 사분면의 구조이다.
도 12는 기능 벡터 FV의 계산이다.
도 13은 계층 라벨 CL의 계산이다.A multipurpose image processing (IP) core for meeting the objectives of the present invention is described in the accompanying drawings.
Figure 1 shows the IP core of an FPGA with external elements.
Figure 2 shows a video and patch structure.
Figure 3 shows the flow of a function extractor.
4 is a structure of a patch acquisition process.
5 shows the structure of a P vector.
Figure 6 shows the structure of a binary PB vector.
7 is a dictionary D.
Fig. 8 shows the structure of the distance vector DV.
9 is a calculation of the pixel function vector PFV.
10 is a structure of a functional summing unit.
11 shows the structure of a quadrant.
12 is a calculation of the function vector FV.
13 is a calculation of the hierarchical label CL.

본 발명의 바람직한 실시예에서, 인벤티브 다목적 이미지 프로세싱 코어(101)가 FPGA (100)에서 구현된다. 코어는 두 개의 메인 하위 블록; 이미지 분석기(102) 및 메모리 인터페이스(103)로 구성된다.In a preferred embodiment of the present invention, the inventive multi-purpose image processing core 101 is implemented in the FPGA 100. [ The core comprises two main sub-blocks; An image analyzer 102 and a memory interface 103.

메모리 인터페이스(103)는 이미지 분석기(102) 및 메모리 인터페이스(103) 사이의 데이터 전송을 수행한다. 이미지 분석기(102)는 세 개의 하위 블록; 특징 추출(104) 블록; 특징 합산(105) 블록; 및 분류(106) 블록으로 구성된다. 이미지 분석기(102) 블록은 FPGA (100)의 외부로부터 다섯 타입의 입력; 비디오 프레임(107); 특징 딕셔너리(108); 분류 매트릭스(110); 특징 계산 요청(109); 및 희소성 곱셈기(114)을 수신한다. 비디오 프레임(107)은 두 개의 파라미터; 해상도 및 프레임 속도에 의해 정의될 수 있다. 해상도는 M(행) (201) N(열)(202)이고 프레임 속도는 1초에 캡쳐되는 프레임(203)의 수이다. 다른 입력; 특징 딕셔너리(108), 계층 매트릭스(110), 특징 계산 요청(109) 및 희소성 곱셈기(114)는 이어지는 부분에서 설명된다.The memory interface 103 performs data transfer between the image analyzer 102 and the memory interface 103. The image analyzer 102 includes three sub-blocks; Feature extraction (104) block; Feature summing (105) blocks; And classification (106) blocks. The image analyzer 102 block includes five types of inputs from outside the FPGA 100; A video frame 107; Feature dictionary 108; A classification matrix 110; Feature calculation request 109; And a scarcity multiplier 114. [ The video frame 107 includes two parameters; Resolution and frame rate. The resolution is M (row) 201 N (column) 202 and the frame rate is the number of frames 203 captured in one second. Other inputs; The feature dictionary 108, hierarchical matrix 110, feature computation request 109 and scarcity multiplier 114 are described in the following sections.

특징 추출(104) 블록은 패치 획득 프로세스를 시작한다. 이러한 프로세스는 비디오 프레임(107)으로부터 패치(204)의 선택되는 관련 좌표를 캡쳐한다. 관련 좌표를 캡쳐하기 위해, 들어오는 비디오 라인(비디오 프레임(107)의 행(201))은 라인 FIFO (401)으로 기록된다. 패치(204) 차원(K)에 따라 패치 획득(301) 프로세스는 K개의 라인 FIFO (401)를 이용한다. 각각의 들어오는 비디오 라인은 먼저 라인 FIFO (401)의 바닥에 기록되고, 다음 비디오 라인이 들어오는 경우, 이전의 라인은 라인 FIFO (401)의 바닥으로부터 판독되고 상위 라인 FIFO (401)에 기록된다. 이러한 단계는 모든 라인 FIFO (401)이 패치(204)를 구축하는 필요한 라인으로 채워질 때까지 지속된다. 모든 라인이 사용 가능한 경우, 다음 라인이 들어오고, 픽셀 값은 라인 FIFO (401)으로부터 판독된다. K 번의 판독 작동 후에, 패치는 추가의 작동을 준비한다. 라인 FIFO (401)로부터의 K+1 번의 판독은 다음 픽셀 패치를 수여한다. 이러한 단계는 모든 패치(204)가 라인을 통해 캡쳐될 때까지 지속된다. 라인 FIFO (401)으로부터의 패치 판독 동안, 새로운 라인은 상위 라인 FIFO (401)으로 이동하도록 지속된다. 이러한 이동은 비디오 프레임(107)을 통해 패치(204)의 하향 이동을 생성한다.The feature extraction 104 block begins the patch acquisition process. This process captures the selected relative coordinates of the patch 204 from the video frame 107. To capture the associated coordinates, the incoming video line (row 201 of video frame 107) is written to line FIFO 401. [ Patch 204 The patch acquisition 301 process, in accordance with dimension (K), uses K line FIFOs 401. Each incoming video line is first written to the bottom of the line FIFO 401 and if the next video line comes in, the previous line is read from the bottom of the line FIFO 401 and written to the upper line FIFO 401. This step lasts until all the line FIFOs 401 are filled with the required lines to build the patches 204. [ If all lines are available, the next line comes in, and the pixel value is read from the line FIFO 401. After K read operations, the patch prepares for further operation. A read of K + 1 from the line FIFO 401 confer the next pixel patch. This step continues until all the patches 204 are captured through the line. During a patch read from line FIFO 401, the new line continues to move to upper line FIFO 401. This movement creates a downward movement of the patch 204 through the video frame 107.

P 벡터(501)는 캡쳐되는 패치(204) 픽셀 값을 이용함으로써 구축(302)된다. 실제로, 이러한 구축 프로세스는 단순한 등록 할당이다. L1P1 (402)로부터 LKPK (403)까지 KxK 번의 등록이 있고, 모든 등록은 관련 픽셀 값을 유지한다. 등록의 비트 사이즈는 최대 가능 픽셀 값에 의해 결정된다.P vector 501 is constructed 302 by utilizing the value of the patch 204 pixel to be captured. In practice, this build process is a simple registration assignment. There is a KxK registration from L1P1 402 to LKPK 403, and all registrations retain the associated pixel values. The bit size of the registration is determined by the maximum possible pixel value.

P 벡터의 평균값(Pμ (602))을 계산하기 위해 패치(204)의 모든 픽셀 값이 추가되어야 하고 픽셀의 전체 수에 의해 분할되어야 한다. 추가 프로세스가 가산기에 의해 구현될 수 있고, 가산기의 입력 수는 FPGA 능력에 따라 달라질 수 있다. 가산기의 입력 수는 파이프라인 클락 레이턴시 및 이용되는 가산기의 수에 영향을 줄 수 있다. 모든 픽셀 값이 계산된 후에, 전체는 K*K에 의해 분할된다.All pixel values of the patch 204 must be added and divided by the total number of pixels to calculate the average value of the P-vector (P 占 602). The additional process may be implemented by an adder, and the number of inputs of the adder may vary depending on the FPGA capability. The number of inputs of the adder can affect the pipeline clock latency and the number of adders used. After all the pixel values are calculated, the whole is divided by K * K.

Pμ (602)를 계산한 후에 P 벡터(501)의 각각의 엔트리는 Pμ (602)와 비교되고, PB 벡터(603)를 구축하기 위해 이진화(304)된다. 이진화 단계는 현재의 가능한 FPGA에서 이러한 이미지 프로세싱 알고리즘을 구현하는 데 필수적이다. Pμ (602)보다 작은 값에서, "0"이 할당된다. 동일하거나 더 큰 값에서, "1"이 할당된다. 모든 값이 평균 값과 비교(601)된 후에, 이진 P(501) 벡터 PB(603)이 획득된다. PB(603)는 T(604)가 K*K와 동일한 경우, "1" 비트 벡터에 의한 T(604)이다.After computing P? 602, each entry of P vector 501 is compared to P? 602 and binarized 304 to build PB vector 603. The P? The binarization step is essential to implement this image processing algorithm in current FPGAs. At a value smaller than P? (602), "0" is assigned. At the same or larger value, "1" is assigned. After all the values are compared 601 to the average value, a binary P (501) vector PB 603 is obtained. PB 603 is T (604) with a "1" bit vector if T (604) equals K * K.

이미지의 모든 패치(204)로부터 구축되는 모든 이진 벡터 PB(603)는 시각적 단어의 Z(703)개의 수를 가진 미리 계산된 딕셔너리를 이용하여 기능 벡터로 변환된다. 딕셔너리 D(701)는 Z(703) 비트 매트릭스에 의한 T(604)이다. D(701)의 엔티티는 이진 값; "1" 또는 "0"이다. D(701) 매트릭스(DC1 - DCZ (702))의 열은 FPGA (100)의 내부 레지스터에 등록된다. 딕셔너리는 PCI, VME 등과 같은 통신 인터페이스의 수단에 의해 FPGA로 로딩된다. 딕셔너리의 엔트리는 엔트리가 내부 레지스터에 저장된 후에 언제라도 업데이트될 수 있다.All binary vectors PB 603 constructed from all the patches 204 of the image are converted into functional vectors using a pre-computed dictionary with Z (703) numbers of visual words. Dictionary D 701 is T (604) with a Z (703) bit matrix. The entity of D 701 may be a binary value; Quot; 1 "or" 0 ". The column of the D (701) matrix (DC1 - DCZ 702) is registered in the internal register of the FPGA 100. The dictionary is loaded into the FPGA by means of a communication interface such as PCI, VME, or the like. An entry in a dictionary can be updated at any time after an entry is stored in an internal register.

비트 플리핑 (또는 해밍) 거리 계산(305)은 두 개의 백터: PB(603) 및 D(701)의 모든 열(DC1-DCZ (702)) 사이에서 유사성을 계산한다. PB(603)의 엔트리 및 DCX(702)가 동일한 경우, "0"으로 할당되고, 그렇지 않은 경우, "1"로 할당된다. 이러한 작용은 xor (801)에 의해 구현된다. xor (801) 동작 이후 "1"의 전체 수는 두 개의 이진 벡터 사이의 차이점 측정이다. DV(804)는 딕셔너리의 모든 시각화 단어(열(702))로 단일 PB(603) 벡터의 해밍 거리를 포함한다. DV(804)의 엔트리(805)는 "1"의 수를 유지하고, 따라서 상기 엔트리는 정수 값이고 PB(603) 또는 DCX (702)와 비교되는 경우 더 낮은 비트에 의해 표현될 수 있다. DV(804)는 Z(703) 비트 벡터에 의한 H(806)이다. H(806)은 스칼라 값 T(604)를 정의할 수 있는 최소 수의 비트이다.The bit flipping (or hamming) distance calculation 305 calculates the similarity between all the columns (DC1 - DCZ 702) of the two vectors: PB 603 and D 701. 0 "if the entry of the PB 603 and the DCX 702 are the same, and is otherwise assigned" 1 ". This action is implemented by xor (801). The total number of "1's" after xor (801) operation is a measure of the difference between two binary vectors. DV 804 contains the hamming distance of a single PB 603 vector in all visualization words (column 702) of the dictionary. The entry 805 of DV 804 maintains a number of "1" s, and thus the entry is an integer value and can be represented by a lower bit when compared to PB 603 or DCX 702. DV 804 is H (806) with a Z (703) bit vector. H 806 is the minimum number of bits that can define a scalar value T (604).

DV (804) (DVμ)의 평균값은 Pμ (602)와 유사하게 계산된다. DV (804) (DVσ) (307)의 표준 편차를 계산하기 위해, DVμ는 DV (805)의 각각의 엔트리로부터 감해진다. 이러한 뺄셈의 제곱이 계산되고 모든 제곱이 합산된다. 다음으로 전체 값이 Z(703)에 의해 분할된다. 마지막으로, 제곱근이 계산되고, DVσ이 획득된다. 활성화 임계치 AT(901)이 식 1에 의해 계산(307)된다. 이러한 임계치는 특정한 값보다 큰 거리 값을 널리파잉(nullifying)하는 것을 경유하여 희소성 표현을 구축하도록 이용된다.The average value of DV 804 (DVμ) is calculated similar to Pμ (602). To compute the standard deviation of DV 804 (DV?) 307, DV? Is subtracted from each entry of DV 805. The squares of these subtractions are calculated and all the squares are summed. Next, the entire value is divided by Z (703). Finally, the square root is calculated and DV? Is obtained. The activation threshold value AT (901) is calculated (307) by the equation (1). This threshold is used to construct a scarcity representation via nullifying a distance value that is greater than a particular value.

(1)

(One)

픽셀 특징 벡터(309)를 구축하기 위해 DV(804)의 각각의 엔트리는 AT(901)와 비교(902)된다. 엔트리(805)가 AT(901)보다 큰 경우, "0"을 PFV(904)의 관련된 엔트리로 할당한다. 결과는 Z 픽셀 특징 벡터 PFV (904)에 의한 "1"이다.Each entry of the DV 804 to build the pixel feature vector 309 is compared 902 with the AT 901. If the entry 805 is larger than the AT 901, assigns "0" to the associated entry of the PFV 904. The result is "1" by the Z pixel feature vector PFV (904).

결과로서, 비디오 프레임(107)의 각각의 픽셀에서, Z(703)에 의한 비트 벡터(픽셀 특징 벡터 PFV (904))이 획득된다. 이러한 PFV(904)는 메모리 인터페이스(103)으로 보내져 외부 메모리(113)에 기록된다.As a result, in each pixel of the video frame 107, a bit vector (pixel feature vector PFV 904) by Z 703 is obtained. The PFV 904 is sent to the memory interface 103 and recorded in the external memory 113.

특징 계산 요청(109)은 특징 계산 요청 FIFO (1003)로 기록되고, 요청은 픽셀 좌표로서 기록된다. CPU는 두 개의 경계 픽셀(좌상단 및 우하단, 검은 점(1101))의 좌표를 송신하고, FPGA는 하위 영역의 남은(하얀 점(1102)) 좌표를 계산한다. 메인 아이디어는 영역을 네 개의 동일한 하위 영역; 사분면(1103, 1104, 1105, 1106)으로 나누는 것이고, 픽셀 특징 벡터(PFVs (904))를 모으고, 특징 벡터(FV (111))를 획득하기 위해 내부 특징 벡터를 연결시킨다.The feature calculation request 109 is recorded in the feature calculation request FIFO 1003, and the request is recorded as pixel coordinates. The CPU transmits the coordinates of two boundary pixels (top left and bottom right, black point 1101), and the FPGA calculates the remaining (white point 1102) coordinates of the bottom area. The main idea is that the region has four identical sub-regions; Is divided into quadrants 1103, 1104, 1105, and 1106, and collects the pixel feature vector (PFVs 904) and links internal feature vectors to obtain the feature vector (FV 111).

픽셀 좌표에 따라, 내부 RAM(1004) 어드레스가 어드레스 계산기(1002) 블록에 의해 계산된다. 상기 블록은 RAM의 내용을 알고, 라인 좌표가 저장된다. 계산을 더 빠르게 하기 위해 PFV (904) 값이 외부 메모리(113)으로부터 판독되고, 내부 RAM으로 기록된다. RAM은 R x N (202) x Z (703) 비트의 데이터를 저장할 수 있다. R은 동시에 프로세싱될 수 있는 라인의 최대 수이다.In accordance with the pixel coordinates, the internal RAM 1004 address is computed by the address calculator 1002 block. The block knows the contents of the RAM, and the line coordinates are stored. To speed up the calculation, the PFV 904 value is read from the external memory 113 and written into the internal RAM. The RAM can store R x N (202) x Z (703) bits of data. R is the maximum number of lines that can be processed simultaneously.

내부 벡터 계산기(100)은 필요한 PFVs (904)를 내부 RAM(1004)로부터 판독하여 내부 벡터(1201)을 계산하도록 하게 한다. 통합 벡터 IV (1201) 엔트리는 수평과 수직 차원 모두의 이전의 PFVs (904)의 모든 엔트리의 합이다. 예를 들어 IV11 (1201-11)는 PFV11 (904-11)와 동일하고, IV12 (1201-12)는 IV11 (1201-11) 및 PFV12 (904-12)의 합과 동일하고, IV21 (1201-21)는 PFV11 (904-11)와 PFV21 (904-21)의 합과 동일하다. 마지막 결과는 사분면 통합 벡터 IV22 (1201-22)이다. 모집 특성 동작은 IV22 (1201-22)와 IV11 (1201-11) 사이의 차이를 요구한다. 따라서, PFV11 (904-11)를 모두 "0"으로 획득하는 것은 동일하다. 모든 통합 벡터 IV22 (1201-22)는 차(difference)이고, QIV (1103-1)와 동일하다.The internal vector calculator 100 reads the required PFVs 904 from the internal RAM 1004 and allows the internal vector 1201 to be computed. The unified vector IV (1201) entry is the sum of all entries of the previous PFVs (904) on both the horizontal and vertical dimensions. For example, IV11 (1201-11) is the same as PFV11 (904-11), IV12 (1201-12) is the same as the sum of IV11 (1201-11) and PFV12 (904-12) 21) is equal to the sum of PFV11 (904-11) and PFV21 (904-21). The final result is the quadrant integrator vector IV22 (1201-22). The recruitment feature operation requires a difference between IV22 (1201-22) and IV11 (1201-11). Therefore, obtaining PFV11 (904-11) as "0" is the same. All of the integrated vectors IV22 (1201-22) are the difference and are the same as QIV (1103-1).

네 개의 사분면(Q1 (1103), Q2 (1104), Q3 (1105) and Q4 (1106))이 존재하기 때문에, 모든 사분면의 결과(1103-1, 1104-1, 1105-1 and 1106-1)는 연결되고 최종 특징 벡터 FV (1202)가 획득된다. FV (1202)는 G x S 비트 벡터이다. S는 사분면에 "1"을 저장할 수 있는 최소 비트 수이다. G는 4*Z (703)과 동일하다. 벡터는 FPGA의 내부 램에 저장된다. 이러한 특징 벡터(FV)는 경계 좌표에 의해 정의되는 이미지 패치를 나타내고, 분류 및 클러스터링 목적으로 이용될 수 있고, 메모리 전송을 통해 FPGA 또는 CPU에서 실행된다.Because there are four quadrants (Q1 1103, Q2 1104, Q3 1105 and Q4 1106), the results of all quadrants 1103-1, 1104-1, 1105-1 and 1106-1, And the final feature vector FV 1202 is obtained. FV 1202 is a G x S bit vector. S is the minimum number of bits that can store a "1" in the quadrant. G is the same as 4 * Z (703). The vector is stored in the FPGA's internal RAM. This feature vector FV represents an image patch defined by boundary coordinates and can be used for classification and clustering purposes and is executed in the FPGA or CPU via memory transfer.

이미지 영역이 종료된 후, 즉 영역 내의 요구되는 좌표로 풀링하는 것이 종료된 경우에, 내부 RAM (1004)은 새로운 라인으로 업데이트되고, 새로운 풀링 계산이 시작된다. 이러한 프로세스는 어드레스 계산기(1002)의 도움으로 통합 벡터 계산기(1001)에 의해 제어된다.After the image area is terminated, that is, when pooling with the required coordinates in the area is terminated, the internal RAM 1004 is updated with a new line and a new pooling calculation is started. This process is controlled by the integrated vector calculator 1001 with the help of the address calculator 1002. [

분류기 블록(106)은 선형 분류 방법을 이용하여 계층 라벨과 유사한 벡터를 생성한다. 이는 FV (111)를 이용하여 계층 매트릭스 C (1301)의 매트릭스 벡터 곱셈을 수행한다. 계층 매트릭스 C(1301)는 특징 딕셔너리 D(701)처럼 FPGA에 로딩된다. 행 아비터(1303)는 FV 곱셈(111)을 위해 C (1301) 매트릭스 행 관리를 제어한다. C(1301) 매트릭스는 J (1302) x G x S 비트 매트릭스이다. 결과는 계층 라벨 CL (112) 벡터이다. CL(112)의 엔트리는 C (1301) 행과 FV (111)의 곱셈이다. CL(112)는 추가의 프로세싱, 분류, 검출 등을 위해 CPU로 송신된다.The classifier block 106 generates a vector similar to the hierarchical label using a linear classification method. This performs a matrix vector multiplication of the hierarchical matrix C 1301 using the FV 111. [ Hierarchical matrix C 1301 is loaded into the FPGA as feature dictionary D 701. The row arbiter 1303 controls the C (1301) matrix row management for the FV multiplication 111. The C (1301) matrix is a J (1302) x G x S bit matrix. The result is a hierarchical label CL (112) vector. The entry of the CL 112 is a multiplication of the C (1301) row and the FV (111). CL 112 is sent to the CPU for further processing, classification, detection, and the like.

Claims

In an inventive multi-purpose image processing core,
At least one image analyzer block; And
At least one memory interface block
/ RTI >
The image analyzer block comprising:
At least one feature extraction block;
At least one feature summing block; And
At least one classification block
/ RTI >
Wherein the feature extraction block comprises:
Patch acquisition process;
A P vector construction process for assigning each pixel value of the patch to each vector;
An average value calculation process of the P vector;
A binary PB vector construction process for binarizing the P vector based on the average value of the P vector;
A bit flipping distance vector (DV) calculation process of a PB using a dictionary (D);
The average value calculation process of DV;
The standard deviation value calculation process of DV;
An activation threshold (AT) calculation process of DV; And
The pixel feature vector (PFV) calculation process of DV
Gt; image processing core. &Lt; / RTI >

delete

The method according to claim 1,
The image analyzer block
Video frames;
Features Dictionary;
Classification matrix;
Feature calculation request; And
Scarcity multiplier
Gt; A multi-purpose image processing < / RTI >

The method according to claim 1,
The image analyzer block
Feature vector; And
Sort labels
Gt; A multi-purpose image processing < / RTI >

delete

The method according to claim 1,
The patch acquisition process
Writing each new incoming video line to a bottom line FIFO;
Reading the previous video line from the bottom line FIFO and writing to the upper line FIFO when the next video line comes;
Continuing with writing to the bottom line FIFO until all line FIFOs are filled with lines required to build a patch and writing to the upper line FIFO;
If all the lines are available, reading from the line FIFO K times and obtaining the patch;
Obtaining a next pixel patch using a read of K + 1 from the line FIFO;
Continuing the read operation until all patches have been acquired through the video line; And
Moving the video line to an upper line FIFO to produce a downward movement of the patch
Gt; image processing core. &Lt; / RTI >

The method according to claim 1,
Characterized by using a binary PB vector instead of a P vector for the distance vector (DV) computation.

The method according to claim 1,
Characterized by using a binary feature dictionary D instead of a scalar feature dictionary for the distance vector (DV) calculation.

The method according to claim 1,
Characterized by having the possibility to load another feature dictionary during operation according to a scenario. &Lt; RTI ID = 0.0 > [0002] < / RTI >

The method according to claim 1,
The distance vector (DV) calculation process
comparing the PB vector with all the columns of the feature dictionary D by using the xor operation;
calculating a number of entries equal to "1" after the xor operation; And
Constructing the distance vector (DV)
Gt; image processing core. &Lt; / RTI >

11. The method of claim 10,
Wherein said distance vector (DV) is characterized by maintaining the number of entries equal to "1" instead of the result of all xor operations.

The method according to claim 1,
The pixel feature vector (PFV) calculation process
Comparing an entry of DV with an AT;
Assigning "0" if the DV entry is larger than the AT;
Assigning a "1" if the DV entry is smaller than the AT; And
Constructing a pixel feature vector (PFV)
Gt; image processing core. &Lt; / RTI >

13. The method of claim 12,
Wherein the pixel feature vector (PFV) is characterized by maintaining the result of the comparing step between AT and DV entries as a binary value.

The method according to claim 1,
The feature summing block
At least one integral vector calculator for calculating an integral vector IV;
At least one address calculator for calculating an internal RAM address in accordance with a feature calculation request;
At least one feature calculation request FIFO storing a feature calculation request; And
At least one internal RAM < RTI ID = 0.0 > (PFV) < / RTI &
Gt; image processing core. &Lt; / RTI >

15. The method of claim 14,
Wherein the feature summing block is characterized by receiving boundary coordinates of a feature calculation request as pixel values to calculate the remaining coordinates to divide the region into quadrants, which are four identical sub-regions.

15. The method of claim 14,
Characterized in that the feature summing block is characterized by reading the pixel feature vector (PFV) from an external memory and writing an internal RAM to make the calculation faster.

15. The method of claim 14,
The sub-block of the integral vector calculator
Reading the pixel feature vector (PFV) from the internal RAM; And
Calculating the quadrant integral vector QIV by adding all the entries of the previous PFV both in the horizontal and vertical dimensions
Further comprising: an image processing unit for processing the image data;

18. The method of claim 17,
Characterized by making all values of the first pixel feature vector "0" to omit a subtraction operation between a quadrant integration vector and a first pixel feature vector.

The method according to claim 1,
The classification block
At least one C matrix row arbiter for controlling multiplication of a C matrix row and an FV; And
At least one multiplication and addition operator for implementing a matrix vector multiplication operation
Gt; image processing core. &Lt; / RTI >