WO2015040450A1 - Multi-purpose image processing core - Google Patents

Multi-purpose image processing core Download PDF

Info

Publication number
WO2015040450A1
WO2015040450A1 PCT/IB2013/058604 IB2013058604W WO2015040450A1 WO 2015040450 A1 WO2015040450 A1 WO 2015040450A1 IB 2013058604 W IB2013058604 W IB 2013058604W WO 2015040450 A1 WO2015040450 A1 WO 2015040450A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
vector
preferred
block
sub
Prior art date
Application number
PCT/IB2013/058604
Other languages
French (fr)
Inventor
Ismail OZSARAC
Ozgür YILMAZ
Omer GUNAY
Original Assignee
Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi filed Critical Aselsan Elektronik Sanayi Ve Ticaret Anonim Sirketi
Priority to KR1020157033283A priority Critical patent/KR101864000B1/en
Priority to PCT/IB2013/058604 priority patent/WO2015040450A1/en
Publication of WO2015040450A1 publication Critical patent/WO2015040450A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/955Hardware or software architectures specially adapted for image or video understanding using specific electronic processors

Definitions

  • This invention is related with an image processing method in FPGA to analyze video frames using neural network based techniques, operating in real-time on embedded platforms. Background of the invention
  • Neural network approaches in vision are becoming increasingly popular due to their performance in complex tasks such as large-scale classification [REF. l] or multi-modal fusion [REF.2].
  • the success is attributed to multiple advantages such as unsupervised feature learning from unlabeled data [REF.3], [REF.4], hierarchical processing via deep architectures [REF.5]-[ REF.7] and exploitation of long-range statistical dependencies using recurrent processing [REF.3], [REF.8].
  • Neural networks approach is orthogonal approach to kernel methods: input is projected onto a nonlinear high dimensional space of hidden units after which even a linear hyper plane is able to partition the data [REF.9].
  • this nonlinear projection is a powerful representation of the visual data, it is possible to utilize it for multiple different tasks, such as classification, detection, tracking, clustering, interest point detection etc.
  • the hidden layer activities that represent the visual input can be multiplexed to many different tasks according to the needs, as it is executed in cortical processing [REF.10].
  • UAV Unmanned Aerial Vehicles
  • Scene recognition, detection of specific objects in an image, classification of moving objects and object tracking are some of the essential visual functions that are required in an autonomous robotic system.
  • Weight and energy specifications of such systems restrict both the number and complexity of visual processing functions, diminishing the operational capacity.
  • a visual processing core that is common to at least a subset of these functions is able to loosen the restrictions.
  • the object of the invention is to provide FPGA implementation of a neural network based image processing core. Detailed description of the invention
  • IP Multi-purpose image processing
  • Figure 2 is the video frame and patch structure.
  • Figure 3 is the flow of the feature extractor.
  • Figure 4 is the structure of the take patch process.
  • Figure 5 is the construction of P vector.
  • Figure 6 is the construction of binary PB vector.
  • Figure 7 is the dictionary D.
  • Figure 8 is the construction of distance vector DV.
  • Figure 9 is the computation of pixel feature vector PFV.
  • Figure 10 is the structure of the feature summer.
  • Figure 11 is the structure of quadrants.
  • Figure 12 is the computation of feature vector FV.
  • Figure 13 is the computation of class label CL.
  • the multi-purpose image processing core (101) is implemented in FPGA (100).
  • the core consists of two main sub- blocks; image analyzer (102) and memory interface (103).
  • Memory interface (103) is responsible for data transfer between image analyzer (102) and external memories (113).
  • Image analyzer (102) block consists of three sub-blocks; feature extractor (104), feature summer (105) and classifier (106).
  • Image analyzer (102) block receives five types of inputs from outside of the FPGA (100); video frames (107), feature dictionary (108), class matrix (110), feature calculation requests (109) and sparsity multiplier (114).
  • the video frames (107) can be defined by two parameters; resolution and frame rate. The resolution is M (row) (201) by N (column) (202) and the frame rate is the number of frames (203) captured in a second.
  • Other inputs; feature dictionary (108), class matrix (110), feature calculation requests (109) and sparsity multiplier (114) will be detailed in the following sections.
  • Feature extractor (104) block starts with take patch (301) process.
  • This process captures the related pixels which are in the selected coordinates of the patch (204) from the video frames (107).
  • the incoming video line (row (201) of the video frames (107)) is written to line FIFO (401).
  • take patch (301) process uses K line FIFO (401).
  • Each incoming video line is firstly written to bottom line FIFO (401), and then when the next video line is coming, the previous one is read from bottom line FIFO (401) and written to upper one. These steps continue until all line FIFOs (401) are filled with the necessary lines to construct patch (204).
  • every pixel value in the patch (204) should be added and then divided by the total number of pixels.
  • the addition process can be realized by the adders; the input number of the adders can be different according to the FPGA capability.
  • the adder input number affects the pipeline clock latency and the number of adders used. After all pixel values are calculated, the total is divided by K*K.
  • each entry of the P vector (501) is compared (601) with ⁇ (602), and binarized to construct the vector PB (603) (304). Binarization step is essential for realizing this image processing algorithm in currently available FPGAs. For the values that are less than ⁇ (602), "0" is assigned. For the values that are equal or greater, "1" is assigned. After all values are compared (601) with mean value, binary P (501) vector PB (603) is obtained. PB (603) is a T (604) by "1" bit vector where T (604) equals to K*K.
  • Every binary vector PB (603) constructed from all the patches (204) in an image are transformed into a feature vector using a pre-computed dictionary that has Z (703) number of visual words.
  • the dictionary D (701) is a T (604) by Z (703) bit matrix. Entities of D (701) are binary values; "1" or "0".
  • the columns of D (701) matrix (DC1 - DCZ (702)) are stored in internal registers of FPGA (100).
  • the dictionary is loaded to FPGA by means of the communication interfaces like PCI, VME etc.
  • the entries of the dictionary can be updated any time since the entries are stored in internal registers.
  • Bit flipping (or Hamming) distance calculation (305) computes the similarity between two vectors: PB (603) and every column (DCl-DCZ (702)) of D (701). If the entries of PB (603) and DCX (702) are the same "0” is assigned, otherwise "1” is assigned. This operation is realized by xor (801) blocks. The total number of "1" values after xor (801) operation is a measure of dissimilarity between the two binary vectors. DV (804) contains the Hamming distance of a single PB (603) vector to all the visual words (columns (702)) in the dictionary.
  • the entries (805) of DV (804) keep the numbers of "l"s, so they are integer values and can be represented by less bits when compared with PB (603) or DCX (702).
  • DV (804) is an H (806) by Z (703) bit vector.
  • H (806) is the minimum number of bits that can define the scalar value T (604).
  • the mean value of DV (804) ( ⁇ ) is computed (306) similar with ⁇ (602).
  • Activation threshold AT (901) is calculating (307) by EQ.l. This threshold is used to construct a sparse representation via nullifying the distance values larger than a specified value.
  • each entry (805) of DV (804) is compared (902) with AT (901). If the entry (805) is greater than AT (901) then assign "0" to related entry (905) of PFV (904), if it is less assign "1". The result is a "1” by Z (703) pixel feature vector PFV (904).
  • pixel feature vector PFV bit vector
  • These PFVs (904) are sent to the memory interface (103) to be written to external memories (113).
  • the feature calculation requests (109) are written to the feature calculation request FIFO (1003), the requests are written as pixel coordinates.
  • the CPU sends the coordinates of two border pixels (upper-left and lower-right, black dots (1101)) and FPGA calculates the rest (white dots (1102)) of the coordinates of the sub- regions.
  • the main idea is to divide a region to four equal sub-regions; quadrants (1103, 1104, 1105, 1106), and pool pixel feature vectors (PFVs (904)) inside the quadrants for dimensionality reduction and concatenate the integral feature vectors to obtain a feature vector (FV (111)).
  • quadrants (1103, 1104, 1105, 1106)
  • PFVs pool pixel feature vectors
  • the internal RAM (1004) addresses are calculated by address calculator (1002) block.
  • This block knows the content of the RAM, namely the line coordinates that are stored.
  • the PFV (904) values are read from external memory (113) and written to internal RAM.
  • the RAM can store R x N (202) x Z (703) bit data. R is the maximum number of lines that can be processed at a time.
  • Integral vector calculator (1001) reads the necessary PFVs (904) from the internal RAM (1004) to calculate the integral vector (1201). Integral vector IV (1201) entry is the summation of the all entries of previous PFVs (904) on both horizontal and vertical dimensions. For example IV11 (1201-11) is equal to PFV11 (904-11), IV12 (1201-12) is equal to IV11 (1201-11) plus PFV 12 (904-12) and rV21 (1201-21) is equal to PFV11 (904-11) plus PFV21 (904-21). The final result is the quadrant integral vector IV22 (1201-22). The pool feature operation requests the difference between IV22 (1201-22) and IV11 (1201-11). So, it is the same to take PFV11 (904-11) as all "0". And the final integral vector IV22 (1201- 22) is the difference and equals to QIV (1103-1).
  • the FV (1202) is G x S bit vector. S is the minimum bit number that can store the all "l"s in the quadrant. G is equal to 4*Z (703).
  • the vector is stored in internal ram of FPGA. This feature vector (FV) represents the image patch defined the border coordinates, and it can be used for classification and clustering purposes, executed either in FPGA or in CPU via memory transfer.
  • Classifier block (106) After the completion of an image region, namely when the pooling on requested coordinates in that region are finished, internal RAM (1004) is updated with new lines, and new pooling calculations are started. These processes are controlled by integral vector calculator (1001) with the aid of address calculator (1002).
  • Classifier block (106) generates a class label likelihood vector using a linear classification method. It performs matrix- vector multiplication of class matrix C (1301) with FV (111).
  • the class matrix C (1301) is loaded to FPGA like feature dictionary D (701).
  • Row arbiter (1303) controls the C (1301) matrix row management for the FV (111) multiplication.
  • the C (1301) matrix is J (1302) x G x S bit matrix.
  • the result is the class label CL (112) vector.
  • the entities of the CL (112) are the addition of the multiplication (1304) of FV (111) with C (1301) rows.
  • the CL (112) is sent to the CPU for further processing, classification, detection etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Object detection, recognition and tracking algorithms are used in many applications in vision. The outputs of these algorithms are essential for situational awareness and decision making. The accuracy and the processing latency of these algorithms are important parameters for the success of the system. This invention enhances the accuracy by enabling the neural network based techniques while fulfilling the latency constraints.

Description

DESCRIPTION
MULTI-PURPOSE IMAGE PROCESSING CORE
Field of the invention
This invention is related with an image processing method in FPGA to analyze video frames using neural network based techniques, operating in real-time on embedded platforms. Background of the invention
Neural network approaches in vision are becoming increasingly popular due to their performance in complex tasks such as large-scale classification [REF. l] or multi-modal fusion [REF.2]. The success is attributed to multiple advantages such as unsupervised feature learning from unlabeled data [REF.3], [REF.4], hierarchical processing via deep architectures [REF.5]-[ REF.7] and exploitation of long-range statistical dependencies using recurrent processing [REF.3], [REF.8]. Neural networks approach is orthogonal approach to kernel methods: input is projected onto a nonlinear high dimensional space of hidden units after which even a linear hyper plane is able to partition the data [REF.9]. Since this nonlinear projection is a powerful representation of the visual data, it is possible to utilize it for multiple different tasks, such as classification, detection, tracking, clustering, interest point detection etc. Thus, after an image or a video block is "analyzed" by a neural network via multi layer processing, the hidden layer activities that represent the visual input can be multiplexed to many different tasks according to the needs, as it is executed in cortical processing [REF.10].
Real-time embedded visual processing needs are growing, with increased demands in intelligent robotic platforms, such as Unmanned Aerial Vehicles (UAV). These systems are expected to navigate and operate in autonomous fashion, and this entails successful implementations of image and video understanding functions. Scene recognition, detection of specific objects in an image, classification of moving objects and object tracking are some of the essential visual functions that are required in an autonomous robotic system. Weight and energy specifications of such systems restrict both the number and complexity of visual processing functions, diminishing the operational capacity. A visual processing core that is common to at least a subset of these functions is able to loosen the restrictions.
In this invention, we are showing that Sparse and over complete image representation is formed in the neural network hidden layers, providing versatility and discriminative power [REF.4], [REF.l l]. Specifically, we show that, which can be embedded in a UAV platform for surveillance and reconnaissance missions. Objects of the invention
The object of the invention is to provide FPGA implementation of a neural network based image processing core. Detailed description of the invention
Multi-purpose image processing (IP) core in order to fulfill the objects of the present invention is illustrated in the attached figures, where: Figure 1 is the schematic of IP core in FPGA with the external components.
Figure 2 is the video frame and patch structure.
Figure 3 is the flow of the feature extractor.
Figure 4 is the structure of the take patch process.
Figure 5 is the construction of P vector.
Figure 6 is the construction of binary PB vector.
Figure 7 is the dictionary D. Figure 8 is the construction of distance vector DV.
Figure 9 is the computation of pixel feature vector PFV.
Figure 10 is the structure of the feature summer.
Figure 11 is the structure of quadrants.
Figure 12 is the computation of feature vector FV.
Figure 13 is the computation of class label CL.
In the preferred embodiment of the invention, the multi-purpose image processing core (101) is implemented in FPGA (100). The core consists of two main sub- blocks; image analyzer (102) and memory interface (103).
Memory interface (103) is responsible for data transfer between image analyzer (102) and external memories (113). Image analyzer (102) block consists of three sub-blocks; feature extractor (104), feature summer (105) and classifier (106). Image analyzer (102) block receives five types of inputs from outside of the FPGA (100); video frames (107), feature dictionary (108), class matrix (110), feature calculation requests (109) and sparsity multiplier (114). The video frames (107) can be defined by two parameters; resolution and frame rate. The resolution is M (row) (201) by N (column) (202) and the frame rate is the number of frames (203) captured in a second. Other inputs; feature dictionary (108), class matrix (110), feature calculation requests (109) and sparsity multiplier (114) will be detailed in the following sections.
Feature extractor (104) block starts with take patch (301) process. This process captures the related pixels which are in the selected coordinates of the patch (204) from the video frames (107). To capture the related pixels, the incoming video line (row (201) of the video frames (107)) is written to line FIFO (401). According to the patch (204) dimension (K), take patch (301) process uses K line FIFO (401). Each incoming video line is firstly written to bottom line FIFO (401), and then when the next video line is coming, the previous one is read from bottom line FIFO (401) and written to upper one. These steps continue until all line FIFOs (401) are filled with the necessary lines to construct patch (204). When all lines are available, with the next line coming, pixel values are read from the line FIFOs (401). After K read operations, the patch is ready for further operations. The K+l read from the line FIFOs (401) gives the next pixel patch. These steps continue until all patches (204) are captured through a line. During patch read from the line FIFOs (401), new lines continue to move to upper line FIFOs (401). This movement generates the patch (204) downward movement through the video frames (107). The P vector (501) is constructed (302) by using the captured patch (204) pixel values. Actually, this construction process is a simple register assignment. There are KxK registers from L1P1 (402) to LKPK (403) and every register keep the related pixel value. The bit size of the registers is determined by the maximum possible pixel value.
To calculate mean value (Ρμ (602)) of P vector (501) (303), every pixel value in the patch (204) should be added and then divided by the total number of pixels. The addition process can be realized by the adders; the input number of the adders can be different according to the FPGA capability. The adder input number affects the pipeline clock latency and the number of adders used. After all pixel values are calculated, the total is divided by K*K.
After calculating the Ρμ (602), each entry of the P vector (501) is compared (601) with Ρμ (602), and binarized to construct the vector PB (603) (304). Binarization step is essential for realizing this image processing algorithm in currently available FPGAs. For the values that are less than Ρμ (602), "0" is assigned. For the values that are equal or greater, "1" is assigned. After all values are compared (601) with mean value, binary P (501) vector PB (603) is obtained. PB (603) is a T (604) by "1" bit vector where T (604) equals to K*K. Every binary vector PB (603) constructed from all the patches (204) in an image are transformed into a feature vector using a pre-computed dictionary that has Z (703) number of visual words. The dictionary D (701) is a T (604) by Z (703) bit matrix. Entities of D (701) are binary values; "1" or "0". The columns of D (701) matrix (DC1 - DCZ (702)) are stored in internal registers of FPGA (100). The dictionary is loaded to FPGA by means of the communication interfaces like PCI, VME etc. The entries of the dictionary can be updated any time since the entries are stored in internal registers.
Bit flipping (or Hamming) distance calculation (305) computes the similarity between two vectors: PB (603) and every column (DCl-DCZ (702)) of D (701). If the entries of PB (603) and DCX (702) are the same "0" is assigned, otherwise "1" is assigned. This operation is realized by xor (801) blocks. The total number of "1" values after xor (801) operation is a measure of dissimilarity between the two binary vectors. DV (804) contains the Hamming distance of a single PB (603) vector to all the visual words (columns (702)) in the dictionary. The entries (805) of DV (804) keep the numbers of "l"s, so they are integer values and can be represented by less bits when compared with PB (603) or DCX (702). DV (804) is an H (806) by Z (703) bit vector. H (806) is the minimum number of bits that can define the scalar value T (604).
The mean value of DV (804) (ϋνμ) is computed (306) similar with Ρμ (602). To calculate standard deviation of DV (804) (DVo) (307), ϋν is subtracted from each entry of DV (805). Then the square of the subtraction is calculated and all the squares are added. Then the total value is divided by Z (703). Finally, the square root is calculated and DVo is obtained. Activation threshold AT (901) is calculating (307) by EQ.l. This threshold is used to construct a sparse representation via nullifying the distance values larger than a specified value.
AT = ϋνμ - (sparity multiplier x DVo)
EQ. l To construct the pixel feature vector (309), each entry (805) of DV (804) is compared (902) with AT (901). If the entry (805) is greater than AT (901) then assign "0" to related entry (905) of PFV (904), if it is less assign "1". The result is a "1" by Z (703) pixel feature vector PFV (904).
As a result, for each pixel of a video frame (107), "1" by Z (703) bit vector (pixel feature vector PFV (904)) is obtained. These PFVs (904) are sent to the memory interface (103) to be written to external memories (113). The feature calculation requests (109) are written to the feature calculation request FIFO (1003), the requests are written as pixel coordinates. The CPU sends the coordinates of two border pixels (upper-left and lower-right, black dots (1101)) and FPGA calculates the rest (white dots (1102)) of the coordinates of the sub- regions. The main idea is to divide a region to four equal sub-regions; quadrants (1103, 1104, 1105, 1106), and pool pixel feature vectors (PFVs (904)) inside the quadrants for dimensionality reduction and concatenate the integral feature vectors to obtain a feature vector (FV (111)).
According to pixel coordinates, the internal RAM (1004) addresses are calculated by address calculator (1002) block. This block knows the content of the RAM, namely the line coordinates that are stored. To make the calculations faster, the PFV (904) values are read from external memory (113) and written to internal RAM. The RAM can store R x N (202) x Z (703) bit data. R is the maximum number of lines that can be processed at a time.
Integral vector calculator (1001) reads the necessary PFVs (904) from the internal RAM (1004) to calculate the integral vector (1201). Integral vector IV (1201) entry is the summation of the all entries of previous PFVs (904) on both horizontal and vertical dimensions. For example IV11 (1201-11) is equal to PFV11 (904-11), IV12 (1201-12) is equal to IV11 (1201-11) plus PFV 12 (904-12) and rV21 (1201-21) is equal to PFV11 (904-11) plus PFV21 (904-21). The final result is the quadrant integral vector IV22 (1201-22). The pool feature operation requests the difference between IV22 (1201-22) and IV11 (1201-11). So, it is the same to take PFV11 (904-11) as all "0". And the final integral vector IV22 (1201- 22) is the difference and equals to QIV (1103-1).
Since there exist four quadrants (Ql (1103), Q2 (1104), Q3 (1105) and Q4 (1106)), all quadrant results (1103-1, 1104-1, 1105-1 and 1106-1) are concatenated and final feature vector FV (1202) is obtained. The FV (1202) is G x S bit vector. S is the minimum bit number that can store the all "l"s in the quadrant. G is equal to 4*Z (703). The vector is stored in internal ram of FPGA. This feature vector (FV) represents the image patch defined the border coordinates, and it can be used for classification and clustering purposes, executed either in FPGA or in CPU via memory transfer. After the completion of an image region, namely when the pooling on requested coordinates in that region are finished, internal RAM (1004) is updated with new lines, and new pooling calculations are started. These processes are controlled by integral vector calculator (1001) with the aid of address calculator (1002). Classifier block (106) generates a class label likelihood vector using a linear classification method. It performs matrix- vector multiplication of class matrix C (1301) with FV (111). The class matrix C (1301) is loaded to FPGA like feature dictionary D (701). Row arbiter (1303) controls the C (1301) matrix row management for the FV (111) multiplication. The C (1301) matrix is J (1302) x G x S bit matrix. The result is the class label CL (112) vector. The entities of the CL (112) are the addition of the multiplication (1304) of FV (111) with C (1301) rows. The CL (112) is sent to the CPU for further processing, classification, detection etc.

Claims

Claims
The inventive multi-purpose image processing core (101) essentially comprises
at least one image analyzer (102) block,
at least one memory interface (103) block.
In the preferred embodiment of the invention, image analyzer (102) block essentially comprises
at least one feature extractor (104) block,
at least one feature summer (105) block,
at least one classifier (106) block.
The image analyzer (102) block according to claim 1 characterized by receiving the following inputs;
video frames (107),
feature dictionary (108),
class matrix (110),
feature calculation requests (109),
sparsity multiplier (114).
The image analyzer (102) block according to claim 1 characterized by generating the following outputs;
feature vectors (111),
class labels (112),
In the preferred embodiment of the invention, feature extractor (104) block essentially comprises the sub-steps of;
take patch (301) process,
construct P vector (302) process,
compute mean value of P vector (303) process, construct binary PB vector (304) process,
calculate bit flipping distance vector (DV) of PB with dictionary D (305) process,
compute mean value of DV (306) process,
compute standard deviation value of DV (307) process,
compute activation threshold AT of DV (308) process,
compute pixel feature vector (PFV) of DV (309) process.
In the preferred embodiment of the invention, take patch (102) process essentially comprises the sub-steps of;
writing the each new incoming video line to bottom line FIFO (401), with the next video line coming, reading the previous one from bottom line FIFO (401) and writing to upper line FIFO (401), continuing the first two steps until all line FIFOs (401) are filled with the necessary lines to construct patch (204),
after all lines are available, reading K times from the line FIFOs (401) and obtaining the patch (204),
obtaining the next pixel patch with the K+l read from the line FIFOs (401),
continuing the read operations until all patches obtained through a video line,
moving the video lines to upper line FIFOs (401) to generate patch (204) downward movement.
In the preferred embodiment of the invention, feature extractor (104) block according to claim 5 characterized by using binary PB (603) vector for the distance calculations (305) between feature dictionary D (701) instead of scalar P (501) vector.
In the preferred embodiment of the invention, feature extractor (104) block according to claim 5 characterized by using binary feature dictionary D (701) for the distance calculations (305) instead of scalar feature dictionary.
9. In the preferred embodiment of the invention, feature extractor (104) block according to claim 5 characterized by having the possibility to load different feature dictionaries D (701) during the operation according the scenario.
10. In the preferred embodiment of the invention, bit flipping distance calculation (305) process essentially comprises the sub-steps of;
comparing PB (603) vector with every column (702) of feature dictionary D (701) by using xor (801) operations,
computing (802) the number of entries which are equal to "1" after xor operations (801),
constructing the distance vector DV (804) (803).
11. In the preferred embodiment of the invention, the distance vector DV (804) according to claim 10 characterized by keeping only the number of entries which are equal to "1" instead of all xor (801) results.
12. In the preferred embodiment of the invention, computing pixel feature vector PFV (309) process essentially comprises the sub-steps of;
- comparing (902) the entries of DV (805) with AT (901),
- assigning "0" if DV entry (805) is greater than AT (901),
- assigning "1" if DV entry (805) is less than AT (901),
constructing pixel feature vector PFV (309) (904).
13. In the preferred embodiment of the invention, the pixel feature vector PFV (904) according to claim 12 characterized by keeping the results of the comparison between DV entries and AT as binary values.
14. In the preferred embodiment of the invention, feature summer (105) block essentially comprises the sub-blocks of;
at least one integral vector calculator (1001) which calculates the integral vector IV (1201),
at least one address calculator (1002) which calculates the internal RAM (1004) addresses according the feature calculation requests (109) ,
at least one feature calculation request FIFO (1003) to store the feature calculation requests (109),
- at least one internal RAM (1004) which stores the PFVs (309).
15. In the preferred embodiment of the invention, the feature summer (105) block according to claim 14 characterized by receiving border coordinates (1101) of the feature calculation requests (109) as pixel values and calculating the other coordinates (1102) to divide a region to four equal sub-regions; quadrants (1103, 1104, 1105, 1106).
16. In the preferred embodiment of the invention, the feature summer (105) block according to claim 14 characterized by reading the PFVs (309) from external memories (113) and writing to internal RAM (1004) to make the calculations faster.
17. In the preferred embodiment of the invention, integral vector calculator (1001) sub-block essentially comprises the sub-steps of;
- reading the PFVs (309) from internal RAM (1004),
computing the quadrant integral vector QIV (1103-1) by adding the all entries of previous PFVs (904) on both horizontal and vertical dimensions.
18. In the preferred embodiment of the invention, integral vector calculator (1001) sub-block according to claim 17 characterized by taking the first PVF (309) value as all "0" to omit the subtraction operation between quadrant integral vector and first PVF.
19. In the preferred embodiment of the invention, classifier (106) block essentially comprises the sub-blocks of;
at least one C matrix row arbiter (1303) which controls the multiplication of C matrix rows with FV (111),
at least one mult & add (1304) operator which realize the matrix- vector multiplication operation.
References
[1] A. Krizhevsky, I. Sutskever, and G. Hinton, "Imagenet classification with deep convolutional neural networks," in Advances in Neural Information Processing Systems 25, 2012, pp. 1106-1114.
[2] N. Srivastava and R. Salakhutdinov, "Multimodal learning with deep boltzmann machines," in Advances in Neural Information Processing Systems 25, 2012, pp. 2231-2239.
[3] G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural computation, vol. 18, no. 7, pp. 1527-1554,2006.
[4] B. A. Olshausen et al., "Emergence of simple-cell receptive field properties by learning a sparse code for natural images," Nature, vol.381, no. 6583, pp. 607- 609, 1996.
[5] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, vol. 86, no. 11, pp. 2278-2324, 1998.
[6] M. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. Lecun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on. IEEE, 2007, pp. 1-8.
[7] Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle, "Greedy layerwise training of deep networks," Advances in neural information processing systems, vol. 19, p. 153, 2007.
[8] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," Cognitive modeling, vol. 1, p. 213, 2002.12
[9] A. Coates, A. Y. Ng, and H. Lee, "An analysis of single-layer networks in unsupervised feature learning," in International Conference on Artificial Intelligence and Statistics, 2011, pp. 215-223.
[10] T. S. Lee, D. Mumford, R. Romero, and V. A. Lamme, "The role of the primary visual cortex in higher level vision," Vision research, vol. 38, no. 15, pp. 2429-2454, 1998. [11] Y.-L. Boureau, F. Bach, Y. LeCun, and J. Ponce, "Learning mid-level features for recognition," in Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE, 2010, pp. 2559-2566.
PCT/IB2013/058604 2013-09-17 2013-09-17 Multi-purpose image processing core WO2015040450A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
KR1020157033283A KR101864000B1 (en) 2013-09-17 2013-09-17 Multi-purpose image processing core
PCT/IB2013/058604 WO2015040450A1 (en) 2013-09-17 2013-09-17 Multi-purpose image processing core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2013/058604 WO2015040450A1 (en) 2013-09-17 2013-09-17 Multi-purpose image processing core

Publications (1)

Publication Number Publication Date
WO2015040450A1 true WO2015040450A1 (en) 2015-03-26

Family

ID=49641805

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2013/058604 WO2015040450A1 (en) 2013-09-17 2013-09-17 Multi-purpose image processing core

Country Status (2)

Country Link
KR (1) KR101864000B1 (en)
WO (1) WO2015040450A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934043B2 (en) 2013-08-08 2018-04-03 Linear Algebra Technologies Limited Apparatus, systems, and methods for providing computational imaging pipeline
CN107992100A (en) * 2017-12-13 2018-05-04 中国科学院长春光学精密机械与物理研究所 High frame frequency image tracking method based on programmable logic array
US11768689B2 (en) 2013-08-08 2023-09-26 Movidius Limited Apparatus, systems, and methods for low power computational imaging

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109525803B (en) * 2017-09-18 2023-09-15 赛灵思电子科技(北京)有限公司 Video structuring processing device and method based on FPGA and artificial intelligence
KR20200015095A (en) 2018-08-02 2020-02-12 삼성전자주식회사 Image processing apparatus and operating method for the same

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101162605B1 (en) * 2011-03-21 2012-07-05 인하대학교 산학협력단 Texture feature extraction method in ct images

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
ADAM COATES ET AL: "An Analysis of Single-Layer Networks in Unsupervised Feature Learning", PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), JMLR W&CP 15, 2011, pages 1 - 9, XP055106213, Retrieved from the Internet <URL:http://ai.stanford.edu/~ang/papers/nipsdlufl10-AnalysisSingleLayerUnsupervisedFeatureLearning.pdf> [retrieved on 20140307] *
BENJAMIN W MARTIN ET AL: "Exploring improvements for simple image classification", SOUTHEASTCON, 2013 PROCEEDINGS OF IEEE, IEEE, 4 April 2013 (2013-04-04), pages 1 - 6, XP032440660, ISBN: 978-1-4799-0052-7, DOI: 10.1109/SECON.2013.6567508 *
DEBOLE M ET AL: "A framework for accelerating neuromorphic-vision algorithms on FPGAs", COMPUTER-AIDED DESIGN (ICCAD), 2011 IEEE/ACM INTERNATIONAL CONFERENCE ON, IEEE, 7 November 2011 (2011-11-07), pages 810 - 813, XP032074042, ISBN: 978-1-4577-1399-6, DOI: 10.1109/ICCAD.2011.6105351 *
FROBA B ET AL: "Face detection with the modified census transform", AUTOMATIC FACE AND GESTURE RECOGNITION, 2004. PROCEEDINGS. SIXTH IEEE INTERNATIONAL CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 17 May 2004 (2004-05-17), pages 91 - 96, XP010949416, ISBN: 978-0-7695-2122-0 *
LIEFU AI ET AL: "Spherical Soft Assignment: Improving Image Representation in Content-Based Image Retrieval", 4 December 2012, ADVANCES IN MULTIMEDIA INFORMATION PROCESSING PCM 2012, SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, PAGE(S) 801 - 810, ISBN: 978-3-642-34777-1, XP047014269 *
MI SUN PARK ET AL: "An FPGA-based accelerator for cortical object classification", DESIGN, AUTOMATION&TEST IN EUROPE CONFERENCE&EXHIBITION (DATE), 2012, IEEE, 12 March 2012 (2012-03-12), pages 691 - 696, XP032320808, ISBN: 978-1-4577-2145-8, DOI: 10.1109/DATE.2012.6176559 *
PHILBIN J ET AL: "Lost in quantization: Improving particular object retrieval in large scale image databases", COMPUTER VISION AND PATTERN RECOGNITION, 2008. CVPR 2008. IEEE CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 23 June 2008 (2008-06-23), pages 1 - 8, XP031297193, ISBN: 978-1-4244-2242-5 *
YUVAL NETZER ET AL: "Reading Digits in Natural Images with Unsupervised Feature Learning", NIPS WORKSHOP ON DEEP LEARNING AND UNSUPERVISED FEATURE LEARNING, 2011, XP055117729 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934043B2 (en) 2013-08-08 2018-04-03 Linear Algebra Technologies Limited Apparatus, systems, and methods for providing computational imaging pipeline
US10360040B2 (en) 2013-08-08 2019-07-23 Movidius, LTD. Apparatus, systems, and methods for providing computational imaging pipeline
US11042382B2 (en) 2013-08-08 2021-06-22 Movidius Limited Apparatus, systems, and methods for providing computational imaging pipeline
US11567780B2 (en) 2013-08-08 2023-01-31 Movidius Limited Apparatus, systems, and methods for providing computational imaging pipeline
US11768689B2 (en) 2013-08-08 2023-09-26 Movidius Limited Apparatus, systems, and methods for low power computational imaging
CN107992100A (en) * 2017-12-13 2018-05-04 中国科学院长春光学精密机械与物理研究所 High frame frequency image tracking method based on programmable logic array
CN107992100B (en) * 2017-12-13 2021-01-15 中国科学院长春光学精密机械与物理研究所 High frame rate image tracking method and system based on programmable logic array

Also Published As

Publication number Publication date
KR20160003020A (en) 2016-01-08
KR101864000B1 (en) 2018-07-05

Similar Documents

Publication Publication Date Title
Lu et al. Monocular semantic occupancy grid mapping with convolutional variational encoder–decoder networks
Milioto et al. Rangenet++: Fast and accurate lidar semantic segmentation
Ye et al. 3d recurrent neural networks with context fusion for point cloud semantic segmentation
Deng et al. RFBNet: deep multimodal networks with residual fusion blocks for RGB-D semantic segmentation
Zhang et al. Latentgnn: Learning efficient non-local relations for visual recognition
Yue-Hei Ng et al. Beyond short snippets: Deep networks for video classification
Li et al. Traffic scene segmentation based on RGB-D image and deep learning
Koyun et al. Focus-and-Detect: A small object detection framework for aerial images
CN108108751B (en) Scene recognition method based on convolution multi-feature and deep random forest
CN108734210B (en) Object detection method based on cross-modal multi-scale feature fusion
Mumuni et al. CNN architectures for geometric transformation-invariant feature representation in computer vision: a review
CN112085123B (en) Point cloud data classification and segmentation method based on salient point sampling
WO2015040450A1 (en) Multi-purpose image processing core
Chen et al. Flexible hardware architecture of hierarchical K-means clustering for large cluster number
Gong et al. Vehicle detection in thermal images with an improved yolov3-tiny
Chen et al. StereoEngine: An FPGA-based accelerator for real-time high-quality stereo estimation with binary neural network
CN115240121B (en) Joint modeling method and device for enhancing local features of pedestrians
Cao et al. Learning spatial-temporal representation for smoke vehicle detection
Wang et al. Multi‐scale pedestrian detection based on self‐attention and adaptively spatial feature fusion
Pavlov et al. Detection and recognition of objects on aerial photographs using convolutional neural networks
Gong et al. FastRoadSeg: Fast monocular road segmentation network
Shen et al. Infrared object detection method based on DBD-YOLOv8
Li et al. HoloSeg: An efficient holographic segmentation network for real-time scene parsing
Zhang et al. Robust object detection in aerial imagery based on multi-scale detector and soft densely connected
Kim et al. Image recognition accelerator design using in-memory processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13795578

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2015/1200.1

Country of ref document: KZ

ENP Entry into the national phase

Ref document number: 20157033283

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13795578

Country of ref document: EP

Kind code of ref document: A1