WO2016157246A1

WO2016157246A1 - Data transfer apparatus and microcomputer

Info

Publication number: WO2016157246A1
Application number: PCT/JP2015/001808
Authority: WO
Inventors: Hanno Lieske
Original assignee: Renesas Electronics Corporation
Priority date: 2015-03-30
Filing date: 2015-03-30
Publication date: 2016-10-06

Abstract

A two-row buffer (3) stores first and second rows. An input buffer (2) stores a third row. A gradient calculator (4) calculates first and second gradient values. A vote calculator (5) calculates a vote amount value. A direction calculator (6) calculates a vote direction value. An output buffer (8) stores accumulated vote amount values. An adder (7) adds the vote amount value to the received accumulated vote amount value and replaces the accumulated vote amount value in the output buffer (8) with the added value. The first gradient value is a difference between values of two pixels in the first and third row. The second gradient value is a difference between values of two pixels in the second row. The four pixels are immediately adjacent to a target pixel in the second row. The output buffer (8) outputs all of the accumulated vote amount values to an outside processor.

Description

DATA TRANSFER APPARATUS AND MICROCOMPUTER

The present invention relates to a data transfer apparatus and a microcomputer.

In the area of object detection, Histogram of Oriented Gradients (HOG) features are widely used as descriptors. In this algorithm, picture gradient orientation occurrences are counted in defined portions.

Therefore, cell histograms are generated where each pixel of a cell adds a weighted vote to an orientation-based histogram channel. The contribution of a pixel can be distinguished by using 1D centred filter in horizontal and vertical direction. Then the arctangent of the outputs, which are gradients in horizontal and vertical direction, specify the orientation while some function of the gradient magnitude is used as weighted vote.

Integral Histograms are used for fast construction of the HOG feature vectors. The advantage here is that after initial Integral Histogram generation, the bin values for each ROI (region of interest) area size can be accessed in constant time, so also the HOG feature vector can be calculated in constant time independent from the ROI size.

PTL1 and NPTL1 are showing two methods for detecting humans in images. Both are utilizing as one part of the processing chain the HOG algorithm. Differences between both HOG implementations are lying in e.g. the usage of fixed sized blocks (NPTL1) against variable sized blocks (PTL1) or usage of L2 normalization (NPTL1) against L1 normalization (PTL1).

The HOG algorithm can be implemented by utilizing the Integral Histogram as shown in NPTL2, which can significantly speed up the HOG vector calculation because of the constant processing time which becomes independent from the ROI size. NPTL2 shows for each pixel the required steps of the Integral Histogram algorithm. Assuming, the data is pixel-wise loaded from memory and the result is pixel-wise stored to memory, the generation of the Integral Histogram takes however a significant time.

PTL 1: US Patent Publication No. 2007/0237387

NPTL 1: P. N. Dalal and B. Triggs, "Histograms of oriented gradients for human detection", Conference on Computer Vision and Pattern Recognition (CVPR), 2005.
NPTL 2: F. Porikli, "Integral histogram: A fast way to extract histograms in Cartesian spaces", IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages I: 829-836, 2005.

PTL1 and NPTL1 are describing a whole chain of processing steps for human detection in images. The HOG algorithm is here only one part, which is described from an algorithmic view, where hardware details are not given. NPTL2 is pixel-wise describing the single tasks of the Integral Histogram algorithm. Also here, hardware details of an implementation are not given.

Therefore, achieving a data transfer apparatus to specifically generate the Integral Histogram using data transferred from outside is required.

An aspect of any one of embodiments is a data transfer apparatus, including: a two-row buffer that stores a first row consisted of consecutive pixels of the image and a second row consisted of consecutive pixels of the image, the first row being immediately adjacent to the second row; an input buffer that stores a third row which is consisted of consecutive pixels of the image and transferred from outside, the third row being immediately adjacent to the second row; a gradient calculator that calculates first and second gradient values; a vote calculator that calculates a vote amount value based on a sum of a square of the first gradient value and a square of the second gradient value; a direction calculator that calculates a vote direction value based on a ratio of the first and second gradient values; an output buffer that stores accumulated vote amount values, and outputs the accumulated vote amount values; and an adder that adds the vote amount value to the accumulated vote amount value, which is received from the output buffer according to the vote direction value, and replaces the accumulated vote amount value, with the newly calculated value. The first gradient value is a difference between values of a first pixel belonging to the third row and a second pixel belonging to the first row. The second gradient value is a difference between values of third and fourth pixels belonging to the second row. The first to fourth pixels are immediately adjacent to a target pixel that is a pixel in the second row. The output buffer outputs all of the accumulated vote amount values to an outside processor that performs a processing using the received accumulated vote amount values.

An aspect of any one of embodiments is a microcomputer, including: a data transfer apparatus including first and second processors; and an external memory that is configured to transfer data to the data transfer apparatus. The first processer includes; a two-row buffer that stores a first row consisted of consecutive pixels of the image and a second row consisted of consecutive pixels of the image, the first row being immediately adjacent to the second row; an input buffer that stores a third row which is consisted of consecutive pixels of the image and transferred from the external memory, the third row being immediately adjacent to the second row; a gradient calculator that calculates first and second gradient values; a vote calculator that calculates a vote amount value based on a sum of a square of the first gradient value and a square of the second gradient value; a direction calculator that calculates a vote direction value based on a ratio of the first and second gradient values; an output buffer that stores accumulated vote amount values, and outputs the accumulated vote amount values; and an adder that adds the vote amount value to the accumulated vote amount value, which is received from the output buffer according to the vote direction value, and replaces the accumulated vote amount value with the newly calculated value. The first gradient value is a difference between values of a first pixel belonging to the third row and a second pixel belonging to the first row. The second gradient value is a difference between values of third and fourth pixels belonging to the second row. The first to fourth pixels are immediately adjacent to a target pixel that is a pixel in the second row. The output buffer outputs all of the accumulated vote amount values to the second processor that performs a processing using the received accumulated vote amount values.

According to an aspect of any one of embodiments, it is possible to achieve a data transfer apparatus to specifically generate the Integral Histogram using data transferred from outside.

Fig. 1 is a block diagram schematically showing a configuration of a data transfer apparatus according to a first embodiment. Fig. 2 is a block diagram schematically showing data transfer for a gradient calculation in a processor according to the first embodiment. Fig. 3 is a block diagram schematically showing the data transfer for the gradient calculation in the processor according to the first embodiment. Fig. 4 is a block diagram schematically showing a row-update operation of an input buffer and a two-row buffer according to the first embodiment. Fig. 5 is a block diagram schematically showing the row-update operation of the input buffer and the two-row buffer according to the first embodiment. Fig. 6 is a block diagram showing an example of a configuration of an adder according to the first embodiment. Fig. 7 is a block diagram showing an example of a configuration of an SIMD array according to the first embodiment. Fig. 8 is a timing chart showing an operation of the data transfer apparatus according to the first embodiment. Fig. 9 is a block diagram showing an initial state before the operation of the data transfer apparatus starts. Fig. 10 is a block diagram showing a state of the data transfer apparatus after a first clock cycle. Fig. 11 is a block diagram showing a state of the data transfer apparatus after a second clock cycle. Fig. 12 is a block diagram showing a state of the data transfer apparatus after a third clock cycle. Fig. 13 is a block diagram showing a state of the data transfer apparatus after a fourth clock cycle. Fig. 14 is a block diagram schematically showing a configuration of a microcomputer according to a second embodiment.

Hereinafter, embodiments shall be explained with reference to the drawings. The same components are denoted by the same reference numerals throughout the drawings, and repeated explanations shall be omitted as necessary.

First Embodiment
A data transfer apparatus according to a first embodiment shall be described. The data transfer apparatus is configured to be able to generate an Integral Histogram during Direct Memory Access (DMA) data transfer from an external memory to Single Instruction Multiple Data (SIMD) array.

Fig. 1 is a block diagram schematically showing a configuration of a data transfer apparatus 100 according to the first embodiment. As shown in Fig. 1, the data transfer apparatus 100 includes a processor 10 and a SIMD array 20. The processor 10 is a Direct Memory Access (DMA) processor that functions as an Integral Histogram Generator (IHG). The processor 10 can access to an external memory 30 and receive data from the external memory 30. The external memory 30 stores data which is the image data consisted of a plurality of rows. Each row is consisted of a plurality of pixels.

Here, a configuration of the processor 10 shall be described in detail. The processor 10 includes a controller 1, an input buffer 2, a two-row buffer 3, a gradient calculator 4, a vote calculator 5, a direction calculator 6, an adder 7, and an output buffer 8.

The controller 1 controls data transfer operations and calculation operations of the input buffer 2, the two-row buffer 3, the gradient calculator 4, the vote calculator 5, the direction calculator 6, the adder 7, and the output buffer 8.

The gradient calculator 4 calculates gradient values based on the pixel values in the rows stored in the input buffer 2 and the two-row buffer 3. Hereinafter, a gradient calculation will be described. When the gradient value of a certain pixel (hereinafter, this pixel is referred to as a target pixel.) in horizontal direction is calculated, values of an immediate left pixel and an immediate right pixel of the target pixel are needed. When the gradient value of the target pixel in vertical direction is calculated, values of an immediate upper pixel and an immediate lower pixel of the target pixel are needed. In sum, for the gradient calculation of each pixel, four pixel values, which are the values of the immediate left, right, upper and lower pixels, are required. In other words, the data from three consecutive rows are required to calculate the gradient values in the horizontal and vertical directions of each pixel.

In this embodiment, the input buffer 2 stores one row and the two-row buffer 3 stores two rows. Fig. 2 is a block diagram schematically showing data transfer for the gradient calculation in the processor according to the first embodiment. Rows R1 and R2 are stored in the row-slots RS1 and RS2 in two-row buffer 3 before the gradient calculation starts. Further, a Row R3 is stored in the input buffer 2 before the gradient calculation starts. In this case, the row R1 is an upper row, the row R2 is a middle row, the row R3 is a lower row, and the rows R1 to R3 are consecutive rows. In this case, the row R1 includes values of six pixels V11 to V16, the row R2 includes values of six pixels V21 to V26, and the row R3 includes values of six pixels V31 to V36. Thus, the immediate upper pixel belongs to the row R1, the immediate left and right pixels belong to the row R2, and the immediate lower pixel belongs to the row R3. In Fig. 2, a value of the target pixel is V22, a value of the upper pixel is V12, a value of the left pixel is V21, a value of the right pixel is V23, and a value of the lower pixel is V32. Therefore, a 1-D centered, point discrete derivative mask in horizontal and vertical direction, e.g., can be used.

The gradient calculator 4 calculates the gradient values in horizontal and vertical direction (also referred to as first and second gradient values) of each target pixel except for the end pixels (i.e., pixel values V21 and V26) in the middle row R2. Thus, the gradient calculator 4 calculates the gradient values in horizontal and vertical direction of each of the pixels (i.e., pixel values V22 to V25). For example, after the gradient values in horizontal and vertical direction are calculated, the target pixel is moved to the adjacent pixel (V22->V23) as shown in Fig. 3. Fig. 3 is a block diagram schematically showing data transfer for the gradient calculation in the processor according to the first embodiment. In Fig. 3, a value of the target pixel is V23, a value of the upper pixel is V13, a value of the left pixel is V22, a value of the right pixel is V24, and a value of the lower pixel is V33.

Then, when the gradient calculator 4 finished calculating the gradient values for the pixels V22 to V25, the rows stored in the input buffer 2 and the two-row buffer 3 are updated. Figs. 4 and 5 are block diagrams schematically showing a row-update operation of the input buffer 2 and the two-row buffer 3 according to the first embodiment. As shown in Fig. 4, the row R3 is firstly transferred from the input buffer 2 to the row-slot RS1 and replaces the row R1. After that, as shown in Fig. 5, the row R2 in the row-slot RS2 and the row R3 in the row-slot RS1 are exchanged each other. Further, a row R4 (V41 to V46), which is an immediate lower row of the row R3, is transferred from the external memory 30 to the input buffer 2 and replaces the row R3. In sum, the middle row R2 before the uploading becomes the upper row after the uploading, the lower row R3 before the uploading becomes the middle row after the uploading, and the lower row R4 is newly loaded. As a result, as compared with the consecutive rows (R1, R2, and R3) before the uploading, the consecutive rows (R2, R3, R4) after the uploading are shifted in the vertical direction by one row.

The gradient calculator calculates the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy according to the following expressions (1) and (2), respectively, for example.
dx = right_pixel_value - left_pixel_value --- (1)
dy = lower_pixel_value - upper_pixel_value --- (2)
The gradient value in the horizontal direction dx and the gradient value in the vertical direction dy are given to the vote calculator 5 and the direction calculator 6.

The vote calculator 5 calculates a vote amount value VA in each clock cycle based on the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy calculated in the previous clock cycle. The vote amount value VA is expressed by a following expression (3).
VA = dx² + dy² --- (3)

The direction calculator 6 calculates a vote direction value VD in each clock cycle based on the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy calculated in the previous clock cycle. The vote amount value VA is expressed by a following expression (4), for example.
VD = arctan(dy/dx) --- (4)
Because an exact value is not needed but only the partition into one of the histogram channels, an approximation of this equation is possible by, e.g., using table look-ups. The vote amount value VA and vote direction value VD are then given to the adder 7.

The output buffer 8 includes the slots for storing accumulated vote amount value for each direction. In this case, the output buffer 8 includes eight slots corresponding to eight directions. The output buffer 8 sends out all accumulated vote values and a multiplexer is selecting one of the accumulated vote values based on the vote direction value VD.

The adder 7 adds the vote amount VA from the vote calculator 5 onto the selected accumulated vote amount value received from the output buffer 8. After that, the adder 7 sends out the accumulated value to the slot of the output buffer 8 which is selected based on the vote direction value VD. Then the added value is stored in the slot corresponding to the vote direction value VD as an updated accumulated vote amount value. Further, the accumulated vote amount values in all slots of the output buffer 8 are output to the SIMD array 20.

Here, an example of the configuration of the adder 7 shall be described. Fig. 6 is a block diagram showing the example of the configuration of the adder 7 according to the first embodiment. As shown in Fig. 6, the adder 7 includes a multiplexer 71, an adder unit 72, and a demultiplexer 73. The multiplexer 71 receives all of the accumulated vote amount values A0 to A7 in the slots SL0 to SL7 of the output buffer 8 and selects one based on the vote direction value VD. In this case, the multiplexer 71 selects the accumulated vote amount value A1 in the slots SL1 of the output buffer 8 based on the vote direction value VD. The adder unit 72 adds the vote amount value VA to the received accumulated vote amount value (A1) and outputs the added value (VA+A1) to the demultiplexer 73. Then, the demultiplexer 73 outputs the added value (VA+A1) to the same slot (A1) which was selected by the multiplexer 71 based on the vote direction value VD. As a result, the value in this slot is updated (A1 -> VA+A1).

The SIMD array 20 adds the accumulated vote values received from the output buffer 8 to the accumulated vote amount values which were generated by the operations for the previous row, for example. Fig. 7 is a block diagram showing an example of a configuration of the SIMD array 20 according to the first embodiment.

The SIMD array 20 includes a pipelined ring bus 21, a buffer array 22, an adder array 23, and a memory element array 24. The pipelined ring bus 21 receives the accumulated vote amount values AVV from the output buffer 8. The buffer array 22 holds the accumulated vote amount values AVV_PRE which were generated by the operations for the previous row. The adder array 23 adds each of the accumulated vote amount values AVV to the corresponding accumulated vote amount values AVV_PRE. The added accumulated vote amount values are stored in the memory element array 24. The buffer array 22 is always filled in time with data from the memory element array 24. The transferred data is the data from the last row for the same horizontal positions.

Hereinafter, operations of the data transfer apparatus 100 at various clock cycles shall be described. Fig. 8 is a timing chart showing the operation of the data transfer apparatus 100 according to the first embodiment. In the Fig. 8, numerical signs T0 to T4 point out ends of clock cycles T0 to T4. In this case, the two-row buffer 3 is already filled with the upper two rows, and the input buffer 2 is already filled with the current row R3 elements, until the end of the clock cycle T0.

In the clock cycle T1 which is a clock cycle after the clock cycle T0, the gradient calculator 4 outputs the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy of the target pixel. In the clock cycle T2 which is a clock cycle after the clock cycle T1, the vote calculator 5 calculates the vote amount value VA and the direction calculator 6 calculates the vote direction value VD. In the clock cycle T3 which is a clock cycle after the clock cycle T2, the operations in the adder 7 and the output buffer 8 are executed based on the vote amount value VA and the vote direction value VD calculated in the clock cycle T2. In the clock cycle T4 which is one or more clock cycle after the clock cycle T3, the SIMD array 20 executes the accumulated vote values addition as described above.

Fig. 9 is a block diagram showing an initial state before the operation of the data transfer apparatus 100 starts. Here, the two-row buffer 3 holds the upper and middle rows, while the input buffer 2 holds the lower row which is received from the external memory 30. Specifically, the row-slot RS1 holds a first row consisted of "1 2 3 4 5 6", the row-slot RS2 holds a second row consisted of "3 4 5 7 8 9", and the input buffer 2 holds a third row consisted of "1 3 5 7 9 2".

First clock cycle
< Operation for a first target pixel >
Fig. 10 is a block diagram showing a state of the data transfer apparatus 100 after the first clock cycle. In the first clock cycle, the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy for the first target pixel "4" (surrounded by a square in Fig. 10) are calculated as expressed by the following expressions (5) and (6). In this case, the right value, left value, lower value, and upper value are "5", "3", "3", and "2", respectively.
dx = right_pixel_value - left_pixel_value = 5-3 = 2 --- (5)
dy = lower_pixel_value - upper_pixel_value = 3-2 = 1 --- (6)

Second clock cycle
< Operation for the first target pixel >
Fig. 11 is a block diagram showing a state of the data transfer apparatus 100 after the second clock cycle. In the second clock cycle, the vote amount value VA for the first target pixel is calculated in the vote calculation unit 5 as expressed by the following expressions (7) based on the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy calculated in the first clock cycle.
VA = dx² + dy² =2² + 1² =5 --- (7)
Further, the vote direction value VD for the first target pixel is calculated in the arctangent calculation unit 6 as expressed by the following expressions (8) based on the dx and dy calculated in the first clock cycle, for example.
VD = floor(arctan(dy/dx) / (180 / sup_dir)) for dy/dx >= 0 and
VD = floor((180 + arctan(dy/dx)) / (180 / sup_dir)) for dy/dx < 0 --- (8)
In this case, the arctangent calculation unit 6 decides the vote direction value VD from the eight directions, so that sup_dir = 8. As dx=2 and dy=1, the vote direction value VD is expressed by the following expressions (9).
VD = floor(arctan(1/2) / (180 / 8)) = 1 --- (9)

< Operation for a second target pixel >
Furthermore, in the second clock cycle, the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy for the second pixel "5" (surrounded by a square in Fig. 11) are calculated as expressed by the following expressions (10) and (11). In this case, the right value, left value, lower value, and upper value are "7", "4", "5", and "3", respectively.
dx = right_pixel_value - left_pixel_value = 7-4 = 3 --- (10)
dy = lower_pixel_value - upper_pixel_value = 5-3 = 2 --- (11)

Third clock cycle
< Operation for the first target pixel >
Fig. 12 is a block diagram showing a state of the data transfer apparatus 100 after the third clock cycle. In the third clock cycle, the adder 7 receives the value from any one of the elements SL0 to SL7 of the output buffer 8 corresponding to the vote direction value VD. Then, the adder 7 adds the newest vote amount VA to the received value from the output buffer 8. After that, the adder 7 sends out the added value to the slot of the output buffer 8 corresponding to the vote direction value VD. Further, all values stored in the slots SL0 to SL7 are transferred to the pipelined ring bus 21 in the SIMD array 20. In this case, as VD=1, the adder receives "0" from the first element SL1 of the output buffer 8. Then, as the newest vote amount VA is "5", the adder 7 adds "5" to "0" and sends out the added value "5" to the first element SL1 of the output buffer 8.

< Operation for the second target pixel >
Further, in the third clock cycle, the vote amount value VA is calculated in the vote calculator 5 as expressed by the following expressions (12). The vote direction value VD is expressed by the following expressions (13).
VA = dx² + dy² =3² + 2² =13 --- (12)
VD = floor(arctan(2/3) / (180/8)) = 1 --- (13)

< Operation for a third target pixel >
Furthermore, in the third clock cycle, the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy for the third pixel "7" (surrounded by a square in Fig. 12) are calculated as expressed by the following expressions (14) and (15) as shown in Fig. 12. In this case, the right value, left value, lower value, and upper value are "8", "5", "7", and "4", respectively.
dx = right_pixel_value - left_pixel_value = 8-5 = 3 --- (14)
dy = lower_pixel_value - upper_pixel_value = 7-4 = 3 --- (15)

Fourth clock cycle
< Operation for the first target pixel >
In the SIMD array 20, the adder array 23 adds the accumulated vote amount values AVV "0 5 0 0 0 0 0 0" stored in the pipelined ring bus 21 to the accumulated vote amount values AVV_PRE "1 1 1 1 1 1 1 1" stored in the buffer array 22, respectively. Note that, the accumulated vote amount values AVV_PRE "1 1 1 1 1 1 1 1" are loaded in advance. Then added values "1 6 1 1 1 1 1 1" are stored in the memory element array 24.

< Operation for the second target pixel >
Fig. 13 is a block diagram showing a state of the data transfer apparatus 100 after the fourth clock cycle. In the fourth clock cycle, as VD=1, the adder 7 receives "5" from the slot SL1 of the output buffer 8. Then, as the newest vote amount value VA is "13", the adder 7 adds "13" to "5" and sends out the added value "18" to the slot SL1 of the output buffer 8. Further, in the pipelined ring bus 21, the accumulated vote values "0 5 0 0 0 0 0 0" stored in the third clock cycle are replaced with the accumulated vote values "0 18 0 0 0 0 0 0" transferred in the fourth clock cycle.

< Operation for the third target pixel >
Further, in the fourth clock cycle, the vote amount value VA is calculated in the vote calculator 5 as expressed by the following expressions (16). The vote direction value VD is expressed by the following expressions (17).
VA = dx² + dy² =3² + 3² =18 --- (16)
VD = floor(arctan(3/3) / (180/8)) = 2 --- (17)

< Operation for a fourth target pixel >
Furthermore, in the fourth clock cycle, the gradient value in the horizontal direction dx and the gradient value in the vertical direction dy for the fourth pixel "8" (surrounded by a square in Fig. 13) are calculated as expressed by the following expressions (18) and (19) as shown in Fig. 13. In this case, the right value, left value, lower value, and upper value are "9", "7", "9", and "5", respectively.
dx = right_pixel_value - left_pixel_value = 9-7 = 2 --- (18)
dy = lower_pixel_value - upper_pixel_value = 9-5 = 4 --- (19)

As describe above, the four clock cycles are needed for one target pixel. Thus, when there are M (M is an integer lager than one) target pixels in a row, M+3 clock cycles are need to perform calculations for all target pixels of a row.

According to the configuration described above, it is possible to efficiently generate the Integral Histogram while using the DMA data transfer for a data read transfer from external to internal memory. The data transfer apparatus receives the rows one by one and stores only three rows, so that size of the input buffer and two-row buffer can be suppressed as compared with the case in which the whole image or a partial area of the image is transferred from the external memory to the data transfer apparatus. Further, the output buffer stores only one accumulated vote amount value for one bin, so that size of the output buffer can be also suppressed.

Further, in each clock cycle, the data transfer apparatus according to the first embodiment updates only one of the accumulated vote amount values for one bin of a pixel in the output buffer and transfers the accumulated vote amount values from the output buffer to the SIMD array. Thus, the data transfer apparatus according to the first embodiment can achieve the rapid histogram generation.

Second Embodiment
Next, a microcomputer 200 according to a second embodiment shall be explained. The microcomputer 200 is an example of a microcomputer in which the data transfer apparatus 100 is incorporated. Fig. 14 is a block diagram schematically showing a configuration of the microcomputer 200 according to the second embodiment. As shown in Fig. 14, the microcomputer 200 includes the data transfer apparatus 100, the external memory 30, and a bus 40. Data can be transferred among the data transfer apparatus 100 and the external memory 30 via the bus 40.

As described above, the data transfer apparatus can be applied to a microcomputer. Therefore, according to the configuration, it is possible to suppress the size of the input buffer, two-row buffer and output buffer in the data transfer apparatus and achieve the rapid histogram generation as well as the first embodiment described above.

Other embodiments
The present invention is not limited to the embodiments described above, and can be modified as appropriate without departing from the scope of the invention. For example, in the above embodiments, it is described that the rows stored in the input buffer and the two-row buffer include the values of the six pixels. However, it is merely an example. Therefore, the number of pixels in the row may be three to five, or more than seven.

In the above embodiments, the input buffer stores the lower row. However, it is merely an example. Thus, the input buffer may store the upper row and the one slot of the two-row buffer may store the lower row. Further, in this case, the input buffer may replace the row stored therein with a more upper row received from the external memory.

In the above embodiments, the input buffer has the size of a whole row. However, this is merely an example. Thus the input buffer can also hold only a few or even only one pixel. In such case, instead of replacing a whole row at the end of the row processing, the pixels of the input buffer are replacing the oldest pixel inside the two-row buffer, which are the pixel on the left side of the first row, during the processing of the pixel row. At the end of the one pixel row processing, both rows of the two-row buffer are then finally swapped.

In the above embodiments, the vote amount value is the sum of the square of the first gradient value and the square of the second gradient value as expressed by the expression (3). However, it is merely an example. For example, the vote amount value may be a square root of the sum of the square of the first gradient value and the square of the second gradient value.

In the above embodiments, it is described that the number of directions is eight. However, it is merely an example. Thus, the number of directions may be arbitrary plural number.

1 CONTROLLER
2 INPUT BUFFER
3 TWO-ROW BUFFER
4 GRADIENT CALCULATOR
5 VOTE CALCULATOR
6 DIRECTION CALCULATOR
7 ADDER
8 AN OUTPUT BUFFER
10 PROCESSOR
20 SIMD ARRAY
21 PIPELINED RING BUS
22 BUFFER ARRAY
23 ADDER ARRAY
24 MEMORY ELEMENT ARRAY
30 EXTERNAL MEMORY
71 MULTIPLEXER
72 ADDER UNIT
73 DEMULTIPLEXER
100 DATA TRANSFER APPARATUS
200 MICROCOMPUTER
A0 to A7 ACCUMULATED VOTE VALUES
SL0 to SL7 SLOTS
VA VOTE AMOUNT VALUE
VD VOTE DIRECTION VALUE

Claims

A data transfer apparatus, comprising:
a two-row buffer that stores a first row consisted of consecutive pixels of the image and a second row consisted of consecutive pixels of the image, the first row being immediately adjacent to the second row;
an input buffer that stores a third row which is consisted of consecutive pixels of the image and transferred from outside, the third row being immediately adjacent to the second row;
a gradient calculator that calculates first and second gradient values;
a vote calculator that calculates a vote amount value based on a sum of a square of the first gradient value and a square of the second gradient value;
a direction calculator that calculates a vote direction value based on a ratio of the first and second gradient values;
an output buffer that stores accumulated vote amount values, and outputs the accumulated vote amount values; and
an adder that adds the vote amount value to the accumulated vote amount value, which is received from the output buffer according to the vote direction value, and replaces the accumulated vote amount value, with the newly calculated value, wherein
the first gradient value is a difference between values of a first pixel belonging to the third row and a second pixel belonging to the first row,
the second gradient value is a difference between values of third and fourth pixels belonging to the second row,
the first to fourth pixels are immediately adjacent to a target pixel that is a pixel in the second row, and
the output buffer outputs all of the accumulated vote amount values to an outside processor that performs a processing using the received accumulated vote amount values.
The data transfer apparatus according to Claim 1, wherein
when the gradient calculator finishes calculating the first gradient values and the second gradient values for a defined number of processed pixel larger zero, the processed pixel of the input buffer are replacing the oldest pixel of the two-row buffer, which are the pixel on the left side of the first row in the two-row buffer, and
when all pixels in one row have been processed, both rows of the two-row buffer are swapped.
The data transfer apparatus according to Claim 1, wherein the direction calculator calculates the vote direction value based on an arctangent of the first and second gradient values.
The data transfer apparatus according to Claim 1, wherein the vote amount value is the sum of the square of the first gradient value and the square of the second gradient value.
The data transfer apparatus according to Claim 1, wherein the vote amount value is a square root of the sum of the square of the first gradient value and the square of the second gradient value.
The data transfer apparatus according to Claim 1, wherein the vote direction value indicates anyone of the N (N is an integer larger than two) directions based on the arctangent of the first and second gradient values.
The data transfer apparatus according to Claim 6, wherein
the output buffer includes N slots corresponding to the N directions, respectively, and
each of the N slots stores the accumulated vote amount value corresponding to the vote direction value.
The data transfer apparatus according to Claim 1, further comprising the outside processor, wherein
the outside processor comprises:
a first buffer that stores the accumulated vote amount values received from the output buffer;
a second buffer that stores temporary from a local memory received accumulated vote amount values for pixels with same horizontal position of a row, the row being immediately adjacent to the current processed row; and
an adder array that adds each of the accumulated vote amount values in the first buffer to each of the accumulated vote amount values in the second buffer, and
the outside processor stores the output of the adder array to a local memory.
A microcomputer, comprising:
a data transfer apparatus including first and second processors; and
an external memory that is configured to transfer data to the data transfer apparatus, wherein
the first processer comprises;
a two-row buffer that stores a first row consisted of consecutive pixels of the image and a second row consisted of consecutive pixels of the image, the first row being immediately adjacent to the second row;
an input buffer that stores a third row which is consisted of consecutive pixels of the image and transferred from the external memory, the third row being immediately adjacent to the second row;
a gradient calculator that calculates first and second gradient values;
a vote calculator that calculates a vote amount value based on a sum of a square of the first gradient value and a square of the second gradient value;
a direction calculator that calculates a vote direction value based on a ratio of the first and second gradient values;
an output buffer that stores accumulated vote amount values, and outputs the accumulated vote amount values; and
an adder that adds the vote amount value to the accumulated vote amount value, which is received from the output buffer according to the vote direction value, and replaces the accumulated vote amount value with the newly calculated value, wherein
the first gradient value is a difference between values of a first pixel belonging to the third row and a second pixel belonging to the first row,
the second gradient value is a difference between values of third and fourth pixels belonging to the second row,
the first to fourth pixels are immediately adjacent to a target pixel that is a pixel in the second row, and
the output buffer outputs all of the accumulated vote amount values to the second processor that performs a processing using the received accumulated vote amount values.