WO2017006512A1 - Dispositif de traitement arithmétique - Google Patents

Dispositif de traitement arithmétique Download PDF

Info

Publication number
WO2017006512A1
WO2017006512A1 PCT/JP2016/002680 JP2016002680W WO2017006512A1 WO 2017006512 A1 WO2017006512 A1 WO 2017006512A1 JP 2016002680 W JP2016002680 W JP 2016002680W WO 2017006512 A1 WO2017006512 A1 WO 2017006512A1
Authority
WO
WIPO (PCT)
Prior art keywords
input
systolic array
output
row
data
Prior art date
Application number
PCT/JP2016/002680
Other languages
English (en)
Japanese (ja)
Inventor
智義 船▲崎▼
智章 尾崎
Original Assignee
株式会社デンソー
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社デンソー filed Critical 株式会社デンソー
Publication of WO2017006512A1 publication Critical patent/WO2017006512A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Definitions

  • the present disclosure relates to an arithmetic processing device that executes arithmetic operations of a convolutional neural network.
  • the processing performed in the fully connected layer is different from the processing performed in the intermediate layer. For this reason, in order to increase the speed of the arithmetic processing, it is necessary to use dedicated circuits corresponding to the intermediate layer and the total coupling layer, which increases the area of the circuit board mounted on the arithmetic processing device. On the other hand, in the case of sharing a circuit in order to suppress an increase in the area of the circuit board, at least one of the circuit for the entire coupling layer and the circuit for the intermediate layer is configured to match the other. It is necessary to reduce the calculation processing speed.
  • one of the objects of the present disclosure is to provide an arithmetic processing device that suppresses both an increase in the area of a circuit board and a decrease in arithmetic processing speed.
  • An arithmetic processing device is an arithmetic processing device that performs an operation of a convolutional neural network having an intermediate layer and a fully connected layer, and includes a plurality of systolic array cells, a plurality of input switches, and a plurality of Output switch, a convolution operation control unit, and a fully coupled operation control unit.
  • the plurality of input switches are provided corresponding to each of the plurality of systolic array cells, and have a first input terminal, a second input terminal, and a third input terminal.
  • the first input connection state in which the input terminal is connected and the second input connection state in which the second input terminal and the third input terminal are connected are switched.
  • the plurality of output switches are provided corresponding to each of the plurality of systolic array cells, and have a first output terminal, a second output terminal, and a third output terminal.
  • the first output connection state in which the output terminal is connected and the second output connection state in which the first output terminal and the third output terminal are connected are switched.
  • the convolution operation control unit switches the input switch so as to be in the second input connection state and switches the output switch so as to be in the second output connection state when the intermediate layer convolution operation is executed.
  • Control data input to multiple systolic array cells to perform operations.
  • the full coupling calculation control unit switches the input switch so as to be in the first input connection state and the output switch so as to be in the first output connection state when the full coupling calculation of the total coupling layer is executed. , Controlling data input to a plurality of systolic array cells to perform a full join operation.
  • the systolic array cell includes a timing adjustment unit and a calculation unit.
  • the timing adjustment unit adjusts the output timing of data input from the third input terminal of the corresponding input switch, which is an input switch provided corresponding to the systolic array cell, so as to correspond to the systolic array cell. Output to the first output terminal of the corresponding output switch which is the provided output switch.
  • the calculation unit adds the multiplication value obtained by multiplying the data input from the third input terminal of the corresponding input switch by a preset weighting factor and the data input without passing through the corresponding input switch. Is output as cell output data without passing through a corresponding output switch.
  • the arithmetic processing unit thus configured convolves the intermediate layer in the systolic array cell by switching the plurality of input switches to the second input connection state and switching the plurality of output switches to the second output connection state. Arithmetic can be executed. In addition, the arithmetic processing unit switches all the input switches to the first input connection state and switches the plurality of output switches to the first output connection state, thereby performing all coupling operations of all coupling layers on the systolic array cell. Can be executed.
  • the arithmetic processing unit adds a plurality of input switches and a plurality of output switches, in other words, without changing the configuration of the plurality of systolic array cells, A plurality of systolic array cells can be made to perform both the convolution operation and the all connection operation of all connection layers.
  • the arithmetic processing unit has a circuit configuration that matches the other circuit with respect to at least one of the circuit for the convolution operation and the circuit for the fully combined operation. Can be minimized.
  • the arithmetic processing unit suppresses both an increase in circuit board area and a decrease in arithmetic processing speed, and shares a circuit that performs convolution calculation of the intermediate layer and a circuit that performs full coupling calculation of all coupling layers. can do.
  • FIG. 1 is a block diagram illustrating a configuration of the driving support device 1.
  • FIG. 2 is a diagram showing a configuration of the convolutional neural network CNN.
  • FIG. 3 is a diagram for explaining a convolution operation method.
  • FIG. 4 is a diagram for explaining the processing of all the coupling layer groups G2.
  • FIG. 5 is a diagram illustrating an operation executed by all the coupling layers Lj1.
  • FIG. 6 is a block diagram showing a configuration of the arithmetic processing device 4.
  • FIG. 7 is a circuit diagram showing a configuration of the systolic array 11 of the first embodiment.
  • FIG. 8 is a circuit diagram showing the configuration of the systolic array cell 21.
  • FIG. 9 is a diagram for explaining a data output method by the all-join calculation control unit 16 of the first embodiment.
  • FIG. 10 is a circuit diagram showing a configuration of the systolic array 11 of the second embodiment.
  • FIG. 11 is a diagram for explaining a data output method by the all-join calculation control unit 16 according to the second embodiment.
  • FIG. 12 is a circuit diagram showing a configuration of the systolic array 11 of the third embodiment.
  • FIG. 13 is a diagram for explaining a data output method by the all-join calculation control unit 16 of the third embodiment.
  • the driving support device 1 of this embodiment is mounted on a vehicle and includes a camera 2, a storage device 3, an arithmetic processing device 4, an image processing device 5, and a display device 6, as shown in FIG.
  • the driving support device 1 notifies the driver of the presence of a pedestrian by causing the display device 6 to display an image indicating the position of the pedestrian when a pedestrian is present in front of the vehicle.
  • the camera 2 continuously shoots the scenery in front of the host vehicle (hereinafter also referred to as foreground) that the driver can visually recognize through the windshield.
  • the storage device 3 temporarily stores image data captured by the camera 2.
  • the arithmetic processing device 4 acquires image data from the storage device 3, and executes arithmetic processing for detecting whether or not there is a pedestrian in the foreground indicated by the image data.
  • the image processing device 5 generates display data to be displayed on the display device 6 based on the image data from the storage device 3 and the detection result by the arithmetic processing device 4.
  • the display device 6 is a color display device having a display screen such as a liquid crystal display, and displays various images on the display screen in accordance with display data input from the image processing device 5.
  • the arithmetic processing unit 4 detects a pedestrian in the image photographed by the camera 2 using a convolutional neural network (Convolutional Neural Network).
  • a convolutional neural network Convolutional Neural Network
  • the convolutional neural network CNN includes an intermediate layer group G1 and a fully connected layer group G2.
  • the intermediate layer group G1 includes a plurality of intermediate layers Lm1, Lm2,.
  • the total coupling layer group G2 includes one or more total coupling layers Lj1, Lj2,. Further, each of the plurality of intermediate layers Lm1, Lm2,... Includes a convolution layer Lc and a pooling layer Lp.
  • the process of the intermediate layer group G1 will be described by taking as an example the case where the intermediate layer group G1 is composed of three intermediate layers Lm1, Lm2, and Lm3.
  • the intermediate layer Lm1 of the intermediate layer group G1 performs a known convolution operation by scanning the input image D0 with a preset feature extraction filter Fc1 (for example, raster scan).
  • the feature extraction filter Fc1 is configured by arranging weighting coefficients in a two-dimensional matrix in order to extract pedestrian features.
  • One or a plurality of feature extraction filters Fc1 are provided depending on the number of features to be extracted.
  • the convolution operation is performed using, for example, a function shown in the following formula (1).
  • W p, q in the following equation (1) indicates a weighting coefficient located in the q-th column of the p-th row in the feature extraction filter Fc1 of N ⁇ N pixels (p, q, N are positive integers).
  • X i, j in the following expression (1) indicates the value of a pixel located in the (j + q ⁇ 1) th column of the (i + p ⁇ 1) th row in the input image (i, j are positive integers).
  • a feature map Mc1 in which elements located in the i-th row and the j-th column are arranged in a two-dimensional matrix as Y ij in Expression (1) is generated for each feature extraction filter Fc1.
  • a feature map Mc1 in which elements located in the i-th row and the j-th column are arranged in a two-dimensional matrix as Y ij in Expression (1) is generated for each feature extraction filter Fc1.
  • four feature extraction filters Fc1 are used, four feature maps Mc1 are generated.
  • the intermediate layer Lm1 of the intermediate layer group G1 performs a well-known activation process on each element Y ij of the feature map Mc1 after the convolution operation.
  • the activation process is performed using, for example, a ReLU (Rectified Linear Unit) function represented by the following equation (2). Note that the above convolution calculation and activation processing are performed in the convolution layer Lc of the intermediate layer Lm1.
  • the intermediate layer Lm1 reduces the size of the feature map Mc1 by performing a well-known pooling process on the feature map Mc1 after the activation process.
  • the pooling process is a process of dividing the feature map Mc1 into areas of 2 ⁇ 2 pixels, for example, and calculating a value for each divided area using the maximum pooling function shown in the following equation (3). Note that the pooling process is performed in the pooling layer Lp of the intermediate layer Lm1.
  • the intermediate layer Lm2 of the intermediate layer group G1 performs a well-known convolution operation by scanning the feature map Mc1 after the pooling process using a preset feature extraction filter Fc2.
  • the feature extraction filter Fc2 is configured by arranging weighting coefficients in a two-dimensional matrix in order to extract pedestrian features that are more complicated than the feature extraction filter Fc1.
  • One or a plurality of feature extraction filters Fc2 (three in FIG. 3) are provided depending on the number of features to be extracted.
  • the intermediate layer Lm1 generates four feature maps Mc1. Therefore, four feature maps Mc1 are scanned with one feature extraction filter Fc2, and the four computation results obtained by performing the convolution computation of the above equation (1) are cumulatively added to obtain one feature map Mc2. Generate.
  • three feature extraction filters Fc2 are used, three feature maps Mc2 are generated.
  • the intermediate layer Lm2 of the intermediate layer group G1 performs a well-known activation process on each element of the feature map Mc2 after the convolution operation. Note that the above convolution calculation and activation processing are performed in the convolution layer Lc of the intermediate layer Lm2.
  • the intermediate layer Lm2 reduces the size of the feature map Mc2 by performing a well-known pooling process on the feature map Mc2 after the activation process. Note that the pooling process is performed in the pooling layer Lp of the intermediate layer Lm2.
  • the intermediate layer Lm3 of the intermediate layer group G1 performs a known convolution operation by scanning the feature map Mc2 after the pooling process with a preset feature extraction filter Fc3.
  • the feature extraction filter Fc3 is configured by arranging weight coefficients in a two-dimensional matrix in order to extract pedestrian features that are more complicated than the feature extraction filter Fc2.
  • One or plural feature extraction filters Fc3 are provided according to the number of features to be extracted.
  • the intermediate layer Lm2 generates three feature maps Mc2.
  • the three feature maps Mc2 are cumulatively added by scanning the one feature extraction filter Fc3 for each of the three feature maps Mc2, and performing the convolution operation of the above equation (1) to obtain one feature map Mc3. Generate.
  • one feature extraction filter Fc3 since one feature extraction filter Fc3 is used, one feature map Mc3 is generated.
  • the intermediate layer Lm3 of the intermediate layer group G1 performs a well-known activation process on each element of the feature map Mc3 after the convolution operation. Note that the above convolution calculation and activation processing are performed in the convolution layer Lc of the intermediate layer Lm3.
  • the intermediate layer Lm3 reduces the size of the feature map Mc3 by performing a well-known pooling process on the feature map Mc3 after the activation process.
  • the above pooling process is performed in the pooling layer Lp of the intermediate layer Lm3.
  • the intermediate layer Lm3 outputs the feature map Mc3 after the pooling process to the all connected layer group G2.
  • Wds is set.
  • the detection windows Wd1, Wd2, Wd3, Wd4,..., Wds are formed in a rectangular shape having the same shape, and the entire imaging region Rs can be covered with the s detection windows. Arranged in different areas.
  • the total coupling layer Lj2 performs the calculation shown in the following expression (6), and outputs the calculation result as a final calculation result.
  • the arithmetic processing device 4 includes a systolic array 11, an activation processing unit 12, a pooling processing unit 13, a storage unit 14, a convolution calculation control unit 15, a fully coupled calculation control unit 16, and a detection unit 17. Is provided.
  • the systolic array 11 performs a convolution operation on the intermediate layers Lm1, Lm2,... And an operation on all the coupling layers Lj1, Lj2,.
  • Data hereinafter referred to as convolution operation result data
  • operation result data of all-join operations hereinafter referred to as all-join operation result data
  • the activation processing unit 12 performs the above-described activation processing on the convolution operation result data output from the systolic array 11.
  • the pooling processing unit 13 performs the pooling process on the convolution operation result data output from the activation processing unit 12.
  • the storage unit 14 stores the convolution calculation result data output from the pooling processing unit 13.
  • the convolution calculation control unit 15 acquires image data from the storage device 3 and acquires convolution calculation result data from the storage unit 14. Then, the convolution calculation control unit 15 controls the timing of outputting data to the systolic array 11 in order to cause the systolic array 11 to execute the convolution calculation, sets the weighting coefficient (described later) of the systolic array 11, The switches 22 and 23 (described later) of the systolic array 11 are switched.
  • the fully combined calculation control unit 16 acquires convolution calculation result data from the storage unit 14. Then, the full coupling calculation control unit 16 controls the timing of outputting data to the systolic array 11 and sets the weighting coefficient (described later) of the systolic array 11 in order to cause the systolic array 11 to execute the full coupling calculation. The switches 22 and 23 (described later) of the systolic array 11 are switched.
  • the convolution operation control unit 15 while the convolution operation control unit 15 is operating, the fully combined operation control unit 16 does not operate. Similarly, the convolution operation control unit 15 does not operate while the fully combined operation control unit 16 is operating.
  • the detection unit 17 detects a pedestrian in the image taken by the camera 2 based on the total coupling calculation result data output from the systolic array 11 and outputs detection data indicating the detection result to the image processing device 5. To do.
  • the systolic array 11 includes a plurality of systolic array cells 21 arranged in a two-dimensional matrix with (k + 1) rows ⁇ (l + 1) columns, and a plurality of systolic array cells 21.
  • a plurality of input switches 22 and output switches 23 provided corresponding thereto are provided (k and l are integers of 1 or more).
  • the input switch 22 includes two input terminals 22a and 22b and one output terminal 22c.
  • the output terminal 22c is connected to the corresponding systolic array cell 21.
  • the output switch 23 includes one input terminal 23a and two output terminals 23b and 23c.
  • the input terminal 23a is connected to the corresponding systolic array cell 21.
  • the systolic array cell 21 located in the j-th column (j 1, 2,..., L + 1) in the first row (the systolic array cells a 0,0 , a 0,1 , a 0,2 in FIG. 7). ,... (See a 0, l ) is connected to the all-coupling operation control unit 16 and the convolution operation result data is input from the all-coupling operation control unit 16. .
  • Input terminal 22a is connected to the output terminal 23b of the output switch 23 of the systolic array cell 21 located in the jth column of the (i-1) th row.
  • the input terminal 22b of the input switch 22 corresponding to ak, 0 ) is connected to the convolution operation control unit 15, and image data and convolution operation result data are input from the convolution operation control unit 15.
  • the input terminal 22b of the input switch 22 corresponding to (..., ak, l ) is connected to the output terminal 23c of the output switch 23 of the systolic array cell 21 located in the (j-1) th column of the i-th row. Connected.
  • the systolic array 11 includes (k + 1) adders 24 and (k + 1) flip-flop circuits 25.
  • the (k + 1) adders 24 are provided corresponding to the (k + 1) systolic array cells 21 positioned in the (l + 1) th column, respectively, and data from the corresponding systolic array cells 21 are input. Is done.
  • (K + 1) flip-flop circuits 25 are provided corresponding to (k + 1) adders 24, respectively, and adjust the timing of outputting data indicating the addition results from the corresponding adders 24.
  • the systolic array cell 21 includes a timing adjustment unit 30 and a calculation unit 40 as shown in FIG.
  • the timing adjustment unit 30 is for adjusting the timing of data input from the input switch 22 and outputting the data to the output switch 23, and includes flip-flop circuits 31 and 32. When data is input to the data input terminal, the flip-flop circuits 31 and 32 output the input data from the data output terminal at a preset output timing.
  • the data input terminal of the flip-flop circuit 31 is connected to the output terminal 22 c of the input switch 22, and the data output terminal of the flip-flop circuit 31 is connected to the data input terminal of the flip-flop circuit 32.
  • the data output terminal of the flip-flop circuit 32 is connected to the input terminal 23 a of the output switch 23.
  • the arithmetic unit 40 includes a register 41, a multiplier 42, an adder 43, and a flip-flop circuit 44.
  • a convolution calculation weighting coefficient is set by the convolution calculation control unit 15.
  • the weighting coefficient for the full join calculation is set by the full join calculation control unit 16.
  • the multiplier 42 calculates a multiplication value of the data output from the timing adjustment unit 30 and the data set in the register 41, and outputs data indicating the multiplication value.
  • the adder 43 calculates an added value of the data output from the multiplier 42 and the data output from the preceding systolic array cell 21 and outputs data indicating the added value.
  • the preceding systolic array cell 21 in the systolic array cell 21 located in the i-th row and the j-th column is the systolic array cell 21 located in the (j ⁇ 1) -th column of the i-th row.
  • the flip-flop circuit 44 When data is input to the data input terminal, the flip-flop circuit 44 outputs the input data from the data output terminal at a preset output timing.
  • the data input terminal of the flip-flop circuit 44 is connected to the adder 43.
  • the data output terminal of the flip-flop circuit 44 is connected to the adder 43 of the subsequent systolic array cell 21. Note that the subsequent systolic array cell 21 in the systolic array cell 21 located in the j-th column of the i-th row is the systolic array cell 21 located in the (j + 1) -th column of the i-th row.
  • the convolution operation control unit 15 switches the switches 22 and 23 of the systolic array 11 to execute the convolution operation. Specifically, the convolution operation control unit 15 selects the input terminal 22b among the two input terminals 22a and 22b with respect to the input switch 22, and outputs the data input from the input terminal 22b from the output terminal 22c. Switch the data input / output path to Further, the convolution operation control unit 15 selects the output terminal 23c of the two output terminals 23b and 23c for the output switch 23, and outputs the data input from the input terminal 23a to the output terminal 23c. Switch the input / output path.
  • the convolution operation control unit 15 sets the weight coefficient of the convolution operation in the register 41 of the systolic array cell 21 when executing the convolution operation.
  • the full join calculation control unit 16 switches the switches 22 and 23 of the systolic array 11 to execute the full join calculation when executing the full join calculation. Specifically, the full coupling operation control unit 16 selects the input terminal 22a of the two input terminals 22a and 22b for the input switch 22, and outputs the data input from the input terminal 22a from the output terminal 22c. Switch the data input / output path to Further, the full coupling operation control unit 16 selects the output terminal 23b from the two output terminals 23b and 23c for the output switch 23, and outputs the data input from the input terminal 23a from the output terminal 23b. Switch the input / output path.
  • the full join calculation control unit 16 sets the weight coefficient of the full join calculation in the register 41 of the systolic array cell 21 when executing the full join calculation.
  • the total coupling layer group G2 includes all coupling layers Lj1, Lj2,..., Ljv (v is an integer of 1 or more).
  • the fully connected layers Lj1, Lj2,..., Ljv execute the fully connected operation using the matrices W 1 , W 2 ,.
  • the matrices W 1 , W 2 ,..., Wv are (m 0 ⁇ m 1 ) matrix, (m 1 ⁇ m 2 ) matrix,..., (M v ⁇ 1 ⁇ m v ) matrix, respectively. is there.
  • W T indicates a transposed matrix of the matrix W.
  • the all-coupling operation control unit 16 outputs convolution operation result data to the systolic array 11 every time a preset output period ⁇ t elapses.
  • the elapsed time t from the start of the output of the convolution operation result data is n ⁇ ⁇ t (n is an integer of 0 or more)
  • the j-th column (j 1, 2,...) Of the first row. , L + 1)
  • the convolution calculation result data output to the systolic array cell 21 is xn + 1-j, j-1 .
  • the arithmetic processing unit 4 configured in this manner includes a plurality of systolic array cells 21, a plurality of input switches 22, a plurality of output switches 23, a convolution operation control unit 15, a fully coupled operation control unit 16, and the like. Is provided.
  • the plurality of input switches 22 are provided corresponding to each of the plurality of systolic array cells 21 and have input terminals 22a and 22b and an output terminal 22c.
  • the plurality of input switches 22 have a state in which the input terminal 22a and the output terminal 22c are connected (hereinafter referred to as a first input connection state), and a state in which the input terminal 22b and the output terminal 22c are connected (hereinafter referred to as the first input connection state).
  • the second input connection state ).
  • the plurality of output switches 23 are provided corresponding to each of the plurality of systolic array cells 21 and have an input terminal 23a and output terminals 23b and 23c.
  • the plurality of output switches 23 includes a state in which the input terminal 23a and the output terminal 23b are connected (hereinafter referred to as a first output connection state), and a state in which the input terminal 23a and the output terminal 23c are connected (hereinafter referred to as the first output connection state).
  • the second output connection state ).
  • the convolution operation control unit 15 switches the input switch 22 so as to be in the second input connection state when the convolution operation of the intermediate layers Lm1, Lm2,.
  • the output switch 23 is switched so that the data input to the plurality of systolic array cells 21 is controlled in order to execute the convolution operation of the intermediate layer.
  • the all-coupling operation control unit 16 switches the input switch 22 so as to be in the first input connection state when the all-coupling operation of all the coupling layers Lj1, Lj2,.
  • the output switch 23 is switched so as to be in a connected state, and data input to the plurality of systolic array cells 21 is controlled in order to execute a full coupling operation.
  • the systolic array cell 21 includes a timing adjustment unit 30 and a calculation unit 40.
  • the timing adjustment unit 30 adjusts the output timing of data input from the output terminal 22 c of the input switch 22 and outputs the data to the input terminal 23 a of the output switch 23.
  • the arithmetic unit 40 adds an addition value obtained by adding a multiplication value obtained by multiplying data input from the output terminal 22c of the input switch 22 by a preset weighting factor and data input without passing through the input switch 22.
  • the output is output without going through the output switch 23.
  • the arithmetic processing unit 4 configured as described above allows the systolic array cell 21 to switch the plurality of input switches 22 to the second input connection state and the plurality of output switches 23 to the second output connection state. It is possible to execute the convolution operation of the intermediate layers Lm1, Lm2,. In addition, the arithmetic processing unit 4 switches the plurality of input switches 22 to the first input connection state and the plurality of output switches 23 to the first output connection state, so that the systolic array cell 21 has all the coupling layers Lj1,. All join operations of Lj2,... Can be executed.
  • the arithmetic processing unit 4 adds a plurality of input switches 22 and a plurality of output switches 23, in other words, without changing the configuration of the plurality of systolic array cells 21. Can perform both the convolution operation of the intermediate layers Lm1, Lm2,... And the full connection operation of all the connection layers Lj1, Lj2,.
  • the arithmetic processing unit 4 is a circuit in which at least one of the circuit for the convolution operation and the circuit for the fully concatenation operation is matched with the other circuit in order to share the convolution operation and the full connection operation.
  • the configuration can be minimized.
  • the arithmetic processing unit 4 suppresses both an increase in the area of the circuit board and a decrease in the arithmetic processing speed, and performs a convolution operation of the intermediate layers Lm1, Lm2,. .. Can be shared with a circuit that performs a full join operation of Lj2,.
  • the total coupling layer group G2 includes all coupling layers Lj1, Lj2,..., Ljv, and all the coupling layers Lj1, Lj2,..., Ljv are respectively matrixes W 1 , W 2 ,. .., Suppose that Wv is used to perform a full join operation.
  • the weighting coefficient w i, j shown in the above equation (9) is set.
  • the arithmetic processing unit 4 does not need to execute all the coupling operations of all the coupling layers Lj1, Lj2,..., Ljv in the order of all the coupling layers Lj1, Lj2,. All join operations of the layers Lj1, Lj2,..., Ljv can be executed together. For this reason, the arithmetic processing unit 4 can reduce the amount of calculation required for executing the fully connected calculation of all the connected layers Lj1, Lj2,..., Ljv.
  • total binding layer group G2 is composed of two full bonds layer Lj1, LJ2, total binding layer Lj1 is, a matrix X of s rows ⁇ m 0 columns, the matrix of m o rows ⁇ m 1 column It is assumed that a matrix product Y with W 1 is calculated, and that all coupling layers Lj2 calculate a matrix product Z of a matrix Y with s rows ⁇ m 1 columns and a matrix W 2 with m 1 rows ⁇ m 2 columns.
  • W (W 1 ⁇ W 2 ) in advance
  • W is a matrix W 1 of the m o rows ⁇ m 1 row, since a matrix product of a matrix W 2 of m 1 row ⁇ m 2 columns, a matrix of m o rows ⁇ m 2 columns.
  • the input terminal 22 a of the input switch 22 of the systolic array cell 21 located in the first row is connected to the all-joining calculation control unit 16. Further, the input terminal 22a of the input switch 22 of the systolic array cell 21 located in the second row or more is the output terminal of the output switch 23 in the systolic array cell 21 in which the row is one smaller and the column is arranged at the same position. 23b.
  • the all-join calculation control unit 16 sets the input switch 22 of the systolic array cell 21 located in the j-th column of the first row when the elapsed time t is n ⁇ ⁇ t (n is an integer of 0 or more). Control is performed so that x n ⁇ j + 1, j ⁇ 1 having a value of 0 is input to the input terminal 22a as the convolution operation result data represented by a two-dimensional matrix when (n ⁇ j + 1) is less than 0. . As a result, the systolic array 11 can calculate the data R i ⁇ 1 shown in the above equation (10) as the total coupling calculation result data.
  • the input terminal 22a is the first input terminal
  • the input terminal 22b is the second input terminal
  • the output terminal 22c is the third input terminal
  • the input terminal 23a is the first output terminal
  • the output terminal 23b is the second output terminal
  • the output terminal 23c is a third output terminal.
  • the driving support device 1 of the second embodiment is different from the first embodiment in the configuration of the systolic array 11.
  • the systolic array 11 of the second embodiment includes two array cell groups 61 and 62 as shown in FIG.
  • the array cell group 61 includes nine systolic array cells 21 (systolic array cells a 0,0 , a 0,1 , a 0,2 ,...) Arranged in a two-dimensional matrix with 3 rows ⁇ 3 columns. , A 2 , 2 ).
  • the array cell group 62 includes nine systolic array cells 21 (see systolic array cells b 0,0 , b 0,1 , b 0,2 ) arranged in a two-dimensional matrix with 3 rows ⁇ 3 columns. Prepare. In FIG. 10, among the nine systolic array cells 21 included in the array cell group 62, systolic array cells b 0,0 , b 0,1 , b 0,2 are shown.
  • the systolic array 11 includes switches 71, 72, 73, 74, adders 81, 82, 83, 84, flip-flop circuits 91, 92, 93, 94, and switches 101, 102, 103, 104. Prepare.
  • the switches 71, 72, 73, and 74 include input terminals 71a, 72a, 73a, and 74a, input terminals 71b, 72b, 73b, and 74b, and output terminals 71c, 72c, 73c, and 74c, respectively.
  • the switches 101, 102, 103, and 104 include input terminals 101a, 102a, 103a, and 104a, output terminals 101b, 102b, 103b, and 104b, and output terminals 101c, 102c, 103c, and 104c, respectively.
  • Data indicating a preset initial value is input to the two input terminals 71 a and 71 b of the switch 71.
  • the adder 81 includes the data from the output terminal 71c of the switch 71 and the systolic array cell 21 located in the third column of the first row of the array cell group 61 (see systolic array cells a0 and 2 in FIG. 10). And the data indicating the result of the addition are output.
  • the flip-flop circuit 91 adjusts the timing at which data indicating the addition result from the adder 81 is output. Data indicating the addition result from the adder 81 is input to the input terminal 101 a of the switch 101 via the flip-flop circuit 91.
  • the output terminal 101b and the output terminal 101c of the switch 101 are connected to the input terminal 73a of the switch 73 and the input terminal 72a of the switch 72, respectively.
  • Data indicating an initial value set in advance is input to the input terminal 72 b of the switch 72.
  • the adder 82 includes data from the output terminal 72c of the switch 72 and the systolic array cell 21 located in the third column of the second row of the array cell group 61 (see systolic array cells a1 and 2 in FIG. 10). And the data indicating the result of the addition are output.
  • the flip-flop circuit 92 adjusts the timing at which data indicating the addition result from the adder 82 is output.
  • Data indicating the addition result from the adder 82 is input to the input terminal 102 a of the switch 102 via the flip-flop circuit 92.
  • the output terminal 102b and the output terminal 102c of the switch 102 are connected to the input terminal 73b of the switch 73 and the input terminal 74b of the switch 74, respectively.
  • the adder 83 receives the data from the output terminal 73c of the switch 73 and the systolic array cell 21 located in the third column of the third row of the array cell group 61 (see systolic array cells a2 and 2 in FIG. 10). And the data indicating the result of the addition are output.
  • the flip-flop circuit 93 adjusts the timing at which data indicating the addition result from the adder 83 is output. Data indicating the addition result from the adder 83 is input to the input terminal 103 a of the switch 103 via the flip-flop circuit 93.
  • the output terminal 103 c of the switch 103 is connected to the input terminal 74 a of the switch 74.
  • the flip-flop circuit 94 adjusts the timing at which data indicating the addition result from the adder 84 is output. Data indicating the addition result from the adder 84 is input to the input terminal 104 a of the switch 104 via the flip-flop circuit 94.
  • the convolution operation control unit 15 switches the switches 71 to 74 and the switches 101 to 104 when executing the convolution operation. Specifically, the convolution operation control unit 15 selects the input terminal 71a of the two input terminals 71a and 71b for the switch 71, and outputs the data input from the input terminal 71a from the output terminal 71c. The data input / output path is switched as follows. In addition, the convolution operation control unit 15 selects the output terminal 101c of the two output terminals 101b and 101c for the switch 101, and outputs the data input from the input terminal 101a to the output terminal 101c. Switch the output path.
  • the convolution operation control unit 15 selects the input terminals 72a, 73b, and 74a for the switches 72, 73, and 74, respectively.
  • the convolution operation control unit 15 selects the output terminals 102b, 103c, and 104b for the switches 102, 103, and 104, respectively.
  • the all-join calculation control unit 16 switches the switches 71 to 74 and the switches 101 to 104 when executing the all-join calculation. Specifically, the full coupling operation control unit 16 selects the input terminals 71b, 72b, 73a, and 74b for the switches 71, 72, 73, and 74, respectively. Further, the all-join operation control unit 16 selects the output terminals 101b, 102c, 103b, and 104c for the switches 101, 102, 103, and 104, respectively.
  • the fully coupled operation control unit 16 sets the weighting coefficient w i, j shown in the above equation (9) in the register 41 of the systolic array cell 21.
  • the weight coefficients w 0,0 , w 0,1 , w 0,2 are respectively systolic array cells positioned in the first column, the second column, and the third column of the first row of the array cell group 61. 21 (systolic array cells a 0,0 , a 0,1 , a 0,2 in FIG. 10) are set in the register 41.
  • the weighting factors w 0,3 , w 0,4 , w 0,5 are respectively assigned to the systolic array cells 21 (see FIG.
  • 10 systolic array cells a2,0 , a2,1 , a2,2 are set in the register 41.
  • the weighting factors w 1,0 , w 1,1 , w 1,2 are respectively assigned to the systolic array cells 21 (see FIG. 10 systolic array cells a 1 , 0 , a 1 , 1 , a 1 , 2 ) are set in the register 41.
  • the weight coefficients w 1,3 , w 1,4 , w 1,5 are respectively assigned to the systolic array cells 21 (see FIG. 5) located in the first, second, and third columns of the first row of the array cell group 62.
  • 10 systolic array cells b 0,0 , b 0,1 , b 0,2 are respectively assigned to the systolic array cells 21 (see FIG. 5) located in the first, second, and third columns of the first row of the array cell group 62.
  • the all-coupling operation control unit 16 outputs the convolution operation result data to the systolic array 11 every time a preset output cycle ⁇ t elapses.
  • (n ⁇ j) ⁇ 0, x n ⁇ j, j + 2 0.
  • data (R 0 + R 2 ) represented by the following expression (11) is output from the output terminal 103 b of the switch 103. Further, data (R 1 + R 3 ) shown in the following expression (12) is output from the output terminal 104 c of the switch 104.
  • a plurality of systolic array cells 21 are arranged in a two-dimensional matrix so that the number of rows is six and the number of columns is three. Further, the full join calculation is performed using a full join calculation matrix that is a matrix having 6 rows and 2 columns.
  • the value obtained by dividing the fractional value obtained by dividing 6 which is the number of rows of the total join operation matrix by 3 which is the number of columns of the plurality of systolic array cells 21 is 2 is 2.
  • the all-join operation control unit 16 then applies 6 systolic array cells 21 to the six systolic array cells 21 arranged in two rows among the six rows of the plurality of systolic array cells 21. Control is performed so that pieces of convolution calculation result data are input.
  • the arithmetic processing unit 4 executes the full coupling operation using the systolic array 11 even when the plurality of systolic array cells 21 constituting the systolic array 11 are arranged for the convolution operation. can do.
  • a value calculated by the matrix product of the matrices W 1 , W 2 ,..., Wv is set as the weight coefficient w i, j in the registers 41 of the plurality of systolic array cells 21.
  • the arithmetic processing unit 4 can reduce the amount of calculation required for executing the fully connected calculation of all the connected layers Lj1, Lj2,..., Ljv.
  • the plurality of systolic array cells 21 arranged in a two-dimensional matrix are divided into two row sets, with two adjacent rows as one row set. Further, the input terminal 22 a of the input switch 22 of the systolic array cell 21 located in the row having the smallest row number among the rows constituting the row set is connected to the all-coupling operation control unit 16.
  • the input terminal 22a of the input switch 22 of the systolic array cell 21 located in a row other than the row having the smallest row number among the rows constituting the row set is the systolic array cell 21 in which the row is one smaller and the column is the same position. Is connected to the output terminal 23 b of the output switch 23.
  • the row set with division number 1 is the systolic array cell 21 located in the first and second rows of the array cell group 61.
  • the row set of division number 2 is the systolic array cell 21 located in the third row of the array cell group 61 and the first row of the array cell group 62.
  • Xn ⁇ j ⁇ w + 2, 3 ⁇ (w ⁇ 1) + j ⁇ 1 is input to the input terminal 22a of the input switch 22 of the systolic array cell 21 located in the jth column of the row with the smallest row number.
  • xn ⁇ j ⁇ w + 2, 3 ⁇ (w ⁇ 1) + j ⁇ 1 is convolution operation result data represented by a two-dimensional matrix, and when (n ⁇ j ⁇ w + 2) is less than 0, the value is 0. Become.
  • the systolic array 11 calculates the data (R 0 + R 2 ) represented by the above equation (11) and the data (R 1 + R 3 ) represented by the above equation (12) as the total coupling calculation result data. Can do.
  • the driving support device 1 of the third embodiment is different from the first embodiment in the configuration of the systolic array 11.
  • the systolic array 11 of the third embodiment includes two array cell groups 61 and 62 as shown in FIG.
  • the array cell group 61 includes nine systolic array cells 21 (systolic array cells a 0,0 , a 0,1 , a 0,2 ,...) Arranged in a two-dimensional matrix with 3 rows ⁇ 3 columns. , A 2 , 2 ).
  • the array cell group 62 includes nine systolic array cells 21 (see systolic array cells b 0,0 , b 0,1 , b 0,2 ) arranged in a two-dimensional matrix with 3 rows ⁇ 3 columns.
  • FIG. 12 shows systolic array cells b 0,0 , b 0,1 , b 0,2 among the nine systolic array cells 21 included in the array cell group 62.
  • the input terminal 22 a of the input switch 22 corresponding to the systolic array cell 21 of the array cell group 61 is connected to the all-coupling operation control unit 16. Further, the input terminal 22 a of the input switch 22 corresponding to the systolic array cell 21 located in the first row of the array cell group 62 is connected to the all-coupling operation control unit 16. Then, the convolution calculation result data is input from the fully combined calculation control unit 16.
  • the systolic array 11 includes switches 111 and 112, adders 121 and 122, flip-flop circuits 131 and 132, adders 141 and 142, and flip-flop circuits 151 and 152.
  • the switches 111 and 112 include input terminals 111a and 112a, input terminals 111b and 112b, and output terminals 111c and 112c, respectively.
  • the adder 121 includes the data from the output terminal 111c of the switch 111 and the systolic array cell 21 located in the third column of the first row of the array cell group 61 (see the systolic array cell a0 , 2 in FIG. 12). And the data indicating the result of the addition are output.
  • the flip-flop circuit 131 adjusts the timing at which data indicating the addition result from the adder 121 is output.
  • the adder 141 includes data indicating the addition result input from the adder 121 via the flip-flop circuit 131, and the systolic array cell 21 (the systolic cell in FIG.
  • the flip-flop circuit 151 adjusts the timing at which data indicating the addition result from the adder 141 is output.
  • Data indicating the addition result from the adder 141 is input to the input terminal 112 a of the switch 112 via the flip-flop circuit 151.
  • Data indicating a preset initial value is input to the input terminal 112 b of the switch 112.
  • the flip-flop circuit 132 adjusts the timing at which data indicating the addition result from the adder 122 is output.
  • the adder 142 includes data indicating the addition result input from the adder 122 via the flip-flop circuit 132, and the systolic array cell 21 (the systolic cell in FIG. 12) located in the third column of the first row of the array cell group 62. Data from the trick array cell b 0,2 ) is added, and data indicating the addition result is output.
  • the flip-flop circuit 152 adjusts the timing at which data indicating the addition result from the adder 142 is output.
  • the convolution operation control unit 15 switches the switches 111 and 112 when executing the convolution operation. Specifically, the convolution operation control unit 15 selects the input terminal 111a of the two input terminals 111a and 111b for the switch 111, and outputs the data input from the input terminal 111a from the output terminal 111c. The data input / output path is switched as follows. Further, the convolution operation control unit 15 selects the input terminal 112a of the two input terminals 112a and 112b with respect to the switch 112, and outputs the data input from the input terminal 112a to the output terminal 112c. Switch the output path.
  • the full join calculation control unit 16 switches the switches 111 and 112 when executing the full join calculation. Specifically, the full coupling calculation control unit 16 selects the input terminal 111b of the two input terminals 111a and 111b for the switch 111, and outputs the data input from the input terminal 111b from the output terminal 111c. The data input / output path is switched as follows. Further, the full coupling operation control unit 16 selects the input terminal 112b of the two input terminals 112a and 112b for the switch 112, and inputs the data input from the output terminal 112c so as to output the data input from the input terminal 112b. Switch the output path.
  • the fully coupled operation control unit 16 sets the weighting coefficient w i, j shown in the above equation (9) in the register 41 of the systolic array cell 21.
  • the weight coefficients w 0,0 , w 0,1 , w 0,2 are respectively systolic array cells positioned in the first column, the second column, and the third column of the first row of the array cell group 61. 21 (systolic array cells a 0,0 , a 0,1 , a 0,2 in FIG. 12) are set in the register 41.
  • the weighting factors w 0,3 , w 0,4 , w 0,5 are respectively assigned to the systolic array cells 21 (see FIG.
  • the weight coefficients w 1,0 , w 1,1 , w 1,2 are respectively assigned to the systolic array cells 21 (see FIG. 5) located in the first, second, and third columns of the third row of the array cell group 61. 12 of the systolic array cells a2,0 , a2,1 , a2,2 ).
  • weight coefficients w 1,3 , w 1,4 , w 1,5 are respectively assigned to the systolic array cells 21 (see FIG. 5) located in the first column, the second column, and the third column of the first row of the array cell group 62. 12 systolic array cells b 0,0 , b 0,1 , b 0,2 ).
  • the all-join operation control unit 16 then converts the convolution computation result data output from the intermediate layer group G1 to the all-join layer group G2 into the first, second, third row of the array cell group 61 and the first row of the array cell group 62.
  • the all-coupling operation control unit 16 outputs the convolution operation result data to the systolic array 11 every time a preset output cycle ⁇ t elapses.
  • (n ⁇ j) ⁇ 0, x n ⁇ j, j + 2 0.
  • a plurality of systolic array cells 21 are arranged in a two-dimensional matrix so that the number of rows is six and the number of columns is three. Further, the full join calculation is performed using a full join calculation matrix that is a matrix having 6 rows and 2 columns.
  • the value obtained by dividing the fractional value obtained by dividing 6 which is the number of rows of the total join operation matrix by 3 which is the number of columns of the plurality of systolic array cells 21 is 2 is 2.
  • the all-join operation control unit 16 assigns two systolic array cells 21 to the six convolution operation result data for the all-join operation, and the two assigned systolic array cells. 21 is controlled so that one corresponding convolution operation result data is input.
  • the arithmetic processing unit 4 executes the full coupling operation using the systolic array 11 even when the plurality of systolic array cells 21 constituting the systolic array 11 are arranged for the convolution operation. can do.
  • a value calculated by the matrix product of the matrices W 1 , W 2 ,..., Wv is set as the weight coefficient w i, j in the registers 41 of the plurality of systolic array cells 21.
  • the arithmetic processing unit 4 can reduce the amount of calculation required for executing the fully connected calculation of all the connected layers Lj1, Lj2,..., Ljv.
  • the value obtained by dividing the fractional value obtained by dividing 6 which is the number of rows of the total join operation matrix by 3 which is the number of columns of the plurality of systolic array cells 21 is 2 is 2.
  • two rows are divided into two row sets with one row set. If division numbers 1 and 2 are assigned to two row sets, the row set with division number 1 is the systolic array cell 21 located in the first and third rows of the array cell group 61.
  • the row set of division number 2 is the systolic array cell 21 located in the second row of the array cell group 61 and the first row of the array cell group 62.
  • the systolic array 11 calculates the data (R 0 + R 1 ) represented by the above equation (13) and the data (R 2 + R 3 ) represented by the above equation (14) as the total coupling calculation result data. Can do.
  • Modification 1 For example, in the above embodiment, the value calculated by the matrix product of the matrices W 1 , W 2 ,..., Wv is set in the register 41 of the systolic array cell 21 as the weight coefficient w i, j . . However, by sequentially setting the weighting factors of the matrices W 1 , W 2 ,..., Wv in the register 41 of the systolic array cell 21, all the coupling layers Lj1, Lj2,. The join operation may be sequentially executed by the systolic array 11.
  • the functions of one component in the above embodiment may be distributed as a plurality of components, or the functions of a plurality of components may be integrated into one component.
  • at least a part of the configuration of the above embodiment may be replaced with a configuration having the same function.
  • at least a part of the configuration of the above embodiment may be added to or replaced with the configuration of the other embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

L'invention concerne un dispositif de traitement arithmétique pour exécuter l'arithmétique d'un réseau de neurones à convolution, le dispositif de traitement arithmétique étant pourvu d'une pluralité de cellules de réseau systolique (21), d'une pluralité de commutateurs d'entrée (22), d'une pluralité de commutateurs de sortie (23), d'une unité de commande arithmétique de convolution et d'une unité de commande arithmétique de liaison totale. Les multiples commutateurs d'entrée (22), prévus de façon correspondant à chaque cellule de la pluralité de cellules de réseau systolique (21), ont des bornes d'entrée (22a, 22b) et une borne de sortie (22c) et sont commutés vers un état parmi un état dans lequel la borne d'entrée (22a) et la borne de sortie (22c) sont connectées et un état dans lequel la borne d'entrée (22b) et la borne de sortie (22c) sont connectées. Les multiples commutateurs de sortie (23), prévus de façon correspondant à chaque cellule de la pluralité de cellules de réseau systolique (21), ont une borne d'entrée (23a) et des bornes de sortie (23b, 23c) et sont commutés vers un état parmi un état dans lequel la borne d'entrée (23a) et la borne de sortie (23b) sont connectées et un état dans lequel la borne d'entrée (23a) et la borne de sortie (23c) sont connectées.
PCT/JP2016/002680 2015-07-08 2016-06-02 Dispositif de traitement arithmétique WO2017006512A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2015137102A JP6387913B2 (ja) 2015-07-08 2015-07-08 演算処理装置
JP2015-137102 2015-07-08

Publications (1)

Publication Number Publication Date
WO2017006512A1 true WO2017006512A1 (fr) 2017-01-12

Family

ID=57684937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2016/002680 WO2017006512A1 (fr) 2015-07-08 2016-06-02 Dispositif de traitement arithmétique

Country Status (2)

Country Link
JP (1) JP6387913B2 (fr)
WO (1) WO2017006512A1 (fr)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832841A (zh) * 2017-11-14 2018-03-23 福州瑞芯微电子股份有限公司 一种神经网络芯片的功耗优化方法及电路
WO2019156746A1 (fr) * 2018-02-08 2019-08-15 Western Digital Technologies, Inc. Moteur de réseau neuronal systolique capable de rétropropagation
CN110609804A (zh) * 2018-06-15 2019-12-24 瑞萨电子株式会社 半导体器件和控制半导体器件的方法
US10796198B2 (en) 2018-02-08 2020-10-06 Western Digital Technologies, Inc. Adjusting enhancement coefficients for neural network engine
US10929058B2 (en) 2019-03-25 2021-02-23 Western Digital Technologies, Inc. Enhanced memory device architecture for machine learning
US20210150311A1 (en) * 2019-11-19 2021-05-20 Alibaba Group Holding Limited Data layout conscious processing in memory architecture for executing neural network model
CN114977353A (zh) * 2021-02-26 2022-08-30 威强电工业电脑股份有限公司 电源管理电路及其系统
US11783176B2 (en) 2019-03-25 2023-10-10 Western Digital Technologies, Inc. Enhanced storage device memory architecture for machine learning

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107578095B (zh) * 2017-09-01 2018-08-10 中国科学院计算技术研究所 神经网络计算装置及包含该计算装置的处理器
WO2019215907A1 (fr) * 2018-05-11 2019-11-14 オリンパス株式会社 Dispositif de traitement arithmétique
KR20200107295A (ko) * 2019-03-07 2020-09-16 에스케이하이닉스 주식회사 시스톨릭 어레이 및 프로세싱 시스템
JP7062617B2 (ja) 2019-06-26 2022-05-06 株式会社東芝 演算装置および演算方法
KR102393916B1 (ko) * 2019-06-27 2022-05-02 주식회사 사피온코리아 위노그라드 알고리즘에 기반한 행렬 곱셈 방법 및 장치
JP7253468B2 (ja) * 2019-07-26 2023-04-06 株式会社メガチップス ニューラルネットワーク用プロセッサ、ニューラルネットワーク用処理方法、および、プログラム

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
NING LI ET AL.: "An FPGA Implementation ot Deep Convolutional Neural Network using Synchronous Shift Data Transfer", IEICE TECHNICAL REPORT RECONF2014-46 - RECONF2014-85 RECONFIGURABLE SYSTEM, vol. 114, no. 428, 22 January 2015 (2015-01-22), pages 175 - 180, XP055342357, ISSN: 0913-5685 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832841A (zh) * 2017-11-14 2018-03-23 福州瑞芯微电子股份有限公司 一种神经网络芯片的功耗优化方法及电路
US11494620B2 (en) 2018-02-08 2022-11-08 Western Digital Technologies, Inc. Systolic neural network engine capable of backpropagation
US11551064B2 (en) 2018-02-08 2023-01-10 Western Digital Technologies, Inc. Systolic neural network engine capable of forward propagation
US11769042B2 (en) 2018-02-08 2023-09-26 Western Digital Technologies, Inc. Reconfigurable systolic neural network engine
US11741346B2 (en) 2018-02-08 2023-08-29 Western Digital Technologies, Inc. Systolic neural network engine with crossover connection optimization
US11494582B2 (en) 2018-02-08 2022-11-08 Western Digital Technologies, Inc. Configurable neural network engine of tensor arrays and memory cells
US11164074B2 (en) 2018-02-08 2021-11-02 Western Digital Technologies, Inc. Multi-core systolic processor system for neural network processing
US11164072B2 (en) 2018-02-08 2021-11-02 Western Digital Technologies, Inc. Convolution engines for systolic neural network processor
US11461579B2 (en) 2018-02-08 2022-10-04 Western Digital Technologies, Inc. Configurable neural network engine for convolutional filter sizes
US10796198B2 (en) 2018-02-08 2020-10-06 Western Digital Technologies, Inc. Adjusting enhancement coefficients for neural network engine
WO2019156746A1 (fr) * 2018-02-08 2019-08-15 Western Digital Technologies, Inc. Moteur de réseau neuronal systolique capable de rétropropagation
US11164073B2 (en) 2018-02-08 2021-11-02 Western Digital Technologies, Inc. Systolic neural network processor with feedback control
CN110609804A (zh) * 2018-06-15 2019-12-24 瑞萨电子株式会社 半导体器件和控制半导体器件的方法
US10929058B2 (en) 2019-03-25 2021-02-23 Western Digital Technologies, Inc. Enhanced memory device architecture for machine learning
US11372577B2 (en) 2019-03-25 2022-06-28 Western Digital Technologies, Inc. Enhanced memory device architecture for machine learning
US11783176B2 (en) 2019-03-25 2023-10-10 Western Digital Technologies, Inc. Enhanced storage device memory architecture for machine learning
US20210150311A1 (en) * 2019-11-19 2021-05-20 Alibaba Group Holding Limited Data layout conscious processing in memory architecture for executing neural network model
CN114977353A (zh) * 2021-02-26 2022-08-30 威强电工业电脑股份有限公司 电源管理电路及其系统

Also Published As

Publication number Publication date
JP6387913B2 (ja) 2018-09-12
JP2017021483A (ja) 2017-01-26

Similar Documents

Publication Publication Date Title
JP6387913B2 (ja) 演算処理装置
JP6987860B2 (ja) ハードウェアにおけるカーネルストライドの実行
US10394929B2 (en) Adaptive execution engine for convolution computing systems
EP3872747B1 (fr) Procédé de super résolution de vidéo
US20180197067A1 (en) Methods and apparatus for matrix processing in a convolutional neural network
EP3816929A1 (fr) Procédé et appareil de restauration d'une image
KR101894651B1 (ko) 데이터 취득 모듈과 방법, 데이터 처리 유닛, 구동기 및 디스플레이 디바이스
JP7014393B2 (ja) データ処理装置、及びこれにおけるデータ処理方法
CN109147036A (zh) 一种基于深度学习的集成成像微图像阵列快速生成方法
Villalpando et al. FPGA implementation of stereo disparity with high throughput for mobility applications
CN106683043B (zh) 一种多通道光学探测系统的并行图像拼接方法、装置
CN1198206C (zh) 时分型矩阵计算器
GB2470740A (en) A method and system for performing an image transform using reverse transformation and interpolation
US7746519B2 (en) Method and device for scanning images
JP2017027314A (ja) 並列演算装置、画像処理装置及び並列演算方法
US20140198989A1 (en) Method and device for determining values which are suitable for distortion correction of an image, and for distortion correction of an image
Ioannou et al. High throughput spatial convolution filters on FPGAs
JP2001195564A (ja) 画像検出処理装置
Van der Wal Technical overview of the Sarnoff Acadia II vision processor
JPH0364279A (ja) 画像ブレ検知装置
JP2020160377A (ja) 表示媒体、処理装置および処理プログラム
KR100300338B1 (ko) 2차원 이산 웨이브렛 변환을 위한 초고밀도 집적회로 구조
CN116108902B (zh) 采样操作实现系统、方法、电子设备及存储介质
WO2023112581A1 (fr) Dispositif d'inférence
JPH06318194A (ja) 並列データ処理方式

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16820992

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16820992

Country of ref document: EP

Kind code of ref document: A1