WO2024048868A1

WO2024048868A1 - Computation method in neural network and device therefor

Info

Publication number: WO2024048868A1
Application number: PCT/KR2022/021128
Authority: WO
Inventors: 정태영
Original assignee: 오픈엣지테크놀로지 주식회사
Priority date: 2022-08-30
Filing date: 2022-12-22
Publication date: 2024-03-07
Also published as: KR20240030359A

Abstract

Disclosed is an instruction execution method by which a computing device executes an instruction. The method comprises: a first step of, when the instruction is called, comparing a first variable with a predetermined first constant; and a second step of storing, in a first register, a predetermined padding value or a value stored in a memory, and then terminating execution of the instruction.

Description

Neural network calculation method and device for this

The present invention relates to a method of performing calculations in a computing device to implement a neural network and a hardware accelerator to which this method is applied.

CNN performs multiple computational steps including pooling operations. US registered patent US10713816 proposes an object detection method using a deep CNN pooling layer as a feature.

Figure 1 shows the computational structure of CNN according to one embodiment. Hereinafter, the description will be made with reference to FIG. 1.

First, convolution layers 52 can be created by performing a convolution operation using a plurality of kernels on the input image data 51 stored in the internal memory. The step of generating the convolution layers 52 may include performing a non-linear operation (e.g., ReLU, Sigmoid, or tanH) on a plurality of feature maps obtained as a result of performing the convolution operation.

Next, pooling layers 53 can be created by performing pooling on the convolutional layers 52. Each convolutional layer 52 may include data that can be expressed in the form of an M*N matrix. At this time, in order to perform pooling, a pooling window, which is a window with a smaller dimension than the convolution layer 52, may be defined. The pooling window may have sizes of Mp and Np in the row and column directions, respectively. The size of the pooling window may be smaller than the size of the convolutional layer (M>=Mp and N>Np, or M>Mp and N>=Np). The pooling is an operation that generates a smaller number of data, for example, 1 data, from Mp*Np data selected by overlapping the pooling window on the convolution layer. For example, MAX pooling is an operation that selects and outputs the largest value among the Mp*Np pieces of data. For example, average pooling is an operation that outputs the average value of the Mp*Np pieces of data. Pooling that follows other rules may be defined. The number of cases in which the pooling window can be overlapped with the convolution layer varies. Depending on the embodiment, rules for moving the pooling window on the convolutional layer may be limited. For example, when the pooling window is limited to move by skipping SM columns along the row direction of the convolutional layer, the row direction Stride of the pooling operation is referred to as SM, and the pooling window is referred to as SM in the column direction of the convolutional layer. If the movement is limited to skipping SN columns at a time, the column direction stride of the pooling operation can be referred to as SN (SM and SN are natural numbers). As the size of Stride increases, the size of the pooling layer output as a result of the pooling operation can become smaller. In addition to the above description, the specific concept of the pooling operation is well presented in prior art explaining CNN.

Next, flattening can be performed on the pooling layers 53 to create an array to be input to the neural network 54.

The array can then be input into the neural network 54 to generate an output from the neural network 54.

Although Figure 1 shows one embodiment of a CNN, there are various other examples of implementing a CNN. Additionally, although the pooling operation is used to implement CNN in Figure 1, the pooling operation can also be used in other computing technology fields other than CNN.

The amount of computation for pooling operations tends to increase as the size of the pooling window increases, and tends to decrease as the size of Stride becomes smaller. Additionally, the smaller the Stride size, the more likely it is that the same operation will be repeated during the pooling operation.

In the process of generating the convolutional layer and executing the pooling, an array, referred to herein as a variable array, can be created and used as follows.

That is, to create a convolution layer, a convolution operation is performed between the first input data and the first kernel. At this time, as a result of performing a convolution operation between the first input data and the first kernel, N1 output values are generated, and the variable array is generated to generate each output value. The variable array is an array that includes part or all of the first input data. Then, an operation is performed between the elements of the created variable array and the elements of the first kernel. At this time, the size of the variable array may be the same as the size of the first kernel. In the process of generating all N1 output values, the operation process between the elements of the generated variable array and the elements of the first kernel is executed N1 times, and the value of the variable array changes each time the operation process is executed. It can be.

Additionally, N2 output values can be generated as a result of pooling the generated convolutional layer. At this time, the variable array can be created to generate each of the output values. The variable array is an array containing some or all of the data of the convolutional layer. Then, pooling can be performed on the elements of the created variable array. Here, the size of the variable array may be the same as the size of the pooling window. In the process of generating all N2 output values, pooling of elements of the generated variable array is performed N2 times, and each time the pooling is performed, the value of the variable array may be changed.

The present invention seeks to provide a method for determining each value of data that must be prepared to execute operations such as convolution and pooling operations in a neural network, and instructions for executing this method.

According to one aspect of the present invention, instructions for preparing data necessary for calculation in a neural network can be provided. By executing the above instruction once, the value of a specific element of the array required for calculation in the neural network is determined. In this specification, the array required for calculation in the neural network may be referred to as a 'variable array'.

If the size of the variable array is M, all values of the variable array can be determined by executing the instruction M times.

Once the values of all elements of the variable array are determined, the variable array can be used in various calculation processes such as convolution and pooling operations in a neural network.

The instructions provided according to one aspect of the present invention may be stored in a memory referenced by the CPU of the computing device and may be executed by the CPU. Alternatively, the instruction may be stored in a memory referenced by the control unit of the hardware accelerator of the computing device and may be executed by the control unit.

According to one aspect of the present invention, an instruction execution method in which a computing device executes an instruction may be provided. This method includes: a first step of comparing a first variable (lpad) with a predetermined first constant when the instruction is called; and a second step of terminating execution of the instruction after storing a predetermined padding value in the first register or a value stored in the memory. At this time, in the second step, if the first variable is greater than the first constant, the predetermined padding value is stored in the first register, and the first variable is decreased by 1, and then the first variable is The execution of the instruction is terminated, and if the first variable is not greater than the first constant, the second variable (valid) is compared with a predetermined second constant. If the second variable is not greater than the first constant, the second variable (valid) is compared with the second constant. If it is greater than the constant, the value stored at the memory address indicated by the third variable (s_idx) is stored in the first register, the second variable is decreased by 1, and the third variable is set to a predetermined third value. The execution of the first instruction is terminated after incrementing the first instruction, and if the second variable is not greater than the second constant, the predetermined padding value is stored in the first register and then the first instruction is executed. Execution is terminated.

At this time, the method may further include the step of initializing the first variable (lpad), the second variable (valid), and the third variable (s_idx) before the instruction is called. At this time, the first variable (lpad) is initialized to have the maximum number of padding used in a pre-planned operation, and the second variable (valid) is initialized to have the size of the target array that is the target of the pre-planned operation. And the third variable (s_idx) may be initialized to have the first address among addresses representing an area of memory containing information corresponding to the target array.

At this time, the predetermined third value may be a difference value between addresses representing two adjacent pieces of data in one area of the memory.

At this time, the first constant may be 0 (zero), and the second constant may be 0 (zero).

At this time, the predetermined operation may be a pooling operation.

At this time, the pre-planned operation may be a convolution operation.

According to another aspect of the present invention, an operation method may be provided in which a computing device executes a predetermined operation. This method includes initializing a first variable (lpad), a second variable (valid), and a third variable (s_idx); determining the values of all elements of a predetermined variable array by executing a predetermined instruction execution method once or repeatedly; and executing the pre-planned operation on the variable array for which the values of all the elements have been determined, thereby generating data for at least some of the pre-planned output array. At this time, the predetermined instruction execution method includes a first step of comparing the first variable (lpad) with a predetermined first constant when a call to the predetermined instruction occurs, and a predetermined padding value or a value stored in memory. An instruction execution step including a second step of terminating execution of the instruction after storing in a first register, and writing the value stored in the first register to the variable array.

At this time, the first variable (lpad) is initialized to have the maximum number of padding used in the pre-planned operation, and the second variable (valid) is initialized to have the size of the target array that is the target of the pre-planned operation. is initialized, and the third variable (s_idx) may be initialized to have the first address among addresses representing an area of memory containing information corresponding to the target array.

At this time, in the second step, if the first variable is greater than the first constant, the predetermined padding value is stored in the first register, and the first variable is decreased by 1, and then the first variable is The execution of the instruction is terminated, and if the first variable is not greater than the first constant, the second variable (valid) is compared with a predetermined second constant. If the second variable is not greater than the first constant, the second variable (valid) is compared with the second constant. If it is greater than the constant, the value stored at the memory address indicated by the third variable (s_idx) is stored in the first register, the second variable is decreased by 1, and the third variable is set to a predetermined third value. The execution of the first instruction is terminated after incrementing the first instruction, and if the second variable is not greater than the second constant, the predetermined padding value is stored in the first register and then the first instruction is executed. It may be set to terminate execution.

At this time, the first constant is 0 (zero), the second constant is 0 (zero), and the predetermined third value is a difference value between addresses representing two adjacent data in one area of the memory. You can.

At this time, the pre-planned operation is a pooling operation, and the size of the variable array may be the same as the size of the pooling window for the pooling operation.

At this time, the pre-planned operation is a convolution operation, and the size of the variable array may be the same as the size of the kernel used for the convolution operation.

At this time, each time the instruction execution method is executed, the value stored in the first register may be written to an element pointed to by a predetermined first pointer in the variable array. And each time the instruction execution method is executed, the element of the variable array pointed to by the first pointer may be changed.

According to another aspect of the present invention, a computing device including a main processor 160 and a storage unit 170 may be provided. Instruction codes for executing the above-described instruction execution method or calculation method are recorded in the storage unit 170, and the main processor is configured to read the instruction code and execute the instruction execution method or the calculation method.

According to another aspect of the present invention, a computing device including a hardware accelerator 110 may be provided. The hardware accelerator is configured to execute the above-described instruction execution method or the above-described calculation method.

According to another aspect of the present invention, an instruction execution method in which a computing device executes an instruction may be provided. This method includes comparing a first variable (lpad) with a predetermined first constant (0) when the instruction is called; If the first variable is greater than the first constant, storing a predetermined padding value in a first register, decreasing the first variable by 1, and then terminating execution of the first instruction; If the first variable is not greater than the first constant, comparing a second variable (valid) with a predetermined second constant (0); If the second variable is greater than the second constant, the value stored at the memory address indicated by the third variable (s_idx) is stored in the first register, the second variable is decreased by 1, and the second variable is decreased by 1. increasing the third variable by a predetermined third value and terminating execution of the first instruction; If the second variable is not greater than the second constant, storing the predetermined padding value in the first register and terminating execution of the first instruction.

According to the present invention, a method for determining each value of data that must be prepared to execute operations such as convolution operation and pooling operation in a neural network and instructions for executing this method can be provided.

Figure 2 shows the main structure of some of the computing devices used in one embodiment of the present invention.

Figure 3 shows the concepts of a target array, kernel, and variable array used when executing a convolution operation according to an embodiment of the present invention.

Figure 4 is a diagram explaining the concept of a method of performing a convolution operation using a predetermined target array and kernel.

FIGS. 5A, 5B, 5C, and 5D are flowcharts showing a convolution operation method provided according to an embodiment of the present invention.

Figure 6 shows an actual example of the convolution operation method provided according to an embodiment of the present invention shown in Figures 5a, 5b, 5c, and 5d.

Figure 7 shows the concepts of a target array and a variable array used when performing a pooling operation according to an embodiment of the present invention.

Figure 8 is a diagram explaining the concept of a method of performing a pooling operation on a predetermined target array using a pooling window.

FIGS. 9A, 9B, 9C, and 9D are flowcharts showing a pooling operation method provided according to an embodiment of the present invention.

Figure 10 shows an actual example of the pooling operation method provided according to an embodiment of the present invention shown in Figures 9a, 9b, 9c, and 9d.

Figure 11a shows the process of initializing three variables required for execution of the first instruction provided according to an embodiment of the present invention.

Figure 11b is a flowchart showing a one-time execution process of the first instruction provided according to an embodiment of the present invention.

Figure 12a shows the process of initializing four variables required for execution of the second instruction provided according to another embodiment of the present invention.

Figure 12b is a flowchart showing a one-time execution process of the second instruction provided according to another embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described with reference to the attached drawings. However, the present invention is not limited to the embodiments described herein and may be implemented in various other forms. The terms used in this specification are intended to aid understanding of the embodiments and are not intended to limit the scope of the present invention. Additionally, as used herein, singular forms include plural forms unless phrases clearly indicate the contrary.

The computing device 1 includes a dynamic random access memory (DRAM) 130, a hardware accelerator 110, a bus 700 connecting the DRAM 130 and the hardware accelerator 110, and other devices connected to the bus 700. It may include hardware 99 and a main processor 160.

In addition, the computing device 1 may further include a power supply unit, a communication unit, a user interface, a storage unit 170, and peripheral device units not shown. The bus 700 may be shared by the hardware accelerator 110, other hardware 99, and the main processor 160.

The storage unit 170 may be integrally coupled to the computing device 1 or may be detachably coupled to the computing device 1.

The hardware accelerator 110 includes a DMA unit (Direct Memory Access part) 20, a control unit 40, an internal memory 30, an input buffer 650, a data operation unit 610, and an output buffer 640. can do.

Some or all of the data temporarily stored in the internal memory 30 may be provided from the DRAM 130 through the bus 700. At this time, in order to move data stored in the DRAM 130 to the internal memory 30, the control unit 40 and the DMA unit 20 may control the internal memory 30 and the DRAM 130.

Data stored in the internal memory 30 may be provided to the data calculation unit 610 through the input buffer 650.

Output values generated by the operation of the data calculation unit 610 may be stored in the internal memory 30 through the output buffer 640. The output values stored in the internal memory 30 may be written to the DRAM 130 under the control of the control unit 40 and the DMA unit 20.

The control unit 40 can collectively control the operations of the DMA unit 20, the internal memory 30, and the data operation unit 610.

In one implementation example, the data calculation unit 610 may perform a first calculation function during a first time period and a second calculation function during a second time period.

In FIG. 2, one data operation unit 610 is shown within the hardware accelerator 110. However, in a modified embodiment not shown, a plurality of data calculation units 610 shown in FIG. 2 may be provided in the hardware accelerator 110 to perform operations requested by the control unit 40 in parallel. there is.

In one implementation example, the data calculation unit 610 may output the output data sequentially according to a given order over time, rather than all at once.

Reference number 80 illustrates memory data 80, which is data stored in memory. In reference number 80, the first row represents the address of the memory, and the second row represents the value of the data stored at that address. For example, 'v0', 'v1', etc. may each be one piece of data. The memory may be the DRAM 130 or the internal memory 30 shown in FIG. 2.

Reference number 81 is data extracted and read from the memory and is the target array 81 that is the target of the convolution operation. In the target array 81, the first row represents the index of the target array 81, and the second row represents the value of data stored at the index. For example, the values constituting the target array 81 may be read from the DRAM 130 and stored in the internal memory 30 .

The indices of the target array 81 may each correspond to addresses of portions of the memory data 80 corresponding to the target array 81.

Reference number 90 is a kernel 90 composed of values used for convolution operations. In the kernel 90, the first row represents the index of the kernel 90, and the second row represents the value of data stored at the index. In one embodiment, the values constituting the kernel 90 may be predetermined constant values. For example, the values constituting the kernel 90 may be read from the DRAM 130 and stored in the internal memory 30 .

Reference number 70 is a variable array 70 in which the kernel 90 and values actually operated are stored. In the variable array 70, the first row represents the index of the variable array 70, and the second row represents the value of data stored at the index. All values constituting the variable array 70 may be obtained from the target array 81. Alternatively, some of the values constituting the variable array 70 may be composed of predetermined padding values, and the remainder may be obtained from the target array 81. The padding value may be a predetermined value such as '0'.

Reference number 60 denotes the output array 60, which is data generated by performing a convolution operation on the target array 81 using the kernel 90. The size of the output array 60 may be determined according to the specific definition of the convolution operation method to be executed.

In one embodiment of the present invention, when performing an operation using the kernel 90, data obtained by referring to the index of the kernel 90 and data obtained by referring to the index of the variable array 70 are used in the operation. You can use it.

The size of the variable array 70 and the kernel 90 may be smaller than the size of the target array 81.

In FIG. 4 , for convenience of explanation, the target array 81 and kernel 90 shown in FIG. 3 are presented as the predetermined target array and kernel. At this time, the size of the kernel 90 is 4.

The convolution operation may consist of a total of 8 steps, from steps 21 to 28. For each step, the relative position of the kernel 90 with respect to the target array 81 may be changed by stride 1.

In each step, at least one piece of data of the target array 81 and the kernel 90 may correspond to each other. At this time, the data of the target array 81 and the data of the kernel 90 that correspond to each other can be used in a multiplication operation. For each step, four multiplication operations equal to the size of the kernel 90 may be performed. If the target array 81 corresponding to the specific data of the kernel 90 in each step is not defined (= does not exist), the specific data of the kernel 90 has a predetermined value. The padding value (vp) can be multiplied. The padding value (vp) may be, for example, 1.

In step 21, data p3 of kernel 90 corresponds to data v1 of target array 81. However, the data of the target array 81 corresponding to the data p0, p1, and p2 of the kernel 90 is not defined. At this time, the convolution operation data generated by step 21 is co[0]= p0*vp + p1*vp + p2*vp + p3*v1.

In step 22, data p2 and p3 of the kernel 90 correspond to data v1 and v2 of the target array 81, respectively. However, the data of the target array 81 corresponding to the data p0 and p1 of the kernel 90 is not defined. At this time, the convolution operation data generated by step 22 is co[1]= p0*vp + p1*vp + p2*v1 + p3*v2.

In step 23, data p1, p2, and p3 of kernel 90 correspond to data v1, v2, and v3 of target array 81, respectively. However, the data of the target array 81 corresponding to data p0 of the kernel 90 is not defined. At this time, the convolution operation data generated by step 23 is co[2]= p0*vp + p1*v1 + p2*v2 + p3*v3.

In step 24, data p0, p1, p2, and p3 of kernel 90 correspond to data v1, v2, v3, and v4 of target array 81, respectively. At this time, the convolution operation data generated by step 24 is co[3]=p0*v1 + p1*v2 + p2*v3 + p3*v4.

In step 25, data p0, p1, p2, and p3 of kernel 90 correspond to data v2, v3, v4, and v5 of target array 81, respectively. At this time, the convolution operation data generated by step 25 is co[4]=p0*v2 + p1*v3 + p2*v4 + p3*v5.

In step 26, data p0, p1, and p2 of kernel 90 correspond to data v3, v4, and v5 of target array 81, respectively. However, the data of the target array 81 corresponding to data p3 of the kernel 90 is not defined. At this time, the convolution operation data generated in step 26 is co[5]=p0*v3 + p1*v4 + p2*v5 + p3*vp.

In step 27, data p0 and p1 of the kernel 90 correspond to data v4 and v5 of the target array 81, respectively. However, the data of the target array 81 corresponding to data p2 and p3 of the kernel 90 is not defined. At this time, the convolution operation data generated by step 27 is co[6]= p0*v4 + p1*v5 + p2*vp + p3*vp.

In step 28, data p0 of kernel 90 corresponds to data v5 of target array 81. However, the target array 81 data corresponding to the data p1, p2, and p3 of the kernel 90 is not defined. At this time, the convolution operation data generated by step 28 is co[7]=p0*v5 + p1*vp + p2*vp + p3*vp.

In the convolution operation method provided according to an embodiment of the present invention, the values 'maximum number of left padding' and 'maximum number of right padding' are defined.

In the first step (step 21) of the execution process of the convolution operation method, the number of data (ex: p0, p1, p2) that does not correspond to the value of the target array among the data of the kernel 90 is calculated as the number of left padding. It is defined as the maximum number. In the example of the convolution operation method shown in FIG. 4, the maximum number of left paddings is determined to be 3.

In the last step (step 28) of the execution process of the convolution operation method, the number of data (ex: p1, p2, p3) that does not correspond to the value of the target array among the data of the kernel 90 is set to the maximum right padding. It can be defined as a number. In the example of the convolution operation method shown in FIG. 4, the maximum number of right paddings is determined to be 3.

As described above, when the maximum number of left paddings is 3, the maximum number of right paddings is 3, and the stride is 1, the size of the output array 60 is determined to be 8.

FIGS. 5A, 5B, 5C, and 5D are flowcharts showing a convolution operation method provided according to an embodiment of the present invention. FIGS. 5A, 5B, 5C, and 5D may be collectively referred to as FIG. 5.

Hereinafter, the description will be made with reference to FIGS. 3, 4, and 5.

In step S110, the main processor 160 or the control unit 40 may obtain the start address and end address where the target array 81 is stored among the addresses of the memory.

Here, the target array 81 may refer to a set of data that is the target of an operation according to the present invention among data stored in the memory. And the data constituting the target array 81 may be stored in consecutive memory addresses. For example, data constituting the target array 81 of FIG. 3 may be stored in addresses 1 to 5 of the memory.

At this time, the values of two addresses indicating the locations of two adjacent pieces of data among the data included in the target array 81 may have a difference of 1. If the values of two addresses indicating the locations of two adjacent data have a difference value of size k, k can be normalized to the value of 1 and used. That is, in the present invention, the difference between the address values of two adjacent pieces of data included in the target array is defined as 1. For example, in the data shown by reference number 80 in FIG. 3, the actual values of the addresses where data v1 and adjacent data v2 are stored, respectively, may be addr1 and addr2, and in this case, the actual value obtained by subtracting addr1 from addr2 may not be 1. there is. However, as shown in FIG. 3, in this case, in one embodiment of the present invention, the actual value obtained by subtracting addr1 from addr2 can be considered to be converted to a scale of 1.

The target array 81 may also be referred to as a third array in this specification.

Meanwhile, in the convolution operation method according to an embodiment of the present invention, four variables are defined and used: source index (s_idx), lpad, valid, and rpad. .

Additionally, in the convolution operation method according to an embodiment of the present invention, two pointers, a variable array pointer (first pointer) and an output array pointer (second pointer), may be further defined.

The computing device used to implement the convolution operation method according to an embodiment of the present invention stores the values of source index (s_idx), lpad, valid, and rpad, respectively. The source index register (s_idx_reg), Lpad register (lpad_reg), valid register (valid_reg), and Rpad register (rpad_reg), which are registers, can be allocated to the memory space. When a specific value is stored in each register, the specific value can be viewed as being assigned to the corresponding variable. When the value stored in each register is changed, the value of the corresponding variable can be viewed as changed. And if you want to change the value of each variable, change the value stored in the corresponding register.

In step S120, the computing device may perform an initialization operation. The initialization task includes the tasks listed below.

First, the first index of the target array can be assigned to the source index (s_idx). The first index of the target array may be an address of the memory where the first data of the target array is stored. In the examples of FIGS. 3 and 4, the first index of the target array 81 is '1' (s_idx=1).

Second, the maximum number of left paddings used in a pre-planned convolution operation can be assigned to lpad. In the examples of Figures 3 and 4, the maximum number of left paddings is 3 (lpad=3).

Third, the size of the target array can be assigned to the valid. In the examples of FIGS. 3 and 4, the size of the target array is 5 (valid=5).

Fourth, the maximum number of right paddings used in the planned convolution operation can be assigned to the rpad. In the examples of Figures 3 and 4, the maximum number of right paddings is 3 (rpad=3).

Fifth, a variable array (second array) 70 of the same size as the kernel 90 can be prepared. That is, the computing device can allocate memory space for the variable array 70. In the example of FIG. 3, the size of the variable array 70 is 4. The variable array 70 may be referred to as a second array in this specification.

Sixth, the value of the first pointer may be initialized so that the variable array pointer (first pointer) points to the first index of the variable array 70. In the example of FIG. 3, the first pointer may be initialized to point to index 1, which is the first index of the variable array 70.

Seventh, the value of the output array pointer (second pointer) can be initialized so that it points to the first index of the output array 60 prepared in advance. In the example of FIG. 3, the second pointer may be initialized to point to index 0, which is the first index of the output array 60.

In step S130, it is determined whether lpad is greater than 0. If lpad is greater than 0, the process proceeds to step S210. Otherwise, the process proceeds to step S310.

In step S310, it is determined whether the valid is greater than 0. If the valid is greater than 0, the process proceeds to step S320. Otherwise, the process proceeds to step S410.

Referring to FIG. 5B, in step S210, a predetermined padding value (vp) may be stored in the position indicated by the variable array pointer in the variable array 70.

In step S220, lpad may be decreased by a first value. The first value may be, for example, 1.

In step S230, the value of the variable array pointer (first pointer) may be increased by the word size. And at this time, the increase may be a circulation increase. Then, the process may return to step S130 of FIG. 5A.

Here, the cyclic increase means that if the first pointer currently points to the last index of the variable array 70, the first pointer points to the first index of the variable array 70. The value of the last index is greater than the value of the first index.

Referring to FIG. 5C, in step S320, the value of the target array 81 pointed to by the source index (s_idx) can be stored in the position pointed to by the variable array pointer (first pointer) of the variable array 70. .

In step S330, the valid may be decreased by a second value and the source index (s_idx) may be increased by a third value.

The second value may be, for example, 1. The third value may be a difference value between addresses representing two adjacent pieces of data in one area of the memory.

In step S340, the calculated value of the variable array 70 and the kernel 90 may be stored in the position pointed to by the output array pointer (second pointer) of the output array 60. At this time, the operation may be a convolution operation.

In step S350, the value of the variable array pointer (first pointer) may be cyclically increased by the word size, and the output array pointer (second pointer) may be increased by the word size. Then, the process may return to step S130 of FIG. 5A.

Referring to FIG. 5D, it can be determined whether rpad is greater than 1 in step S410. If rpad is greater than 1, the process proceeds to step S420. Otherwise, the convolution calculation method ends.

In step S420, a predetermined padding value (vp) may be stored in the position indicated by the variable array pointer in the variable array 70.

In step S430, rpad may be decreased by a fourth value. The fourth value may be, for example, 1.

In step S440, the value calculated by the variable array 70 and the kernel 90 may be stored in the position pointed to by the output array pointer (second pointer) of the output array 60. At this time, the operation may be a convolution operation.

In step S450, the value of the variable array pointer may be cyclically increased by the word size, and the output array pointer (second pointer) may be increased by the word size. Then, the process may return to step S130 of FIG. 5A.

The first value, the second value, and the fourth value may refer to index difference values between adjacent elements in a given array.

Figure 6 shows an actual example of the convolution operation method provided according to an embodiment of the present invention shown in Figure 5.

FIG. 6 is an example of applying the convolution operation method shown in FIG. 5 to the target array 81 and kernel 90 illustrated in FIG. 3.

Figure 6 shows the register table 50, variable array 70, kernel 90, and output showing the values of source index (s_idx), lpad, valid, and rpad. Array 60 was presented. The output array 60 presented only some elements where the output array pointer (second pointer) was located, and did not present the remaining elements.

In steps 0 to 3 shown in FIG. 6, the value of the variable array 70 has not yet been determined.

Each value of the output array 60 illustrated in FIG. 3 can be determined by calculating the values of the variable array 70 and the kernel 90 with each other. Since at least some of the element values of the variable array 70 are not determined in steps 0 to 3, an operation for determining an element of the output array 60 cannot be performed in steps 0 to 3.

In a total of eight steps, including steps 4 to 11 among steps 1 to 11 shown in FIG. 6, each element of the output array 60 is determined.

In step 0 of FIG. 6, the values of source index (s_idx), lpad, valid, and rpad are 1, 3, 5, respectively, by step S120 of FIG. 5A. It is determined to be 3, and the value of the first pointer is initialized so that the first pointer points to the first index (0) of the variable array 70. In Figure 6, the location of the first pointer is indicated by a black inverted triangle.

Each k-th step (step k) (k=1, ..., 11) may start from step S130 of FIG. 5.

Hereinafter, the value of each element of the register table 50 shown to the right of the k-th step (k=1, ..., 11) and the value of each element of the variable array 70 are expressed in the k-th step. This shows the state just before step S130 is executed.

And the position of the first pointer (black inverted triangle) shown on the variable array 70 shown to the right of the k-th step (k=1, ..., 11) is the step (S130) in the k-th step. It shows the state just before execution.

And the position of the second pointer (arrow) shown above the output array 60 shown to the right of the k-th step (k=1, ..., 11) is immediately before the step (S130) is executed in the k-th step. It indicates the state of.

And the value displayed for each element of the variable array 70 shown to the right of the k-th step (k=1, ..., 11) represents the value when the k-th step is completed. 'vp' is a predefined padding value, and the symbol '?' indicates that the value of the element is not a valid value.

The state shown in step 1 of FIG. 6 is the result of executing steps S130, S210, S220, and S230 of FIG. 5. As a result, the padding value is stored at index 0 of the variable array 70.

For example, in step 1, lpad was decreased by 1 in step S220, but the result is presented in step 2. For example, in step 1, the value of the first pointer is increased by 1 in step S230, but the result is presented in step 2.

The state shown in step 2 of FIG. 6 is the result of sequentially executing steps S130, S210, S220, and S230 in FIG. 5. As a result, the padding value is stored at index 1 of the variable array 70.

The state shown in step 3 of FIG. 6 is the result of sequentially executing steps S130, S210, S220, and S230 in FIG. 5. As a result, the padding value is stored at index 2 of the variable array 70.

In the states shown in steps 4 to 8 of FIG. 6, steps S130, S310, S320, S330, S340, and S350 of FIG. 5 are sequentially performed. This is the result of running . The result value (co[ ]) of calculating the values stored in the variable array 70 and the values stored in the kernel 90 is stored in the location of the output array 60 indicated by the second pointer.

The states shown in steps 9 to 10 of FIG. 6, respectively, are step (S130), step (S310), step (S410), step (S420), step (S430), step (S440), and step (S440) of FIG. (S450) is the result of sequential execution. The result value (co[ ]) of calculating the values stored in the variable array 70 and the values stored in the kernel 90 is stored in the location of the output array 60 indicated by the second pointer.

The state shown in step 11 of FIG. 6 is the result of sequentially executing steps S130, S310, S410, and S500 of FIG. 5. A result value (co[ ]) of calculating the values stored in the variable array 70 and the values stored in the kernel 90 is stored in the location of the output array 60 pointed to by the second pointer, and the convolution The calculation method ends.

In steps 4 to 11, the last index value of the kernel 90 corresponds to the index value pointed to by the first pointer in the variable array 70. The value of the index cyclically increased by n from the last index of the kernel 90 corresponds to the value of the index cyclically increased by n from the index pointed to by the first pointer. In steps 4 to 11, when the kernel 90 and the variable array 70 are operated with each other, the corresponding values can be operated with each other. That is, in FIG. 6, in the kernel 90 and the variable array 70, data indicated by a square icon is operated, data indicated by a circular icon is calculated, data indicated by a triangle icon are calculated, and Data indicated by the black inverted triangle icon can be operated on.

Among the concepts presented in FIG. 7, the same concepts as those presented in FIG. 3 will be briefly explained here.

Reference number 80 illustrates memory data 80, which is data stored in memory.

Reference number 81 is data extracted and read from the memory and is the target array 81 that is the target of the pooling operation.

Reference number 70 is a variable array 70. All values constituting the variable array 70 may be obtained from the target array 81. Alternatively, some of the values constituting the variable array 70 may be composed of predetermined padding values, and the remainder may be obtained from the target array 81. The padding value may be a predetermined value such as '0'.

Reference number 160 indicates the output array 160, which is data generated by performing a pooling operation on the target array 81. The size of the output array 160 may be determined according to the specific definition of the pooling operation method to be executed.

The size of the variable array 70 may be smaller than the size of the target array 81.

The size of the variable array 70 is the same as the size of the pooling window used in the defined pooling operation. In Figure 7, an example where the size of the pooling window is 4 is presented.

In FIG. 8 , for convenience of explanation, the target array 81 shown in FIG. 7 is presented as the predetermined target array. At this time, the size of the pooling window is 4. In Figure 8, the position of the pooling window is indicated by a dotted line.

The pooling operation may consist of a total of 8 steps, including steps 121 to 128. For each step, the relative position of the pooling window to the target array 81 may be changed by stride 1.

For convenience of explanation, in the example shown in FIG. 8, v1, v2, v3, v4, and v5 are assumed to have values of 1, 2, 3, 4, and 5, respectively. It is assumed that the pooling operation applied in FIG. 8 is MAX pooling, which outputs the largest value among the values existing in the pooling window.

According to the above conditions, the pooling operation data generated by step 121 is po[0]=v1.

According to the above conditions, the pooling operation data generated by step 122 is po[1]=v2.

According to the above conditions, the pooling operation data generated by step 123 is po[2]=v3.

According to the above conditions, the pooling operation data generated by step 124 is po[3]=v4.

According to the above conditions, the pooling operation data generated by step 125 is po[4]=v5.

According to the above conditions, the pooling operation data generated by step 126 is po[5]=v5.

According to the above conditions, the pooling operation data generated by step 127 is po[6]=v5.

According to the above conditions, the pooling operation data generated by step 128 is po[7]=v5.

In the pooling operation method provided according to an embodiment of the present invention, the values 'maximum number of left padding' and 'maximum number of right padding' are defined.

In the first step (step 121) of the execution process of the pooling operation method, the number of areas in the pooling window that do not correspond to the value of the target array is defined as the maximum number of left padding. In the example of the pooling operation method shown in FIG. 8, the maximum number of left paddings is determined to be 3.

In the last step (step 128) of the execution process of the pooling operation method, the number of areas in the pooling window that do not correspond to the value of the target array can be defined as the maximum number of right padding. In the example of the pooling operation method shown in FIG. 8, the maximum number of right paddings is determined to be 3.

As described above, when the maximum number of left paddings is 3, the maximum number of right paddings is 3, and the stride is 1, the size of the output array 160 is determined to be 8.

FIGS. 9A, 9B, 9C, and 9D are flowcharts showing a pooling operation method provided according to an embodiment of the present invention. FIGS. 9A, 9B, 9C, and 9D may be collectively referred to as FIG. 9.

Hereinafter, the description will be made with reference to FIGS. 7, 8, and 9.

In step S1110, the main processor 160 or the control unit 40 may obtain the start address and end address of the area in the memory where data of the target array 81 is stored.

Data constituting the target array 81 of FIG. 7 may be stored in addresses 1 to 5 of the memory.

Meanwhile, in the pooling operation method according to an embodiment of the present invention, four variables are defined and used: source index (s_idx), lpad, valid, and rpad.

Additionally, in the pooling operation method according to an embodiment of the present invention, two pointers, a variable array pointer (first pointer) and an output array pointer (second pointer), may be further defined.

The computing device used to implement the pooling operation method according to an embodiment of the present invention stores the values of source index (s_idx), lpad, valid, and rpad, respectively. The registers source index register (s_idx_reg), Lpad register (lpad_reg), valid register (valid_reg), and Rpad register (rpad_reg) can be allocated to the memory space.

In step S1120, the computing device may perform an initialization operation. The initialization task includes the tasks listed below.

First, the first index of the target array 81 can be assigned to the source index (s_idx). The first index of the target array 81 may be the address of the memory where the first data of the target array 81 is stored. In the examples of FIGS. 7 and 8, the first index of the target array 81 is '1' (s_idx=1).

Second, the maximum number of left paddings used in a pre-planned pooling operation can be assigned to lpad. In the examples of Figures 7 and 8, the maximum number of left paddings is 3 (lpad=3).

Third, the size of the target array 81 can be assigned to the valid. In the examples of FIGS. 7 and 8, the size of the target array is 5 (valid=5).

Fourth, the maximum number of right paddings used in the planned pooling operation can be assigned to the rpad. In the examples of Figures 7 and 8, the maximum number of right paddings is 3 (rpad=3).

Fifth, a variable array (second array) 70 having the same size as the size of the pooling window used in the pooling operation can be prepared.

Sixth, the value of the variable array pointer (first pointer) can be initialized so that the variable array pointer (first pointer) points to the first index of the variable array 70. In the example of FIG. 7, the variable array pointer (first pointer) may be initialized to point to index 1, which is the first index of the variable array 70.

Seventh, the value of the output array pointer (second pointer) can be initialized so that it points to the first index of the output array 160 prepared in advance. In the example of FIG. 7, the second pointer may be initialized to point to index 0, which is the first index of the output array 160.

In step S1130, it is determined whether lpad is greater than 0. If lpad is greater than 0, the process proceeds to step S1210. Otherwise, the process proceeds to step S1310.

In step S1310, it is determined whether the valid is greater than 0. If the valid is greater than 0, the process proceeds to step S1320. Otherwise, the process proceeds to step S1410.

Referring to FIG. 9B, in step S1210, a predetermined padding value (vp) may be stored in the position indicated by the variable array pointer in the variable array 70.

As shown in FIG. 8, if the pooling operation is MAX pooling, the padding value (vp) may be the smallest value that can be expressed.

In another embodiment, the padding value (vp) may be defined as a different value. For example, if the pooling operation is average pooling, which outputs the average value of data within the pooling window, the padding value (vp) may be 0 (0).

In step S1220, lpad may be reduced by the first value.

In step S1230, the value of the variable array pointer (first pointer) may be increased by the word size. And at this time, the increase may be a circulation increase. Then, the process may return to step S1130 of FIG. 9A.

Referring to FIG. 9C, in step S1320, the value of the target array 81 indicated by the source index (s_idx), that is, the source, is placed at the position pointed to by the variable array pointer (first pointer) of the variable array 70. The value of the element of the target array 81 pointed to by the index (s_idx) can be stored.

In step S1330, the valid may be decreased by the second value and the source index (s_idx) may be increased by the third value.

In step S1340, the result of the pooling operation on the data of the variable array 70 may be stored in the position pointed to by the output array pointer (second pointer) of the output array 160.

In step S1350, the value of the variable array pointer (first pointer) may be cyclically increased by the word size, and the output array pointer (second pointer) may be increased by the word size. Then, the process may return to step S1130 of FIG. 9A.

Referring to FIG. 9D, it can be determined whether rpad is greater than 1 in step S1410. If rpad is greater than 1, the process proceeds to step S1420. Otherwise, the process proceeds to step S1500 and the pooling operation method is terminated.

In step S1420, the predetermined padding value (vp) may be stored in the position indicated by the variable array pointer in the variable array 70.

In step S1430, the rpad may be decreased by the fourth value.

In step S1440, the result of the pooling operation on the data of the variable array 70 may be stored in the position pointed to by the output array pointer (second pointer) of the output array 160.

In step S1450, the value of the variable array pointer may be cyclically increased by the word size, and the output array pointer (second pointer) may be increased by the word size. Then, the process may return to step S1130 of FIG. 9A.

Figure 10 shows an actual example of the pooling operation method provided according to an embodiment of the present invention shown in Figure 9.

FIG. 10 is an example of applying the pooling operation method shown in FIG. 9 to the target array 81 illustrated in FIG. 7.

Figure 10 shows the register table 50, variable array 70, and output array 160 showing the values of source index (s_idx), lpad, valid, and rpad. presented. The output array 160 presented only some elements where the output array pointer (second pointer) was located, and did not present the remaining elements.

In steps 100 to 103 shown in FIG. 10, the values of the variable array 70 are not confirmed.

Each value of the output array 160 illustrated in FIG. 7 may be determined by performing a pooling operation targeting the variable array 70. Since at least some of the element values of the variable array 70 are not determined in steps 100 to 103, an operation for determining an element of the output array 160 cannot be performed in steps 100 to 103.

In a total of eight steps, steps 104 to 111 shown in FIG. 10, each element of the output array 160 is determined.

In step 100 of FIG. 10, the values of source index (s_idx), lpad, valid, and rpad are 1, 3, 5, respectively, by step S1120 of FIG. 9a. It is determined to be 3, and the value of the first pointer is initialized to point to the first index (0) of the variable array 70. In Figure 10, the location of the first pointer is indicated by a black inverted triangle.

Each k-th step (step k) (k=101, ..., 111) may start from step S1130 of FIG. 9.

Hereinafter, the value of each element of the register table 50 shown on the right side of the k-th step (k=101, ..., 111) and the value of each element of the variable array 70 are expressed in the k-th step. This shows the state just before step (S1130) is executed.

And the position of the first pointer (black inverted triangle) shown on the variable array 70 shown to the right of the k-th step (k=101, ..., 111) is the step (S1130) in the k-th step. It shows the state just before execution.

And the position of the second pointer (arrow) shown above the output array 160 shown to the right of the k-th step (k=101, ..., 111) is immediately before the step (S1130) is executed in the k-th step. It indicates the state of.

And the value displayed for each element of the variable array 70 shown to the right of the k-th step (k=101, ..., 111) represents the value when the k-th step is completed. 'vp' is a predefined padding value, and the symbol '?' indicates that the value of the element is not a valid value.

The state shown in step 101 of FIG. 10 is the result of executing steps S1130, S1210, S1220, and S1230 of FIG. 9. As a result, the padding value is stored at index 0 of the variable array 70.

The state shown in step 102 of FIG. 10 is the result of sequentially executing steps S1130, S1210, S1220, and S1230 in FIG. 9. As a result, the padding value is stored at index 1 of the variable array 70.

The state shown in step 103 of FIG. 10 is the result of sequentially executing steps S1130, S1210, S1220, and S1230 in FIG. 9. As a result, the padding value is stored at index 2 of the variable array 70.

In the states shown in steps 104 to 108 of FIG. 10, steps S1130, S1310, S1320, S1330, S1340, and S1350 of FIG. 9 are sequentially performed. This is the result of running . The result value (po[ ]) of the pooling operation on the values stored in the variable array 70 is stored in the location of the output array 160 pointed to by the second pointer.

The states shown in steps 109 to 110 of FIG. 10 are step (S1130), step (S1310), step (S1410), step (S1420), step (S1430), step (S1440), and step (S1440) of FIG. 9. This is the result of (S1450) being executed sequentially. The result value (po[ ]) of the pooling operation on the values stored in the variable array 70 is stored in the location of the output array 160 pointed to by the second pointer.

The state shown in step 111 of FIG. 10 is the result of sequentially executing steps S1130, S1310, S1410, and S1500 in FIG. 9. The result value (po[ ]) of the pooling operation on the values stored in the variable array 70 is stored in the location of the output array 160 pointed to by the second pointer, and the pooling operation method is terminated.

The above-described embodiment of the convolution operation and the embodiment of the pooling operation include a process for determining the value to be stored in the variable array 70. This process is explained with reference to Figures 11A and 11B.

Executing the first instruction by a computing device may mean executing steps executed by the first instruction.

In order for the computing device to execute the first instruction, the values of the three variables necessary for executing the first instruction are initialized, and the first instruction is initialized so that the first instruction can control the values of the three variables. may need to be associated with the above three variables. After the three variables are initialized, each time the first instruction is executed, values required for a pre-planned operation may be determined and loaded into the first register.

When the first instruction is executed once, a value stored in memory may be loaded into the first register, but under certain conditions, a specific predetermined constant value may be loaded into the first register. In this specification, the specific predetermined constant value may be referred to as a padding value. The name of the first instruction provided according to an embodiment of the present invention may be, for example, LwP1 or LoadWithPad1, but is not limited thereto.

Now, with reference to FIG. 11A, a method of initializing variables for executing the first instruction provided according to an embodiment of the present invention will be described.

The variable initialization method for executing the first instruction provided according to an embodiment of the present invention includes the following steps.

In step S2110, the computing device may obtain the first address of the area of the memory corresponding to the target array that is the target of the pre-planned operation.

In one embodiment, the memory may be DRAM or SRAM.

In this specification, the address of the memory may be referred to as an index.

In step S2120, the computing device assigns the obtained first address to s_idx, assigns the maximum number of left paddings used in the preplanned operation to lpad, and assigns the size of the target array to valid. can do.

The steps S2110 and S2120 may be understood as initializing a set of variables including lpad, s_idx, and valid.

The steps S2110 and S2120 can be explained by integrating them into step S2100. At this time, step S2100 can be understood as a step of initializing a set of variables including lpad, s_idx, and valid.

The size of the target array may be the number of data elements included in the target array.

Now, with reference to FIG. 11B, a method of executing the first instruction provided according to an embodiment of the present invention will be described.

When the computing device calls the first instruction, the method for executing the first instruction begins.

The method of executing the first instruction requires control rights to a set of variables including pad, s_idx, and valid, write rights to a predetermined first register, and read rights to memory.

A method of executing the first instruction provided according to an embodiment of the present invention includes the following steps.

In step S2130, it is checked whether lpad is greater than 0 (zero). If it is greater than 0, proceed to step (S2210), and if not greater than 0, proceed to step (S2310).

In step S2310, it is checked whether valid is greater than 0 (zero). If it is greater than 0, proceed to step (S2320), and if not greater than 0, proceed to step (S2420).

In step S2210, a predetermined padding value is stored in the first register. Next, in step S2220, lpad is decreased by the first value and then execution of the first instruction is terminated.

In step S2320, the value stored at the memory address indicated by s_idx is stored in the first register. Next, in step S2330, valid is decreased by the second value, s_idx is increased by the third value, and execution of the first instruction is terminated.

Here, it is assumed that the difference between two adjacent address values of the memory is 1. If the nominal value of the difference between two adjacent address values of the memory is ndv, s_idx may be increased by ndv instead of 1 in step S2330.

In step S2420, the predetermined padding value is stored in the first register and then execution of the first instruction is terminated.

When the method of executing the first instruction provided according to an embodiment of the present invention shown in FIG. 11B is executed once, the value stored in the first register is updated once.

When the first instruction is executed once, at least one of the lpad, the valid, and the s_idx is changed. In this way, the first instruction may be executed again with at least one of the lpad, valid, and s_idx changed.

In one embodiment, a higher-level application using the first instruction may call the first instruction N times after executing the variable initialization method for executing the first instruction once. As a result, the value stored in the first register can be updated N times. The higher-level application can use the value of the first register updated N times in real time or store it in a separate array (eg, the variable array) and use it for a pre-planned operation.

For example, the pre-planned operation may be a convolution operation, and the higher-level application may be an operation that executes the convolution operation. In another example, the pre-planned operation may be a pooling operation and the higher level application may be a program that executes the pooling operation. In another example, the pre-planned operation may be an arbitrary operation and the higher level application may be a program that executes the arbitrary operation.

In order for the computing device to execute the second instruction, the values of four variables required for executing the second instruction are initialized, and the second instruction is initialized so that the second instruction can control the values of the four variables. may need to be associated with the above four variables. After the four variables are initialized, each time the second instruction is executed, values required for a pre-planned operation may be determined and loaded into the first register.

The name of the second instruction provided according to another embodiment of the present invention may be, for example, LwP2 or LoadWithPad2, but is not limited thereto.

Now, with reference to FIG. 12A, a method of initializing variables for executing a second instruction provided according to another embodiment of the present invention will be described.

A variable initialization method for executing a second instruction provided according to another embodiment of the present invention includes the following steps.

In step S2120, the computing device assigns the obtained first address to s_idx, assigns the maximum number of left paddings used in the preplanned operation to lpad, and assigns the size of the target array to valid. , and the maximum number of right paddings used in the preplanned operation can be given to rpad.

The steps S2110 and S2120 may be understood as initializing a set of variables including lpad, s_idx, valid, and rpad.

The steps S2110 and S2120 can be explained by integrating them into step S2100. At this time, step S2100 can be understood as a step of initializing a set of variables including lpad, s_idx, valid, and rpad.

Now, with reference to FIG. 12B, a method of executing the second instruction provided according to another embodiment of the present invention will be described.

When the computing device calls the second instruction, the second instruction execution method begins.

The second instruction execution method requires control rights to a set of variables including pad, s_idx, valid, and rpad, write rights to a predetermined first register, and read rights to memory.

A method of executing the second instruction provided according to another embodiment of the present invention includes the following steps.

In step S2130, it is checked whether lpad is greater than 0 (zero). If it is greater than 0, proceed to step (S2210), and if it is not greater than 0, proceed to step (S2310).

In step S2310, it is checked whether valid is greater than 0 (zero). If it is greater than 0, proceed to step (S2320), and if not greater than 0, proceed to step (S2410).

In step S2410, it is checked whether rpad is greater than 1. If it is greater than 1, it proceeds to step (S2420), and if it is not greater than 1, it ends.

In step S2210, a predetermined padding value is stored in the first register. Next, in step S2220, lpad is decreased by the first value and then execution of the second instruction is terminated.

In step S2320, the value stored at the memory address indicated by s_idx is stored in the first register. Next, in step S2330, valid is decreased by the second value, s_idx is increased by the third value, and execution of the second instruction is terminated.

If the nominal value of the difference between two adjacent address values of the memory is ndv, s_idx can be increased by ndv in step S2330.

In step S2420, the predetermined padding value is stored in the first register. Otherwise, in step S2430, rpad is decreased by the fourth value and then execution of the second instruction is terminated.

By using the above-described embodiments of the present invention, those in the technical field of the present invention will be able to easily make various changes and modifications without departing from the essential characteristics of the present invention. The contents of each claim in the patent claims can be combined with other claims without reference within the scope that can be understood through this specification.

Claims

As an instruction execution method in which a computing device executes an instruction,

When the instruction is called, a first step of comparing a first variable (lpad) with a predetermined first constant; and

A second step of terminating execution of the instruction after storing a predetermined padding value in a first register or a value stored in a memory;

Includes,

In the second step,

If the first variable is greater than the first constant, store the predetermined padding value in the first register, decrease the first variable by the first value, and then terminate execution of the first instruction. It is done,

If the first variable is not greater than the first constant, the second variable (valid) is compared with a predetermined second constant,

If the second variable is greater than the second constant, the value stored at the memory address indicated by the third variable (s_idx) is stored in the first register, and the second variable is decreased by the second value, And the execution of the first instruction is terminated after increasing the third variable by a predetermined third value, and

If the second variable is not greater than the second constant, execution of the first instruction is terminated after storing the predetermined padding value in the first register.

Instruction execution method.
According to paragraph 1,

Before calling the instruction, it further includes initializing a first variable (lpad), a second variable (valid), and a third variable (s_idx),

The first variable (lpad) is initialized to have the maximum number of padding used in a predetermined operation,

The second variable (valid) is initialized to have the size of the target array that is the target of the predetermined operation, and

The third variable (s_idx) is initialized to have the first address among addresses representing an area of memory containing information corresponding to the target array,

Instruction execution method.
The method of claim 1, wherein the predetermined third value is a difference value between addresses representing two adjacent pieces of data in an area of the memory.
The method of claim 1, wherein the first constant is 0 (zero) and the second constant is 0 (zero).
As a calculation method in which a computing device executes a predetermined calculation,

Initializing a first variable (lpad), a second variable (valid), and a third variable (s_idx);

determining the values of all elements of a predetermined variable array by executing a predetermined instruction execution method once or repeatedly; and

executing the pre-planned operation on the variable array for which values of all the elements have been determined, thereby generating data for at least some of the pre-planned output array;

Includes,

The method of executing the given instruction is,

When a call to a predetermined instruction occurs, the first step is to compare the first variable (lpad) with a predetermined first constant, and the predetermined padding value or the value stored in the memory is stored in the first register and then the instruction an instruction execution step including a second step of terminating execution of

Writing the value stored in the first register to the variable array

Including,

Calculation method.
According to clause 5,

The first variable (lpad) is initialized to have the maximum number of padding used in the predetermined operation,

The second variable (valid) is initialized to have the size of the target array that is the target of the predetermined operation, and

The third variable (s_idx) is initialized to have the first address among addresses representing an area of memory containing information corresponding to the target array,

Calculation method.
According to clause 5,

In the second step,

If the first variable is greater than the first constant, store the predetermined padding value in the first register, decrease the first variable by the first value, and then terminate execution of the first instruction. It is done,

If the first variable is not greater than the first constant, the second variable (valid) is compared with a predetermined second constant,

If the second variable is greater than the second constant, the value stored at the memory address indicated by the third variable (s_idx) is stored in the first register, and the second variable is decreased by the second value, And the execution of the first instruction is terminated after increasing the third variable by a predetermined third value, and

If the second variable is not greater than the second constant, execution of the first instruction is terminated after storing the predetermined padding value in the first register.

Calculation method.
In clause 7,

The first constant is 0 (zero),

The second constant is 0 (zero),

The first value is 1,

The second value is 1,

The predetermined third value is a difference value between addresses representing two adjacent data in an area of the memory,

Calculation method.
The method according to claim 5, wherein the predetermined operation is a pooling operation.
The method of claim 9, wherein the size of the variable array is equal to the size of a pooling window for the pooling operation.
The method according to claim 5, wherein the predetermined operation is a convolution operation.
The method of claim 11, wherein the size of the variable array is the same as the size of the kernel used for the convolution operation.
According to clause 5,

Whenever the instruction execution method is executed, the value stored in the first register is written to an element pointed to by a predetermined first pointer in the variable array,

Each time the instruction execution method is executed, the element of the variable array pointed to by the first pointer is changed.

Calculation method.
A computing device comprising a hardware accelerator (110),

The hardware accelerator is configured to execute the instruction execution method of any one of claims 1 to 4 or the calculation method of any one of claims 5 to 13,

Computing device.