WO2023042989A1

WO2023042989A1 - Add operation method considering data scale, hardware accelerator therefor, and computing device using same

Info

Publication number: WO2023042989A1
Application number: PCT/KR2022/006216
Authority: WO
Inventors: 정태영
Original assignee: 오픈엣지테크놀로지 주식회사
Priority date: 2021-09-16
Filing date: 2022-04-29
Publication date: 2023-03-23
Also published as: KR102395744B1

Abstract

Disclosed is an operation method comprising the steps of: generating one set of pieces of convolution data by convoluting, for each input channel, one set of first split data obtained by splitting first input data for each input channel and one set of second split data obtained by splitting a particular output channel of second input data for each input channel; determining a scale representing each piece of convolution data on the basis of a statistical value of values configuring each piece of convolution data of the one set of pieces of convolution data; generating intermediate data by performing an add operation of first convolution data represented in a first scale and second convolution data represented in a second scale among the one set of pieces of convolution data; and after the step of generating the intermediate data, performing, by a computing device, an add operation of the intermediate data and third convolution data represented in a third scale among the one set of pieces of convolution data to calculate channel-specific output data for a particular output channel corresponding to the particular output channel of the second input data. The third scale is not smaller than the first scale, and the third scale is not smaller than the second scale.

Description

Addition operation method considering data scale, hardware accelerator for the same, and computing device using the same

The present invention relates to a technique for performing an operation in a computing device, and more particularly, to an addition operation technique considering the scale of a number.

Signal processing technology used to implement artificial intelligence such as neural networks may be implemented as software or as a hardware accelerator for fast processing. In a neural network used for machine learning, there are many layers that perform various calculations, and a lot of data can be calculated in each layer. When these data operations are performed by a hardware accelerator, a problem may occur due to an environment in which the size of an internal memory or an internal buffer provided inside the hardware accelerator is limited. That is, if the size of one set of data to be operated is smaller than the size of the internal memory or internal buffer, the data of one set is divided into two subsets, each is calculated separately, sub result values are calculated, and then the sub results are calculated. It is necessary to go through the process of combining the resulting values again. At this time, the sub-result values go through a process of being stored in an internal memory or an internal buffer and then read again, and in this process, there is a problem that unwanted quantization errors of data may occur. This quantization error may be an error that does not occur unless one set of data is divided into two subsets and separately calculated.

Prior art related to quantization of data in neural network technology includes Korean Patent Application Nos. 1020217011986, 1020200110330, 1020170150707, 1020200082108, and 1020207038081.

Hereinafter, technical contents that should be known for the understanding of the present invention will be briefly described using FIGS. 1 to 5. These contents are prior knowledge known to the inventor of the present invention, and at least some of them may be contents that have not been disclosed to unspecified persons at the time of filing the present patent application.

The present invention uses the concept of a scale of numbers or data used by a computing device. Computing devices express numbers in the form of N-bit numbers using binary numbers. In this case, the N-bit number includes a most significant bit (MSB) and a least significant bit (LSB). Here, the scale of the N-bit number may be defined as the size of a number represented by the LSB of the N-bit number. It can be defined by the minimum value other than 0 (zero) that can be represented by the N-bit number. To help understand the concept of the scale, for example, two decimal numbers '128' and '1' each represented by 2 bits may be considered. Here, the decimal number '128' may be expressed as '01' according to the binary notation, and the decimal number '1' may be expressed as '01' according to the binary notation. At this time, the first scale, which is a scale of a 2-bit number representing the decimal number '128', is a value proportional to the decimal number 128, and the second scale, which is a scale of a 2-bit number representing the decimal number '1', is a value proportional to the decimal number 1. am. That is, the first scale is 128 times larger than the second scale.

1A illustrates a configuration of an input activation 710, which is one of objects for mathematical operations according to the present invention. In this specification, input activation may also be referred to as first input data.

The input activation 710 may be a three-dimensional array consisting of a first dimension, a second dimension, and a third dimension. The first dimension, the second dimension, and the third dimension of the input activation 710 may be referred to as an input channel dimension, a height dimension, and a width dimension, respectively. The input activation 710 shown in FIG. 1A is an example in which the size ci of the first dimension, the size h of the second dimension, and the size w of the third dimension are 3, 2, and 4, respectively. The data size of the input activation 710 shown in FIG. 1A is proportional to ci*h*w.

In this specification, the portion of the input activation 710 where the index of the input channel is k can be expressed as 'input activation _[ci]=k '. In FIG. 1A , input activation _{[ci] = 1} is indicated by reference numeral 711 , input activation _{[ci] = 2} is indicated by reference numeral 712 , and input activation _{[ci] = 3} is indicated by reference numeral 713 .

Scales of the input activation _{[ci] = 1} (711), the input activation _{[ci] = 2} (712), and the input activation _{[ci] = 3} (713) are the same. That is, the scales of all numbers constituting the input activation 710 are the same, and the scales can be expressed as 'sc_ai1'. For example, sc_ai1 may be the decimal number 1 or the decimal number 128.

1B shows the configuration of a weight 740, which is another one of the objects of mathematical operation according to the present invention.

In this specification, the weight 740 may also be referred to as second input data.

The weight 740 may be a 4-dimensional array consisting of a first dimension, a second dimension, a third dimension, and a fourth dimension. The first dimension, the second dimension, the third dimension, and the fourth dimension of the weight 740 may be referred to as an output channel dimension, an input channel dimension, a height dimension, and a width dimension, respectively. The weight 740 shown in FIG. 1B is 2, 3, 2, and 2 in the first dimension co, the second dimension ci, the third dimension r, and the fourth dimension s, respectively. Yes. The data size of the weight 740 presented in FIG. 1B is proportional to co*ci*r*s.

In this specification, the portion of the weight 740 where the index of the output channel is k can be expressed as 'weight _[co]=k '. In FIG. 1B, the weight _[co]=1 is indicated by reference numeral 741, and the weight _[co]=2 is indicated by reference numeral 742.

The first scale, which is the scale of all numbers constituting the weight _{[co] = 1} (741), is the same, and the first scale can be expressed as 'sc_w1'.

The second scale, which is the scale of all numbers constituting the weight _{[co] = 2} (742), is the same, and the second scale can be expressed as 'sc_w2'.

The first scale sc_w1 and the second scale sc_w2 are values that can be set independently of each other.

For example, the first scale sc_w1 may be proportional to the decimal number 1, and the second scale sc_w2 may be proportional to the decimal number 128.

1C shows an example in which the input activation 710 includes six input channels 711 to 716.

FIG. 1D shows an example in which the weight 740 is composed of two

output channels

741 and 742, and each output channel is composed of 6 input channels (ex: 7411 to 7416).

2A to 2C are conceptual diagrams illustrating a convolution operation between the input activation 710 and the weight 740 .

As the symbols presented in FIGS. 2A to 2C and in this specification, the circular symbol surrounding the letter 'x' includes a first mathematical operation object disposed to the left of the symbol and a second mathematical operation object disposed to the right of the symbol. It is a symbol representing the convolution operation between

As shown in FIG. 2A , an output activation 750 may be generated by performing a convolution operation on the input activation 710 and the weight 740 . In this specification, output activation may also be referred to as output data.

The output activation 750 may be a three-dimensional array consisting of a first dimension, a second dimension, and a third dimension. The first dimension, the second dimension, and the third dimension of the output activation 750 may be referred to as an output channel dimension, a height dimension, and a width dimension, respectively. The output activation 750 shown in FIGS. 2A to 2C is an example in which the size co of the first dimension, the size ho of the second dimension, and the size wo of the third dimension are 2, 2, and 3, respectively. The data size of the output activation 750 shown in FIGS. 2A to 2C is proportional to co*ho*wo.

In this specification, the part where the index of the output channel is k among the output activations 750 can be expressed as 'output activation _[co]=k '. 2A to 2C, output activation _{[co] = 1} is indicated by reference numeral 751, and output activation _{[co] = 2} is indicated by reference numeral 752.

Figure 2b shows how to generate the output activation _{[co] = 1} (751). The output activation _{[co] = 1} (751) is generated by a convolution operation of the input activation 710 and the weight _{[co] = 1} (741).

Figure 2c shows how to generate the output activation _{[co] = 2} (752). The output activation _{[co] = 2} (752) is generated by a convolution operation of the input activation 710 and the weight _{[co] = 2} (742).

3A shows the main structure of some of the computing devices used in an embodiment of the present invention.

The computing device 1 includes a dynamic random access memory (DRAM) 130, a hardware accelerator 110, a bus 700 connecting the DRAM 130 and the hardware accelerator 110, and other devices connected to the bus 700. hardware 99 and main processor 160 . Here, the DRAM 130 may be referred to as a memory 130 .

In addition, the computing device 1 may further include a power supply unit, a communication unit, a user interface, a storage unit 170, and peripheral units not shown. The bus 700 may be shared by the hardware accelerator 110, other hardware 99, and the main processor 160.

The hardware accelerator 110 includes a DMA unit (Direct Memory Access part) 20, a control unit 40, an internal memory 30, an input buffer 650, a data operation unit 610, and an output buffer 640 can do.

Some or all of data temporarily stored in the internal memory 30 may be provided from the DRAM 130 through the bus 700 . At this time, the controller 40 and the DMA unit 20 may control the internal memory 30 and the DRAM 130 to move data stored in the DRAM 130 to the internal memory 30 .

Data stored in the internal memory 30 may be provided to the data calculator 610 through the input buffer 650 .

Output values generated by the operation of the data calculator 610 may be stored in the internal memory 30 via the output buffer 640 . The output values stored in the internal memory 30 may be written to the DRAM 130 under the control of the control unit 40 and the DMA unit 20 .

The control unit 40 may collectively control the operations of the DMA unit 20, the internal memory 30, and the data operation unit 610.

In one embodiment, the data calculator 610 may perform a first calculation function during a first time period and a second calculation function during a second time period.

In FIG. 3A , one data calculation unit 610 is presented within the hardware accelerator 110 . However, in a modified embodiment not shown, a plurality of data calculation units 610 shown in FIG. 3A may be provided in the hardware accelerator 110 to perform operations requested by the control unit 40 in parallel, respectively. there is.

In one embodiment, the data operation unit 610 may sequentially output the output data according to a given order according to time rather than outputting them all at once.

3B to 3E compare the size of a storage space for storing a mathematical operation target for a convolution operation with the size of the mathematical operation target. The buffer may be part of the internal memory 30 shown in FIG. 3A. For example, in the internal memory 30, a first storage space allocated for the input activation 710 may be defined, and a second storage space allocated for a weight may be defined. The sizes of the first storage space and the second storage space may be limited.

As shown in FIG. 3B, when the size of the input activation 710 is greater than the size of the first storage space, a problem arises in that the entirety of the input activation 710 cannot be input to the first storage space. To solve this problem, as shown in FIG. 3C, the input activations 710 may be split for each input channel, and only the

input activations

711 and 712, for example, may be stored in the first storage space and used.

Likewise, as shown in FIG. 3D, when the size of the weight 740 is larger than the size of the second storage space, a problem arises in that the entire weight 740 cannot be input to the second storage space. To solve this problem, as shown in FIG. 3E, the weight 740 is split for each input channel, so that, for example, the

input activations

7411 and 7412 of the first output channel and the

input activations

7421 and 7421 of the second output channel 7422) can be stored and used in the second storage space.

Due to the small size of the internal memory, if one of the input activation 710 and the weight 740 is split for each input channel, the other one may also need to be split for each input channel.

4 illustrates a concept of splitting weights for each input channel for convolution operation.

In this specification, a portion of the weight 740 in which the output channel index is k and the input channel index is j can be expressed as 'weight _{[co] = k, [ci] = j} '. In FIG. 4, weight _{[co]=1,[ci]=1} , weight _{[co]=1,[ci]=2} , weight _{[co]=1,[ci]=3} , weight _{[co]=2,[ ci]=1} , weight _{[co]=2, [ci]=2} , weight _{[co]=2, [ci]=3} are denoted by

reference numerals

7411, 7412, 7413, 7421, 7422, and 7423, respectively.

FIG. 5 illustrates a method of calculating the output activation 750 shown in FIG. 2(b) using the split data.

In the embodiment shown in FIG. 5, since the input activation 710 and the weight 740 are split for each input channel, it can be assumed that the size of the buffer required for the convolution operation is sufficiently large.

It will be described with reference to FIG. 5 below.

By performing a convolution operation on the input activation _{[ci] = 1} (711) and the weight _{[co] = 1, [ci] = 1} (7411), the output activation _{[co] = 1, [ci] = 1} ( 7511) can be calculated.

By performing a convolution operation on the input activation _{[ci] = 2} (712) and the weight _{[co] = 1, [ci] = 2} (7412), the output activation _{[co] = 1, [ci] = 2} ( 7512) can be calculated.

By performing a convolution operation on the input activation _{[ci] = 3} (713) and the weight _{[co] = 1, [ci] = 3} (7413), the output activation _{[co] = 1, [ci] = 3} ( 7513) can be calculated.

By performing a convolution operation on the input activation _{[ci] = 1} (711) and the weight _{[co] = 2, [ci] = 1} (7421), the output activation _{[co] = 2, [ci] = 1} ( 7521) can be calculated.

By performing a convolution operation on the input activation _{[ci] = 2} (712) and the weight _{[co] = 2, [ci] = 2} (7422), the output activation _{[co] = 2, [ci] = 2} ( 7522) can be calculated.

By performing a convolution operation on the input activation _{[ci] = 3} (713) and the weight _{[co] = 2, [ci] = 3} (7423), the output activation _{[co] = 2, [ci] = 3} ( 7523) can be calculated.

Now, the output activation _{[co] = 1, [ci] = 1} (7511), the output activation _{[co] = 1, [ci] = 2} (7512), and the output activation _{[co] = 1, [ci ] = 3} (7513), the output activation _{[co] = 1} (751) may be calculated by performing an element-wise adding operation (P101).

and the output activation _{[co]=2,[ci]=1} (7521), the output activation _{[co]=2,[ci]=2} (7522), and the output activation _{[co]=2,[ci] = 3} (7523), the output activation _{[co] = 2} (752) may be calculated by performing an element-wise adding operation (P102).

Then, the output activation 750 may be generated by combining the output activation _[co]=1 (751) and the output activation _[co]=2 (752).

In the process of the above-mentioned addition operation for each element (P101, P102), a process of recording each output activation _{[co] = k, [ci] = j} in a buffer may be performed. In this process, quantization errors of data may occur. there is.

An object of the present invention is to provide a technique for reducing quantization errors generated in the process of processing or dividing data into two or more groups when calculating or processing data in a hardware accelerator.

An operation method provided according to an aspect of the present invention relates to a specific method of element-by-element addition operation (P101, P102) to improve the above-described quantization error.

In an operation method provided according to one aspect of the present invention, a computing device performs a convolution on first input data 710 and second input data 740 for each input channel to obtain a set of convolutional data (7012, 7034). , 7056 or 7511 to 7513); The computing device sets scales (sc_co_ci1,2, sc_co_ci3,4, and sc_co_ci5,6 representing each convolution data based on statistical values of values constituting each convolution data of the set of convolution data). ) determining; The computing device performs an addition operation on first convolution data 7012 represented by a first scale and second convolution data 7034 represented by a second scale among the set of convolution data, generating intermediate data 750p; And the computing device, after generating the intermediate data, performs an addition operation on third convolution data 7056 expressed in a third scale among the set of convolution data and the intermediate data, , calculating the output data 750; may include. In this case, the third scale may not be smaller than the first scale, and the third scale may not be smaller than the second scale.

In this case, the step of generating the one set of convolution data includes outputting one set of first split data 711 to 716 obtained by splitting the first input data for each input channel and one set of the second input data. generating the set of convolutional data by convolving a set of second split data (7411 to 7416) obtained by splitting the channel 741 for each input channel; and calculating the output data may include performing an addition operation on the third convolution data 7056 and the intermediate data to calculate output data corresponding to the set of output channels of the second input data. steps may be included.

At this time, the set of output channels is any one specific output channel among a plurality of output channels constituting the second input data, and the output data corresponding to the set of output channels of the second input data is the specific output channel. It may be output data corresponding to an output channel.

In this case, the step of performing convolution for each input channel to generate a set of convolution data includes performing convolution on the first split data set and the second split data set for each input channel to correspond to each input channel. generating a set of input channel convolution data (7501 to 7506, or 7511 to 7516) consisting of input channel convolution data of and generating the one set of convolution data by grouping the one set of input channel convolution data.

At this time, each of the convolution data is the same as one input channel convolution data among the set of input channel convolution data, or two or more input channels among the set of input channel convolution data. It may be calculated by performing an element-by-element addition operation on the convolution data.

At this time, the calculation method may include, in order to determine the group, the computing device calculates a range of values of elements constituting each of the second split data and determines a set of ranges (rg_w_co_ci1 to rg_w_co_ci6) ; and grouping, by the computing device, the one set of input channel convolution data based on the one set of ranges.

Alternatively, the calculation method may include, in order to determine the group, the computing device calculates a range of values of elements constituting each of the input channel convolution data and determines a set of ranges (rg_co_ci1 to rg_co_ci6) step; and grouping, by the computing device, the one set of input channel convolution data based on the one set of ranges.

In this case, the step of generating the one set of convolution data may include convolution of the first split data of the one set and the second split data of the one set for each input channel to obtain an input channel convolution corresponding to each input channel. and generating a set of input channel convolution data consisting of data, wherein each convolution data is identical to one input channel convolution data among the set of input channel convolution data. can

At this time, the computing device includes the step of generating the set of convolution data, the step of determining, the step of generating the intermediate data, and the step of calculating the set of convolution data for all output channels included in the second input data. step, the computing device may be configured to combine output data for each channel generated for each output channel included in the second input data to generate output data including all the output channels. there is.

In this case, the first input data is an input activation, the second input data is a weight, the output data is an output activation, and a dimension of the weight may be greater than a dimension of the input activation.

At this time, the input activation includes a plurality of first input channel data, each of the first input channel data is a two-dimensional array, the weight includes a plurality of output channel data, and each of the output channel data is It includes a plurality of second input channel data, and each of the second input channel data may be a two-dimensional array.

A computing device having a hardware accelerator 110 provided according to one aspect of the present invention, wherein the hardware accelerator is adapted to obtain first input data and second input data, the first input data and the second input data A set of convolution data is generated by convolving data for each input channel, and each convolution data is expressed based on a statistical value of values constituting each convolution data of the set convolution data. It is configured to determine a scale to perform an addition operation on the first convolution data represented by the first scale and the second convolution data represented by the second scale among the set of convolution data to obtain intermediate data And after the step of generating the intermediate data, by performing an addition operation on the intermediate data and third convolution data expressed in a third scale among the set of convolution data, output data may be designed to produce In this case, the third scale may not be smaller than the first scale, and the third scale may not be smaller than the second scale.

At this time, the step of generating the one set of convolution data includes the first split data obtained by splitting the first input data for each input channel and the one set of output channels of the second input data for each input channel. generating a set of convolution data by convolving a set of second split data obtained by splitting for each input channel; And the calculating of the output data may include calculating output data corresponding to the set of output channels of the second input data by performing an addition operation on the third convolution data and the intermediate data; Further, the hardware accelerator 110 includes an internal memory 30, the size of the internal memory is smaller than the data size of all the second input data, and one set of output channels of the second input data may be greater than the size of split data obtained by splitting for each input channel.

According to the present invention, when calculating or processing data in a hardware accelerator, it is possible to provide a technique for reducing quantization errors occurring in the process of dividing data into two or more groups and processing them.

1A shows the configuration of input activation, which is one of the subjects of mathematical operation according to the present invention.

1B shows the configuration of a weight, which is another one of the objects of mathematical operation according to the present invention.

1C shows an example in which input activation consists of 6 input channels.

1D shows an example in which weights are composed of two output channels and each output channel is composed of six input channels.

2A to 2C are conceptual diagrams illustrating a convolution operation between the input activation and the weight.

Figure 3a shows the main structure of some of the computing devices used in an embodiment of the present invention, Figures 3b to 3e is the size of the storage space for storing a mathematical operation target for convolution operation and the mathematical operation target size comparison.

FIG. 5 illustrates a method of calculating the output activation shown in FIG. 2(b) using the split data.

6A to 6C are flowcharts illustrating a method of calculating output activations by performing a convolution operation on input activations and weights according to an embodiment of the present invention.

7A to 7C illustrate a method of calculating output activation by performing a convolution operation on an input activation and a weight according to an embodiment of the present invention.

8 is a flowchart illustrating a method of generating output data by performing an operation on two input data according to an embodiment of the present invention.

9A illustrates a convolution operation process between an input activation composed of 6 input channels and a first output channel of a weight composed of 2 output channels.

9B illustrates a convolution operation process between the input activation and the second output channel.

FIG. 10 illustrates an embodiment of a specific method for determining input channels to belong to a specific group shown in FIG. 9A.

FIG. 11 shows another embodiment of a specific method of determining input channels belonging to a specific group shown in FIG. 9A.

12 shows a calculation method provided according to another embodiment of the present invention.

FIG. 13 illustrates an embodiment of a specific method for determining input channels to belong to a specific group shown in FIG. 12 .

FIG. 14 shows another embodiment of a specific method for determining input channels belonging to a specific group shown in FIG. 12 .

15 is a flowchart illustrating a calculation method provided according to an embodiment of the present invention.

Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. However, the present invention is not limited to the embodiments described herein and may be implemented in various other forms. Terms used in this specification are intended to aid understanding of the embodiments, and are not intended to limit the scope of the present invention. Also, the singular forms used herein include the plural forms unless the phrases clearly dictate the contrary.

In this specification, FIGS. 6a, 6b, and 6c may collectively be referred to as FIG. 6 . 7a, 7b, and 7c may collectively be referred to as FIG. 7 .

6 is a flowchart illustrating a method of calculating output activations by performing a convolution operation on input activations and weights according to an embodiment of the present invention.

In FIG. 6 , the input activation may be 3D data having the same structure as the input activation 710 illustrated in FIG. 1A , and the weight may be 4D data having the same structure as the weight 740 illustrated in FIG. 1 . can

7 illustrates a method of calculating output activations by performing a convolution operation on input activations and weights according to an embodiment of the present invention.

Contents presented in FIGS. 6A, 6B, and 6C correspond to those presented in FIGS. 7A, 7B, and 7C, respectively.

Hereinafter, it will be described with reference to FIGS. 6A and 7A together.

In step S110, a set of

first split data

711, 712, and 713 obtained by splitting the input activation 710 for each input channel may be obtained. The first output channel 741 of the weight 740 is split for each input channel to obtain a set of

second split data

7411, 7412, and 7413. Further, the set of

first split data

711, 712, and 713 and the set of

second split data

7411, 7412, and 7413 are convoluted for each input channel to obtain a set of

convolution data

7511, 7512, 7513) can be created.

In step S120, a scale representing each convolution data may be determined based on statistical values of values constituting each convolution data of the set of

convolution data

7511, 7512, and 7513. .

For example, it can be easily understood that the scale sc_co1_ci1 to be applied for the representation of the convolution data 7511 can be determined based on the distribution of values of 6 elements constituting the convolution data 7511 . Similarly, it can be easily understood that the scale sc_co1_ci2 to be applied for the representation of the convolutional data 7512 can be determined based on the distribution of the six elements constituting the convolutional data 7512 .

As such, the scales applied to the convolution data 7511, the convolution data 7512, and the convolution data 7513 may be determined as sc_co1_ci1, sc_co1_ci2, and sc_co1_ci3, respectively. Here, sc_co1_ci1, sc_co1_ci2, and sc_co1_ci3 are values that can be independently determined. Accordingly, sc_co1_ci1, sc_co1_ci2, and sc_co1_ci3 may be the same or different.

In step S130, specific expression values of the set of

convolution data

7511, 7512, and 7513 may be determined according to the determined scales sc_co1_ci1, sc_co1_ci2, and sc_co1_ci3.

In step S140, among the set of

convolution data

7511, 7512, and 7513, first convolution data 7511 expressed as 'first scale (sc_co1_ci1)' and 'second scale (sc_co1_ci2)' Intermediate data 751p may be generated by performing an addition operation on the second convolutional data 7512 represented by .

In step S150, the third convolution data 7513 expressed as 'third scale (sc_co1_ci3)' among the set of

convolution data

7511, 7512, and 7513 and the intermediate data 751p By performing an addition operation, the output activation 751 for the first output channel ([co]=1) corresponding to the first output channel ([co]=1) of the weight 740 can be calculated. That is, the output activation _{[co] = 1} (751) can be calculated.

7A shows an example in which the third scale (sc_co1_ci3) is not smaller than the first scale (sc_co1_ci1) and the third scale (sc_co1_ci3) is not smaller than the second scale (sc_co1_ci2).

Here, step S140 may be performed before step S150.

Hereinafter, it will be described with reference to FIGS. 6B and 7B together.

In step S210, a set of

first split data

711, 712, 713 obtained by splitting the input activation 710 for each input channel and the second output channel 742 of the weight 740 are converted into input channels. A set of

second split data

7421 , 7422 , and 7423 obtained by splitting each may be convoluted for each input channel to generate a set of

convolutional data

7521 , 7522 , and 7523 .

In step S220, a scale representing each convolution data may be determined based on statistical values of values constituting each convolution data of the set of

convolution data

7521, 7522, and 7523. .

For example, it can be easily understood that the scale sc_co2_ci1 to be applied for the representation of the convolution data 7521 can be determined based on the distribution of 6 elements constituting the convolution data 7521 . Similarly, it can be easily understood that the scale sc_co2_ci2 to be applied for the representation of the convolutional data 7522 can be determined based on the distribution of the six elements constituting the convolutional data 7522 .

As such, the scales applied to the convolution data 7521, the convolution data 7522, and the convolution data 7523 may be determined as sc_co2_ci1, sc_co2_ci2, and sc_co2_ci3, respectively. Here, sc_co2_ci1, sc_co2_ci2, and sc_co2_ci3 are values that can be independently determined. Accordingly, sc_co2_ci1, sc_co2_ci2, and sc_co2_ci3 may be the same or different.

In step S230, specific expression values of the set of

convolution data

7521, 7522, and 7523 may be determined according to the determined scales sc_co2_ci1, sc_co2_ci2, and sc_co2_ci3.

In step S240, among the set of

convolution data

7521, 7522, and 7523, first convolution data 7521 expressed as 'first scale sc_co2_ci1' and 'second scale sc_co2_ci2' Intermediate data 752p may be generated by performing an addition operation on the second convolutional data 7522 represented by .

In step S250, the third convolution data 7523 expressed as 'third scale (sc_co2_ci3)' among the set of

convolution data

7521, 7522, and 7523 and the intermediate data 752p By performing an addition operation, the output activation 752 for the second output channel ([co]=2) corresponding to the second output channel ([co]=2) of the weight 740 can be calculated. That is, the output activation _{[co] = 2} (752) can be calculated.

7B shows an example in which the third scale (sc_co2_ci3) is not smaller than the first scale (sc_co2_ci1) and the third scale (sc_co2_ci3) is not smaller than the second scale (sc_co2_ci2).

In a preferred embodiment, step S240 may be performed before step S250.

Hereinafter, it will be described with reference to FIGS. 6C and 7C together.

In step S310, the output activation 750 may be calculated by combining the calculated output activation _[co]=1 (751) and the output activation _[co]=2 (752).

The first process provided according to an embodiment of the present invention may include steps S110, S120, S130, S140, and S150.

The second process provided according to an embodiment of the present invention may include steps S210, S220, S230, S240, and S250.

The third process provided according to an embodiment of the present invention may include the step S310.

The first process and the second process may be performed in parallel or sequentially with a precedence relationship. The third process may be executed after both the first process and the second process are completed.

In this case, the above-described first process, second process, and third process may be executed by the main processing unit of the computing device. In this case, the computing device reads command codes for execution of the first process, the second process, and the third process from storage and stores them in a volatile memory, and the main processing unit executes the command codes to execute the first process , the second process, and the third process can be executed. The above-described buffer may be provided in a part of internal memory or volatile memory inside the main processing unit according to the command code. Also, the input activation 710 and the weight 740 may be stored in an internal memory of the main processing unit or a part of a volatile memory.

Alternatively, the above-described first process, second process, and third process may be executed by a dedicated hardware accelerator included in the computing device. In this case, the computing device reads the instruction codes for execution of the first process, the second process, and the third process from storage and stores them in a volatile memory, and the main processing unit executes the instruction codes, so that the hardware The accelerator may acquire the input activation 710 and the weight 740 from volatile memory or non-volatile memory. In this case, the buffer may exist inside the hardware accelerator.

The method may include step S100 and step S200.

In step S100, the computing device performs a predefined calculation process P10 for each output channel of the second input data having M output channels to generate output data for each channel for each output channel. can

In step S200, the computing device may generate output data by combining the M pieces of output data for each channel.

At this time, the calculation process (P10) may include step (S10), step (S20), step (S30), step (S40), and step (S50).

In step S10, the computing device includes a set of first split data obtained by splitting the first input data for each input channel and a set of split data obtained by splitting a specific output channel of the second input data for each input channel. A set of convolution data may be generated by convolving the second split data for each input channel.

In step S20, the computing device may determine a scale representing each convolution data based on a statistical value of values constituting each convolution data of the set of convolution data.

In step S30, the computing device may determine an expression value of each convolution data according to the determined scale.

In step S40, the computing device performs an addition operation on first convolution data expressed in a first scale and second convolution data expressed in a second scale among the set of convolution data, Intermediate data can be generated.

In step S50, the computing device performs an addition operation on third convolution data expressed in a third scale among the set of convolution data and the intermediate data to obtain the specific output channel of the weight. Output data for a specific output channel corresponding to can be calculated.

At this time, the third scale is not smaller than the first scale, and the third scale is not smaller than the second scale.

In a preferred embodiment, step S40 necessarily precedes step 250.

The first input data may be, for example, the input activation 710 described in FIG. 7 .

The second input data may be, for example, the weight 740 described in FIG. 7 .

Hereinafter, an embodiment of grouping and processing the input channels of the input activation 710 and the weight 741 will be described.

For convenience of description, the output activation _{[co] = 1, [ci] = 1} (7511), the output activation _{[co] = 1, [ci] = 2} (7512), and the output activation _[co] shown in FIG. _]=1,[ci]=3 (7513) respectively, partial output activation _{[co]=1,[ci]=1} (7511), partial output activation _{[co]=1,[ci]=2} (7512) , and partial output activation _{[co]=1, [ci]=3} (7513).

7A shows an example in which one partial output activation is determined by one input channel.

In contrast, FIGS. 9A and 9B show an example in which one partial output activation is determined by a plurality of input channels, that is, one input channel group.

The examples shown in FIGS. 9A and 9B can be usefully used when the number of input channels is large.

9A illustrates a convolution operation process between an input activation 710 composed of six input channels and a first output channel of a weight composed of two output channels.

The example shown in FIG. 9A is different from the example shown in FIG. 7A in that the number of input channels constituting the input activation 710 is six. In FIG. 7A, the number of input channels is three.

In the example shown in FIG. 9A , the input channels constituting the input activation 710 are grouped. In the example of FIG. 9A, input activation _[ci]=1 (711) and input activation _[ci]=2 (712) are classified as a first group (G1), and input activation _{[ci]=3 (713) and input activation [ci]=3} (713). Activation _{[ci] = 4} (714) is classified as a second group (G2), and input activation _{[ci] = 5} (715) and input activation _{[ci] = 6} (716) are classified as a third group (G3). are classified

A detailed method of determining input channels belonging to a specific group will be described later with reference to FIGS. 10 and 11 .

9A shows that two input channels belong to one group, but the present invention is not limited to this configuration, and each group may consist of one to a plurality of input channels.

Steps S110, S120, S130, S140, and S150 shown in FIG. 9A are steps S110, S120, S130, and S150 shown in FIG. 7A. (S140), and the same as step (S150).

In step S110, a set of first split data 711 to 716 obtained by splitting the input activation 710 for each input channel may be obtained. In addition, a set of second split data 7411 to 7416 may be obtained by splitting the first output channel 741 of the weight 740 for each input channel. Then, the set of first split data 711 to 716 and the set of second split data 7411 to 7416 are convolved for each input channel to obtain a set of input channel convolution data 7511 to 7516. can create

Input channel convolution data generated from input activation belonging to the xth group Gx is also considered to belong to the xth group Gx. For example, the input channel convolution data 7511 generated from the input activation 711 belonging to the first group G1 is also regarded as belonging to the first group G1.

In step S115, convolutional data of a specific group is generated by performing an element-by-element addition operation on the plurality of input channel convolution data belonging to the specific group.

For example, in FIG. 9A , an element-by-element addition operation is performed on the plurality of input

channel convolution data

7511 and 7512 belonging to the first group G1 to generate the first group convolution data 7112. Then, the second group of convolutional data 7134 and the third group of convolutional data 7156 are generated for the second group G2 and the third group G3, respectively. Here, the convolution data 7112 of the first group, the convolution data 7134 of the second group, and the convolution data 7156 of the third group may be respectively referred to as group-specific convolution data.

In step S120, the

convolution data

7112, 7134, and 7156 are expressed based on statistical values of values constituting the set of group-

specific convolution data

7112, 7134, and 7156, respectively. scale can be determined.

For example, it can be easily understood that the scale sc_co1_ci1,2 to be applied for the expression of the convolution data 7112 can be determined based on the distribution of values of the six elements constituting the first group of convolution data 7112. there is.

As such, the scales applied to the first group of convolution data 7112, the second group of convolution data 7134, and the third group of convolution data 7156 are sc_co1_ci1,2, sc_co1_ci3,4, and It can be determined as sc_co1_ci5,6. Here, sc_co1_ci1,2, sc_co1_ci3,4, and sc_co1_ci5,6 are values that can be independently determined. Accordingly, sc_co1_ci1,2, sc_co1_ci3,4, and sc_co1_ci5,6 may be the same or different.

In step S130, specific expression values of the set of

convolution data

7112, 7134, and 7156 for each group may be determined according to the determined scales sc_co1_ci1,2, sc_co1_ci3,4, and sc_co1_ci5,6.

In step S140, the first group of convolution data 7112 represented by the 'first scale (sc_co1_ci1,2)' and the 'th Intermediate data 751p may be generated by performing an addition operation on the convolutional data 7134 of the second group expressed as '2 scale(sc_co1_ci3,4)'.

In step S150, the third group of convolution data 7156 represented by 'third scale (sc_co1_ci5,6)' among the set of group-by-

group convolution data

7112, 7134, and 7156 and the intermediate Output activation 751 for the first output channel ([co] = 1) corresponding to the first output channel ([co] = 1) of the weight 740 by performing an addition operation on the data 751p can be calculated. That is, the output activation _{[co] = 1} (751) can be calculated.

7A, the third scale (sc_co1_ci5,6) is not smaller than the first scale (sc_co1_ci1,2), and the third scale (sc_co1_ci5,6) is not smaller than the second scale (sc_co1_ci3,4). It shows an example that is not.

As described above, FIG. 9A shows a modified example by applying the concept of grouping input channels to the method described in FIG. 7A.

9B illustrates a convolution operation process between the input activation 710 and the second output channel.

Steps indicated by reference numerals S210, S215, S220, S230, S240, and S250 in FIG. 9B correspond to steps indicated by reference numerals S110, S115, S120, S130, S140, and S150 in FIG. 9A, respectively.

Components indicated by reference numerals 7421 to 7426, 7521 to 7526, 7212, 7234, 7256, 752p, and 752 in FIG. 9B are reference numerals 7411 to 7416, 7511 to 7516, 7112, 7134, 7156, 751p, and components indicated by 751.

Components indicated by reference numerals sc_co2_ci1,2, sc_co2_ci3,4, and sc_co2_ci5,6 in FIG. 9B correspond to components indicated by reference numerals sc_co1_ci1,2, sc_co1_ci3,4, and sc_co1_ci5,6 in FIG. 9A, respectively.

Output activation 750 may be generated by combining the output activation _[co]=1 (751) and the output activation _[co]=2 (752) shown in FIGS. 9A and 9B, respectively.

For each of the set of input channel convolution data 7511 to 7516 presented in FIG. 9A , the computing device may calculate statistical values of elements constituting the corresponding input channel convolution data.

For example, the first range (rg_co1_ci1) of the first input channel convolution data 7511 may be determined based on the minimum and maximum values of the six elements constituting the first input channel convolution data 7511 . For example, if the minimum value of the first input channel convolution data 7511 is 1 and the maximum value is 5, the first range rg_co1_ci1 is 4, which is the difference between the maximum value and minimum value, or 1, which is the minimum value, or the first range rg_co1_ci1. It can be a maximum of 5.

Similarly, the second input channel convolution data 7512, the third input channel convolution data 7513, the fourth input channel convolution data 7514, the fifth input channel convolution data 7515, and the sixth input For the channel convolution data 7516, a second range (rg_co1_ci2), a third range (rg_co1_ci3), a fourth range (rg_co1_ci4), a fifth range (rg_co1_ci5), and a sixth range (rg_co1_ci6) may be determined. .

Above, an example of using the minimum and maximum values among the 6 elements constituting the first input channel convolution data 7511 as a standard has been presented, but in order to determine the first rate, various information obtained from the 6 elements can be obtained. Other statistical parameters may be used.

The computing device includes a set of input channel convolution data 7511 to 7516 based on the values of the ranges rg_co1_ci1 to rg_co1_ci6 or input channels 711 to 716 constituting the input activation 710 can be grouped.

For example, when the first range (rg_co1_ci1), the second range (rg_co1_ci2), the third range (rg_co1_ci3), and the fourth range (rg_co1_ci4) are 4, 5, 400, and 500, respectively, the first range (rg_co1_ci1); The second range (rg_co1_ci2) may be grouped into the first group, and the third range (rg_co1_ci3) and the fourth range (rg_co1_ci4) may be grouped into the second group. The first group G1, the second group G2, and the third group G3 shown in FIG. 9 may be determined through this process.

FIG. 11 is a modified embodiment from FIG. 10, and instead of a set of input channel convolution data 7511 to 7516 as a criterion for calculating the statistical value, a set of split data 7411 to 7416 presented in FIG. 9A is used. use

For example, the first range (rg_w_co1_ci1), the second range (rg_w_co1_ci2), and the third range (rg_w_co1_ci2) for each of the split data (7411 to 7416) based on the minimum and maximum values of the four elements constituting each of the split data (7411 to 7416). A range (rg_w_co1_ci3), a fourth range (rg_w_co1_ci4), a fifth range (rg_w_co1_ci5), and a sixth range (rg_w_co1_ci6) may be determined.

The computing device may group the split data 7411 to 7416 or the input channels 711 to 716 constituting the input activation 710 based on the values of the ranges rg_w_co1_ci1 to rg_w_co1_ci6. there is. The first group G1, the second group G2, and the third group G3 shown in FIG. 9A may be determined through this process.

The method presented in FIGS. 10 and 11 can also be applied to FIG. 9B.

12 is a method of integrating and performing the methods presented in FIGS. 9A and 9B.

12 illustrates a convolution operation process between an input activation 710 composed of six input channels 711 to 716 and a weight composed of two output channels. The first output channel includes 6 input channels 7411 to 7416, and the second output channel includes 6 input channels 7421 to 7426.

In the example shown in FIG. 12, the input channels constituting the input activation 710 are grouped. In the example of FIG. 12, input activation _[ci]=1 (711) and input activation _[ci]=2 (712) are classified as a first group (G1), and input activation _{[ci]=3 (713) and input activation [ci]=3} (713). Activation _{[ci] = 4} (714) is classified as a second group (G2), and input activation _{[ci] = 5} (715) and input activation _{[ci] = 6} (716) are classified as a third group (G3). are classified

A detailed method of determining input channels belonging to a specific group is as described above.

Steps S310, step 315, step S320, step S330, step S340, and step S350 shown in FIG. 12 are steps S110, step S115, and step S115 shown in FIG. 9A. It corresponds to (S120), step S130, step S140, and step S150.

In step S310, a set of first split data 711 to 716 obtained by splitting the input activation 710 for each input channel may be obtained. In addition, a set of second split data 7411 to 7416 and 7421 to 7426 may be obtained by splitting the

output channels

741 and 742 of the weight 740 for each input channel. The set of second split data 7411 to 7416 and 7421 to 7426 include split data 7411 to 7416 corresponding to the first output channels of the weights and split data 7421 to 7426 corresponding to the second output channels of the weights. ) is composed of

In addition, the set of first split data 711 to 716 and the set of second split data 7411 to 7416 and 7421 to 7426 are convolved for each input channel to obtain a set of input channel convolution data ( 7501 to 7506) can be created.

The input channel convolution data corresponding to each input channel is composed of convolution data corresponding to a first output channel of weights and convolution data corresponding to a second output channel of weights. For example, the input channel convolution data 7501 corresponding to the first input channel is composed of convolution data 7511 corresponding to the first output channel of weights and convolution data 7521 corresponding to the second output channel of weights. do. Here, convolution data 7511 is calculated by a convolution operation between split data 711 and split data 7411, and convolution data 7521 is convolution between split data 711 and split data 7421. It is calculated by calculation.

Input channel convolution data generated from input activation belonging to the x group (Gx) is also considered to belong to the xth group (Gx).

In step S315, convolutional data of a specific group is generated by performing an element-by-element addition operation on the plurality of input channel convolution data belonging to the specific group.

For example, in FIG. 12 , the first group of convolutional data 7012 may be generated by performing an element-by-element addition operation on the plurality of input

channel convolutional data

7501 and 7502 belonging to the first group G1. Here, among the first group of convolutional data 7012, the first output channel data 7112 is calculated by performing an element-by-element addition operation between the input channel convolutional data 7511 and the input channel convolutional data 7512. Among the first group of convolutional data 7012, the second output channel data 7212 is calculated by performing an element-by-element addition operation between the convolutional data 7521 and the convolutional data 7522.

In step S320, a scale representing the convolution data for each group may be determined based on statistical values of values constituting the

convolution data

7012, 7034, and 7056 for each group.

For example, the scale sc_co_ci1,2 to be applied for the expression of the first group of convolutional data 7012 can be determined based on the distribution of values of 12 elements constituting the first group of convolutional data 7012. can be easily understood.

As such, the scales applied to the first group of convolution data 7012, the second group of convolution data 7034, and the third group of convolution data 7056 are sc_co_ci1,2, sc_co_ci3,4, and It can be determined as sc_co_ci5,6. Here, sc_co_ci1,2, sc_co_ci3,4, and sc_co_ci5,6 are values that can be independently determined.

In step S340, among the set of

convolution data

7012, 7034, and 7056, first convolution data 7012 expressed as 'first scale sc_co_ci1,2' and 'second scale sc_co_ci3' ,4)' to generate the intermediate data 750p by performing an addition operation on the second convolutional data 7034 .

Among the intermediate data 750p, the first output channel portion 751p is calculated by adding the first output channel portion of the first convolution data 7012 and the first output channel portion of the second convolution data 7034 for each element. can Similarly, the second output channel portion 752p of the intermediate data 750p is obtained by adding the second output channel portion of the first convolution data 7012 and the second output channel portion of the second convolution data 7034 for each element. can be derived.

In step S350, the third convolution data 7056 expressed as 'third scale (sc_co_ci5,6)' and the intermediate data 750p among the set of

convolution data

7012, 7034, and 7056 The output activation 750 may be calculated by performing an addition operation on .

The first output channel portion 751 of the output activation 750 is calculated by adding the first output channel portion 751p of the intermediate data 750p and the first output channel portion of the third convolution data 7056 for each element. It can be. Similarly, the second output channel portion 752 of the output activation 750 converts the second output channel portion 752p of the intermediate data 750p and the second output channel portion of the third convolution data 7056 for each element. can be calculated in addition

12, the third scale (sc_co_ci5,6) is not smaller than the first scale (sc_co_ci1,2), and the third scale (sc_co_ci5,6) is not smaller than the second scale (sc_co0_ci3,4). It shows an example that is not.

In a preferred embodiment, step S340 may be performed before step S350.

For each of the set of input channel convolution data 7501 to 7506 presented in FIG. 13 , the computing device may calculate statistical values of elements constituting the corresponding input channel convolution data. Based on the statistical value, ranges rg_co_ci1 to rg_co_ci6 may be determined for each of the input channel convolution data 7501 to 7506 .

The computing device includes a set of input channel convolution data 7501 to 7506 based on values of the ranges rg_co_ci1 to rg_co_ci6 or input channels 711 to 716 constituting the input activation 710 can be grouped. The first group G1, the second group G2, and the third group G3 shown in FIG. 12 may be determined through this process.

14 is a modified embodiment from FIG. 13, and instead of a set of input channel convolution data (7501 to 7506) as a criterion for calculating the statistical value, a set of split data (7411 to 7416, 7421 shown in FIG. 12) ~ 7426), it is possible to determine ranges (rg_w_co_ci1 to rg_w_co_ci6) for each split data.

The computing device includes a set of split data 7411 to 7416 and 7421 to 7426 based on values of the ranges rg_w_co_ci1 to rg_w_co_ci6 or input channels 711 to 716 constituting the input activation 710 can be grouped. The first group G1, the second group G2, and the third group G3 shown in FIG. 12 may be determined through this process.

Hereinafter, it will be described with reference to FIGS. 9A, 12, and 15 together.

In step S410, the computing device may generate a set of convolution data by convolving the first input data and the second input data 740 for each input channel.

In step S420, the computing device may determine a scale representing each convolution data based on a statistical value of values constituting each convolution data of the set of convolution data.

In step S430, the computing device performs an addition operation on first convolution data represented by a first scale and second convolution data represented by a second scale among the set of convolution data, Intermediate data can be generated.

In step S440, after the step of generating the intermediate data, the computing device performs an addition operation on third convolution data expressed in a third scale among the set of convolution data and the intermediate data By doing so, output data can be calculated.

In the first embodiment, the first input data is the input data 711 to 716 shown in FIG. 12, the second input data is the data 7411 to 7416, 7421 to 7426 shown in FIG. A set of convolutional data may be

data

7012, 7034, and 7056 shown in FIG. 12 . And, the scale representing each convolution data may be the scales (sc_co_ci1,2, sc_co_ci3,4, and sc_co_ci5,6) shown in FIG. 12 . Also, the first convolution data, the second convolution data, and the intermediate data may be

data

7012 , 7034 , and 750p shown in FIG. 12 , respectively. Also, the third convolution data and the output data may be data 7056 and 750 shown in FIG. 12 , respectively.

In the second embodiment, the first input data is the input data 711 to 716 shown in FIG. 9A, the second input data is the data 7411 to 7416 shown in FIG. 9A, and the set of conballs The solution data may be

data

7112, 7134, and 7156 shown in FIG. 9A. And, the scale representing each of the convolution data may be the scales (sc_co1_ci1,2, sc_co1_ci3,4, and sc_c1o_ci5,6) presented in FIG. 9A. Also, the first convolution data, the second convolution data, and the intermediate data may be

data

7112, 7134, and 751p shown in FIG. 7A, respectively. Further, the third convolution data and the output data may be

data

7156 and 751 presented in FIG. 9A, respectively.

In the third embodiment, the first input data is the input data 711 to 713 shown in FIG. 7A, the second input data is the data 7411 to 7413 shown in FIG. 7A, and the set of conballs The solution data may be data 7511 to 7513 presented in FIG. 7A. And, the scale representing each convolution data may be the scale (sc_co1_ci1 to sc_co1_ci3) presented in FIG. 7A. The first convolution data, the second convolution data, and the intermediate data may be

data

7511, 7512, and 751p presented in FIG. 7A, respectively. Further, the third convolution data and the output data may be

data

7513 and 751 presented in FIG. 7A, respectively.

At this time, in the step S410, a set of first split data obtained by splitting the first input data for each input channel and a set of output channels obtained by splitting a set of output channels of the second input data for each input channel generating the set of convolution data by convolving second split data for each input channel (S411); And the calculating of the output data may include calculating output data corresponding to the output channel of the one set of the second input data by performing an addition operation on the third convolution data and the intermediate data ( S412) may be included.

In the first embodiment, the first split data set, the second split data set, and the output data corresponding to the output channel set of the second input data are respectively shown in FIG. 12. It may be data 711 to 716 , data 7411 to 7416 , 7421 to 7426 , and data 750 .

In the second embodiment, the first split data set, the second split data set, and the output data corresponding to the output channel set of the second input data are respectively shown in FIG. 9A. It may be data 711 to 716 , data 7411 to 7416 , and data 751 .

In the third embodiment, the first split data set, the second split data set, and the output data corresponding to the output channel set of the second input data are respectively shown in FIG. 7A. It may be data 711 to 713 , data 7411 to 7413 , and data 751 .

Alternatively, in the step S410, the first split data of the one set and the second split data of the one set are convolved for each input channel to obtain a set of input channel convolution data corresponding to each input channel. Generating input channel convolution data of (S413); and generating the one set of convolution data by grouping the one set of input channel convolution data (S414).

In the first embodiment, the set of input channel convolution data may be data 7501 to 7506 shown in FIG. 12 .

In the second embodiment, the set of input channel convolution data may be data 7511 to 7516 shown in FIG. 9A.

At this time, in order to determine the group, calculating, by the computing device, a range of values of elements constituting each of the second split data to determine a set of ranges (rg_w_co_ci1 to rg_w_co_ci6); and grouping, by the computing device, the one set of input channel convolution data based on the one set of ranges. In this case, the set of ranges may be the ranges (rg_w_co1_ci1 to rg_w_co1_ci6) shown in FIG. 11 or the ranges (rg_w_co_ci1 to rg_w_co_ci6) shown in FIG. 14 .

Alternatively, determining a set of ranges (rg_co_ci1 to rg_co_ci6) by calculating, by the computing device, a range of values of elements constituting each of the input channel convolution data to determine the group; and grouping, by the computing device, the one set of input channel convolution data based on the one set of ranges. In this case, the set of ranges may be the ranges (rg_co1_ci1 to rg_co1_ci6) shown in FIG. 10 or the ranges (rg_co_ci1 to rg_co_ci6) shown in FIG. 13 .

In this case, the step of generating the one set of convolution data may include convolution of the first split data of the one set and the second split data of the one set for each input channel to obtain an input channel convolution corresponding to each input channel. It may include generating a set of input channel convolution data consisting of data. Further, each of the convolution data may be the same as one input channel convolution data among the set of input channel convolution data.

Using the above-described embodiments of the present invention, those belonging to the technical field of the present invention will be able to easily implement various changes and modifications without departing from the essential characteristics of the present invention. The content of each claim of the claims may be combined with other claims without reference relationship within the scope understandable through this specification.

<Sasa-Acknowledgement>

The present invention is a combination of next-generation intelligent semiconductor technology development (design)-artificial intelligence processor business, which is a research project supported by Open Edge Technology Co., Ltd. (project performing organization) and the Ministry of Science and ICT and the National Research Foundation of Korea Information and Communication Planning and Evaluation Institute. It was developed in the process of carrying out the research project development of a sensory-based context predictive mobile artificial intelligence processor (task number 2020001310, task number 2020-0-01310, research period 2020.04.01 ~ 2024.12.31).

Claims

generating, by a computing device, a set of convolution data by performing a convolution on the first input data and the second input data for each input channel;

determining, by the computing device, a scale representing each convolution data of the set of convolution data based on a statistical value of values constituting each convolution data of the set of convolution data;

Generating, by the computing device, intermediate data by performing an addition operation on first convolution data represented by a first scale and second convolution data represented by a second scale among the set of convolution data ; and

The computing device, after generating the intermediate data, performs an addition operation on the intermediate data and third convolution data expressed in a third scale among the set of convolution data to obtain output data calculating;

Including,

The third scale is not smaller than the first scale, and the third scale is not smaller than the second scale.

calculation method.
According to claim 1,

The step of generating the set of convolutional data,

A set of first split data obtained by splitting the first input data for each input channel and a set of second split data obtained by splitting a set of output channels of the second input data for each input channel are generating the set of convolution data by convolution; and

The calculating of the output data may include calculating output data corresponding to the set of output channels of the second input data by performing an addition operation on the third convolution data and the intermediate data;

including,

calculation method.
According to claim 2,

The set of output channels is one specific output channel among a plurality of output channels constituting the second input data,

Output data corresponding to the set of output channels of the second input data is output data corresponding to the specific output channel,

calculation method.
According to claim 2,

The step of generating a set of convolution data by performing convolution for each input channel,

Generating a set of input channel convolution data consisting of input channel convolution data corresponding to each input channel by convolving the set of first split data and the set of second split data for each input channel step; and

generating the set of convolution data by grouping the set of input channel convolution data;

including,

calculation method.
The method of claim 4, wherein each of the convolution data is identical to one input channel convolution data of the one set of input channel convolution data, or two of the one set of input channel convolution data. An operation method calculated by performing an element-by-element addition operation on at least one input channel convolution data.
According to claim 4,

To determine the group,

determining, by the computing device, a set range by calculating a range of values of elements constituting each of the second split data; and

grouping, by the computing device, the set of input channel convolution data based on the set of ranges;

Including more,

calculation method.
According to claim 4,

To determine the group,

determining, by the computing device, a set range by calculating a range of values of elements constituting the respective input channel convolution data; and

grouping, by the computing device, the set of input channel convolution data based on the set of ranges;

Including more,

calculation method.
According to claim 1,

The generating of the set of convolution data may include convolution of the set of first split data and the set of second split data for each input channel to obtain input channel convolution data corresponding to each input channel. Generating a set of input channel convolution data consisting of; including,

Each of the convolution data is the same as one input channel convolution data among the set of input channel convolution data,

calculation method.
According to claim 8,

The computing device includes the steps of generating, determining, generating the intermediate data, and calculating the set of convolution data for all output channels included in the second input data. is meant to run,

The computing device is configured to generate output data including all output channels by combining output data for each channel generated for each output channel included in the second input data.

calculation method.
According to claim 1,

The first input data is input activation,

The second input data is a weight,

the output data is an output activation;

The dimension of the weight is greater than the dimension of the input activation,

calculation method.
According to claim 10,

The input activation includes a plurality of first input channel data, each of the first input channel data is a two-dimensional array,

The weights include a plurality of output channel data, each of the output channel data includes a plurality of second input channel data, each of the second input channel data is a two-dimensional array,

calculation method.
A computing device having a hardware accelerator,

The hardware accelerator,

It is configured to obtain first input data and second input data,

Convolution of the first input data and the second input data for each input channel to generate a set of convolution data;

To determine a scale representing each convolution data based on a statistical value of values constituting each convolution data of the set of convolution data;

To generate intermediate data by performing an addition operation on first convolution data expressed in a first scale and second convolution data expressed in a second scale among the set of convolution data, and

After the step of generating the intermediate data, an addition operation is performed on the intermediate data and third convolution data expressed in a third scale among the set of convolution data to calculate output data,

The third scale is not smaller than the first scale, and the third scale is not smaller than the second scale.

computing device.
According to claim 12,

The generating of the set of convolution data may include splitting a set of first split data obtained by splitting the first input data for each input channel and a set of output channels of the second input data for each input channel. generating a set of convolution data by convolving the obtained set of second split data for each input channel; And the calculating of the output data may include calculating output data corresponding to the set of output channels of the second input data by performing an addition operation on the third convolution data and the intermediate data; Including more,

The hardware accelerator includes an internal memory,

The size of the internal memory is smaller than the data size of the entire second input data and larger than the size of split data obtained by splitting one set of output channels of the second input data for each input channel.

computing device.