KR20230099190A

KR20230099190A - Apparatus and method for address generation of multi-dimensional tensor

Info

Publication number: KR20230099190A
Application number: KR1020210188353A
Authority: KR
Inventors: 이상헌; 이혁재
Original assignee: 서울대학교산학협력단
Priority date: 2021-12-27
Filing date: 2021-12-27
Publication date: 2023-07-04

Abstract

A device for generating an address of a multi-dimensional tensor according to one embodiment comprises: an input part that input-receives information for a multi-dimensional tensor in a memory and a coordinate to access; a determination part that determines an operation required to generate an address for the coordinate, and generates a control signal corresponding to the determined operation; and an operation part that generates an address corresponding to the coordinate according to the control signal. Therefore, the present invention is capable of shortening an overall processing time.

Description

Apparatus and method for generating address of multi-dimensional tensor {APPARATUS AND METHOD FOR ADDRESS GENERATION OF MULTI-DIMENSIONAL TENSOR}

본 발명은 다차원 텐서(multi-dimensional tensor)의 주소를 생성하는 장치와 이 장치가 다차원 텐서의 주소를 생성하는 방법에 관한 것이다.The present invention relates to an apparatus for generating an address of a multi-dimensional tensor and a method for generating an address of a multi-dimensional tensor by the apparatus.

인공지능 분야에서 CNN(Convolution Neural Network)과 RNN(Recurrent Neural Network) 등의 딥러닝 기반 알고리즘은 뛰어난 성능을 보이면서 널리 사용되고 있다. 하지만, 필요한 연산량이 많다는 단점이 존재하기 때문에 이를 가속하기 위한 전용 하드웨어들이 필요하다. 특히 자율주행 자동차 등 실 생활에서의 응용에서는 딥러닝 추론 동작의 처리 속도가 중요하므로, 딥러닝 추론 동작을 위한 전용 하드웨어들은 처리 속도를 높이는 것에 초점을 맞추고 있다.In the field of artificial intelligence, deep learning-based algorithms such as CNN (Convolution Neural Network) and RNN (Recurrent Neural Network) are widely used with excellent performance. However, since there is a disadvantage that the amount of computation required is large, dedicated hardware is required to accelerate it. In particular, since the processing speed of deep learning inference operation is important in real life applications such as self-driving cars, dedicated hardware for deep learning inference operation focuses on increasing the processing speed.

일반적으로 딥러닝 전용 하드웨어들은 행렬 연산을 위한 시스톨릭 어레이와 벡터 연산을 위한 벡터 프로세서로 구성된다. 한편 딥러닝 알고리즘은 필요한 연산량이 많은 만큼 연산을 위한 데이터의 메모리 접근 또한 빈번하게 일어난다. 이를 위해 시스톨릭 어레이는 연산에 필요한 데이터를 메모리에 접근해서 입력하는 장치가 따로 존재하지만, 벡터 프로세서는 내부 자원을 이용해 메모리 주소를 연산해야 하고, 이는 처리 시간을 증가시키는 원인이 될 수 있다.In general, hardware dedicated to deep learning consists of a systolic array for matrix operation and a vector processor for vector operation. On the other hand, as the amount of computation required for deep learning algorithms is large, memory access to data for computation also occurs frequently. To this end, a systolic array has a separate device for accessing and inputting data necessary for operation into memory, but a vector processor needs to calculate a memory address using internal resources, which can cause an increase in processing time.

대한민국 공개특허공보 제10-2019-0113973호, 공개일자 2019년 10월 08일.Republic of Korea Patent Publication No. 10-2019-0113973, published on October 08, 2019.

일 실시예에 따르면, 메모리 상의 다차원 텐서에 대한 좌표에 대응하는 주소를 생성하는 다차원 텐서의 주소 생성 장치 및 방법을 제공한다.According to an embodiment, an apparatus and method for generating an address of a multidimensional tensor generating an address corresponding to coordinates of a multidimensional tensor in a memory are provided.

본 발명의 해결하고자 하는 과제는 이상에서 언급한 것으로 제한되지 않으며, 언급되지 않은 또 다른 해결하고자 하는 과제는 아래의 기재로부터 본 발명이 속하는 통상의 지식을 가진 자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to those mentioned above, and another problem to be solved that is not mentioned will be clearly understood by those skilled in the art from the description below.

제 1 관점에 따른 다차원 텐서의 주소 생성 장치는, 메모리 상의 다차원 텐서에 대한 좌표를 입력 받는 입력부와, 상기 좌표에 대한 주소 생성에 필요한 연산을 판별하고, 상기 판별된 연산에 대응하는 제어 신호를 생성하는 판별부와, 상기 제어 신호에 따라 상기 좌표에 대응하는 주소를 생성하는 연산부를 포함한다.An apparatus for generating an address of a multidimensional tensor according to a first aspect includes an input unit that receives coordinates of a multidimensional tensor in a memory, an operation required to generate an address for the coordinates, and a control signal corresponding to the determined operation. and a calculating unit generating an address corresponding to the coordinates according to the control signal.

제 2 관점에 따른 다차원 텐서의 주소 생성 장치가 수행하는 다차원 텐서 주소 생성 방법은, 메모리 상의 다차원 텐서에 대한 좌표를 입력 받는 단계와, 상기 좌표에 대한 주소 생성에 필요한 연산을 판별하고, 상기 판별된 연산에 따라 상기 좌표에 대응하는 주소를 생성하는 단계를 포함한다.A method for generating a multidimensional tensor address performed by an address generating apparatus for a multidimensional tensor according to a second aspect includes receiving coordinates of a multidimensional tensor in a memory, determining an operation necessary for generating an address for the coordinates, and determining the determined and generating an address corresponding to the coordinates according to the operation.

제 3 관점에 따라, 컴퓨터 판독 가능 기록매체에 저장된 컴퓨터 프로그램은, 상기 컴퓨터 프로그램이, 프로세서에 의해 실행되면, 상기 다차원 텐서의 주소 생성 방법을 상기 프로세서가 수행하도록 하기 위한 명령어를 포함한다.According to the third aspect, a computer program stored on a computer readable recording medium includes instructions for causing the processor to perform the method of generating an address of the multidimensional tensor when the computer program is executed by the processor.

일 실시예에 따르면, 메모리 상의 다차원 텐서에 대한 좌표에 대응하는 주소를 생성해 제공함으로써, 프로세서에서 메모리 주소를 연산하지 않고 텐서의 좌표를 이용할 수 있도록 한다. 나아가, 메모리 접근 패턴에 따라 주소 연산 결과를 재사용하거나 일부 연산만을 수행함으로써 메모리 접근의 처리 시간을 단축하고 연산에 필요한 에너지 소비를 줄일 수 있다.According to an embodiment, by generating and providing an address corresponding to coordinates of a multidimensional tensor in memory, a processor can use the coordinates of a tensor without calculating a memory address. Further, by reusing address operation results or performing only partial operations according to memory access patterns, memory access processing time and energy consumption required for operations may be reduced.

따라서, 메모리 주소 연산을 위해 필요한 시간에 다른 연산을 수행할 수 있어서 전체 처리 시간이 단축된다.Accordingly, other operations can be performed at the time required for the memory address operation, thereby reducing the overall processing time.

또한, 메모리 주소 연산에 필요한 프로세서의 레지스터를 다른 연산에 이용할 수 있게 되어 컴파일러의 스케쥴링이 용이하게 하는 효과가 있다.In addition, processor registers required for memory address operation can be used for other operations, so that the scheduling of the compiler can be facilitated.

도 1은 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치의 구성도이다.
도 2는 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 방법을 설명하기 위한 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치의 판별부에 포함될 수 있는 저장장치의 구조도이다.
도 4는 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치의 판별부의 동작을 나타낸 도면이다.
도 5는 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치의 판별부의 출력 값 결정 과정을 설명하기 위한 흐름도이다.
도 6은 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치의 연산부의 연산 동작을 나타낸 도면이다.1 is a block diagram of an apparatus for generating an address of a multi-dimensional tensor according to an embodiment of the present invention.
2 is a flowchart illustrating a method of generating an address of a multi-dimensional tensor according to an embodiment of the present invention.
3 is a structural diagram of a storage device that may be included in a determining unit of a device for generating an address of a multidimensional tensor according to an embodiment of the present invention.
4 is a diagram illustrating an operation of a determination unit of an apparatus for generating an address of a multi-dimensional tensor according to an embodiment of the present invention.
5 is a flowchart illustrating a process of determining an output value of a determination unit of an apparatus for generating an address of a multidimensional tensor according to an embodiment of the present invention.
6 is a diagram illustrating an operation operation of an operation unit of an apparatus for generating an address of a multidimensional tensor according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods of achieving them, will become clear with reference to the embodiments described below in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in various different forms, and only these embodiments make the disclosure of the present invention complete, and common knowledge in the art to which the present invention belongs. It is provided to completely inform the person who has the scope of the invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용되는 용어에 대해 간략히 설명하고, 본 발명에 대해 구체적으로 설명하기로 한다.The terms used in this specification will be briefly described, and the present invention will be described in detail.

본 발명에서 사용되는 용어는 본 발명에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 발명의 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 발명에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 발명의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present invention have been selected from general terms that are currently widely used as much as possible while considering the functions in the present invention, but these may vary depending on the intention of a person skilled in the art or precedent, the emergence of new technologies, and the like. In addition, in a specific case, there is also a term arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the description of the invention. Therefore, the term used in the present invention should be defined based on the meaning of the term and the overall content of the present invention, not simply the name of the term.

명세서 전체에서 어떤 부분이 어떤 구성요소를 '포함'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. When it is said that a certain part 'includes' a certain element in the entire specification, it means that other elements may be further included without excluding other elements unless otherwise stated.

또한, 명세서에서 사용되는 '부'라는 용어는 소프트웨어 또는 FPGA나 ASIC과 같은 하드웨어 구성요소를 의미하며, '부'는 어떤 역할들을 수행한다. 그렇지만 '부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니다. '부'는 어드레싱할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로 코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '부'들로 결합되거나 추가적인 구성요소들과 '부'들로 더 분리될 수 있다.In addition, the term 'unit' used in the specification means software or a hardware component such as FPGA or ASIC, and 'unit' performs certain roles. However, 'part' is not limited to software or hardware. A 'unit' may be configured to reside in an addressable storage medium and may be configured to reproduce one or more processors. Thus, as an example, 'unit' refers to components such as software components, object-oriented software components, class components and task components, processes, functions, properties, procedures, subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. The functionality provided within the components and 'parts' may be combined into a smaller number of elements and 'parts' or further separated into additional elements and 'parts'.

아래에서는 첨부한 도면을 참고하여 본 발명의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략한다.Hereinafter, with reference to the accompanying drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily carry out the present invention. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted.

도 1은 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치의 구성도이다.1 is a block diagram of an apparatus for generating an address of a multi-dimensional tensor according to an embodiment of the present invention.

도 1을 참조하면, 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)는 입력부(110), 판별부(120) 및 연산부(130)를 포함하고, 출력부(140)를 더 포함할 수 있다. 여기서, 판별부(120) 및/또는 연산부(130)는 마이크로프로세서 또는 CPU(central processing unit) 등과 같은 컴퓨팅 연산수단을 포함할 수 있다.Referring to FIG. 1 , an apparatus 100 for generating an address of a multidimensional tensor according to an embodiment includes an input unit 110, a determination unit 120, and an operation unit 130, and may further include an output unit 140. there is. Here, the determining unit 120 and/or the calculating unit 130 may include a computing means such as a microprocessor or a central processing unit (CPU).

입력부(110)는 메모리 상의 다차원 텐서에 대한 좌표를 입력 받는다. 예를 들어, 입력부(110)는 프로세서(10) 혹은 프로세서(10)를 구동하는 상위 장치로부터 다차원 텐서에 대한 정보를 전달받을 수 있고, 다차원 텐서에 접근이 필요한 프로세서(10)로부터 다차원 텐서에 대한 좌표를 입력 받고, 입력된 좌표 정보를 판별부(120)에 제공할 수 있다. 예컨대, 입력부(110)는 프로세서(10)의 프로그램이 시작하는 초기 또는 프로그램 상에서 다차원 텐서에 처음 접근하기 전에, 다차원 텐서에 대한 정보를 입력 받아 판별부(120)에 제공할 수 있고, 다차원 텐서에 접근 시에는 프로세서(10)에서 입력 받은 좌표를 판별부(120)에 제공할 수 있다.The input unit 110 receives coordinates of multi-dimensional tensors in memory. For example, the input unit 110 may receive information about the multidimensional tensor from the processor 10 or a higher-order device driving the processor 10, and may receive information about the multidimensional tensor from the processor 10 requiring access to the multidimensional tensor. Coordinates may be input, and inputted coordinate information may be provided to the determining unit 120 . For example, the input unit 110 may receive information about the multidimensional tensor and provide it to the determination unit 120 at the beginning of the program of the processor 10 or before accessing the multidimensional tensor in the program for the first time. Upon access, the coordinates received from the processor 10 may be provided to the determining unit 120 .

판별부(120)는 입력부(110)로부터 전달받은 좌표에 대한 주소 생성에 필요한 연산을 판별하고, 판별된 연산에 대응하는 제어 신호를 생성하여 연산부(130)에 제공한다. 예를 들어, 판별부(120)는 입력부(110)가 입력 받은 좌표와 연산부(130)가 생성한 주소를 기존 연산 결과로서 저장장치(도시 생략됨)에 저장할 수 있고, 이 저장장치에 다차원 텐서에 대한 정보를 저장할 수 있다. 그리고, 판별부(120)는 좌표에 대한 주소 생성에 필요한 연산을 판별할 때에, 기존 연산 결과와 비교하여 좌표 변화가 있는 차원에 대해서 필요한 연산으로 판별할 수 있다. 예컨대, 다차원 텐서에 대한 정보는 텐서 각 차원의 크기, 스트라이드 및 주소 오프셋을 포함할 수 있고, 판별부(120)는 판별된 연산에 따라 제어 신호와 텐서 차원 수만큼의 유효 좌표, 주소 오프셋 및 스트라이드를 연산부(130)에 제공할 수 있다. 그리고, 판별부(120)는 좌표에 대한 주소 생성에 필요한 연산을 판별할 때에, 기존 연산 결과의 적어도 일부를 재사용할 수 있을 경우에 출력 오프셋에 기존 연산 결과를 포함시킬 수 있다. 그리고, 판별부(120)는 다차원 텐서의 임의 차원에서 좌표에 따라 주소의 불연속 구간이 존재할 경우, 각 불연속 구간을 저장장치의 여러 항목에 나누어 저장하고, 이후 좌표가 입력되면 해당 차원의 좌표가 속하는 불연속 구간이 할당된 텐서 정보를 연산하여 해당 불연속 구간을 저장장치로부터 가져올 수 있다. 또한, 판별부(120)는 연산부(130)가 프로세서(10)의 요청 순서와 다른 순서로 주소를 생성하도록 제어 신호를 생성할 수 있다.The determination unit 120 determines an operation necessary for generating an address for the coordinates received from the input unit 110, generates a control signal corresponding to the determined operation, and provides the generated control signal to the operation unit 130. For example, the determination unit 120 may store the coordinates received by the input unit 110 and the address generated by the operation unit 130 in a storage device (not shown) as an existing operation result. information can be stored. In addition, when determining an operation necessary for generating an address for coordinates, the determining unit 120 may determine an operation necessary for a dimension having a coordinate change by comparing with an existing operation result. For example, the information on the multi-dimensional tensor may include the size, stride, and address offset of each dimension of the tensor, and the determination unit 120 determines effective coordinates, address offset, and stride as many as the number of dimensions of the tensor and a control signal according to the determined operation. may be provided to the calculation unit 130. In addition, when determining an operation required to generate an address for coordinates, the determination unit 120 may include the previous operation result in the output offset when at least a part of the previous operation result can be reused. In addition, if there is a discontinuous section of the address according to the coordinates in any dimension of the multidimensional tensor, the discrimination unit 120 divides and stores each discontinuous section in several items of the storage device, and then, when the coordinates are input, the coordinates of the corresponding dimension belong The discontinuous section may be retrieved from the storage device by calculating tensor information to which the discontinuous section is allocated. Also, the determining unit 120 may generate a control signal so that the calculating unit 130 generates addresses in a different order from the request order of the processor 10 .

연산부(130)는 판별부(120)에 의해 생성된 제어 신호에 따라 다차원 텐서에 대한 좌표에 대응하는 주소를 생성한다. 그리고, 연산부(130)는 출력 오프셋을 통해 전달된 기존 연산 결과 중 적어도 일부를 다차원 텐서에 대한 좌표에 대응하The operation unit 130 generates an address corresponding to the coordinates of the multidimensional tensor according to the control signal generated by the determination unit 120 . The calculation unit 130 converts at least some of the previous calculation results transmitted through the output offset into coordinates of the multidimensional tensor.

는 주소의 연산에 포함시킬 수 있다.can be included in the calculation of the address.

출력부(140)는 연산부(130)에 의해 생성된 주소를 이용해 메모리(20)에 접근하여, 메모리(20)에 의해 해당 주소의 데이터가 프로세서(10)에 전달되도록 한다.The output unit 140 accesses the memory 20 using the address generated by the arithmetic unit 130 and transmits data of the corresponding address to the processor 10 by the memory 20 .

도 2는 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 방법을 설명하기 위한 흐름도이고, 도 3은 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)의 판별부(120)에 포함될 수 있는 저장장치의 구조도이며, 도 4는 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)의 판별부(120)의 동작을 나타낸 도면이고, 도 5는 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)의 판별부(120)의 출력 값 결정 과정을 설명하기 위한 흐름도이며, 도 6은 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)의 연산부(130)의 연산 동작을 나타낸 도면이다.2 is a flowchart illustrating a method for generating an address of a multidimensional tensor according to an embodiment of the present invention, and FIG. 3 is a determination unit 120 of the apparatus 100 for generating a multidimensional tensor address according to an embodiment 4 is a diagram showing the operation of the determining unit 120 of the apparatus 100 for generating addresses of multi-dimensional tensors according to an embodiment of the present invention, and FIG. A flowchart illustrating a process of determining an output value of the determination unit 120 of the multidimensional tensor address generating apparatus 100 according to an embodiment. FIG. 6 is a multidimensional tensor address generating apparatus 100 according to an embodiment of the present invention. It is a diagram showing the calculation operation of the calculation unit 130 of ).

이하, 도 1 내지 도 6을 참조하여 본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)가 수행하는 다차원 텐서의 주소 생성 방법에 대해 자세히 살펴보기로 한다.Hereinafter, with reference to FIGS. 1 to 6 , a method of generating an address of a multidimensional tensor performed by the apparatus 100 for generating an address of a multidimensional tensor according to an embodiment of the present invention will be described in detail.

영상 처리는 합성곱 신경망과 같은 인공지능 알고리즘의 중요한 응용 중 하나이다. 이때 입력된 영상이 합성곱 신경망의 각 층(layer)을 통과하면 특징 맵(feature map) 형태로 다음 층에 전달된다. 입력 영상과 특징 맵은 너비, 높이, 채널로 구성되어 메모리상에 저장되므로 3차원 텐서(tensor)로 표현될 수 있다. Image processing is one of the important applications of artificial intelligence algorithms such as convolutional neural networks. At this time, when the input image passes through each layer of the convolutional neural network, it is transmitted to the next layer in the form of a feature map. Since the input image and feature map are composed of width, height, and channel and stored in memory, they can be expressed as a 3D tensor.

벡터 프로세서에서는 특정 너비, 높이, 채널로 구성된 좌표의 데이터에 접근하여 필요한 연산을 수행한다. 이때 각 좌표에 해당하는 데이터의 메모리 주소를 연산하기 위해서는, 텐서의 정보가 필요하다. 여기서 필요한 텐서의 정보는 텐서 각 차원의 크기 (너비, 높이, 채널)과 각 차원의 좌표가 1씩 변할 때마다 메모리 주소가 변하는 양을 나타내는 스트라이드, 그리고 텐서의 시작 주소인 주소 오프셋이 있다.A vector processor performs necessary operations by accessing coordinate data consisting of a specific width, height, and channel. At this time, in order to calculate the memory address of data corresponding to each coordinate, tensor information is required. The tensor information required here includes the size of each dimension of the tensor (width, height, channel), the stride indicating the amount by which the memory address changes whenever the coordinates of each dimension change by 1, and the address offset, which is the starting address of the tensor.

이 정보를 이용해서 3차원 텐서의 주소를 연산하기 위해서는 3번의 곱셈(각 차원의 좌표와 스트라이드)과 3번의 덧셈 연산(각 차원의 결과와 오프셋)이 요구된다. 이때 연속된 좌표에 접근할 경우에는 프로세서에서 실행할 코드를 생성해주는 컴파일러에서 불필요한 연산을 일부 제거함으로써 연산량이 줄어들지만, 텐서의 특정 범위에서 최댓값만을 추출하는 풀링(pooling) 등 간단한 신경망 층에서는 주소 연산에 상대적으로 많은 시간이 소요되고, 이는 전체 신경망의 처리시간을 증가시키는 원인이 될 수 있다.To calculate the address of a 3D tensor using this information, 3 multiplications (coordinates and strides in each dimension) and 3 addition operations (results and offsets in each dimension) are required. At this time, when accessing continuous coordinates, the amount of computation is reduced by removing some unnecessary operations from the compiler that generates the code to be executed by the processor. It takes a relatively large amount of time, which may cause an increase in the processing time of the entire neural network.

본 발명의 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)는 이러한 문제를 해결하기 위하여 프로세서(10)에서 주소 연산 없이 텐서의 좌표만을 이용해 메모리(20)에 접근하더라도 동일한 결과를 획득할 수 있게 한다.In order to solve this problem, the multidimensional tensor address generator 100 according to an embodiment of the present invention can obtain the same result even if the processor 10 accesses the memory 20 using only the coordinates of the tensor without performing an address operation. let it be

먼저, 프로세서(10) 혹은 프로세서(10)를 구동하는 상위 장치는 다차원 텐서에 대한 정보를 다차원 텐서의 주소 생성 장치(100)에 제공하고, 다차원 텐서의 주소 생성 장치(100)의 입력부(110)는 다차원 텐서에 대한 정보를 입력 받아서 판별부(120)에 제공한다. 이처럼, 다차원 텐서에 대한 정보를 입력 받는 것은 프로세서(10)의 프로그램이 시작하는 초기 또는 프로그램 상에서 다차원 텐서에 처음 접근하기 전에 이루어질 수 있다. 여기서 텐서에 대한 정보는 주소 연산을 위해 필요한 텐서 각 차원의 크기, 스트라이드, 그리고 주소 오프셋으로 구성된 다차원 텐서의 정보일 수 있다(S210).First, the processor 10 or a higher-order device driving the processor 10 provides information about the multidimensional tensor to the multidimensional tensor address generating device 100, and the input unit 110 of the multidimensional tensor address generating device 100 receives information about the multi-dimensional tensor and provides it to the discrimination unit 120. In this way, receiving information on the multidimensional tensor may be performed at the beginning of the program of the processor 10 or before accessing the multidimensional tensor for the first time in the program. Here, the information on the tensor may be information of a multi-dimensional tensor composed of the size, stride, and address offset of each dimension of the tensor necessary for address calculation (S210).

그리고, 입력부(110)는 다차원 텐서에 접근이 필요한 프로세서(10)로부터 다차원 텐서에 대한 좌표를 입력 받고, 입력된 좌표 정보를 판별부(120)에 제공한다. 여기서, 판별부(120)로 전달되는 다차원 텐서에 대한 좌표 정보에는 주소 연산이 필요한 텐서의 번호도 포함될 수 있다(S220).In addition, the input unit 110 receives coordinates of the multidimensional tensor from the processor 10 that requires access to the multidimensional tensor, and provides the input coordinate information to the determination unit 120 . Here, the coordinate information of the multidimensional tensor transmitted to the determination unit 120 may also include a tensor number for which an address operation is required (S220).

판별부(120)는 필요한 연산을 판별하기 위해서 기존의 입력 좌표와 주소 연산 결과를 저장하는 저장장치(302), 그리고 주소 연산에 필요한 텐서 정보를 저장하는 텐서 정보 저장장치(301)를 포함할 수 있다. 예를 들어, 하나의 저장장치에 기존의 입력 좌표와 주소 연산 결과 및 주소 연산에 필요한 텐서 정보가 함께 저장될 수 있다. 여기서, 저장할 수 있는 텐서의 정보의 개수와 각 텐서에 대한 기존 연산 결과의 수는 시스템의 필요에 맞게 설정될 수 있다.The determination unit 120 may include a storage device 302 for storing existing input coordinates and address calculation results in order to determine necessary operations, and a tensor information storage device 301 for storing tensor information necessary for address calculations. there is. For example, existing input coordinates, address operation results, and tensor information required for address operation may be stored together in one storage device. Here, the number of tensor information that can be stored and the number of previous operation results for each tensor can be set according to the needs of the system.

판별부(120)는 입력(S510)된 좌표의 주소 생성을 위해 필요한 연산을 판별하고 제어 신호를 생성하여 연산부(130)로 전달한다. 여기서, 주소 생성을 위해 필요한 연산은, 저장된 기존 좌표의 주소 연산 결과와 비교(S520)하여 좌표에 차이가 있는 차원에 대한 추가적인 연산을 의미(S530 내지 S550)할 수 있고, 해당 연산을 위한 제어 신호를 연산부(130)에 제공할 수 있다(S570). 저장된 기존 결과가 없거나 모든 차원의 좌표가 다를 경우에는 전체 연산을 수행하도록 제어 신호를 생성(S560)하여 연산부(130)에 전달할 수 있다(S570).The determining unit 120 determines an operation necessary for generating an address of the input coordinate (S510), generates a control signal, and transfers it to the calculating unit 130. Here, the operation required for address generation may refer to an additional operation for a dimension having a difference in coordinates (S530 to S550) by comparing the address operation result of the stored existing coordinates (S520), and a control signal for the corresponding operation. may be provided to the calculation unit 130 (S570). If there is no stored result or the coordinates of all dimensions are different, a control signal may be generated (S560) and transmitted to the calculation unit 130 (S570) to perform the entire operation.

이러한 판별부(120)는 제어 신호를 생성하면서 각 제어 신호에 따라 주소 연산에 소요되는 시간을 미리 계산할 수 있다. 따라서 프로세서(10)의 요청 순서에 맞게 주소 연산이 이루어지도록 제어 신호를 생성할 수 있고, 요구에 따라서는 처리 시간 단축을 위해 프로세서(10)의 요청 순서와 관계없이 주소 연산이 이루어지도록 제어 신호를 생생할 수도 있다.The determining unit 120 may pre-calculate the time required for address calculation according to each control signal while generating the control signal. Therefore, a control signal can be generated so that address calculations are performed according to the request order of the processor 10, and a control signal is generated so that address calculations are performed regardless of the request order of the processor 10 in order to reduce processing time according to the request. may come alive

여기서, 판별부(120)는 판별 결과에 따라 연산부(130)로 출력 오프셋과 제어 신호 그리고 텐서 차원 수만큼의 유효 좌표 및 스트라이드를 전달할 수 있다. 여기서, 출력되는 유효 좌표와 스트라이드는 텐서의 특정 차원에 고정된 것이 아니고, 판별 결과에 따라 연산이 필요한 차원에 대해서만 제어 신호와 함께 가변적으로 출력될 수 있다. 이때 판별 결과로 저장된 기존 연산 결과를 일부 재사용할 수 있을 경우에는 출력 오프셋에 기존 연산 결과, 그리고 유효 좌표에는 저장된 좌표와 입력 좌표의 차이가 출력될 수 있다. 만약 기존 결과를 재사용할 수 없을 경우에는 출력 오프셋에 텐서의 주소 오프셋이 출력될 수 있고, 유효 좌표에는 입력 좌표가 그대로 출력될 수 있다. 도 4는 입력된 텐서 좌표(401)에 따라 판별부(120)가 연산부(130)에 전달하는 데이터(402)를 예시한 것이다.Here, the determination unit 120 may transmit an output offset, a control signal, and effective coordinates and strides equal to the number of tensor dimensions to the operation unit 130 according to the determination result. Here, the effective coordinates and strides that are output are not fixed to a specific dimension of the tensor, but may be variably output along with a control signal only for dimensions that require calculation according to the determination result. At this time, if the existing calculation result stored as the discrimination result can be partially reused, the existing calculation result can be output to the output offset, and the difference between the stored coordinate and the input coordinate can be output to the effective coordinate. If the existing result cannot be reused, the address offset of the tensor can be output in the output offset, and the input coordinate can be output as it is in the valid coordinate. FIG. 4 illustrates data 402 transmitted from the determination unit 120 to the operation unit 130 according to the input tensor coordinates 401 .

또한, 판별부(120)는 텐서의 어떤 차원에서 좌표에 따라 주소가 불연속인 구간이 존재할 경우, 각 불연속 구간을 텐서 정보 저장장치(301)의 여러 항목에 나누어 저장할 수 있다. 이후 좌표가 입력되면 해당 차원의 좌표가 속하는 불연속 구간이 할당된 텐서 정보의 번호를 연산하여 텐서 정보 저장장치(301)로부터 가져올 수 있다. 이때 각 차원의 주소가 불연속적이라는 것은, 해당 차원의 좌표가 1증가할 때 변하는 주소 값이 해당 차원의 스트라이드 값이 아닌 경우를 의미한다. 여기서 불연속적인 구간의 개수는 시스템의 요구에 따라 미리 정해질 수 있다(S230).In addition, if there is a section in which addresses are discontinuous according to coordinates in a certain dimension of the tensor, the determination unit 120 may divide and store each discontinuous section in several items of the tensor information storage device 301. Then, when the coordinates are input, the number of tensor information to which the discontinuous section to which the coordinates of the corresponding dimension belong is calculated and retrieved from the tensor information storage device 301. At this time, the discontinuous address of each dimension means a case where the address value that changes when the coordinates of the corresponding dimension increase by 1 is not the stride value of the corresponding dimension. Here, the number of discontinuous sections may be pre-determined according to system requirements (S230).

다음으로, 연산부(130)는 판별부(120)에서 전달된 제어 신호에 따라 입력 데이터에 대한 연산을 수행한다. 예를 들어, 연산부(130)는 도 6과 같이 복수의 곱셈기(601, 602, 603), 복수의 덧셈기(604, 605, 606) 및 멀티플렉서(607)를 이용해 연산을 수행할 수 있다. 이때 수행하는 연산의 수는 제어 신호에 따라 가변적이며, 제어 신호는 필요한 연산이 끝나면 출력부(140)로 결과 주소를 전달할 수 있도록 하는 정보를 포함한다. 예를 들어, 입력 좌표에 대해 저장된 결과 주소가 있을 경우에는, 추가적인 연산 없이 출력 오프셋을 통해 전달된 기존 연산 결과를 그대로 출력부(140)로 전달할 수 있다. 또한 입력 좌표와 높이만 다른 결과 주소가 저장되어 있을 경우에는, 도 6의 유효좌표 1에 입력된 높이 차이에 해당하는 주소만 연산하여 출력 오프셋의 기존 결과와 더함으로써 주소 연산이 완료된다. 이때 기존 연산 결과의 일부 혹은 전부를 재사용할 경우에는 연산 시간이 짧아지므로 처리 속도가 빨라진다(S240).Next, the calculation unit 130 performs an operation on the input data according to the control signal transmitted from the determination unit 120 . For example, the operation unit 130 may perform an operation using a plurality of multipliers 601 , 602 , and 603 , a plurality of adders 604 , 605 , and 606 , and a multiplexer 607 as shown in FIG. 6 . At this time, the number of operations performed is variable according to a control signal, and the control signal includes information enabling a result address to be transmitted to the output unit 140 when necessary operations are completed. For example, if there is a result address stored for the input coordinates, the previous operation result transferred through the output offset may be transferred to the output unit 140 as it is without additional operation. In addition, when a resultant address differing only in height from the input coordinates is stored, the address calculation is completed by calculating only the address corresponding to the input height difference in effective coordinate 1 of FIG. 6 and adding it to the existing result of the output offset. At this time, when some or all of the existing calculation results are reused, the calculation time is shortened and the processing speed is increased (S240).

출력부(140)는 연산부(130)에 의해 생성된 주소를 이용해 메모리(20)에 접근하여, 메모리(20)에 의해 해당 주소의 데이터가 프로세서(10)에 전달되도록 한다(S250).The output unit 140 accesses the memory 20 using the address generated by the arithmetic unit 130, and transmits data of the corresponding address to the processor 10 by the memory 20 (S250).

프로세서(10)에 의한 메모리 접근이 종료되면 다차원 텐서의 주소 생성 과정 또한 종료되지만, 프로세서(10)에 의한 메모리 접근이 잔존하는 경우는 단계 S220부터 단계 S250까지 반복하여 수행될 수 있다(S260).When the memory access by the processor 10 ends, the multidimensional tensor address generation process also ends, but if the memory access by the processor 10 remains, steps S220 to S250 may be repeatedly performed (S260).

한편, 전술한 일 실시예에 따른 다차원 텐서의 주소 생성 장치(100)가 수행하는 다차원 텐서의 주소 생성 방법에 포함된 각각의 단계는, 이러한 단계를 수행하도록 하기 위한 명령어를 포함하는 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록매체에서 구현될 수 있다.Meanwhile, each step included in the method for generating an address of a multidimensional tensor performed by the apparatus 100 for generating an address of a multidimensional tensor according to the above-described embodiment records a computer program including instructions for performing these steps. It can be implemented in a computer readable recording medium.

지금까지 설명한 바와 같이, 본 발명의 일 실시예에 따른 메모리 상의 다차원 텐서에 대한 좌표에 대응하는 주소를 생성해 제공함으로써, 프로세서에서 메모리 주소를 연산하지 않고 텐서의 좌표를 이용할 수 있도록 한다. 나아가, 메모리 접근 패턴에 따라 주소 연산 결과를 재사용하거나 일부 연산만을 수행함으로써 메모리 접근의 처리 시간을 단축하고 연산에 필요한 에너지 소비를 줄일 수 있다.As described so far, by generating and providing an address corresponding to coordinates of a multidimensional tensor in memory according to an embodiment of the present invention, a processor can use the coordinates of a tensor without calculating a memory address. Further, by reusing address operation results or performing only partial operations according to memory access patterns, memory access processing time and energy consumption required for operations may be reduced.

또한, 메모리 주소 연산에 필요한 프로세서의 레지스터를 다른 연산에 이용할 수 있게 되어 컴파일러의 스케쥴링이 용이하게 한다.In addition, processor registers required for memory address operation can be used for other operations, facilitating compiler scheduling.

본 발명에 첨부된 각 흐름도의 각 단계의 조합들은 컴퓨터 프로그램 인스트럭션들에 의해 수행될 수도 있다. 이들 컴퓨터 프로그램 인스트럭션들은 범용 컴퓨터, 특수용 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서에 탑재될 수 있으므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비의 프로세서를 통해 수행되는 그 인스트럭션들이 흐름도의 각 단계에서 설명된 기능들을 수행하는 수단을 생성하게 된다. 이들 컴퓨터 프로그램 인스트럭션들은 특정 방식으로 기능을 구현하기 위해 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 기록매체에 저장되는 것도 가능하므로, 그 컴퓨터 이용가능 또는 컴퓨터 판독 가능 기록매체에 저장된 인스트럭션들은 흐름도의 각 단계에서 설명된 기능을 수행하는 인스트럭션 수단을 내포하는 제조 품목을 생산하는 것도 가능하다. 컴퓨터 프로그램 인스트럭션들은 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에 탑재되는 것도 가능하므로, 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비 상에서 일련의 동작 단계들이 수행되어 컴퓨터로 실행되는 프로세스를 생성해서 컴퓨터 또는 기타 프로그램 가능한 데이터 프로세싱 장비를 수행하는 인스트럭션들은 흐름도의 각 단계에서 설명된 기능들을 실행하기 위한 단계들을 제공하는 것도 가능하다.Combinations of each step in each flowchart attached to the present invention may be performed by computer program instructions. Since these computer program instructions may be loaded into a processor of a general-purpose computer, a special-purpose computer, or other programmable data processing equipment, the instructions executed by the processor of the computer or other programmable data processing equipment function as described in each step of the flowchart. create a means to do them. These computer program instructions can also be stored on a computer usable or computer readable medium that can be directed to a computer or other programmable data processing equipment to implement functions in a particular way, so that the computer usable or computer readable It is also possible that the instructions stored on the recording medium produce an article of manufacture containing instruction means for performing the functions described in each step of the flowchart. The computer program instructions can also be loaded on a computer or other programmable data processing equipment, so that a series of operational steps are performed on the computer or other programmable data processing equipment to create a computer-executed process to generate computer or other programmable data processing equipment. Instructions for performing the processing equipment may also provide steps for executing the functions described at each step in the flowchart.

또한, 각 단계는 특정된 논리적 기능(들)을 실행하기 위한 하나 이상의 실행 가능한 인스트럭션들을 포함하는 모듈, 세그먼트 또는 코드의 일부를 나타낼 수 있다. 또, 몇 가지 대체 실시예들에서는 단계들에서 언급된 기능들이 순서를 벗어나서 발생하는 것도 가능함을 주목해야 한다. 예컨대, 잇달아 도시되어 있는 두 개의 단계들은 사실 실질적으로 동시에 수행되는 것도 가능하고 또는 그 단계들이 때때로 해당하는 기능에 따라 역순으로 수행되는 것도 가능하다.Further, each step may represent a module, segment or portion of code that includes one or more executable instructions for executing the specified logical function(s). It should also be noted that in some alternative embodiments it is possible for the functions mentioned in the steps to occur out of order. For example, two steps shown in succession may in fact be performed substantially concurrently, or the steps may sometimes be performed in reverse order depending on the function in question.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 품질에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 균등한 범위 내에 있는 모든 기술사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely an example of the technical idea of the present invention, and various modifications and variations can be made to those skilled in the art without departing from the essential qualities of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical idea of the present invention, but to explain, and the scope of the technical idea of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed according to the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100: 다차원 텐서의 주소 생성 장치
110: 입력부
120: 판별부
130: 연산부
140: 출력부100: multi-dimensional tensor address generator
110: input unit
120: determination unit
130: calculation unit
140: output unit

Claims

An input unit for receiving coordinates of a multi-dimensional tensor in memory;
a determination unit determining an operation necessary to generate an address for the coordinates and generating a control signal corresponding to the determined operation;
Comprising a calculation unit for generating an address corresponding to the coordinates according to the control signal
Address generator for multi-dimensional tensors.

According to claim 1,
Further comprising an output unit for accessing the memory using the address and transmitting data of the address to the processor by the memory
Address generator for multi-dimensional tensors.

According to claim 1,
The input unit receives information about the multidimensional tensor at the beginning of a program of the processor or before accessing the multidimensional tensor for the first time in a program, and provides the information to the discriminating unit.
Address generator for multi-dimensional tensors.

According to claim 3,
The determination unit includes a storage device for storing the coordinates received by the input unit and the address generated by the operation unit as an existing operation result, and storing information about the multidimensional tensor
Address generator for multi-dimensional tensors.

According to claim 4,
The determination unit, when determining an operation necessary for generating an address for the coordinates, compares the result of the existing operation and determines the operation required for a dimension having a coordinate change.
Address generator for multi-dimensional tensors.

According to claim 4,
The information on the multi-dimensional tensor includes the size, stride, and address offset of each dimension of the tensor,
The determination unit provides the control signal and effective coordinates, address offsets, and strides as many as the number of tensor dimensions to the operation unit according to the determined operation.
Address generator for multi-dimensional tensors.

According to claim 6,
When determining an operation required to generate an address for the coordinates, the determining unit includes the existing operation result in an output offset when at least a part of the existing operation result can be reused.
Address generator for multi-dimensional tensors.

According to claim 4,
If there is a discontinuous section of the address according to the coordinates in any dimension of the multidimensional tensor, the discrimination unit divides and stores each discontinuous section in several items of the storage device, and then, when coordinates are input, the discontinuous section to which the coordinates of the corresponding dimension belong. Calculating the assigned tensor information and bringing the corresponding discontinuous section from the storage device
Address generator for multi-dimensional tensors.

According to claim 1,
The determination unit generates the control signal so that the operation unit generates addresses in an order different from the request order of the processor.
Address generator for multi-dimensional tensors.

According to claim 7,
The calculation unit includes at least some of the previous calculation results transmitted through the output offset in an address corresponding to the coordinates.
Address generator for multi-dimensional tensors.

A method for generating an address of a multidimensional tensor performed by an address generating device of a multidimensional tensor,
Receiving coordinates of a multi-dimensional tensor in memory;
Determining an operation necessary to generate an address for the coordinates, and generating an address corresponding to the coordinates according to the determined operation
How to generate addresses for multidimensional tensors.

According to claim 11,
Accessing the memory using the address, and transmitting the data at the address to the processor by the memory
How to generate the addresses of multi-dimensional tensors.

According to claim 11,
Receiving information about the multidimensional tensor at the beginning of the program of the processor or before accessing the multidimensional tensor for the first time in the program
How to generate addresses for multidimensional tensors.

According to claim 13,
Storing the input coordinates and the generated address as an existing operation result and storing information about the multidimensional tensor
How to generate the addresses of multi-dimensional tensors.

15. The method of claim 14,
When determining the operation necessary to generate an address for the coordinates, comparing the result of the existing operation to determine the operation required for the dimension with coordinate change
How to generate the addresses of multi-dimensional tensors.

15. The method of claim 14,
The information on the multi-dimensional tensor includes the size, stride, and address offset of each dimension of the tensor,
According to the determined operation, effective coordinates, address offsets, and strides as many as the number of tensor dimensions are used for the operation.
How to generate the addresses of multi-dimensional tensors.

17. The method of claim 16,
When determining an operation necessary to generate an address for the coordinates, including an existing operation result in an output offset when at least a part of the existing operation result can be reused
How to generate the addresses of multi-dimensional tensors.

15. The method of claim 14,
If there is a discontinuous section of the address according to the coordinates in any dimension of the multidimensional tensor, each discontinuous section is divided into several items and stored, and then, when coordinates are input, tensor information to which the discontinuous section to which the coordinates of the dimension belong is assigned is calculated. Bringing the corresponding discontinuous section among the stored items
How to generate addresses for multidimensional tensors.

According to claim 11,
Generating the addresses in an order different from the request order of the processor.
How to generate addresses for multidimensional tensors.

18. The method of claim 17,
Including at least some of the previous operation results delivered through the output offset to the address corresponding to the coordinates
How to generate the addresses of multi-dimensional tensors.

A computer program stored on a computer readable recording medium,
When the computer program is executed by a processor,
Receiving coordinates of a multi-dimensional tensor in memory;
An instruction for causing the processor to perform a method for generating an address of a multidimensional tensor comprising determining an operation necessary to generate an address for the coordinate and generating an address corresponding to the coordinate according to the determined operation.
computer program.